Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding Paper • 2502.08363 • Published Feb 12, 2025 • 1
Dynamically Sacrificing Accuracy for Reduced Computation: Cascaded Inference Based on Softmax Confidence Paper • 1805.10982 • Published May 28, 2018 • 1
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights Paper • 2509.22944 • Published Sep 26, 2025 • 80