interesting architecture
updated
FAN: Fourier Analysis Networks
Paper
• 2410.02675
• Published
• 29
Tensor Product Attention Is All You Need
Paper
• 2501.06425
• Published
• 90
Scalable-Softmax Is Superior for Attention
Paper
• 2501.19399
• Published
• 24
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative
Image Modeling
Paper
• 2502.09509
• Published
• 8
YOLOv12: Attention-Centric Real-Time Object Detectors
Paper
• 2502.12524
• Published
• 12
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic
Understanding, Localization, and Dense Features
Paper
• 2502.14786
• Published
• 158
Large Language Diffusion Models
Paper
• 2502.09992
• Published
• 126
ObjectMover: Generative Object Movement with Video Prior
Paper
• 2503.08037
• Published
• 5
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
• 2503.09573
• Published
• 76
Transformers without Normalization
Paper
• 2503.10622
• Published
• 170
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper
• 2503.14456
• Published
• 153
Scaling Vision Pre-Training to 4K Resolution
Paper
• 2503.19903
• Published
• 41
Paper
• 2504.00927
• Published
• 56
TransMamba: Flexibly Switching between Transformer and Mamba
Paper
• 2503.24067
• Published
• 21
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
• 2504.20966
• Published
• 31
MMaDA: Multimodal Large Diffusion Language Models
Paper
• 2505.15809
• Published
• 98
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper
• 2506.07900
• Published
• 95
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for
Long Video Generation
Paper
• 2506.19852
• Published
• 42
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Paper
• 2508.11598
• Published
• 17
Paper
• 2508.10104
• Published
• 298
2D Gaussian Splatting with Semantic Alignment for Image Inpainting
Paper
• 2509.01964
• Published
• 7
Sequential Diffusion Language Models
Paper
• 2509.24007
• Published
• 46
Paper
• 2510.13998
• Published
• 59
AnyUp: Universal Feature Upsampling
Paper
• 2510.12764
• Published
• 12
Latent Diffusion Model without Variational Autoencoder
Paper
• 2510.15301
• Published
• 49
Stronger Normalization-Free Transformers
Paper
• 2512.10938
• Published
• 21
Bolmo: Byteifying the Next Generation of Language Models
Paper
• 2512.15586
• Published
• 17
ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation
Paper
• 2601.03955
• Published
• 3
AnyDepth: Depth Estimation Made Easy
Paper
• 2601.02760
• Published
• 10
ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers
Paper
• 2601.05741
• Published
• 2
Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings
Paper
• 2512.12167
• Published
• 5
Implicit Neural Representation Facilitates Unified Universal Vision Encoding
Paper
• 2601.14256
• Published
• 7
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
Document Understanding
Paper
• 2506.16035
• Published
• 89
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper
• 2601.21204
• Published
• 100
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
Paper
• 2602.03216
• Published
• 12
dLLM: Simple Diffusion Language Modeling
Paper
• 2602.22661
• Published
• 102