oguzhanercan 's Collections Training Theory
updated
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
Paper
• 2502.03738
• Published
• 11
Better Embeddings with Coupled Adam
Paper
• 2502.08441
• Published
• 2
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and
Mixture-of-Experts Optimization Alignment
Paper
• 2502.16894
• Published
• 32
SALT: Singular Value Adaptation with Low-Rank Transformation
Paper
• 2503.16055
• Published
• 8
Decoupling Angles and Strength in Low-rank Adaptation
Paper
• 2503.18225
• Published
• 3
Entropy-Based Adaptive Weighting for Self-Training
Paper
• 2503.23913
• Published
• 3
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
DiffusionBlocks: Blockwise Training for Generative Models via
Score-Based Diffusion
Paper
• 2506.14202
• Published
• 2
Selective Contrastive Learning for Weakly Supervised Affordance
Grounding
Paper
• 2508.07877
• Published
• 12
Why Low-Precision Transformer Training Fails: An Analysis on Flash
Attention
Paper
• 2510.04212
• Published
• 26