This collection lists all the historical (pre-"Hub") canonical model checkpoints, i.e. repos that were not under an org or user namespace
Julien Chaumond PRO
AI & ML interests
<3 ML/AI for everyone, building products to propel communities fwd
Recent Activity
updated a dataset about 11 hours ago
julien-c/caliceo liked a Space 2 days ago
victor/dlss-5-anything upvoted a changelog 2 days ago
Organization tagging for PapersOrganizations
Recent Mamba Papers
[NB: Notes are from TuringPost]
-
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
Paper • 2403.09977 • Published • 10 -
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Paper • 2403.14520 • Published • 35 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 13
Research projects on top of vLLM
Papers cited in https://blog.vllm.ai/2024/07/25/lfai-perf.html
-
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Paper • 2407.00079 • Published • 5 -
Llumnix: Dynamic Scheduling for Large Language Model Serving
Paper • 2406.03243 • Published -
CacheGen: Fast Context Loading for Language Model Applications
Paper • 2310.07240 • Published • 1 -
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Paper • 2405.04437 • Published • 3
Papers about model merging
referenced in the mergekit repo: https://github.com/cg123/mergekit
-
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 8 -
Editing Models with Task Arithmetic
Paper • 2212.04089 • Published • 7 -
Resolving Interference When Merging Models
Paper • 2306.01708 • Published • 17 -
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper • 2311.03099 • Published • 30
git-theta
Playing with git-theta: https://github.com/r-three/git-theta
Canonical models
This collection lists all the historical (pre-"Hub") canonical model checkpoints, i.e. repos that were not under an org or user namespace
Papers about model merging
referenced in the mergekit repo: https://github.com/cg123/mergekit
-
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 8 -
Editing Models with Task Arithmetic
Paper • 2212.04089 • Published • 7 -
Resolving Interference When Merging Models
Paper • 2306.01708 • Published • 17 -
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper • 2311.03099 • Published • 30
Recent Mamba Papers
[NB: Notes are from TuringPost]
-
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
Paper • 2403.09977 • Published • 10 -
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Paper • 2403.14520 • Published • 35 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 13
git-theta
Playing with git-theta: https://github.com/r-three/git-theta
Research projects on top of vLLM
Papers cited in https://blog.vllm.ai/2024/07/25/lfai-perf.html
-
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Paper • 2407.00079 • Published • 5 -
Llumnix: Dynamic Scheduling for Large Language Model Serving
Paper • 2406.03243 • Published -
CacheGen: Fast Context Loading for Language Model Applications
Paper • 2310.07240 • Published • 1 -
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Paper • 2405.04437 • Published • 3