INT4 LLMs for vLLM - a neuralmagic Collection

neuralmagic 's Collections

DeepSeek-R1-Distill Quantized

Granite 3.1 Quantization

Sparse-Llama-3.1-2of4

Vision Language Models Quantization

FP8 LLMs for vLLM

Llama-3.2 Quantization

Llama-3.1 Quantization

INT8 LLMs for vLLM

INT4 LLMs for vLLM

Sparse Foundational Llama 2 Models

Compression Papers

DeepSparse Sparse LLMs

Sparse Finetuning MPT

Compressed LLMs from the Community

INT4 LLMs for vLLM

updated Mar 2

Accurate INT4 quantized models by Neural Magic, ready for use with vLLM!

RedHatAI/Meta-Llama-3.1-405B-Instruct-quantized.w4a16

Text Generation • 409B • Updated Oct 10, 2024 • 236 • 12
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16

Text Generation • 71B • Updated Feb 12, 2025 • 115k • 33
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16

Text Generation • 8B • Updated May 5 • 63.9k • 30
RedHatAI/Mistral-Nemo-Instruct-2407-quantized.w4a16

Text Generation • 12B • Updated Oct 9, 2024 • 857 • 4
RedHatAI/gemma-2-9b-it-quantized.w4a16

Text Generation • 10B • Updated Oct 9, 2024 • 23 • 2
RedHatAI/gemma-2-2b-it-quantized.w4a16

Text Generation • 3B • Updated Oct 9, 2024 • 77 • 1
RedHatAI/Phi-3-medium-128k-instruct-quantized.w4a16

Text Generation • 14B • Updated Oct 9, 2024 • 1.72k • 3
RedHatAI/Phi-3-mini-128k-instruct-quantized.w4a16

Text Generation • 4B • Updated Oct 9, 2024 • 45 • 1
RedHatAI/Qwen2-0.5B-Instruct-quantized.w4a16

Text Generation • 0.6B • Updated Jul 18, 2024 • 436
RedHatAI/Qwen2-1.5B-Instruct-quantized.w4a16

Text Generation • 2B • Updated Jul 18, 2024 • 105
RedHatAI/Qwen2-72B-Instruct-quantized.w4a16

Text Generation • 73B • Updated Jul 18, 2024 • 28 • 4
RedHatAI/Qwen2-7B-Instruct-quantized.w4a16

Text Generation • 8B • Updated Jul 18, 2024 • 14
RedHatAI/Meta-Llama-3-8B-Instruct-quantized.w4a16

Text Generation • 8B • Updated Jul 18, 2024 • 1.21k • 2
RedHatAI/Meta-Llama-3-70B-Instruct-quantized.w4a16

Text Generation • 71B • Updated Aug 29, 2024 • 17 • 2
RedHatAI/Llama-2-7b-chat-quantized.w4a16

Text Generation • 7B • Updated Jul 18, 2024 • 10
RedHatAI/Mistral-7B-Instruct-v0.3-quantized.w4a16

Text Generation • 7B • Updated Mar 13, 2025 • 224 • 2