GPT-5-Distill-Qwen3-Lllama3
Collection
8 items • Updated
Model Type: Instruction-tuned Edge LLM (Llama 3.2 Architecture)
unsloth/Llama-3.2-3B-Instructmax_seq_length = 32768)This model represents a high-efficiency distillation attempt, combining the lightweight, edge-ready architecture of Llama-3.2-3B with the high-quality conversational patterns of GPT-5. By filtering for "normal" (flawless) responses from the LMSYS dataset, this model aims to deliver flagship-level instruction following in a 3B parameter package.
The model was trained on a curated mix of ~104,000 high-quality samples:
Jackrong/ShareGPT-Qwen3-235B-A22B-Instuct-2507ytz20/LMSYS-Chat-GPT-5-Chat-Responseflaw == "normal" (Removed hallucinations, refusals, and bad formatting).train_on_responses_only was enabled (Model learns to generate answers, not questions).This model uses the standard Llama 3 / 3.2 prompt template.
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
{Your Prompt Here}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Python Inference Example:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Jackrong/GPT-5-Distill-llama3.2-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum mechanics to a 5-year-old."},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.7,
do_sample=True
)
print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
| Feature | Description |
|---|---|
| Super Lightweight | 3B Parameters. Runs on almost any modern consumer hardware. |
| GPT-5 Distilled | Learned from 100k+ clean GPT-5 outputs for superior tone. |
| Long Context | Supports up to 32k context, great for long conversations. |
| GGUF Ready | Available in q4_k_m (very fast) and q8_0 quantizations. |
This project is an open research effort to bring "Big Model Intelligence" to "Small Model Footprints."
Base model
meta-llama/Llama-3.2-3B-Instruct