Instructions to use Columbia-NLP/gemma-2b-zephyr-dpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Columbia-NLP/gemma-2b-zephyr-dpo with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Columbia-NLP/gemma-2b-zephyr-dpo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Columbia-NLP/gemma-2b-zephyr-dpo")
model = AutoModelForMultimodalLM.from_pretrained("Columbia-NLP/gemma-2b-zephyr-dpo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Columbia-NLP/gemma-2b-zephyr-dpo with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Columbia-NLP/gemma-2b-zephyr-dpo"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Columbia-NLP/gemma-2b-zephyr-dpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Columbia-NLP/gemma-2b-zephyr-dpo

SGLang

How to use Columbia-NLP/gemma-2b-zephyr-dpo with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Columbia-NLP/gemma-2b-zephyr-dpo" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Columbia-NLP/gemma-2b-zephyr-dpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Columbia-NLP/gemma-2b-zephyr-dpo" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Columbia-NLP/gemma-2b-zephyr-dpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Columbia-NLP/gemma-2b-zephyr-dpo with Docker Model Runner:
```
docker model run hf.co/Columbia-NLP/gemma-2b-zephyr-dpo
```

Model Card for Gemma 2B Zephyr DPO

We trained the google/gemma-2b with DPO and data from argilla/dpo-mix-7k. We carefully selected the hyper-parameters to achieve the best DPO performance.

Model description

Model type: A 2.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
Language(s) (NLP): Primarily English
License: Gemma Terms of Use
Finetuned from model: google/gemma-2b

License

This model has the same license as the original Gemma model collection

OpenLLM Leaderboard Performance

Models	Avg.	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8k
google/gemma-2b	46.37	48.38	71.77	41.77	33.08	66.77	16.91
google/gemma-2b-it	42.75	43.94	62.70	37.65	45.82	60.93	5.46
wandb/gemma-2b-zephyr-sft	47.18	49.74	72.38	41.37	34.42	66.93	18.27
wandb/gemma-2b-zephyr-dpo	46.92	49.66	72.23	41.13	34.47	66.54	17.51
Columbia-NLP/gemma-2b-zephyr-sft	48.75	51.80	72.63	42.20	41.96	63.85	20.09
Columbia-NLP/gemma-2b-zephyr-dpo	49.14	52.22	73.11	42.55	42.64	64.40	19.94

MT-Bench

We evaluate our model with GPT-4-0125-preview as the judge.

Model	Total	Coding	Extraction	Humanities	Math	Reasoning	Roleplay	STEM	Writing
google/gemma-2b-it	4.71	2.95	4.35	6.15	2.90	3.50	5.60	5.50	6.70
wandb/gemma-2b-zephyr-sft	4.03	3.10	3.15	5.00	2.70	2.65	5.10	4.80	5.75
wandb/gemma-2b-zephyr-dpo	4.06	2.80	2.90	5.55	2.65	2.70	5.20	4.80	5.85
anakin87_gemma-2b-orpo	4.14	3.00	3.70	6.30	2.70	2.35	5.68	4.75	4.75
Columbia-NLP/gemma-2b-zephyr-sft	4.34	3.10	3.70	6.25	2.65	2.70	5.55	5.25	5.50
Columbia-NLP/gemma-2b-zephyr-dpo	4.75	3.50	4.05	6.75	3.30	3.70	5.85	5.40	5.53

Downloads last month: 8

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for Columbia-NLP/gemma-2b-zephyr-dpo

Base model

google/gemma-2b

Finetuned

Columbia-NLP/gemma-2b-zephyr-sft

Finetuned

(1)

this model

Dataset used to train Columbia-NLP/gemma-2b-zephyr-dpo

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set self-reported

52.220
normalized accuracy on HellaSwag (10-Shot)
validation set self-reported

73.110
accuracy on MMLU (5-Shot)
test set self-reported

42.550
mc2 on TruthfulQA (0-shot)
validation set self-reported

42.640
accuracy on Winogrande (5-shot)
validation set self-reported

64.400
accuracy on GSM8k (5-shot)
test set self-reported

19.940