Instructions to use bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF", filename="DeepSeek-R1-Distill-Qwen-32B-IQ2_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M
- Ollama
How to use bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF with Ollama:
ollama run hf.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M
- Unsloth Studio new
How to use bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF to start chatting
- Docker Model Runner
How to use bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF with Docker Model Runner:
docker model run hf.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M
- Lemonade
How to use bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.DeepSeek-R1-Distill-Qwen-32B-GGUF-Q4_K_M
List all available models
lemonade list
R1 32b is much worse than QwQ ...
Do you experience that?
I can confirm is bad if you compare to QwQ ...
The system prompt that's floating around seems to help a lot (I transcribed it from a screenshot):
A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.```
Can you try the unsloth/DeepSeek-R1-Distill-Qwen-32B-/GGUFDeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf and compare the result with bartowski files. I tried the Stellar Blade riddel (a side quest): Can you solve the puzzle with these equations?
( 4 @ 7 @ 8 = 285684 )
( 9 @ 3 @ 5 = 271542 )
( 6 @ 2 @ 7 = 121426 )
( 5 @ 6 @ 7 = ? )
And unsloth seams to solve it but not the
bartowski version, I am just loading them default in LM studio.
Can you give the full prompt?
The prompt is: Can you solve the puzzle with these equations?
( 4 @ 7 @ 8 = 285684 )
( 9 @ 3 @ 5 = 271542 )
( 6 @ 2 @ 7 = 121426 )
( 5 @ 6 @ 7 = ? )
the unsloth version solved this, i had problems with the bartowski version, but if this correct or not...?
I tried with that prompt unsloth and bartowski model - version q4lm
full command
R1 32b
llama-cli.exe --model DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 -p "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>."
QwQ
build\bin\llama-cli.exe --model QwQ-32B-Preview-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --in-prefix "<|im_end|>\n<|im_start|>user\n" --in-suffix "<|im_end|>\n<|im_start|>assistant\n" -p "<|im_start|>system\nYou are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."
Example questions:
1
How many days are between 12-12-1971 and 18-4-2024?
answer: 19121
2
Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what is inside. Yet, the label on the bag says "chocolate" and not "popcorn". Sam finds the bag. She had never seen the bag before. Sam reads the label. She believes that the bag is full of…
answer : popcorn
3
I have a boat with 3 free spaces. I want to transport a man, sheep and cat on the other side of the river. How to do that?
answer: one run by boat
4
Two fathers and two sons go fishing. They each catch one fish. Together, they leave with four fish in total. Is there anything strange about this story?
answer: nothing special
5
I have a bowl with a small cup inside. I placed the bowl upside down on a table and then pick up the bowl to put it in the microwave. Where is that cup?
answer: on the table
6
Hello! I have multiple different files with different sizes,
I want to move files from disk 1 to disk 2, which has only 688 space available.
Without yapping, and being as concise as possible.
What combination of files gets me closer to that number?
The file sizes are: 36, 36, 49, 53, 54, 54, 63, 94, 94, 107, 164, 201, 361, 478
answer :is the sum numbers which gives exact 688
Example answers QwQ 32b q4km is answering correctly but R1 32b q4km not even one is correct ... very strange ...
I think quantization is broken or they trained at least R1 32b for benchmarks ... but according to their table R1 32b should be far better than QwQ in reasoning or math
The HuggingChat with the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B model also failes the "Stellarblade puzzle", (unsloth
gguf works...)
I think maybe temperature is a factor, because testing this stellar blade puzzle with top-k=1 it got stuck repeating itself. The temp 0.5-0.6 and top-p 0.95 they recommend might let it get out of that. I'm still testing it but I think we're no longer in Kansas when it comes to this model.
Ok so I got Bartowski's Q8 to answer it correctly using temp 0.5 and top-p 0.95
I think I actually found a solution! ... the trick is to force model to thing LONGER like QwQ is thinking longer on problems than R1 32b.
This question I got 5 times proper in a row already ... longest thinking took 9k tokens ..but works ;)
How many days are between 12-12-1971 and 18-4-2024?
Still testing ....
look
https://pastebin.com/UrJu1DgK
EDIT - I was right 10/10 answers are proper now at least for math problems.
More testing ;)
I tested dates question in LM Studio. Q4_K_M always made a mistake calculating 53 full years, plus other math errors. Q6_K always gave me a correct reply. I think going beyond Q6_K is not a good idea for this model. I set temp to 0.09, for my purpose precision is more important than repetition since I can always increase temp if needed.
the trick is to force model to think LONGER
Coould you share how did you force it to think longer?
the trick is to force model to think LONGER
Coould you share how did you force it to think longer?
llama-cli.exe --model DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 -p "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>."
Reasoning and math is better now but still not as good as QwQ
It's brittle with prompts. If you just ask the question, even the IQ3_M works fine.
./build/bin/llama-cli -no-cnv --no-escape -m models/DeepSeek-R1-Distill-Qwen-32B-IQ3_M.gguf \
-t 2 -ngl -1 -fa -c 16384 -nkvo -n -1 --temp 0.1 \
-p '<|User|>How many days are between 12-12-1971 and 18-4-2024?<|Assistant|>'
output:
<think>
First, I need to determine the number of days between December 12, 1971, and April 18, 2024. To do this, I'll break down the problem into smaller parts.
I'll start by calculating the number of full years between 1972 and 2023. This gives me 52 years. Among these, I need to identify how many are leap years because they have 366 days instead of the usual 365. Leap years occur every 4 years, so there are 13 leap years in this period.
Next, I'll calculate the total number of days contributed by these years. The 39 common years contribute 39 × 365 = 14,235 days, and the 13 leap years add 13 × 366 = 4,758 days. Adding these together gives 18,993 days.
Now, I'll account for the partial years. From December 12, 1971, to December 31, 1971, there are 20 days. From January 1, 2024, to April 18, 2024, there are 138 days. Adding these partial days to the total gives 18,993 + 20 + 138 = 19,151 days.
Finally, I'll add 1 day to include the end date in the count. This results in a total of 19,152 days between December 12, 1971, and April 18, 2024.
</think>
To determine the number of days between **12-12-1971** and **18-4-2024**, we'll break down the calculation into manageable steps.
### Step 1: Calculate the Number of Full Years
- **Start Year:** 1971
- **End Year:** 2024
The full years between 1972 and 2023 inclusive are:
\[ 2023 - 1972 + 1 = 52 \text{ years} \]
### Step 2: Identify Leap Years
A leap year occurs every 4 years. Between 1972 and 2023:
- **Number of Leap Years:**
\[ \left\lfloor \frac{2023 - 1972}{4} \right\rfloor + 1 = 13 \text{ leap years} \]
### Step 3: Calculate Total Days from Full Years
- **Common Years (365 days):** \( 52 - 13 = 39 \) years
- **Leap Years (366 days):** 13 years
\[
\begin{align*}
\text{Days from Common Years} &= 39 \times 365 = 14,235 \text{ days} \\
\text{Days from Leap Years} &= 13 \times 366 = 4,758 \text{ days} \\
\text{Total Days from Full Years} &= 14,235 + 4,758 = 18,993 \text{ days}
\end{align*}
\]
### Step 4: Calculate Days in Partial Years
- **From December 12, 1971, to December 31, 1971:**
\[
31 - 12 = 19 \text{ days}
\]
- **From January 1, 2024, to April 18, 2024:**
\[
31 \text{ (January)} + 29 \text{ (February)} + 31 \text{ (March)} + 18 \text{ (April)} = 109 \text{ days}
\]
\[
\text{Total Partial Days} = 19 + 109 = 128 \text{ days}
\]
### Step 5: Sum All Days
\[
\begin{align*}
\text{Total Days} &= \text{Days from Full Years} + \text{Partial Days} \\
&= 18,993 + 128 \\
&= 19,121 \text{ days}
\end{align*}
\]
### Final Answer
\[
\boxed{19121}
\]
the trick is to force model to think LONGER
Coould you share how did you force it to think longer?
llama-cli.exe --model DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 -p "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>."Reasoning and math is better now but still not as good as QwQ
thanks, will try it out
For information, with the prompt below deepseek-r1:14b (Ollama version / Q4) find the good answer in 19s on my RTX3090.
With this parameters:
Temp=0.5
Reasoning Effort=high (available in Open WebUI)
Top_p=0.9
How many days are between 12-12-1971 and 18-4-2024?
Think step by step.
Without adding "Think step by step" even deepseek-r1:70b (Ollama version / Q4) doesn't find the good answer. Same thing for ChatGPT4-o1 that failed and find 19122 and not 19121.
I still don’t understand. Downloaded bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF/DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf again just to test again. Temperature = 0.6 Got 19,486 days
for the Prompt "How many days are between 12-12-1971 and 18-4-2024?"
Using the unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF/DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf, With Temperature = 0.6 gives 19,121 days for the Prompt "How many days are between 12-12-1971 and 18-4-2024?"
My very unfounded suspicion is that use of imatrix does something wrong? Or something else is wrong?
Hmm that's surprising.. can you try the Q6 on lmstudio? That one doesn't have imatrix applied..
Imatrix ruining it isn't impossible but is unlikely :S it's very hard for it to make things worse
For the record with temperature 0.6 on Q5_K_M I get 19,122, which, if it includes the day itself, is a valid answer but obviously still different
What does the official API return?
Oh my.. the official chat goes absolutely crazy trying to answer this question.. thought for 89 seconds, and eventually gets 19,121
Oh right but we're talking about distill, not R1 🤦♂️
I just tested DeepSeek-R1-Distill-Qwen-𝟏𝟒𝗕-Q5_K_M, and it almost always answers 19121. Also, when reading thought process, there was a funny part "...But I'm not sure about the exact Julian day numbers off the 𝐭𝐨𝐩 𝐨𝐟 𝐦𝐲 𝐡𝐞𝐚𝐝."
I tested those 6 questions locally, 14B R1 not even 32B. And it answered 1,5,6 questions correctly, while 32B R1 on huggingface/chat didn't. Is something wrong with huggingface/chat version of 32B R1?
3
I have a boat with 3 free spaces. I want to transport a man, sheep and cat on the other side of the river. How to do that?answer: one run by boat
By modifying this question a little bit, even standard qwen2.5-14b-instruct-1M will answer it. >>> I have a boat with 3 free spaces. I want to transport a man, sheep and cat on the other side of the river. How to do that? Before solving, analyse every detail of my question. <<<
But looks like both R1 distilled and phi-4 still can't answer... qwen2.5-14b-instruct-1M is king 😂 It even answered popcorn and bowl/cup questions.
I think I'll just delete r1 distilled and phi-4 lol.
(llama-server -m Qwen2.5-14B-Instruct-1M.Q4_K_M.gguf, no system promt, temp 0.6, nothing else changed)