Instructions to use AsyncBuilds/qwen3-1.7b-promql with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use AsyncBuilds/qwen3-1.7b-promql with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AsyncBuilds/qwen3-1.7b-promql", filename="qwen3-1.7b.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use AsyncBuilds/qwen3-1.7b-promql with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AsyncBuilds/qwen3-1.7b-promql:Q4_K_M # Run inference directly in the terminal: llama-cli -hf AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AsyncBuilds/qwen3-1.7b-promql:Q4_K_M # Run inference directly in the terminal: llama-cli -hf AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf AsyncBuilds/qwen3-1.7b-promql:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf AsyncBuilds/qwen3-1.7b-promql:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
Use Docker
docker model run hf.co/AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use AsyncBuilds/qwen3-1.7b-promql with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AsyncBuilds/qwen3-1.7b-promql" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AsyncBuilds/qwen3-1.7b-promql", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
- Ollama
How to use AsyncBuilds/qwen3-1.7b-promql with Ollama:
ollama run hf.co/AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
- Unsloth Studio new
How to use AsyncBuilds/qwen3-1.7b-promql with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AsyncBuilds/qwen3-1.7b-promql to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AsyncBuilds/qwen3-1.7b-promql to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AsyncBuilds/qwen3-1.7b-promql to start chatting
- Pi new
How to use AsyncBuilds/qwen3-1.7b-promql with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "AsyncBuilds/qwen3-1.7b-promql:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use AsyncBuilds/qwen3-1.7b-promql with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use AsyncBuilds/qwen3-1.7b-promql with Docker Model Runner:
docker model run hf.co/AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
- Lemonade
How to use AsyncBuilds/qwen3-1.7b-promql with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull AsyncBuilds/qwen3-1.7b-promql:Q4_K_M
Run and chat with the model
lemonade run user.qwen3-1.7b-promql-Q4_K_M
List all available models
lemonade list
qwen3-1.7b-promql
A fine-tuned version of Qwen3-1.7B for generating PromQL queries from natural language descriptions.
Model Details
- Base model: Qwen/Qwen3-1.7B
- Fine-tuning method: QLoRA (4-bit) via Unsloth
- Training data: ~6,400 curated PromQL instruction examples covering Kubernetes, node metrics, application metrics, and alerting patterns
- Training time: ~24 minutes on A100
- Formats available: LoRA adapter weights + GGUF (Q4_K_M)
Evaluation
Evaluated against the base Qwen3-1.7B on 100 held-out examples using PromQL parser validation and LLM-as-judge scoring (1-5):
| Model | Valid PromQL | Correct% | Avg Score |
|---|---|---|---|
| qwen3-1.7b-promql (this model) | 90% | 35% | 3.55 |
| qwen3:1.7b (base) | 6% | 4% | — |
Per-category breakdown:
| Category | Valid% | Correct% |
|---|---|---|
| General metrics | 90% | 45% |
| Hard / multi-step | 93% | 48% |
| Expert / subqueries | 87% | 12% |
The model performs well on common Kubernetes and infrastructure monitoring queries. Complex nested subqueries (e.g. min_over_time(rate(...)[6h:5m])) are the current weak spot.
Usage
Ollama (recommended)
# Download the GGUF file from this repo, then:
cat > Modelfile << 'EOF'
FROM ./qwen3-1.7b.Q4_K_M.gguf
TEMPLATE """<|im_start|>system
You are a PromQL expert. Given a monitoring request and context, return only the PromQL query with no explanation.<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER temperature 0.1
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|im_start|>"
EOF
ollama create promql -f Modelfile
ollama run promql "Request: Show HTTP error rate over 5 minutes
Context: Metric http_requests_total with labels code, method"
Transformers + LoRA adapter
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
model = PeftModel.from_pretrained(base, "AsyncBuilds/qwen3-1.7b-promql")
SYSTEM = "You are a PromQL expert. Given a monitoring request and context, return only the PromQL query with no explanation."
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "Request: Show CPU usage per node\nContext: Metric node_cpu_seconds_total"},
]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response.strip())
Input Format
The model expects input in this format:
Request: <natural language description of what you want to measure>
Context: <relevant metric names and labels>
The Context field is optional but improves accuracy — include the metric name(s) you want to query when known.
Training Data
Trained on a curated dataset of ~6,400 PromQL instruction examples covering:
- Kubernetes cluster metrics (kube-state-metrics, cAdvisor)
- Node/infrastructure metrics (node_exporter)
- Application metrics (HTTP, gRPC, database)
- Alerting patterns (absent, rate thresholds)
- Hard negatives (common mistakes and their corrections)
Dataset was validated using a combination of PromQL parser validation and LLM-as-judge scoring before training.
Limitations
- Complex nested subqueries with multiple aggregation levels may be inaccurate
- Non-standard or custom metric names require explicit context
- Not a substitute for understanding PromQL — always validate generated queries before use in production alerting
License
Apache 2.0 — same as the base Qwen3 model.
- Downloads last month
- 3
4-bit