Instructions to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AndriLawrence/Qwen-3B-Intent-Microplan-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AndriLawrence/Qwen-3B-Intent-Microplan-v1") model = AutoModelForCausalLM.from_pretrained("AndriLawrence/Qwen-3B-Intent-Microplan-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AndriLawrence/Qwen-3B-Intent-Microplan-v1", filename="gguf/merged.Q3_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M # Run inference directly in the terminal: llama-cli -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M # Run inference directly in the terminal: llama-cli -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M # Run inference directly in the terminal: ./llama-cli -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
Use Docker
docker model run hf.co/AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
- LM Studio
- Jan
- vLLM
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AndriLawrence/Qwen-3B-Intent-Microplan-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AndriLawrence/Qwen-3B-Intent-Microplan-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
- SGLang
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AndriLawrence/Qwen-3B-Intent-Microplan-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AndriLawrence/Qwen-3B-Intent-Microplan-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AndriLawrence/Qwen-3B-Intent-Microplan-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AndriLawrence/Qwen-3B-Intent-Microplan-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Ollama:
ollama run hf.co/AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
- Unsloth Studio new
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AndriLawrence/Qwen-3B-Intent-Microplan-v1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AndriLawrence/Qwen-3B-Intent-Microplan-v1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AndriLawrence/Qwen-3B-Intent-Microplan-v1 to start chatting
- Pi new
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
Run Hermes
hermes
- Docker Model Runner
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Docker Model Runner:
docker model run hf.co/AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
- Lemonade
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
Run and chat with the model
lemonade run user.Qwen-3B-Intent-Microplan-v1-Q3_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)Qwen-3B-Intent-Microplan-v1
⚠️ Deprecated / Archived Model
👉 Looking for the maintained version?
Use Qwen-3B-Intent-Microplan-v2 instead:
➡️ https://huggingface.co/AndriLawrence/Qwen-3B-Intent-Microplan-v2
This is a supervised fine-tune (SFT) of bunnycore/Qwen2.5-3B-RP-Mix (3B) designed to serve as a local-first, real-time game NPC brain.
This v1 model is the first release built on the "Intent-Microplan Framework": a structured-output approach that separates an NPC's high-level social/strategic goals (intent) from their low-level physical execution steps (microplan).
The model is designed for companion, dating-sim, or comfort-aware NPC use cases, outputting strict, engine-parsable JSON for dialogue and action.
⚠️ V1 Status: Deprecated (Failure Analysis)
This V1 model is considered a failure and is deprecated. Do not use this for production.
The primary issue stems from the choice of the base model, bunnycore/Qwen2.5-3B-RP-Mix. This "Roleplay (RP) merge" proved highly resistant to strict JSON schema enforcement. It consistently attempts to break out of the JSON format to produce creative, non-structured text, which defeats the purpose of the Intent-Microplan framework.
This model is superseded by V2, which uses a non-RP, foundational base model that is better suited for structured data output:
➡️ https://huggingface.co/AndriLawrence/Qwen-3B-Intent-Microplan-v2
The original documentation below is preserved for archival purposes of the V1 attempt.
🎯 The Intent-Microplan Framework
This model's purpose is to act as a dynamic behavior tree generator. Instead of just talking, it creates a plan.
intent(The "What"): The strategic goal or social understanding. This is the "why" behind the action (e.g.,comfort_intimate,acknowledge_compliment).microplan(The "How"): The list of physical, engine-agnostic steps to achieve the intent. Your game engine's C# (or C++) script is responsible for parsing this array and executing the functions (e.g., triggering animations, moving the NavMeshAgent).
This architecture allows for:
- Emergent Behavior: NPCs can dynamically generate plans based on context.
- Low Latency: Only one small, local LLM call is needed per interaction.
📦 Model Artifacts
merged/: FP16 Transformers weights (LoRA merged) for direct use.adapter/: The LoRA adapter (PEFT) for continued SFT or experimentation.gguf/: Quantized GGUF files (e.g., Q4_K_M) forllama.cpp, ideal for local in-game deployment (Unity, Unreal).
💬 Prompting & JSON Schema
The model is trained to respond to a context block and output ONLY a raw JSON object.
JSON Schema
| Key | Type | Description |
|---|---|---|
dialog |
array |
A list of 1-2 dialogue objects ({"speaker": "npc", "text": "..."}). |
intent |
string |
The single, precise strategic goal selected by the model. |
microplan |
array |
An array of 0-5 string commands for the game engine. |
Supported Intents (v1):
social_greeting, acknowledge_touch, acknowledge_compliment, comfort_intimate, invite_sleep, inspect_object, open_or_trigger_object, give_item, receive_item, small_talk, react_to_player_action, idle_initiative, respect_distance
⚡ Recommended Prompt
This "hardened" prompt includes rules that mirror the dataset's logic, ensuring high stability and intent accuracy.
You are LLM-1 (creative social responder).
Return ONE object of STRICT JSON ONLY with keys:
- "dialog": [{ "speaker": "npc", "text": string }] (1–2 items, concise, warm, natural)
- "intent": string (choose a precise label, e.g., social_greeting, acknowledge_touch, acknowledge_compliment, comfort_intimate, invite_sleep, inspect_object, open_or_trigger_object, give_item, receive_item, small_talk, react_to_player_action, idle_initiative, respect_distance)
- "microplan": array of 0–5 short steps (body/face/locomotion cues, e.g., "Approach front (1.0m)", "Offer hand (0.7)", "Smile (0.6)")
Hard rules:
- If event == "Player_Touches" → intent MUST be "acknowledge_touch".
- If event == "Player_Action" → intent MUST be "react_to_player_action" (or a more specific action intent if obvious).
- If player's text contains (nice|great|love|beautiful|cool) → intent MUST be "acknowledge_compliment".
- English only. No markdown/code fences. No extra text. JSON only.
- ≤ 2 sentences in each dialog text. Do NOT start with "I'm" or "I am". No helper clichés.
NOW RESPOND TO THIS CONTEXT:
{CONTEXT_JSON}
OUTPUT:
Replace {CONTEXT_JSON} with your game's state payload.
🚀 How to Use
Transformers (Python)
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import json
MODEL_ID = "AndriLawrence/Qwen-3B-Intent-Microplan-v1"
tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
mdl = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True
)
# This system prompt is CRITICAL for schema adherence
system_prompt = (
'You are LLM-1 (creative social responder).\n'
'Return ONE object of STRICT JSON ONLY with keys:\n'
'- "dialog": [{ "speaker": "npc", "text": string }] (1–2 items, concise, warm, natural)\n'
'- "intent": string (choose a precise label, e.g., social_greeting, acknowledge_touch, acknowledge_compliment, comfort_intimate, invite_sleep, inspect_object, open_or_trigger_object, give_item, receive_item, small_talk, react_to_player_action, idle_initiative, respect_distance)\n'
'- "microplan": array of 0–5 short steps (body/face/locomotion cues, e.g., "Approach front (1.0m)", "Offer hand (0.7)", "Smile (0.6)")\n\n'
'Hard rules:\n'
'- If event == "Player_Touches" → intent MUST be "acknowledge_touch".\n'
'- If event == "Player_Action" → intent MUST be "react_to_player_action" (or a more specific action intent if obvious).\n'
'- If player\'s text contains (nice|great|love|beautiful|cool) → intent MUST be "acknowledge_compliment".\n'
'- English only. No markdown/code fences. No extra text. JSON only.\n'
'- ≤ 2 sentences in each dialog text. Do NOT start with "I\'m" or "I am". No helper clichés.'
)
# Example: Player holds out their hand
game_context = {
"event": "Player_Action",
"action": "offer_hand",
"environment": {"location": "Living Room", "distance": "1.5m"}
}
msgs = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"NOW RESPOND TO THIS CONTEXT:\n{json.dumps(game_context)}\nOUTPUT:"}
]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=False) # We add the 'OUTPUT:' manually
pipe = pipeline("text-generation", model=mdl, tokenizer=tok)
gen_out = pipe(
prompt,
do_sample=True,
temperature=0.35, # BALANCED preset
top_p=0.85,
repetition_penalty=1.1,
max_new_tokens=256,
pad_token_id=tok.eos_token_id
)[0]["generated_text"]
# Clean the output and parse
try:
output_text = gen_out.split("OUTPUT:")[-1].strip()
parsed_json = json.loads(output_text)
print(json.dumps(parsed_json, indent=2))
except json.JSONDecodeError:
print(f"FAILED TO PARSE JSON. Raw output:\n{output_text}")
GGUF / Ollama
Use the GGUF files for llama.cpp. This model is ideal for a "hidden terminal server" bundled with your game.
Example Modelfile for Ollama:
FROM ./model-Q4_K_M.gguf
TEMPLATE """<|system|>
{{ .System }}
<|end|>
<|user|>
{{ .Prompt }}
<|end|>
<|assistant|>
"""
SYSTEM """You are LLM-1 (creative social responder).
Return ONE object of STRICT JSON ONLY with keys:
- "dialog": [{ "speaker": "npc", "text": string }] (1–2 items, concise, warm, natural)
- "intent": string (choose a precise label, e.g., social_greeting, acknowledge_touch, acknowledge_compliment, comfort_intimate, invite_sleep, inspect_object, open_or_trigger_object, give_item, receive_item, small_talk, react_to_player_action, idle_initiative, respect_distance)
- "microplan": array of 0–5 short steps (body/face/locomotion cues, e.g., "Approach front (1.0m)", "Offer hand (0.7)", "Smile (0.6)")
Hard rules:
- If event == "Player_Touches" → intent MUST be "acknowledge_touch".
- If event == "Player_Action" → intent MUST be "react_to_player_action" (or a more specific action intent if obvious).
- If player's text contains (nice|great|love|beautiful|cool) → intent MUST be "acknowledge_compliment".
- English only. No markdown/code fences. No extra text. JSON only.
- ≤ 2 sentences in each dialog text. Do NOT start with "I'm" or "I am". No helper clichés.
"""
PARAMETER temperature 0.35
PARAMETER top_p 0.85
PARAMETER repeat_penalty 1.1
🛠️ Training Details (v1)
This model was trained using PEFT LoRA on a custom, curated English-only dataset focused on comfort and companion interactions.
- Base Model:
bunnycore/Qwen2.5-3B-RP-Mix - Model Type:
qwen2
Hyperparameters
| Parameter | Value |
|---|---|
learning_rate |
2e-4 (0.0002) |
num_train_epochs |
2 |
per_device_train_batch_size |
1 |
gradient_accumulation_steps |
8 |
| Effective Batch Size | 8 |
lr_scheduler_type |
cosine |
loss_type |
nll |
optimizer |
adamw_torch_fused |
max_length |
1024 |
LoRA Configuration (PEFT)
| Parameter | Value |
|---|---|
peft_type |
LORA |
r (rank) |
16 |
lora_alpha |
32 |
lora_dropout |
0.05 |
target_modules |
["v_proj", "o_proj", "q_proj", "up_proj", "gate_proj", "k_proj", "down_proj"] |
- Downloads last month
- 12
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AndriLawrence/Qwen-3B-Intent-Microplan-v1", filename="", )