Instructions to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AndriLawrence/Qwen-3B-Intent-Microplan-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AndriLawrence/Qwen-3B-Intent-Microplan-v1")
model = AutoModelForCausalLM.from_pretrained("AndriLawrence/Qwen-3B-Intent-Microplan-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="AndriLawrence/Qwen-3B-Intent-Microplan-v1",
	filename="gguf/merged.Q3_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
# Run inference directly in the terminal:
llama-cli -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
# Run inference directly in the terminal:
llama-cli -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
# Run inference directly in the terminal:
./llama-cli -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M

Use Docker

docker model run hf.co/AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M

LM Studio
Jan

vLLM

How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AndriLawrence/Qwen-3B-Intent-Microplan-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AndriLawrence/Qwen-3B-Intent-Microplan-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M

SGLang

How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AndriLawrence/Qwen-3B-Intent-Microplan-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AndriLawrence/Qwen-3B-Intent-Microplan-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AndriLawrence/Qwen-3B-Intent-Microplan-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AndriLawrence/Qwen-3B-Intent-Microplan-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Ollama:
```
ollama run hf.co/AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
```

Unsloth Studio new

How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AndriLawrence/Qwen-3B-Intent-Microplan-v1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AndriLawrence/Qwen-3B-Intent-Microplan-v1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AndriLawrence/Qwen-3B-Intent-Microplan-v1 to start chatting

Pi new

How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M

Run Hermes

hermes

Docker Model Runner
How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Docker Model Runner:
```
docker model run hf.co/AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M
```

Lemonade

How to use AndriLawrence/Qwen-3B-Intent-Microplan-v1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull AndriLawrence/Qwen-3B-Intent-Microplan-v1:Q3_K_M

Run and chat with the model

lemonade run user.Qwen-3B-Intent-Microplan-v1-Q3_K_M

List all available models

lemonade list

Qwen-3B-Intent-Microplan-v1

⚠️ Deprecated / Archived Model

👉 Looking for the maintained version?
Use Qwen-3B-Intent-Microplan-v2 instead:
➡️ https://huggingface.co/AndriLawrence/Qwen-3B-Intent-Microplan-v2

This is a supervised fine-tune (SFT) of bunnycore/Qwen2.5-3B-RP-Mix (3B) designed to serve as a local-first, real-time game NPC brain.

This v1 model is the first release built on the "Intent-Microplan Framework": a structured-output approach that separates an NPC's high-level social/strategic goals (intent) from their low-level physical execution steps (microplan).

The model is designed for companion, dating-sim, or comfort-aware NPC use cases, outputting strict, engine-parsable JSON for dialogue and action.

⚠️ V1 Status: Deprecated (Failure Analysis)

This V1 model is considered a failure and is deprecated. Do not use this for production.

The primary issue stems from the choice of the base model, bunnycore/Qwen2.5-3B-RP-Mix. This "Roleplay (RP) merge" proved highly resistant to strict JSON schema enforcement. It consistently attempts to break out of the JSON format to produce creative, non-structured text, which defeats the purpose of the Intent-Microplan framework.

This model is superseded by V2, which uses a non-RP, foundational base model that is better suited for structured data output:

➡️ https://huggingface.co/AndriLawrence/Qwen-3B-Intent-Microplan-v2

The original documentation below is preserved for archival purposes of the V1 attempt.

🎯 The Intent-Microplan Framework

This model's purpose is to act as a dynamic behavior tree generator. Instead of just talking, it creates a plan.

intent (The "What"): The strategic goal or social understanding. This is the "why" behind the action (e.g., comfort_intimate, acknowledge_compliment).
microplan (The "How"): The list of physical, engine-agnostic steps to achieve the intent. Your game engine's C# (or C++) script is responsible for parsing this array and executing the functions (e.g., triggering animations, moving the NavMeshAgent).

This architecture allows for:

Emergent Behavior: NPCs can dynamically generate plans based on context.
Low Latency: Only one small, local LLM call is needed per interaction.

📦 Model Artifacts

merged/: FP16 Transformers weights (LoRA merged) for direct use.
adapter/: The LoRA adapter (PEFT) for continued SFT or experimentation.
gguf/: Quantized GGUF files (e.g., Q4_K_M) for llama.cpp, ideal for local in-game deployment (Unity, Unreal).

💬 Prompting & JSON Schema

The model is trained to respond to a context block and output ONLY a raw JSON object.

JSON Schema

Key	Type	Description
`dialog`	`array`	A list of 1-2 dialogue objects (`{"speaker": "npc", "text": "..."}`).
`intent`	`string`	The single, precise strategic goal selected by the model.
`microplan`	`array`	An array of 0-5 string commands for the game engine.

Supported Intents (v1): social_greeting, acknowledge_touch, acknowledge_compliment, comfort_intimate, invite_sleep, inspect_object, open_or_trigger_object, give_item, receive_item, small_talk, react_to_player_action, idle_initiative, respect_distance

⚡ Recommended Prompt

This "hardened" prompt includes rules that mirror the dataset's logic, ensuring high stability and intent accuracy.


You are LLM-1 (creative social responder).
Return ONE object of STRICT JSON ONLY with keys:

  - "dialog": [{ "speaker": "npc", "text": string }] (1–2 items, concise, warm, natural)
  - "intent": string (choose a precise label, e.g., social_greeting, acknowledge_touch, acknowledge_compliment, comfort_intimate, invite_sleep, inspect_object, open_or_trigger_object, give_item, receive_item, small_talk, react_to_player_action, idle_initiative, respect_distance)
  - "microplan": array of 0–5 short steps (body/face/locomotion cues, e.g., "Approach front (1.0m)", "Offer hand (0.7)", "Smile (0.6)")

Hard rules:

  - If event == "Player_Touches" → intent MUST be "acknowledge_touch".
  - If event == "Player_Action"  → intent MUST be "react_to_player_action" (or a more specific action intent if obvious).
  - If player's text contains (nice|great|love|beautiful|cool) → intent MUST be "acknowledge_compliment".
  - English only. No markdown/code fences. No extra text. JSON only.
  - ≤ 2 sentences in each dialog text. Do NOT start with "I'm" or "I am". No helper clichés.

NOW RESPOND TO THIS CONTEXT:
{CONTEXT_JSON}
OUTPUT:

Replace {CONTEXT_JSON} with your game's state payload.

🚀 How to Use

Transformers (Python)

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import json

MODEL_ID = "AndriLawrence/Qwen-3B-Intent-Microplan-v1"

tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
mdl = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True
)

# This system prompt is CRITICAL for schema adherence
system_prompt = (
    'You are LLM-1 (creative social responder).\n'
    'Return ONE object of STRICT JSON ONLY with keys:\n'
    '- "dialog": [{ "speaker": "npc", "text": string }] (1–2 items, concise, warm, natural)\n'
    '- "intent": string (choose a precise label, e.g., social_greeting, acknowledge_touch, acknowledge_compliment, comfort_intimate, invite_sleep, inspect_object, open_or_trigger_object, give_item, receive_item, small_talk, react_to_player_action, idle_initiative, respect_distance)\n'
    '- "microplan": array of 0–5 short steps (body/face/locomotion cues, e.g., "Approach front (1.0m)", "Offer hand (0.7)", "Smile (0.6)")\n\n'
    'Hard rules:\n'
    '- If event == "Player_Touches" → intent MUST be "acknowledge_touch".\n'
    '- If event == "Player_Action"  → intent MUST be "react_to_player_action" (or a more specific action intent if obvious).\n'
    '- If player\'s text contains (nice|great|love|beautiful|cool) → intent MUST be "acknowledge_compliment".\n'
    '- English only. No markdown/code fences. No extra text. JSON only.\n'
    '- ≤ 2 sentences in each dialog text. Do NOT start with "I\'m" or "I am". No helper clichés.'
)

# Example: Player holds out their hand
game_context = {
    "event": "Player_Action",
    "action": "offer_hand",
    "environment": {"location": "Living Room", "distance": "1.5m"}
}

msgs = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": f"NOW RESPOND TO THIS CONTEXT:\n{json.dumps(game_context)}\nOUTPUT:"}
]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=False) # We add the 'OUTPUT:' manually

pipe = pipeline("text-generation", model=mdl, tokenizer=tok)
gen_out = pipe(
    prompt,
    do_sample=True,
    temperature=0.35, # BALANCED preset
    top_p=0.85,
    repetition_penalty=1.1,
    max_new_tokens=256,
    pad_token_id=tok.eos_token_id
)[0]["generated_text"]

# Clean the output and parse
try:
    output_text = gen_out.split("OUTPUT:")[-1].strip()
    parsed_json = json.loads(output_text)
    print(json.dumps(parsed_json, indent=2))

except json.JSONDecodeError:
    print(f"FAILED TO PARSE JSON. Raw output:\n{output_text}")

GGUF / Ollama

Use the GGUF files for llama.cpp. This model is ideal for a "hidden terminal server" bundled with your game.

Example Modelfile for Ollama:

FROM ./model-Q4_K_M.gguf

TEMPLATE """<|system|>
{{ .System }}
<|end|>
<|user|>
{{ .Prompt }}
<|end|>
<|assistant|>
"""

SYSTEM """You are LLM-1 (creative social responder).
Return ONE object of STRICT JSON ONLY with keys:
- "dialog": [{ "speaker": "npc", "text": string }] (1–2 items, concise, warm, natural)
- "intent": string (choose a precise label, e.g., social_greeting, acknowledge_touch, acknowledge_compliment, comfort_intimate, invite_sleep, inspect_object, open_or_trigger_object, give_item, receive_item, small_talk, react_to_player_action, idle_initiative, respect_distance)
- "microplan": array of 0–5 short steps (body/face/locomotion cues, e.g., "Approach front (1.0m)", "Offer hand (0.7)", "Smile (0.6)")

Hard rules:
- If event == "Player_Touches" → intent MUST be "acknowledge_touch".
- If event == "Player_Action"  → intent MUST be "react_to_player_action" (or a more specific action intent if obvious).
- If player's text contains (nice|great|love|beautiful|cool) → intent MUST be "acknowledge_compliment".
- English only. No markdown/code fences. No extra text. JSON only.
- ≤ 2 sentences in each dialog text. Do NOT start with "I'm" or "I am". No helper clichés.
"""

PARAMETER temperature 0.35
PARAMETER top_p 0.85
PARAMETER repeat_penalty 1.1

🛠️ Training Details (v1)

This model was trained using PEFT LoRA on a custom, curated English-only dataset focused on comfort and companion interactions.

Base Model: bunnycore/Qwen2.5-3B-RP-Mix
Model Type: qwen2

Hyperparameters

Parameter	Value
`learning_rate`	2e-4 (0.0002)
`num_train_epochs`	2
`per_device_train_batch_size`	1
`gradient_accumulation_steps`	8
Effective Batch Size	8
`lr_scheduler_type`	`cosine`
`loss_type`	`nll`
`optimizer`	`adamw_torch_fused`
`max_length`	1024

LoRA Configuration (PEFT)

Parameter	Value
`peft_type`	`LORA`
`r` (rank)	16
`lora_alpha`	32
`lora_dropout`	0.05
`target_modules`	`["v_proj", "o_proj", "q_proj", "up_proj", "gate_proj", "k_proj", "down_proj"]`

Downloads last month: 12

Safetensors

Model size

3B params

Tensor type

F16

Model tree for AndriLawrence/Qwen-3B-Intent-Microplan-v1

Base model

bunnycore/Qwen2.5-3B-RP-Mix

Adapter

(1)

this model

Adapters

1 model