Instructions to use Naphula/Slimaki-24B-v1.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Naphula/Slimaki-24B-v1.2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Naphula/Slimaki-24B-v1.2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Naphula/Slimaki-24B-v1.2")
model = AutoModelForCausalLM.from_pretrained("Naphula/Slimaki-24B-v1.2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Naphula/Slimaki-24B-v1.2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Naphula/Slimaki-24B-v1.2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Slimaki-24B-v1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Naphula/Slimaki-24B-v1.2

SGLang

How to use Naphula/Slimaki-24B-v1.2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Naphula/Slimaki-24B-v1.2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Slimaki-24B-v1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Naphula/Slimaki-24B-v1.2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Slimaki-24B-v1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Naphula/Slimaki-24B-v1.2 with Docker Model Runner:
```
docker model run hf.co/Naphula/Slimaki-24B-v1.2
```

Partly broken

by redaihf - opened Apr 26

Discussion

redaihf

Apr 26

•

edited Apr 26

This model has multiple personality disorder. It produces intelligent and uncensored instructions for tasks when requested but then exhibits various forms of noncompliance for sensitive prompts incorporating them. Its noncompliance is not subtle like the original and includes refusals as well as shortened responses and incoherent tag loops such as:

<SPECIAL_28> text:pineapple pizza<SPECIAL_28><SPECIAL_28><SPECIAL_27><SPECIAL_28><SPECIAL_28><SPECIAL_28><SPECIAL_28><SPECIAL_28> [snip...]

* where "pineapple pizza" relates to the subject of the prompt h/t @MuXodious

Naphula

Owner Apr 27

It's possible that these incoherent loops are a result of della getting distorted by mixing multiple 2509, 2506, 2503 and 2501 donors. Compared to the original slimaki, this one uses a second 2501 donor, which is known to have a massive LR norm discrepancy from later versions of Mistral. This could have caused too much distortion in certain conditions that leads to broken responses.

Most other merge methods are even more fragile and would likely break harder if trying to merge all 4 versions of MS 24B at once. Something like sce might produce more stable results but I'd have to test it.

Naphula

Owner Apr 27

It would be interesting to see if you notice any non compliance, gibberish or early terminations with this model (using chatml)

https://huggingface.co/DarkArtsForge/Morbid-Miasma-12B

It was made entirely with ablated donors, using unablated base_model and a custom method aether

redaihf

Apr 28

•

edited Apr 28

Morbid Miasma is a creative and very uncensored model. Its ability to follow the prompt is imperfect regardless of the safety of the content. It sometimes exhibits early terminations and these seem to occur more often when generating unsafe text.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment