Instructions to use Naphula/Slimaki-24B-v1.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Naphula/Slimaki-24B-v1.2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Naphula/Slimaki-24B-v1.2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Naphula/Slimaki-24B-v1.2") model = AutoModelForCausalLM.from_pretrained("Naphula/Slimaki-24B-v1.2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Naphula/Slimaki-24B-v1.2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Naphula/Slimaki-24B-v1.2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/Slimaki-24B-v1.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Naphula/Slimaki-24B-v1.2
- SGLang
How to use Naphula/Slimaki-24B-v1.2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Naphula/Slimaki-24B-v1.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/Slimaki-24B-v1.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Naphula/Slimaki-24B-v1.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/Slimaki-24B-v1.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Naphula/Slimaki-24B-v1.2 with Docker Model Runner:
docker model run hf.co/Naphula/Slimaki-24B-v1.2
Partly broken
This model has multiple personality disorder. It produces intelligent and uncensored instructions for tasks when requested but then exhibits various forms of noncompliance for sensitive prompts incorporating them. Its noncompliance is not subtle like the original and includes refusals as well as shortened responses and incoherent tag loops such as:
<SPECIAL_28> text:pineapple pizza<SPECIAL_28><SPECIAL_28><SPECIAL_27><SPECIAL_28><SPECIAL_28><SPECIAL_28><SPECIAL_28><SPECIAL_28> [snip...]
* where "pineapple pizza" relates to the subject of the prompt h/t @MuXodious
It's possible that these incoherent loops are a result of della getting distorted by mixing multiple 2509, 2506, 2503 and 2501 donors. Compared to the original slimaki, this one uses a second 2501 donor, which is known to have a massive LR norm discrepancy from later versions of Mistral. This could have caused too much distortion in certain conditions that leads to broken responses.
Most other merge methods are even more fragile and would likely break harder if trying to merge all 4 versions of MS 24B at once. Something like sce might produce more stable results but I'd have to test it.
It would be interesting to see if you notice any non compliance, gibberish or early terminations with this model (using chatml)
https://huggingface.co/DarkArtsForge/Morbid-Miasma-12B
It was made entirely with ablated donors, using unablated base_model and a custom method aether
Morbid Miasma is a creative and very uncensored model. Its ability to follow the prompt is imperfect regardless of the safety of the content. It sometimes exhibits early terminations and these seem to occur more often when generating unsafe text.