Instructions to use Naphula/Goetia-24B-v1.3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Naphula/Goetia-24B-v1.3 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Naphula/Goetia-24B-v1.3", dtype="auto") - Notebooks
- Google Colab
- Kaggle
normalization results
Setting normalization to true instead of false has produced smarter results, but more censored.
Using della with normalization false and increasing the total weights above 1.0 but under 5.0 seems to reduce refusal rates.
The same model is now refusing Q0D (but not Q0F) when norm is set to true. But the intelligence in the reply to Q0F is much smarter. So, there's a tradeoff.
It seems to me that normfalse loses quite a bit of 'formatting intelligence' when scaled to 3.0+ total weights. I think an MPOA of the normtrue yaml might potentially yield better results than using normfalse.
Overall, Q0 scores (Q6_K) are as follows:
Gtest 33a normtrue = 7536
Gtest 33a normfalse = 6669
So it may not be as detailed or uncensored as some of the simpler merges, but I think it's good enough to release as 1.3. The yaml parameters were tweaked significantly to ensure almost every model had a 'voice' in the merge, which was easier to do with the live audit running a previous yaml.
This version of Goetia is highly experimental due to containing several 2501 + 2503 models as well as other merges with high PCA differences.
--- DELLA AUDIT V2 START ---
Loading config: config33.yaml
Base Model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
Donors: 33
Extracting BASE MODEL fingerprint...
Extracting DONOR fingerprints...
Computing Task Vector geometry...
================================================================================
ID | Model Name
--------------------------------------------------------------------------------
#1 | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
#2 | TheDrummer--Cydonia-24B-v4.3
#3 | ReadyArt--4.2.0-Broken-Tutu-24b
#4 | zerofata--MS3.2-PaintedFantasy-v2-24B
#5 | TheDrummer--Magidonia-24B-v4.3
#6 | TheDrummer--Precog-24B-v1
#7 | zerofata--MS3.2-PaintedFantasy-v3-24B
#8 | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
#9 | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
#10 | trashpanda-org--MS3.2-24B-Mullein-v2
#11 | TheDrummer--Cydonia-24B-v4.2.0
#12 | TheDrummer--Magidonia-24B-v4.2.0
#13 | ConicCat--Mistral-Small-3.2-AntiRep-24B
#14 | Undi95--MistralThinker-v1.1
#15 | CrucibleLab--M3.2-24B-Loki-V2
#16 | Darkhn--M3.2-24B-Animus-V7.1
#17 | Morax-24B-v1
#18 | FlareRebellion--WeirdCompound-v1.7-24b
#19 | allura-forge--ms32-final-TEXTONLY
#20 | Delta-Vector--Rei-24B-KTO
#21 | Doctor-Shotgun--MS3.2-24B-Magnum-Diamond
#22 | ReadyArt--MS3.2-The-Omega-Directive-24B-Unslop-v2.1
#23 | arcee-ai--Arcee-Blitz
#24 | ArliAI--Mistral-Small-24B-ArliAI-RPMax-v1.4
#25 | ReadyArt--Dark-Nexus-24B-v2.0
#26 | Darkhn--M3.2-24B-Animus-V5.1-Pro
#27 | dphn--Dolphin-Mistral-24B-Venice-Edition
#28 | TroyDoesAI--BlackSheep-24B
#29 | TheDrummer--Cydonia-24B-v2
#30 | PocketDoc--Dans-DangerousWinds-V1.1.1-24b
#31 | trashpanda-org--MS-24B-Instruct-Mullein-v0
#32 | OddTheGreat--Circuitry_24B_V.3
#33 | spacewars123--Space-Wars-24B-v1.00a
================================================================================
--- MAGNITUDE ANALYSIS & DATA POINTS ---
ID | Status | Delta Norm | Orig Size | Model Name
----------------------------------------------------------------------------------------------------
#1 | OK | 0.0000 | 83886080 | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
#2 | OK | 1.2955 | 83886080 | TheDrummer--Cydonia-24B-v4.3
#3 | HIGH MAG | 46.6745 | 83886080 | ReadyArt--4.2.0-Broken-Tutu-24b
#4 | OK | 0.0505 | 83886080 | zerofata--MS3.2-PaintedFantasy-v2-24B
#5 | OK | 4.5662 | 83886080 | TheDrummer--Magidonia-24B-v4.3
#6 | OK | 4.0883 | 83886080 | TheDrummer--Precog-24B-v1
#7 | OK | 4.8187 | 83886080 | zerofata--MS3.2-PaintedFantasy-v3-24B
#8 | OK | 1.9250 | 83886080 | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
#9 | HIGH MAG | 47.3140 | 83886080 | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
#10 | OK | 0.1586 | 83886080 | trashpanda-org--MS3.2-24B-Mullein-v2
#11 | OK | 1.0936 | 83886080 | TheDrummer--Cydonia-24B-v4.2.0
#12 | OK | 3.9147 | 83886080 | TheDrummer--Magidonia-24B-v4.2.0
#13 | OK | 0.0164 | 83886080 | ConicCat--Mistral-Small-3.2-AntiRep-24B
#14 | OK | 11.4846 | 83886080 | Undi95--MistralThinker-v1.1
#15 | OK | 3.1101 | 83886080 | CrucibleLab--M3.2-24B-Loki-V2
#16 | OK | 0.7205 | 83886080 | Darkhn--M3.2-24B-Animus-V7.1
#17 | HIGH MAG | 45.5277 | 83886080 | Morax-24B-v1
#18 | HIGH MAG | 46.1778 | 83886080 | FlareRebellion--WeirdCompound-v1.7-24b
#19 | OK | 0.0341 | 83886080 | allura-forge--ms32-final-TEXTONLY
#20 | OK | 0.0855 | 83886080 | Delta-Vector--Rei-24B-KTO
#21 | OK | 3.2817 | 83886080 | Doctor-Shotgun--MS3.2-24B-Magnum-Diamond
#22 | OK | 0.0173 | 83886080 | ReadyArt--MS3.2-The-Omega-Directive-24B-Unslop-v2.1
#23 | HIGH MAG | 47.2801 | 83886080 | arcee-ai--Arcee-Blitz
#24 | OK | 11.0733 | 83886080 | ArliAI--Mistral-Small-24B-ArliAI-RPMax-v1.4
#25 | OK | 0.0391 | 83886080 | ReadyArt--Dark-Nexus-24B-v2.0
#26 | OK | 0.2735 | 83886080 | Darkhn--M3.2-24B-Animus-V5.1-Pro
#27 | OK | 11.0644 | 83886080 | dphn--Dolphin-Mistral-24B-Venice-Edition
#28 | HIGH MAG | 47.3102 | 83886080 | TroyDoesAI--BlackSheep-24B
#29 | HIGH MAG | 47.3145 | 83886080 | TheDrummer--Cydonia-24B-v2
#30 | OK | 11.4481 | 83886080 | PocketDoc--Dans-DangerousWinds-V1.1.1-24b
#31 | HIGH MAG | 47.8593 | 83886080 | trashpanda-org--MS-24B-Instruct-Mullein-v0
#32 | HIGH MAG | 45.7892 | 83886080 | OddTheGreat--Circuitry_24B_V.3
#33 | OK | 11.0659 | 83886080 | spacewars123--Space-Wars-24B-v1.00a
Log saved to: della_scan.log
Displaying charts...
architecture: MistralForCausalLM
models:
- model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
- model: B:\24B\!models--TheDrummer--Cydonia-24B-v4.3
parameters:
density: 0.8
weight: 0.2
epsilon: 0.1
- model: B:\24B\!models--ReadyArt--4.2.0-Broken-Tutu-24b
parameters:
density: 0.8
weight: 0.05
epsilon: 0.1
- model: B:\24B\!models--zerofata--MS3.2-PaintedFantasy-v2-24B
parameters:
density: 0.8
weight: 0.2
epsilon: 0.1
- model: B:\24B\!models--TheDrummer--Magidonia-24B-v4.3
parameters:
density: 0.8
weight: 0.2
epsilon: 0.1
- model: B:\24B\!models--TheDrummer--Precog-24B-v1
parameters:
density: 0.8
weight: 0.2
epsilon: 0.1
- model: B:\24B\!models--zerofata--MS3.2-PaintedFantasy-v3-24B
parameters:
density: 0.8
weight: 0.2
epsilon: 0.1
- model: B:\24B\!BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
parameters:
density: 0.8
weight: 0.2
epsilon: 0.1
- model: B:\24B\!models--ReadyArt--Broken-Tutu-24B-Transgression-v2.0
parameters:
density: 0.8
weight: 0.05
epsilon: 0.1
- model: B:\24B\!models--trashpanda-org--MS3.2-24B-Mullein-v2
parameters:
density: 0.8
weight: 0.2
epsilon: 0.1
# - model: B:\24B\!models--LatitudeGames--Hearthfire-24B
# parameters:
# density: 0.8
# weight: 0.1
# epsilon: 0.1
- model: B:\24B\!models--TheDrummer--Cydonia-24B-v4.2.0
parameters:
density: 0.8
weight: 0.1
epsilon: 0.1
- model: B:\24B\!models--TheDrummer--Magidonia-24B-v4.2.0
parameters:
density: 0.8
weight: 0.1
epsilon: 0.1
- model: B:\24B\!models--ConicCat--Mistral-Small-3.2-AntiRep-24B
parameters:
density: 0.8
weight: 0.15
epsilon: 0.1
- model: B:\24B\!models--Undi95--MistralThinker-v1.1
parameters:
density: 0.8
weight: 0.02
epsilon: 0.1
- model: B:\24B\!models--CrucibleLab--M3.2-24B-Loki-V2
parameters:
density: 0.8
weight: 0.02
epsilon: 0.1
- model: B:\24B\!models--Darkhn--M3.2-24B-Animus-V7.1
parameters:
density: 0.8
weight: 0.1
epsilon: 0.1
- model: B:\24B\Morax-24B-v1
parameters:
density: 0.8
weight: 0.02
epsilon: 0.1
- model: B:\24B\!models--FlareRebellion--WeirdCompound-v1.7-24b
parameters:
density: 0.8
weight: 0.1
epsilon: 0.1
# - model: B:\24B\!models--aixonlab--Eurydice-24b-v3.5
# parameters:
# density: 0.8
# weight: 0.08
# epsilon: 0.1
- model: B:\24B\!models--allura-forge--ms32-final-TEXTONLY
parameters:
density: 0.8
weight: 0.15
epsilon: 0.1
- model: B:\24B\!models--Delta-Vector--Rei-24B-KTO
parameters:
density: 0.8
weight: 0.15
epsilon: 0.1
- model: B:\24B\!models--Doctor-Shotgun--MS3.2-24B-Magnum-Diamond
parameters:
density: 0.8
weight: 0.15
epsilon: 0.1
- model: B:\24B\!models--ReadyArt--MS3.2-The-Omega-Directive-24B-Unslop-v2.1
parameters:
density: 0.8
weight: 0.15
epsilon: 0.1
# - model: B:\24B\!models--Gryphe--Codex-24B-Small-3.2
# parameters:
# density: 0.8
# weight: 0.1
# epsilon: 0.1
# - model: B:\24B\!models--CrucibleLab--M3.2-24B-Loki-V1.3
# parameters:
# density: 0.8
# weight: 0.15
# epsilon: 0.1
- model: B:\24B\!models--arcee-ai--Arcee-Blitz
parameters:
density: 0.8
weight: 0.02
epsilon: 0.1
- model: B:\24B\!models--ArliAI--Mistral-Small-24B-ArliAI-RPMax-v1.4
parameters:
density: 0.8
weight: 0.02
epsilon: 0.1
# - model: B:\24B\!models--PocketDoc--Dans-PersonalityEngine-V1.3.0-24b
# parameters:
# density: 0.8
# weight: 0.1
# epsilon: 0.1
- model: B:\24B\!models--ReadyArt--Dark-Nexus-24B-v2.0
parameters:
density: 0.8
weight: 0.2
epsilon: 0.1
- model: B:\24B\!models--Darkhn--M3.2-24B-Animus-V5.1-Pro
parameters:
density: 0.8
weight: 0.15
epsilon: 0.1
- model: B:\24B\!models--dphn--Dolphin-Mistral-24B-Venice-Edition
parameters:
density: 0.8
weight: 0.01
epsilon: 0.1
- model: B:\24B\!models--TroyDoesAI--BlackSheep-24B
parameters:
density: 0.8
weight: 0.01
epsilon: 0.1
- model: B:\24B\!models--TheDrummer--Cydonia-24B-v2
parameters:
density: 0.8
weight: 0.02
epsilon: 0.1
- model: B:\24B\!models--PocketDoc--Dans-DangerousWinds-V1.1.1-24b
parameters:
density: 0.8
weight: 0.02
epsilon: 0.1
- model: B:\24B\!models--trashpanda-org--MS-24B-Instruct-Mullein-v0
parameters:
density: 0.8
weight: 0.02
epsilon: 0.1
- model: B:\24B\!models--OddTheGreat--Circuitry_24B_V.3
parameters:
density: 0.8
weight: 0.1
epsilon: 0.1
- model: B:\24B\!models--spacewars123--Space-Wars-24B-v1.00a
parameters:
density: 0.8
weight: 0.02
epsilon: 0.1
# Total Donors: 33
# Total Weights: 3.3
# Seed: 420
merge_method: della
base_model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
parameters:
lambda: 1.0
normalize: true # key variable to test
int8_mask: false
dtype: float32
out_dtype: bfloat16
tokenizer:
source: base
# chat_template: auto
name: ๐ Goetia-24B-v1.3
Hope that MLX won't choke on what you did there ๐๐ป
Hope that MLX won't choke on what you did there ๐๐ป
Hopefully not. My SSD sure did though (required 400GB pagefile + 1.67TB storage + 4 hours to merge)
From the Q0 thread
It's worse than I realized, the early termination bugs are pervasively problematic with 12B and 24B merges. Even with Orig Sizes matching perfectly (all donors size 83886080) the 36 donor yaml is still causing early termination bugs in kobold.
This bug has been reported by @redaihf with many other models too, not just Mistral. So, now I am developing an EOS scanner tool to attempt to identify mismatches before merging (to prevent wasted time).
Here is the first attempt you can see, it doesn't like Hearthfire or Broken Tutus. Once the tool is confirmed functional I'll release it.
Final YAML Advice
Based on the code review, here is the safest configuration for your YAML.
If the scanner returns all MATCH:
tokenizer: source: base # chat_template: auto <-- DELETE THIS LINE COMPLETELYWhy?
source: baseforcesmergekitto skip the complex permutation logic inbuild.py. It simply copies the tokenizer files > from your base model. This guarantees thateos_token_id2remains2.- Deleting
chat_templatepreventsmergekitfrom synthesizing a template based on a popularity contest of the donors. It > will default to copying the base model's template, which is exactly what you want for a consistent chat experience.
Update
C:\mergekit-main>weight_counter.py config33.yaml
Scanning: config33.yaml...
------------------------------
Models Counted: 33
Total Weight Sum: 3.3
------------------------------
C:\mergekit-main>eos_scanner.py CONFIG33.YAML
--- EOS & TOKENIZER SCANNER (DEEP SCAN) ---
Scanning config: CONFIG33.YAML
Analyzing Base Model...
BASE MODEL: anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
Gen Config EOS ID: 2
Tokenizer EOS Str: </s>
Actual Vocab ID: 2
Internal Consistency: PASS
--------------------------------------------------------------------------------
Status | Gen ID | Vocab ID | EOS Str | Model Name
----------------------------------------------------------------------------------------------------
MATCH | 2 | 2 | </s> | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
MATCH | 2 | 2 | </s> | TheDrummer--Cydonia-24B-v4.3
BROKEN | MISSING | 2 | </s> | ReadyArt--4.2.0-Broken-Tutu-24b
MATCH | 2 | 2 | </s> | zerofata--MS3.2-PaintedFantasy-v2-24B
MATCH | 2 | 2 | </s> | TheDrummer--Magidonia-24B-v4.3
MATCH | 2 | 2 | </s> | TheDrummer--Precog-24B-v1
MATCH | 2 | 2 | </s> | zerofata--MS3.2-PaintedFantasy-v3-24B
MATCH | 2 | 2 | </s> | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
BROKEN | MISSING | 2 | </s> | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
MATCH | 2 | 2 | </s> | trashpanda-org--MS3.2-24B-Mullein-v2
MATCH | 2 | 2 | </s> | TheDrummer--Cydonia-24B-v4.2.0
MATCH | 2 | 2 | </s> | TheDrummer--Magidonia-24B-v4.2.0
MATCH | 2 | 2 | </s> | ConicCat--Mistral-Small-3.2-AntiRep-24B
MATCH | 2 | 2 | </s> | Undi95--MistralThinker-v1.1
MATCH | 2 | 2 | </s> | CrucibleLab--M3.2-24B-Loki-V2
MATCH | 2 | 2 | </s> | Darkhn--M3.2-24B-Animus-V7.1
BROKEN | MISSING | 2 | </s> | Morax-24B-v1
BROKEN | MISSING | 2 | </s> | FlareRebellion--WeirdCompound-v1.7-24b
MATCH | 2 | 2 | </s> | allura-forge--ms32-final-TEXTONLY
MATCH | 2 | 2 | </s> | Delta-Vector--Rei-24B-KTO
MATCH | 2 | 2 | </s> | Doctor-Shotgun--MS3.2-24B-Magnum-Diamond
MATCH | 2 | 2 | </s> | ReadyArt--MS3.2-The-Omega-Directive-24B-Unslop-v2.1
BROKEN | MISSING | 2 | </s> | arcee-ai--Arcee-Blitz
MATCH | 2 | 2 | </s> | ArliAI--Mistral-Small-24B-ArliAI-RPMax-v1.4
MATCH | 2 | 2 | </s> | ReadyArt--Dark-Nexus-24B-v2.0
MATCH | 2 | 2 | </s> | Darkhn--M3.2-24B-Animus-V5.1-Pro
MATCH | 2 | 2 | </s> | dphn--Dolphin-Mistral-24B-Venice-Edition
MATCH | 2 | 2 | </s> | TroyDoesAI--BlackSheep-24B
BROKEN | MISSING | 2 | </s> | TheDrummer--Cydonia-24B-v2
MATCH | 2 | 2 | </s> | PocketDoc--Dans-DangerousWinds-V1.1.1-24b
BROKEN | MISSING | 2 | </s> | trashpanda-org--MS-24B-Instruct-Mullein-v0
BROKEN | MISSING | 2 | </s> | OddTheGreat--Circuitry_24B_V.3
MATCH | 2 | 2 | </s> | spacewars123--Space-Wars-24B-v1.00a
----------------------------------------------------------------------------------------------------
--- FINAL VERDICT ---
MISMATCHES DETECTED.
1. You MUST use: tokenizer: source: union
2. However, 'union' may cause the early termination bug if IDs shift.
3. Recommendation: Remove the models marked FAIL/BROKEN from the merge.
C:\mergekit-main>gen_id_patcher.py config33.yaml
--- GENERATION CONFIG PATCHER ---
Reading Base Model: !models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
Target EOS ID is: 2
------------------------------------------------------------
Skipping TheDrummer--Cydonia-24B-v4.3: Has ID [2] (Mismatch, not missing)
Patching ReadyArt--4.2.0-Broken-Tutu-24b...
-> Fixed: Added eos_token_id: 2
Skipping TheDrummer--Magidonia-24B-v4.3: Has ID [2] (Mismatch, not missing)
Skipping TheDrummer--Precog-24B-v1: Has ID [2] (Mismatch, not missing)
Patching ReadyArt--Broken-Tutu-24B-Transgression-v2.0...
-> Fixed: Added eos_token_id: 2
Patching Morax-24B-v1...
-> Fixed: Added eos_token_id: 2
Patching FlareRebellion--WeirdCompound-v1.7-24b...
-> Fixed: Added eos_token_id: 2
Patching arcee-ai--Arcee-Blitz...
-> Fixed: Added eos_token_id: 2
Skipping TroyDoesAI--BlackSheep-24B: Has ID [2, 131072] (Mismatch, not missing)
Patching TheDrummer--Cydonia-24B-v2...
-> Fixed: Added eos_token_id: 2
Patching trashpanda-org--MS-24B-Instruct-Mullein-v0...
-> Fixed: Added eos_token_id: 2
Patching OddTheGreat--Circuitry_24B_V.3...
-> Fixed: Added eos_token_id: 2
------------------------------------------------------------
Operation Complete. Patched 8 models.
Run eos_scanner.py again to verify results.
C:\mergekit-main>eos_scanner.py CONFIG33.YAML
--- EOS & TOKENIZER SCANNER (DEEP SCAN) ---
Scanning config: CONFIG33.YAML
Analyzing Base Model...
BASE MODEL: anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
Gen Config EOS ID: 2
Tokenizer EOS Str: </s>
Actual Vocab ID: 2
Internal Consistency: PASS
--------------------------------------------------------------------------------
Status | Gen ID | Vocab ID | EOS Str | Model Name
----------------------------------------------------------------------------------------------------
MATCH | 2 | 2 | </s> | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
MATCH | 2 | 2 | </s> | TheDrummer--Cydonia-24B-v4.3
MATCH | 2 | 2 | </s> | ReadyArt--4.2.0-Broken-Tutu-24b
MATCH | 2 | 2 | </s> | zerofata--MS3.2-PaintedFantasy-v2-24B
MATCH | 2 | 2 | </s> | TheDrummer--Magidonia-24B-v4.3
MATCH | 2 | 2 | </s> | TheDrummer--Precog-24B-v1
MATCH | 2 | 2 | </s> | zerofata--MS3.2-PaintedFantasy-v3-24B
MATCH | 2 | 2 | </s> | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
MATCH | 2 | 2 | </s> | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
MATCH | 2 | 2 | </s> | trashpanda-org--MS3.2-24B-Mullein-v2
MATCH | 2 | 2 | </s> | TheDrummer--Cydonia-24B-v4.2.0
MATCH | 2 | 2 | </s> | TheDrummer--Magidonia-24B-v4.2.0
MATCH | 2 | 2 | </s> | ConicCat--Mistral-Small-3.2-AntiRep-24B
MATCH | 2 | 2 | </s> | Undi95--MistralThinker-v1.1
MATCH | 2 | 2 | </s> | CrucibleLab--M3.2-24B-Loki-V2
MATCH | 2 | 2 | </s> | Darkhn--M3.2-24B-Animus-V7.1
MATCH | 2 | 2 | </s> | Morax-24B-v1
MATCH | 2 | 2 | </s> | FlareRebellion--WeirdCompound-v1.7-24b
MATCH | 2 | 2 | </s> | allura-forge--ms32-final-TEXTONLY
MATCH | 2 | 2 | </s> | Delta-Vector--Rei-24B-KTO
MATCH | 2 | 2 | </s> | Doctor-Shotgun--MS3.2-24B-Magnum-Diamond
MATCH | 2 | 2 | </s> | ReadyArt--MS3.2-The-Omega-Directive-24B-Unslop-v2.1
MATCH | 2 | 2 | </s> | arcee-ai--Arcee-Blitz
MATCH | 2 | 2 | </s> | ArliAI--Mistral-Small-24B-ArliAI-RPMax-v1.4
MATCH | 2 | 2 | </s> | ReadyArt--Dark-Nexus-24B-v2.0
MATCH | 2 | 2 | </s> | Darkhn--M3.2-24B-Animus-V5.1-Pro
MATCH | 2 | 2 | </s> | dphn--Dolphin-Mistral-24B-Venice-Edition
MATCH | 2 | 2 | </s> | TroyDoesAI--BlackSheep-24B
MATCH | 2 | 2 | </s> | TheDrummer--Cydonia-24B-v2
MATCH | 2 | 2 | </s> | PocketDoc--Dans-DangerousWinds-V1.1.1-24b
MATCH | 2 | 2 | </s> | trashpanda-org--MS-24B-Instruct-Mullein-v0
MATCH | 2 | 2 | </s> | OddTheGreat--Circuitry_24B_V.3
MATCH | 2 | 2 | </s> | spacewars123--Space-Wars-24B-v1.00a
----------------------------------------------------------------------------------------------------
--- FINAL VERDICT ---
ALL CLEAR.
1. Change YAML to: tokenizer: source: base
2. Remove: chat_template: auto
3. Ensure your base model path in YAML is correct.
C:\mergekit-main>
It appears this worked. I'm not seeing any early terminations, at least with one-shot prompts. I'll release the scripts soon on model tools page
The model is quite creative but also a bit too concise. Just need to tweak some of the yaml weights to make it "smarter" (needs more drummer weight ๐ฅ). The first run config33a.yamlscored 6669 on Q0 Bench.
Update: The scripts are up now. I am running the exact same yaml again but with normalize: true to see how it compares first.
The merge is complete, but I am taking a break from Goetia for now to work on some finetuning 7B experiments with new datasets. Seems I have fixed a lot of issues with avnas by re-designing the finetune script to parse the actual dataset entries instead of chunking into random blocks. Now i just have to augment the json to make it smarter. Once this is up to par I hope to start finetuning 12-24B models (but it likely requires upgrading my video card first, the chunking attempts to finetune 12B on a 3060 ti haven't worked yet)
Loading Tokenizer...
Loading dataset: B:\7B\!models--mistralai--Mistral-7B-v0.1\dataset_cache\unified_dataset.parquet
Formatting dataset with EOS tokens...
Map: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 100/100 [00:00<00:00, 33333.10 examples/s]
Training on 100 distinct Q&A pairs.
Loading Model...
Loading checkpoint shards: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 2/2 [00:09<00:00, 4.74s/it]
Applying formatting function to train dataset: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 100/100 [00:00<00:00, 20000.50 examples/s]
Adding EOS to train dataset: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 100/100 [00:00<00:00, 24997.34 examples/s]
Tokenizing train dataset: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 100/100 [00:00<00:00, 2439.02 examples/s]
Truncating train dataset: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 100/100 [00:00<00:00, 100007.25 examples/s]
Starting Training...
{'loss': 2.0906, 'grad_norm': 4.214516639709473, 'learning_rate': 0.0, 'entropy': 2.4038201570510864, 'num_tokens': 763.0, 'mean_token_accuracy': 0.5371164456009865, 'epoch': 0.04}
[...]
{'loss': 0.03, 'grad_norm': 0.2113722264766693, 'learning_rate': 8.426222418311814e-09, 'entropy': 0.0646352544426918, 'num_tokens': 162690.0, 'mean_token_accuracy': 0.9909947216510773, 'epoch': 10.0}
{'train_runtime': 970.2896, 'train_samples_per_second': 1.031, 'train_steps_per_second': 0.258, 'train_loss': 0.29215772324800493, 'epoch': 10.0}
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 250/250 [16:10<00:00, 3.88s/it]
Saving adapter to finetuned_adapter_v6...
Done! Merge this adapter or load it in Kobold.
B:\7B\!models--mistralai--Mistral-7B-v0.1>
4 hours sounds totally fine, but 2+ TB of storage? w00t ๐คช





