normalization results

by Naphula - opened Feb 9

Owner Feb 9

•

Setting normalization to true instead of false has produced smarter results, but more censored.

Using della with normalization false and increasing the total weights above 1.0 but under 5.0 seems to reduce refusal rates.

The same model is now refusing Q0D (but not Q0F) when norm is set to true. But the intelligence in the reply to Q0F is much smarter. So, there's a tradeoff.

It seems to me that normfalse loses quite a bit of 'formatting intelligence' when scaled to 3.0+ total weights. I think an MPOA of the normtrue yaml might potentially yield better results than using normfalse.

Overall, Q0 scores (Q6_K) are as follows:
Gtest 33a normtrue = 7536
Gtest 33a normfalse = 6669

So it may not be as detailed or uncensored as some of the simpler merges, but I think it's good enough to release as 1.3. The yaml parameters were tweaked significantly to ensure almost every model had a 'voice' in the merge, which was easier to do with the live audit running a previous yaml.

This version of Goetia is highly experimental due to containing several 2501 + 2503 models as well as other merges with high PCA differences.

--- DELLA AUDIT V2 START ---
Loading config: config33.yaml
Base Model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
Donors: 33

Extracting BASE MODEL fingerprint...

Extracting DONOR fingerprints...

Computing Task Vector geometry...

================================================================================
ID    | Model Name
--------------------------------------------------------------------------------
#1    | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
#2    | TheDrummer--Cydonia-24B-v4.3
#3    | ReadyArt--4.2.0-Broken-Tutu-24b
#4    | zerofata--MS3.2-PaintedFantasy-v2-24B
#5    | TheDrummer--Magidonia-24B-v4.3
#6    | TheDrummer--Precog-24B-v1
#7    | zerofata--MS3.2-PaintedFantasy-v3-24B
#8    | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
#9    | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
#10   | trashpanda-org--MS3.2-24B-Mullein-v2
#11   | TheDrummer--Cydonia-24B-v4.2.0
#12   | TheDrummer--Magidonia-24B-v4.2.0
#13   | ConicCat--Mistral-Small-3.2-AntiRep-24B
#14   | Undi95--MistralThinker-v1.1
#15   | CrucibleLab--M3.2-24B-Loki-V2
#16   | Darkhn--M3.2-24B-Animus-V7.1
#17   | Morax-24B-v1
#18   | FlareRebellion--WeirdCompound-v1.7-24b
#19   | allura-forge--ms32-final-TEXTONLY
#20   | Delta-Vector--Rei-24B-KTO
#21   | Doctor-Shotgun--MS3.2-24B-Magnum-Diamond
#22   | ReadyArt--MS3.2-The-Omega-Directive-24B-Unslop-v2.1
#23   | arcee-ai--Arcee-Blitz
#24   | ArliAI--Mistral-Small-24B-ArliAI-RPMax-v1.4
#25   | ReadyArt--Dark-Nexus-24B-v2.0
#26   | Darkhn--M3.2-24B-Animus-V5.1-Pro
#27   | dphn--Dolphin-Mistral-24B-Venice-Edition
#28   | TroyDoesAI--BlackSheep-24B
#29   | TheDrummer--Cydonia-24B-v2
#30   | PocketDoc--Dans-DangerousWinds-V1.1.1-24b
#31   | trashpanda-org--MS-24B-Instruct-Mullein-v0
#32   | OddTheGreat--Circuitry_24B_V.3
#33   | spacewars123--Space-Wars-24B-v1.00a
================================================================================

--- MAGNITUDE ANALYSIS & DATA POINTS ---
ID    | Status     | Delta Norm   | Orig Size    | Model Name
----------------------------------------------------------------------------------------------------
#1    | OK         | 0.0000       | 83886080     | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
#2    | OK         | 1.2955       | 83886080     | TheDrummer--Cydonia-24B-v4.3
#3    | HIGH MAG   | 46.6745      | 83886080     | ReadyArt--4.2.0-Broken-Tutu-24b
#4    | OK         | 0.0505       | 83886080     | zerofata--MS3.2-PaintedFantasy-v2-24B
#5    | OK         | 4.5662       | 83886080     | TheDrummer--Magidonia-24B-v4.3
#6    | OK         | 4.0883       | 83886080     | TheDrummer--Precog-24B-v1
#7    | OK         | 4.8187       | 83886080     | zerofata--MS3.2-PaintedFantasy-v3-24B
#8    | OK         | 1.9250       | 83886080     | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
#9    | HIGH MAG   | 47.3140      | 83886080     | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
#10   | OK         | 0.1586       | 83886080     | trashpanda-org--MS3.2-24B-Mullein-v2
#11   | OK         | 1.0936       | 83886080     | TheDrummer--Cydonia-24B-v4.2.0
#12   | OK         | 3.9147       | 83886080     | TheDrummer--Magidonia-24B-v4.2.0
#13   | OK         | 0.0164       | 83886080     | ConicCat--Mistral-Small-3.2-AntiRep-24B
#14   | OK         | 11.4846      | 83886080     | Undi95--MistralThinker-v1.1
#15   | OK         | 3.1101       | 83886080     | CrucibleLab--M3.2-24B-Loki-V2
#16   | OK         | 0.7205       | 83886080     | Darkhn--M3.2-24B-Animus-V7.1
#17   | HIGH MAG   | 45.5277      | 83886080     | Morax-24B-v1
#18   | HIGH MAG   | 46.1778      | 83886080     | FlareRebellion--WeirdCompound-v1.7-24b
#19   | OK         | 0.0341       | 83886080     | allura-forge--ms32-final-TEXTONLY
#20   | OK         | 0.0855       | 83886080     | Delta-Vector--Rei-24B-KTO
#21   | OK         | 3.2817       | 83886080     | Doctor-Shotgun--MS3.2-24B-Magnum-Diamond
#22   | OK         | 0.0173       | 83886080     | ReadyArt--MS3.2-The-Omega-Directive-24B-Unslop-v2.1
#23   | HIGH MAG   | 47.2801      | 83886080     | arcee-ai--Arcee-Blitz
#24   | OK         | 11.0733      | 83886080     | ArliAI--Mistral-Small-24B-ArliAI-RPMax-v1.4
#25   | OK         | 0.0391       | 83886080     | ReadyArt--Dark-Nexus-24B-v2.0
#26   | OK         | 0.2735       | 83886080     | Darkhn--M3.2-24B-Animus-V5.1-Pro
#27   | OK         | 11.0644      | 83886080     | dphn--Dolphin-Mistral-24B-Venice-Edition
#28   | HIGH MAG   | 47.3102      | 83886080     | TroyDoesAI--BlackSheep-24B
#29   | HIGH MAG   | 47.3145      | 83886080     | TheDrummer--Cydonia-24B-v2
#30   | OK         | 11.4481      | 83886080     | PocketDoc--Dans-DangerousWinds-V1.1.1-24b
#31   | HIGH MAG   | 47.8593      | 83886080     | trashpanda-org--MS-24B-Instruct-Mullein-v0
#32   | HIGH MAG   | 45.7892      | 83886080     | OddTheGreat--Circuitry_24B_V.3
#33   | OK         | 11.0659      | 83886080     | spacewars123--Space-Wars-24B-v1.00a

Log saved to: della_scan.log
Displaying charts...

architecture: MistralForCausalLM
models:
  - model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
  - model: B:\24B\!models--TheDrummer--Cydonia-24B-v4.3
    parameters:
      density: 0.8
      weight: 0.2
      epsilon: 0.1
  - model: B:\24B\!models--ReadyArt--4.2.0-Broken-Tutu-24b
    parameters:
      density: 0.8
      weight: 0.05
      epsilon: 0.1
  - model: B:\24B\!models--zerofata--MS3.2-PaintedFantasy-v2-24B
    parameters:
      density: 0.8
      weight: 0.2
      epsilon: 0.1
  - model: B:\24B\!models--TheDrummer--Magidonia-24B-v4.3
    parameters:
      density: 0.8
      weight: 0.2
      epsilon: 0.1
  - model: B:\24B\!models--TheDrummer--Precog-24B-v1
    parameters:
      density: 0.8
      weight: 0.2
      epsilon: 0.1
  - model: B:\24B\!models--zerofata--MS3.2-PaintedFantasy-v3-24B
    parameters:
      density: 0.8
      weight: 0.2
      epsilon: 0.1
  - model: B:\24B\!BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
    parameters:
      density: 0.8
      weight: 0.2
      epsilon: 0.1
  - model: B:\24B\!models--ReadyArt--Broken-Tutu-24B-Transgression-v2.0
    parameters:
      density: 0.8
      weight: 0.05
      epsilon: 0.1
  - model: B:\24B\!models--trashpanda-org--MS3.2-24B-Mullein-v2
    parameters:
      density: 0.8
      weight: 0.2
      epsilon: 0.1
  # - model: B:\24B\!models--LatitudeGames--Hearthfire-24B
  #   parameters:
  #     density: 0.8
  #     weight: 0.1
  #     epsilon: 0.1
  - model: B:\24B\!models--TheDrummer--Cydonia-24B-v4.2.0
    parameters:
      density: 0.8
      weight: 0.1
      epsilon: 0.1
  - model: B:\24B\!models--TheDrummer--Magidonia-24B-v4.2.0
    parameters:
      density: 0.8
      weight: 0.1
      epsilon: 0.1
  - model: B:\24B\!models--ConicCat--Mistral-Small-3.2-AntiRep-24B
    parameters:
      density: 0.8
      weight: 0.15
      epsilon: 0.1
  - model: B:\24B\!models--Undi95--MistralThinker-v1.1
    parameters:
      density: 0.8
      weight: 0.02
      epsilon: 0.1
  - model: B:\24B\!models--CrucibleLab--M3.2-24B-Loki-V2
    parameters:
      density: 0.8
      weight: 0.02
      epsilon: 0.1
  - model: B:\24B\!models--Darkhn--M3.2-24B-Animus-V7.1
    parameters:
      density: 0.8
      weight: 0.1
      epsilon: 0.1
  - model: B:\24B\Morax-24B-v1
    parameters:
      density: 0.8
      weight: 0.02
      epsilon: 0.1
  - model: B:\24B\!models--FlareRebellion--WeirdCompound-v1.7-24b
    parameters:
      density: 0.8
      weight: 0.1
      epsilon: 0.1
  # - model: B:\24B\!models--aixonlab--Eurydice-24b-v3.5
  #   parameters:
  #     density: 0.8
  #     weight: 0.08
  #     epsilon: 0.1
  - model: B:\24B\!models--allura-forge--ms32-final-TEXTONLY
    parameters:
      density: 0.8
      weight: 0.15
      epsilon: 0.1
  - model: B:\24B\!models--Delta-Vector--Rei-24B-KTO
    parameters:
      density: 0.8
      weight: 0.15
      epsilon: 0.1
  - model: B:\24B\!models--Doctor-Shotgun--MS3.2-24B-Magnum-Diamond
    parameters:
      density: 0.8
      weight: 0.15
      epsilon: 0.1
  - model: B:\24B\!models--ReadyArt--MS3.2-The-Omega-Directive-24B-Unslop-v2.1
    parameters:
     density: 0.8
     weight: 0.15
     epsilon: 0.1
  # - model: B:\24B\!models--Gryphe--Codex-24B-Small-3.2
  #   parameters:
  #    density: 0.8
  #    weight: 0.1
  #    epsilon: 0.1
  # - model: B:\24B\!models--CrucibleLab--M3.2-24B-Loki-V1.3
  #   parameters:
  #     density: 0.8
  #     weight: 0.15
  #     epsilon: 0.1
  - model: B:\24B\!models--arcee-ai--Arcee-Blitz
    parameters:
      density: 0.8
      weight: 0.02
      epsilon: 0.1
  - model: B:\24B\!models--ArliAI--Mistral-Small-24B-ArliAI-RPMax-v1.4
    parameters:
      density: 0.8
      weight: 0.02
      epsilon: 0.1
  # - model: B:\24B\!models--PocketDoc--Dans-PersonalityEngine-V1.3.0-24b
  #   parameters:
  #     density: 0.8
  #     weight: 0.1
  #     epsilon: 0.1
  - model: B:\24B\!models--ReadyArt--Dark-Nexus-24B-v2.0
    parameters:
      density: 0.8
      weight: 0.2
      epsilon: 0.1
  - model: B:\24B\!models--Darkhn--M3.2-24B-Animus-V5.1-Pro
    parameters:
      density: 0.8
      weight: 0.15
      epsilon: 0.1
  - model: B:\24B\!models--dphn--Dolphin-Mistral-24B-Venice-Edition
    parameters:
      density: 0.8
      weight: 0.01
      epsilon: 0.1
  - model: B:\24B\!models--TroyDoesAI--BlackSheep-24B
    parameters:
      density: 0.8
      weight: 0.01
      epsilon: 0.1
  - model: B:\24B\!models--TheDrummer--Cydonia-24B-v2
    parameters:
      density: 0.8
      weight: 0.02
      epsilon: 0.1
  - model: B:\24B\!models--PocketDoc--Dans-DangerousWinds-V1.1.1-24b
    parameters:
      density: 0.8
      weight: 0.02
      epsilon: 0.1
  - model: B:\24B\!models--trashpanda-org--MS-24B-Instruct-Mullein-v0
    parameters:
      density: 0.8
      weight: 0.02
      epsilon: 0.1
  - model: B:\24B\!models--OddTheGreat--Circuitry_24B_V.3
    parameters:
      density: 0.8
      weight: 0.1
      epsilon: 0.1
  - model: B:\24B\!models--spacewars123--Space-Wars-24B-v1.00a
    parameters:
      density: 0.8
      weight: 0.02
      epsilon: 0.1
# Total Donors: 33
# Total Weights: 3.3
# Seed: 420 
merge_method: della
base_model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
parameters:
  lambda: 1.0
  normalize: true # key variable to test
  int8_mask: false
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: base
# chat_template: auto
name: 📜 Goetia-24B-v1.3

McG-221

Feb 9

•

edited Feb 9

Hope that MLX won't choke on what you did there 👀👻

Naphula

Owner Feb 9

•

edited Feb 10

Hope that MLX won't choke on what you did there 👀👻

Hopefully not. My SSD sure did though (required 400GB pagefile + 1.67TB storage + 4 hours to merge)

From the Q0 thread

It's worse than I realized, the early termination bugs are pervasively problematic with 12B and 24B merges. Even with Orig Sizes matching perfectly (all donors size 83886080) the 36 donor yaml is still causing early termination bugs in kobold.

This bug has been reported by @redaihf with many other models too, not just Mistral. So, now I am developing an EOS scanner tool to attempt to identify mismatches before merging (to prevent wasted time).

Here is the first attempt you can see, it doesn't like Hearthfire or Broken Tutus. Once the tool is confirmed functional I'll release it.

Final YAML Advice

Based on the code review, here is the safest configuration for your YAML.

If the scanner returns all MATCH:
tokenizer:
  source: base
# chat_template: auto  <-- DELETE THIS LINE COMPLETELY
Why?

source: base forces mergekit to skip the complex permutation logic in build.py. It simply copies the tokenizer files > from your base model. This guarantees that eos_token_id 2 remains 2.

Deleting chat_template prevents mergekit from synthesizing a template based on a popularity contest of the donors. It > will default to copying the base model's template, which is exactly what you want for a consistent chat experience.

Update

C:\mergekit-main>weight_counter.py config33.yaml
Scanning: config33.yaml...
------------------------------
Models Counted:   33
Total Weight Sum: 3.3
------------------------------

C:\mergekit-main>eos_scanner.py CONFIG33.YAML
--- EOS & TOKENIZER SCANNER (DEEP SCAN) ---
Scanning config: CONFIG33.YAML

Analyzing Base Model...
BASE MODEL: anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
  Gen Config EOS ID: 2
  Tokenizer EOS Str: </s>
  Actual Vocab ID:   2
  Internal Consistency: PASS
--------------------------------------------------------------------------------
Status     | Gen ID   | Vocab ID | EOS Str    | Model Name
----------------------------------------------------------------------------------------------------
MATCH      | 2        | 2        | </s>       | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
MATCH      | 2        | 2        | </s>       | TheDrummer--Cydonia-24B-v4.3
BROKEN     | MISSING | 2        | </s>       | ReadyArt--4.2.0-Broken-Tutu-24b
MATCH      | 2        | 2        | </s>       | zerofata--MS3.2-PaintedFantasy-v2-24B
MATCH      | 2        | 2        | </s>       | TheDrummer--Magidonia-24B-v4.3
MATCH      | 2        | 2        | </s>       | TheDrummer--Precog-24B-v1
MATCH      | 2        | 2        | </s>       | zerofata--MS3.2-PaintedFantasy-v3-24B
MATCH      | 2        | 2        | </s>       | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
BROKEN     | MISSING | 2        | </s>       | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
MATCH      | 2        | 2        | </s>       | trashpanda-org--MS3.2-24B-Mullein-v2
MATCH      | 2        | 2        | </s>       | TheDrummer--Cydonia-24B-v4.2.0
MATCH      | 2        | 2        | </s>       | TheDrummer--Magidonia-24B-v4.2.0
MATCH      | 2        | 2        | </s>       | ConicCat--Mistral-Small-3.2-AntiRep-24B
MATCH      | 2        | 2        | </s>       | Undi95--MistralThinker-v1.1
MATCH      | 2        | 2        | </s>       | CrucibleLab--M3.2-24B-Loki-V2
MATCH      | 2        | 2        | </s>       | Darkhn--M3.2-24B-Animus-V7.1
BROKEN     | MISSING | 2        | </s>       | Morax-24B-v1
BROKEN     | MISSING | 2        | </s>       | FlareRebellion--WeirdCompound-v1.7-24b
MATCH      | 2        | 2        | </s>       | allura-forge--ms32-final-TEXTONLY
MATCH      | 2        | 2        | </s>       | Delta-Vector--Rei-24B-KTO
MATCH      | 2        | 2        | </s>       | Doctor-Shotgun--MS3.2-24B-Magnum-Diamond
MATCH      | 2        | 2        | </s>       | ReadyArt--MS3.2-The-Omega-Directive-24B-Unslop-v2.1
BROKEN     | MISSING | 2        | </s>       | arcee-ai--Arcee-Blitz
MATCH      | 2        | 2        | </s>       | ArliAI--Mistral-Small-24B-ArliAI-RPMax-v1.4
MATCH      | 2        | 2        | </s>       | ReadyArt--Dark-Nexus-24B-v2.0
MATCH      | 2        | 2        | </s>       | Darkhn--M3.2-24B-Animus-V5.1-Pro
MATCH      | 2        | 2        | </s>       | dphn--Dolphin-Mistral-24B-Venice-Edition
MATCH      | 2        | 2        | </s>       | TroyDoesAI--BlackSheep-24B
BROKEN     | MISSING | 2        | </s>       | TheDrummer--Cydonia-24B-v2
MATCH      | 2        | 2        | </s>       | PocketDoc--Dans-DangerousWinds-V1.1.1-24b
BROKEN     | MISSING | 2        | </s>       | trashpanda-org--MS-24B-Instruct-Mullein-v0
BROKEN     | MISSING | 2        | </s>       | OddTheGreat--Circuitry_24B_V.3
MATCH      | 2        | 2        | </s>       | spacewars123--Space-Wars-24B-v1.00a
----------------------------------------------------------------------------------------------------

--- FINAL VERDICT ---
MISMATCHES DETECTED.
1. You MUST use: tokenizer: source: union
2. However, 'union' may cause the early termination bug if IDs shift.
3. Recommendation: Remove the models marked FAIL/BROKEN from the merge.

C:\mergekit-main>gen_id_patcher.py config33.yaml
--- GENERATION CONFIG PATCHER ---
Reading Base Model: !models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
Target EOS ID is: 2
------------------------------------------------------------
Skipping TheDrummer--Cydonia-24B-v4.3: Has ID [2] (Mismatch, not missing)
Patching ReadyArt--4.2.0-Broken-Tutu-24b...
  -> Fixed: Added eos_token_id: 2
Skipping TheDrummer--Magidonia-24B-v4.3: Has ID [2] (Mismatch, not missing)
Skipping TheDrummer--Precog-24B-v1: Has ID [2] (Mismatch, not missing)
Patching ReadyArt--Broken-Tutu-24B-Transgression-v2.0...
  -> Fixed: Added eos_token_id: 2
Patching Morax-24B-v1...
  -> Fixed: Added eos_token_id: 2
Patching FlareRebellion--WeirdCompound-v1.7-24b...
  -> Fixed: Added eos_token_id: 2
Patching arcee-ai--Arcee-Blitz...
  -> Fixed: Added eos_token_id: 2
Skipping TroyDoesAI--BlackSheep-24B: Has ID [2, 131072] (Mismatch, not missing)
Patching TheDrummer--Cydonia-24B-v2...
  -> Fixed: Added eos_token_id: 2
Patching trashpanda-org--MS-24B-Instruct-Mullein-v0...
  -> Fixed: Added eos_token_id: 2
Patching OddTheGreat--Circuitry_24B_V.3...
  -> Fixed: Added eos_token_id: 2
------------------------------------------------------------
Operation Complete. Patched 8 models.
Run eos_scanner.py again to verify results.

C:\mergekit-main>eos_scanner.py CONFIG33.YAML
--- EOS & TOKENIZER SCANNER (DEEP SCAN) ---
Scanning config: CONFIG33.YAML

Analyzing Base Model...
BASE MODEL: anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
  Gen Config EOS ID: 2
  Tokenizer EOS Str: </s>
  Actual Vocab ID:   2
  Internal Consistency: PASS
--------------------------------------------------------------------------------
Status     | Gen ID   | Vocab ID | EOS Str    | Model Name
----------------------------------------------------------------------------------------------------
MATCH      | 2        | 2        | </s>       | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
MATCH      | 2        | 2        | </s>       | TheDrummer--Cydonia-24B-v4.3
MATCH      | 2        | 2        | </s>       | ReadyArt--4.2.0-Broken-Tutu-24b
MATCH      | 2        | 2        | </s>       | zerofata--MS3.2-PaintedFantasy-v2-24B
MATCH      | 2        | 2        | </s>       | TheDrummer--Magidonia-24B-v4.3
MATCH      | 2        | 2        | </s>       | TheDrummer--Precog-24B-v1
MATCH      | 2        | 2        | </s>       | zerofata--MS3.2-PaintedFantasy-v3-24B
MATCH      | 2        | 2        | </s>       | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
MATCH      | 2        | 2        | </s>       | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
MATCH      | 2        | 2        | </s>       | trashpanda-org--MS3.2-24B-Mullein-v2
MATCH      | 2        | 2        | </s>       | TheDrummer--Cydonia-24B-v4.2.0
MATCH      | 2        | 2        | </s>       | TheDrummer--Magidonia-24B-v4.2.0
MATCH      | 2        | 2        | </s>       | ConicCat--Mistral-Small-3.2-AntiRep-24B
MATCH      | 2        | 2        | </s>       | Undi95--MistralThinker-v1.1
MATCH      | 2        | 2        | </s>       | CrucibleLab--M3.2-24B-Loki-V2
MATCH      | 2        | 2        | </s>       | Darkhn--M3.2-24B-Animus-V7.1
MATCH      | 2        | 2        | </s>       | Morax-24B-v1
MATCH      | 2        | 2        | </s>       | FlareRebellion--WeirdCompound-v1.7-24b
MATCH      | 2        | 2        | </s>       | allura-forge--ms32-final-TEXTONLY
MATCH      | 2        | 2        | </s>       | Delta-Vector--Rei-24B-KTO
MATCH      | 2        | 2        | </s>       | Doctor-Shotgun--MS3.2-24B-Magnum-Diamond
MATCH      | 2        | 2        | </s>       | ReadyArt--MS3.2-The-Omega-Directive-24B-Unslop-v2.1
MATCH      | 2        | 2        | </s>       | arcee-ai--Arcee-Blitz
MATCH      | 2        | 2        | </s>       | ArliAI--Mistral-Small-24B-ArliAI-RPMax-v1.4
MATCH      | 2        | 2        | </s>       | ReadyArt--Dark-Nexus-24B-v2.0
MATCH      | 2        | 2        | </s>       | Darkhn--M3.2-24B-Animus-V5.1-Pro
MATCH      | 2        | 2        | </s>       | dphn--Dolphin-Mistral-24B-Venice-Edition
MATCH      | 2        | 2        | </s>       | TroyDoesAI--BlackSheep-24B
MATCH      | 2        | 2        | </s>       | TheDrummer--Cydonia-24B-v2
MATCH      | 2        | 2        | </s>       | PocketDoc--Dans-DangerousWinds-V1.1.1-24b
MATCH      | 2        | 2        | </s>       | trashpanda-org--MS-24B-Instruct-Mullein-v0
MATCH      | 2        | 2        | </s>       | OddTheGreat--Circuitry_24B_V.3
MATCH      | 2        | 2        | </s>       | spacewars123--Space-Wars-24B-v1.00a
----------------------------------------------------------------------------------------------------

--- FINAL VERDICT ---
ALL CLEAR.
1. Change YAML to: tokenizer: source: base
2. Remove: chat_template: auto
3. Ensure your base model path in YAML is correct.

C:\mergekit-main>

It appears this worked. I'm not seeing any early terminations, at least with one-shot prompts. I'll release the scripts soon on model tools page

The model is quite creative but also a bit too concise. Just need to tweak some of the yaml weights to make it "smarter" (needs more drummer weight 🥁). The first run config33a.yamlscored 6669 on Q0 Bench.

Update: The scripts are up now. I am running the exact same yaml again but with normalize: true to see how it compares first.

The merge is complete, but I am taking a break from Goetia for now to work on some finetuning 7B experiments with new datasets. Seems I have fixed a lot of issues with avnas by re-designing the finetune script to parse the actual dataset entries instead of chunking into random blocks. Now i just have to augment the json to make it smarter. Once this is up to par I hope to start finetuning 12-24B models (but it likely requires upgrading my video card first, the chunking attempts to finetune 12B on a 3060 ti haven't worked yet)

Loading Tokenizer...
Loading dataset: B:\7B\!models--mistralai--Mistral-7B-v0.1\dataset_cache\unified_dataset.parquet
Formatting dataset with EOS tokens...
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 33333.10 examples/s]
Training on 100 distinct Q&A pairs.
Loading Model...
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:09<00:00,  4.74s/it]
Applying formatting function to train dataset: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 20000.50 examples/s]
Adding EOS to train dataset: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 24997.34 examples/s]
Tokenizing train dataset: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 2439.02 examples/s]
Truncating train dataset: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 100007.25 examples/s]
Starting Training...
{'loss': 2.0906, 'grad_norm': 4.214516639709473, 'learning_rate': 0.0, 'entropy': 2.4038201570510864, 'num_tokens': 763.0, 'mean_token_accuracy': 0.5371164456009865, 'epoch': 0.04}
[...]
{'loss': 0.03, 'grad_norm': 0.2113722264766693, 'learning_rate': 8.426222418311814e-09, 'entropy': 0.0646352544426918, 'num_tokens': 162690.0, 'mean_token_accuracy': 0.9909947216510773, 'epoch': 10.0}
{'train_runtime': 970.2896, 'train_samples_per_second': 1.031, 'train_steps_per_second': 0.258, 'train_loss': 0.29215772324800493, 'epoch': 10.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [16:10<00:00,  3.88s/it]
Saving adapter to finetuned_adapter_v6...
Done! Merge this adapter or load it in Kobold.

B:\7B\!models--mistralai--Mistral-7B-v0.1>

McG-221

Feb 9

4 hours sounds totally fine, but 2+ TB of storage? w00t 🤪

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment