manoskary
/

musicbert-large

@@ -22,14 +22,14 @@ symbolic music sequences extracted from the [GigaMIDI](https://huggingface.co/da
 corpus. It is tailored for symbolic music understanding, fill-mask style infilling,
 and as a backbone for downstream generative tasks.
-- **Checkpoint**: 60000 steps
 - **Hidden size**: 1024
 - **Parameters**: ~330M
-- **Validation loss**: ~1.5
 ## Training Configuration
 - **Objective**: Masked language modeling with span-aware masking
-- **Dataset**: GigaMIDI (REMI tokens → BPE, vocab size 40000)
 - **Sequence length**: 1024
 - **Max events per MIDI**: 2048
@@ -68,7 +68,6 @@ with torch.no_grad():
 print("Predicted token IDs:", predictions.tolist())
 ```
 ## Limitations and Risks
 - Model is trained purely on symbolic data; it does not produce audio directly.
 - The GigaMIDI dataset is biased towards Western tonal music.

 corpus. It is tailored for symbolic music understanding, fill-mask style infilling,
 and as a backbone for downstream generative tasks.
+- **Checkpoint**: 120000 steps
 - **Hidden size**: 1024
 - **Parameters**: ~330M
+- **Validation loss**: 1.19900381565094
 ## Training Configuration
 - **Objective**: Masked language modeling with span-aware masking
+- **Dataset**: GigaMIDI (REMI tokens → BPE, vocab size 50000)
 - **Sequence length**: 1024
 - **Max events per MIDI**: 2048
 print("Predicted token IDs:", predictions.tolist())
 ```
 ## Limitations and Risks
 - Model is trained purely on symbolic data; it does not produce audio directly.
 - The GigaMIDI dataset is biased towards Western tonal music.

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3bd843e666dffdf65611d274987a65a921c7004850878d9321fd61ff1571d783
 size 1385892208

 version https://git-lfs.github.com/spec/v1
+oid sha256:cd19dcc018e5aabec992ef2e7ba5cdb3e65c6f7619cc3ffff444f4e67c5edac8
 size 1385892208