manoskary
/

musicbert-large

music-generation

masked-language-modeling

Model card Files Files and versions

manoskary commited on Oct 9, 2025

Commit

db44d98

·

verified ·

1 Parent(s): 6525545

Update README.md

Files changed (1) hide show

README.md +2 -36

README.md CHANGED Viewed

@@ -25,25 +25,14 @@ and as a backbone for downstream generative tasks.
 - **Checkpoint**: 60000 steps
 - **Hidden size**: 1024
 - **Parameters**: ~330M
-- **Training loss**: unknown
-- **Validation loss**: 1.5264089107513428
 ## Training Configuration
 - **Objective**: Masked language modeling with span-aware masking
-- **Dataset**: GigaMIDI (REMI tokens → BPE, vocab size 50000)
 - **Sequence length**: 1024
 - **Max events per MIDI**: 2048
-- **Per-device batch size**: 24
-- **Gradient accumulation**: 8
-- **Effective batch size**: 192
-- **Learning rate**: 5e-05
-- **Warmup steps**: 0
-## Tokenizer
-- **Base REMI vocab size**: 532
-- **BPE vocab size**: 50000
-- Includes REMI control tokens for bar, position, tempo, velocity, program, and duration
-- Special tokens: `<PAD>`, `<MASK>`, `<SEP>`, `<CLS>`
 ## Inference Example
@@ -79,29 +68,6 @@ with torch.no_grad():
 print("Predicted token IDs:", predictions.tolist())
 ```
-### Using with pre-tokenized sequences
-```python
-from transformers import BertForMaskedLM
-from miditok import MusicTokenizer
-import torch
-model = BertForMaskedLM.from_pretrained("manoskary/musicbert-large")
-tokenizer = MusicTokenizer.from_pretrained("manoskary/miditok-REMI")
-# Note: The tokenizer uses REMI+BPE encoding
-# For direct token manipulation, work with token IDs
-# The vocabulary includes compressed BPE tokens learned from REMI sequences
-```
-## Training Command (for reproducibility)
-Training was launched with the simplified MusicBERT pretraining script:
-```bash
-python -m music_llm.train.train_pretrain_musicbert_simple \
-    --model_size large \
-    --output_dir ./runs/musicbert_large_gigamidi_bpe \
-    --dataset_path /opt/datasets/music_llm/gigamidi_remi/final \
-    --tokenizer_path /opt/datasets/music_llm/gigamidi_remi/bpe_tokenizer
-```
 ## Limitations and Risks
 - Model is trained purely on symbolic data; it does not produce audio directly.

 - **Checkpoint**: 60000 steps
 - **Hidden size**: 1024
 - **Parameters**: ~330M
+- **Validation loss**: ~1.5
 ## Training Configuration
 - **Objective**: Masked language modeling with span-aware masking
+- **Dataset**: GigaMIDI (REMI tokens → BPE, vocab size 40000)
 - **Sequence length**: 1024
 - **Max events per MIDI**: 2048
 ## Inference Example
 print("Predicted token IDs:", predictions.tolist())
 ```
 ## Limitations and Risks
 - Model is trained purely on symbolic data; it does not produce audio directly.