Fill-Mask
Transformers
Safetensors
bert
music-generation
masked-language-modeling
remi
midi
symbolic-music
gigamidi
Instructions to use manoskary/musicbert-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use manoskary/musicbert-large with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="manoskary/musicbert-large")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("manoskary/musicbert-large") model = AutoModelForMaskedLM.from_pretrained("manoskary/musicbert-large") - Inference
- Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -25,25 +25,14 @@ and as a backbone for downstream generative tasks.
|
|
| 25 |
- **Checkpoint**: 60000 steps
|
| 26 |
- **Hidden size**: 1024
|
| 27 |
- **Parameters**: ~330M
|
| 28 |
-
- **
|
| 29 |
-
- **Validation loss**: 1.5264089107513428
|
| 30 |
|
| 31 |
## Training Configuration
|
| 32 |
- **Objective**: Masked language modeling with span-aware masking
|
| 33 |
-
- **Dataset**: GigaMIDI (REMI tokens → BPE, vocab size
|
| 34 |
- **Sequence length**: 1024
|
| 35 |
- **Max events per MIDI**: 2048
|
| 36 |
-
- **Per-device batch size**: 24
|
| 37 |
-
- **Gradient accumulation**: 8
|
| 38 |
-
- **Effective batch size**: 192
|
| 39 |
-
- **Learning rate**: 5e-05
|
| 40 |
-
- **Warmup steps**: 0
|
| 41 |
|
| 42 |
-
## Tokenizer
|
| 43 |
-
- **Base REMI vocab size**: 532
|
| 44 |
-
- **BPE vocab size**: 50000
|
| 45 |
-
- Includes REMI control tokens for bar, position, tempo, velocity, program, and duration
|
| 46 |
-
- Special tokens: `<PAD>`, `<MASK>`, `<SEP>`, `<CLS>`
|
| 47 |
|
| 48 |
## Inference Example
|
| 49 |
|
|
@@ -79,29 +68,6 @@ with torch.no_grad():
|
|
| 79 |
print("Predicted token IDs:", predictions.tolist())
|
| 80 |
```
|
| 81 |
|
| 82 |
-
### Using with pre-tokenized sequences
|
| 83 |
-
```python
|
| 84 |
-
from transformers import BertForMaskedLM
|
| 85 |
-
from miditok import MusicTokenizer
|
| 86 |
-
import torch
|
| 87 |
-
|
| 88 |
-
model = BertForMaskedLM.from_pretrained("manoskary/musicbert-large")
|
| 89 |
-
tokenizer = MusicTokenizer.from_pretrained("manoskary/miditok-REMI")
|
| 90 |
-
|
| 91 |
-
# Note: The tokenizer uses REMI+BPE encoding
|
| 92 |
-
# For direct token manipulation, work with token IDs
|
| 93 |
-
# The vocabulary includes compressed BPE tokens learned from REMI sequences
|
| 94 |
-
```
|
| 95 |
-
|
| 96 |
-
## Training Command (for reproducibility)
|
| 97 |
-
Training was launched with the simplified MusicBERT pretraining script:
|
| 98 |
-
```bash
|
| 99 |
-
python -m music_llm.train.train_pretrain_musicbert_simple \
|
| 100 |
-
--model_size large \
|
| 101 |
-
--output_dir ./runs/musicbert_large_gigamidi_bpe \
|
| 102 |
-
--dataset_path /opt/datasets/music_llm/gigamidi_remi/final \
|
| 103 |
-
--tokenizer_path /opt/datasets/music_llm/gigamidi_remi/bpe_tokenizer
|
| 104 |
-
```
|
| 105 |
|
| 106 |
## Limitations and Risks
|
| 107 |
- Model is trained purely on symbolic data; it does not produce audio directly.
|
|
|
|
| 25 |
- **Checkpoint**: 60000 steps
|
| 26 |
- **Hidden size**: 1024
|
| 27 |
- **Parameters**: ~330M
|
| 28 |
+
- **Validation loss**: ~1.5
|
|
|
|
| 29 |
|
| 30 |
## Training Configuration
|
| 31 |
- **Objective**: Masked language modeling with span-aware masking
|
| 32 |
+
- **Dataset**: GigaMIDI (REMI tokens → BPE, vocab size 40000)
|
| 33 |
- **Sequence length**: 1024
|
| 34 |
- **Max events per MIDI**: 2048
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
## Inference Example
|
| 38 |
|
|
|
|
| 68 |
print("Predicted token IDs:", predictions.tolist())
|
| 69 |
```
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
## Limitations and Risks
|
| 73 |
- Model is trained purely on symbolic data; it does not produce audio directly.
|