Fill-Mask
Transformers
Safetensors
bert
music-generation
masked-language-modeling
remi
midi
symbolic-music
gigamidi
Instructions to use manoskary/musicbert-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use manoskary/musicbert-large with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="manoskary/musicbert-large")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("manoskary/musicbert-large") model = AutoModelForMaskedLM.from_pretrained("manoskary/musicbert-large") - Inference
- Notebooks
- Google Colab
- Kaggle
Upload MusicBERT base model (GigaMIDI REMI+BPE, 120K steps)
Browse files- README.md +3 -4
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -22,14 +22,14 @@ symbolic music sequences extracted from the [GigaMIDI](https://huggingface.co/da
|
|
| 22 |
corpus. It is tailored for symbolic music understanding, fill-mask style infilling,
|
| 23 |
and as a backbone for downstream generative tasks.
|
| 24 |
|
| 25 |
-
- **Checkpoint**:
|
| 26 |
- **Hidden size**: 1024
|
| 27 |
- **Parameters**: ~330M
|
| 28 |
-
- **Validation loss**:
|
| 29 |
|
| 30 |
## Training Configuration
|
| 31 |
- **Objective**: Masked language modeling with span-aware masking
|
| 32 |
-
- **Dataset**: GigaMIDI (REMI tokens → BPE, vocab size
|
| 33 |
- **Sequence length**: 1024
|
| 34 |
- **Max events per MIDI**: 2048
|
| 35 |
|
|
@@ -68,7 +68,6 @@ with torch.no_grad():
|
|
| 68 |
print("Predicted token IDs:", predictions.tolist())
|
| 69 |
```
|
| 70 |
|
| 71 |
-
|
| 72 |
## Limitations and Risks
|
| 73 |
- Model is trained purely on symbolic data; it does not produce audio directly.
|
| 74 |
- The GigaMIDI dataset is biased towards Western tonal music.
|
|
|
|
| 22 |
corpus. It is tailored for symbolic music understanding, fill-mask style infilling,
|
| 23 |
and as a backbone for downstream generative tasks.
|
| 24 |
|
| 25 |
+
- **Checkpoint**: 120000 steps
|
| 26 |
- **Hidden size**: 1024
|
| 27 |
- **Parameters**: ~330M
|
| 28 |
+
- **Validation loss**: 1.19900381565094
|
| 29 |
|
| 30 |
## Training Configuration
|
| 31 |
- **Objective**: Masked language modeling with span-aware masking
|
| 32 |
+
- **Dataset**: GigaMIDI (REMI tokens → BPE, vocab size 50000)
|
| 33 |
- **Sequence length**: 1024
|
| 34 |
- **Max events per MIDI**: 2048
|
| 35 |
|
|
|
|
| 68 |
print("Predicted token IDs:", predictions.tolist())
|
| 69 |
```
|
| 70 |
|
|
|
|
| 71 |
## Limitations and Risks
|
| 72 |
- Model is trained purely on symbolic data; it does not produce audio directly.
|
| 73 |
- The GigaMIDI dataset is biased towards Western tonal music.
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1385892208
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cd19dcc018e5aabec992ef2e7ba5cdb3e65c6f7619cc3ffff444f4e67c5edac8
|
| 3 |
size 1385892208
|