Update README.md
Browse files
README.md
CHANGED
|
@@ -50,7 +50,7 @@ The model is a 12-layer causal transformer with the following architecture:
|
|
| 50 |
|
| 51 |
## training
|
| 52 |
|
| 53 |
-
- **Datasets**: HuggingFaceFW/fineweb-edu (~700k docs) + mlfoundations/dclm-baseline-1.0 (~250k docs)
|
| 54 |
- **Tokenizer**: Custom ByteLevelBPE (vocab size: 32768)
|
| 55 |
- **Batch size**: 524,288 tokens
|
| 56 |
- **Sequence length**: 1024
|
|
|
|
| 50 |
|
| 51 |
## training
|
| 52 |
|
| 53 |
+
- **Datasets**: HuggingFaceFW/fineweb-edu (\~700k docs) + mlfoundations/dclm-baseline-1.0 (\~250k docs)
|
| 54 |
- **Tokenizer**: Custom ByteLevelBPE (vocab size: 32768)
|
| 55 |
- **Batch size**: 524,288 tokens
|
| 56 |
- **Sequence length**: 1024
|