rudyon
/

linnet-497M

Text Generation

Mixture of Experts

Model card Files Files and versions

rudyon commited on Mar 25

Commit

0053036

·

verified ·

1 Parent(s): 0493f64

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -50,7 +50,7 @@ The model is a 12-layer causal transformer with the following architecture:
 ## training
-- **Datasets**: HuggingFaceFW/fineweb-edu (~700k docs) + mlfoundations/dclm-baseline-1.0 (~250k docs)
 - **Tokenizer**: Custom ByteLevelBPE (vocab size: 32768)
 - **Batch size**: 524,288 tokens
 - **Sequence length**: 1024

 ## training
+- **Datasets**: HuggingFaceFW/fineweb-edu (\~700k docs) + mlfoundations/dclm-baseline-1.0 (\~250k docs)
 - **Tokenizer**: Custom ByteLevelBPE (vocab size: 32768)
 - **Batch size**: 524,288 tokens
 - **Sequence length**: 1024