Long-Context Marin 8B - OpenThoughts3

This model is the final checkpoint from the exp2199a2_redo2 experiment, which fine-tunes the long-context extended Marin-8B model on the OpenThoughts3 dataset.

Model Details

Base Model: tootsie-8b-giraffe-phase3-64k (Marin 8B with 64k context extension)
Training Dataset: OpenThoughts3-1.2M (1.2M examples)
Final Checkpoint: step-11718

Training Hyperparameters

Parameter	Value
Epochs	5
Batch Size	512
Learning Rate	8e-5
Max Sequence Length	16384
LR Schedule	Cosine
Warmup	10%
Decay	0.9
Weight Decay	0.0
Beta1	0.9
Beta2	0.999
Hardware	TPU v4-512

Training Notes

Era shuffling enabled (dataset shuffled every epoch)
Trained with Llama3-style rotary embeddings configured for 64k context

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

marin-community
/

longcontext-marin-8b-openthoughts3

Long-Context Marin 8B - OpenThoughts3

Model Details

Training Hyperparameters

Training Notes

Dataset used to train marin-community/longcontext-marin-8b-openthoughts3