🇲🇳 NLLB-200 Mongolian Fine-tuned
This model is a fine-tuned version of Meta's NLLB-200 (No Language Left Behind) distilled model, specifically optimized for high-quality Mongolian ↔ English translation.
📖 Model Description
NLLB-200 is a multilingual model capable of translating across 200+ languages. This specific checkpoint has been further fine-tuned on curated Mongolian datasets to improve:
- Grammar & Syntax: Better handling of Mongolian's SOV (Subject-Object-Verb) structure.
- Vocabulary: Improved subword tokenization for Cyrillic Mongolian.
- Contextual Accuracy: More natural translations for conversational and technical text.
🚀 Usage
You can use this model directly with the Hugging Face transformers library:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name = "your-username/nllb-mongolian"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
text = "Сайн байна уу? Таны ажил төрөл сайн уу?"
inputs = tokenizer(text, return_tensors="pt")
translated_tokens = model.generate(
**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"], max_length=128
)
print(tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0])
📊 Training Data
The model was fine-tuned on a combination of:
- Publicly available Mongolian-English parallel corpora.
- Custom datasets collected from local translations.
⚖️ License
This model is licensed under the Apache License 2.0. You are free to use, modify, and distribute this model, including for commercial purposes, provided you include the original license and copyright notice.
- Downloads last month
- 8