MrBERT-nos-gl-thematic-press-classifier: Thematic Press Classification for Galician
Fine-tuned version of MrBERT-nos-gl for thematic classification of Galician press articles into editorial sections. Developed as part of Proxecto Nós, an initiative to build language technology for the Galician language.
Model Details
| Property | Value |
|---|---|
| Base model | proxectonos/MrBERT-nos-gl |
| Task | Text classification (topic/section classification) |
| Language | Galician (gl) |
| License | Apache 2.0 |
| Source domain | Journalistic Galician text (Praza Pública) |
Thematic Categories
The model classifies articles into the following press sections:
| Label | Description |
|---|---|
Política |
Politics and governance |
Economía |
Economy and finance |
Cultura |
Culture, arts, and literature |
Movementos sociais |
Social movements and civil society |
Mundo |
International news |
Ciencia e tecnoloxía |
Science and technology |
Deportes |
Sports |
Acontece |
Current events / news in brief |
Training Data
Fine-tuned on the prazapublica subset of proxectonos/corpus_dominio_periodistico, a corpus of Galician journalistic text from the online newspaper Praza Pública (39,700 articles).
Usage
Installation
pip install transformers torch
Quick start
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("proxectonos/MrBERT-nos-gl-thematic-press-classifier")
model = AutoModelForSequenceClassification.from_pretrained("proxectonos/MrBERT-nos-gl-thematic-press-classifier")
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
)
text = "O Concello de Santiago aprobou o novo plan de mobilidade urbana para reducir o tráfico no casco histórico."
results = classifier(text, top_k=3)
for i, result in enumerate(sorted(results, key=lambda x: x['score'], reverse=True)):
print(f"{i+1}. {result['label']:<20} {result['score']*100:.1f}%")
Example output
1. Movementos sociais 82.3%
2. Acontece 9.0%
3. Política 3.0%
Interactive CLI (optional)
For interactive exploration from the command line:
while True:
text = input("Enter text to classify: ").strip()
if text.lower() in ["quit", "exit", "q"]:
break
results = classifier(text, top_k=3)
for i, r in enumerate(sorted(results, key=lambda x: x['score'], reverse=True)):
bar = "█" * int(r['score'] * 20)
print(f" {i+1}. {r['label']:<20} {r['score']*100:5.1f}% {bar}")
Acknowledgements
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA. (Esta publicación del proyecto Desarrollo de Modelos ALIA está financiada por el Ministerio para la Transformación Digital y de la Función Pública y por el Plan de Recuperación, Transformación y Resiliencia – Financiado por la Unión Europea – NextGenerationEU)
Citation
@misc{proxectenos2026MrBERT-nos-gl-thematic-press-classifier,
author = {{Proxecto Nós}},
title = {{MrBERT-nos-gl-thematic-press-classifier}: Thematic Press Classification for Galician},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/proxectonos/MrBERT-nos-gl-thematic-press-classifier}},
}
- Downloads last month
- 39