MrBERT-nos-gl-thematic-press-classifier: Thematic Press Classification for Galician

Fine-tuned version of MrBERT-nos-gl for thematic classification of Galician press articles into editorial sections. Developed as part of Proxecto Nós, an initiative to build language technology for the Galician language.

Model Details

Property	Value
Base model	`proxectonos/MrBERT-nos-gl`
Task	Text classification (topic/section classification)
Language	Galician (`gl`)
License	Apache 2.0
Source domain	Journalistic Galician text (Praza Pública)

Thematic Categories

The model classifies articles into the following press sections:

Label	Description
`Política`	Politics and governance
`Economía`	Economy and finance
`Cultura`	Culture, arts, and literature
`Movementos sociais`	Social movements and civil society
`Mundo`	International news
`Ciencia e tecnoloxía`	Science and technology
`Deportes`	Sports
`Acontece`	Current events / news in brief

Training Data

Fine-tuned on the prazapublica subset of proxectonos/corpus_dominio_periodistico, a corpus of Galician journalistic text from the online newspaper Praza Pública (39,700 articles).

Usage

Installation

pip install transformers torch

Quick start

from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("proxectonos/MrBERT-nos-gl-thematic-press-classifier")
model = AutoModelForSequenceClassification.from_pretrained("proxectonos/MrBERT-nos-gl-thematic-press-classifier")
classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
)

text = "O Concello de Santiago aprobou o novo plan de mobilidade urbana para reducir o tráfico no casco histórico."
results = classifier(text, top_k=3)

for i, result in enumerate(sorted(results, key=lambda x: x['score'], reverse=True)):
    print(f"{i+1}. {result['label']:<20} {result['score']*100:.1f}%")

Example output

1. Movementos sociais    82.3%
2. Acontece               9.0%
3. Política               3.0%

Interactive CLI (optional)

For interactive exploration from the command line:

while True:
    text = input("Enter text to classify: ").strip()
    if text.lower() in ["quit", "exit", "q"]:
        break
    results = classifier(text, top_k=3)
    for i, r in enumerate(sorted(results, key=lambda x: x['score'], reverse=True)):
        bar = "█" * int(r['score'] * 20)
        print(f"  {i+1}. {r['label']:<20} {r['score']*100:5.1f}%  {bar}")

Acknowledgements

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA. (Esta publicación del proyecto Desarrollo de Modelos ALIA está financiada por el Ministerio para la Transformación Digital y de la Función Pública y por el Plan de Recuperación, Transformación y Resiliencia – Financiado por la Unión Europea – NextGenerationEU)

Citation

@misc{proxectenos2026MrBERT-nos-gl-thematic-press-classifier,
  author       = {{Proxecto Nós}},
  title        = {{MrBERT-nos-gl-thematic-press-classifier}: Thematic Press Classification for Galician},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/proxectonos/MrBERT-nos-gl-thematic-press-classifier}},
}