MrBERT-nos-gl-thematic-press-classifier: Thematic Press Classification for Galician

Fine-tuned version of MrBERT-nos-gl for thematic classification of Galician press articles into editorial sections. Developed as part of Proxecto Nós, an initiative to build language technology for the Galician language.

Model Details

Property Value
Base model proxectonos/MrBERT-nos-gl
Task Text classification (topic/section classification)
Language Galician (gl)
License Apache 2.0
Source domain Journalistic Galician text (Praza Pública)

Thematic Categories

The model classifies articles into the following press sections:

Label Description
Política Politics and governance
Economía Economy and finance
Cultura Culture, arts, and literature
Movementos sociais Social movements and civil society
Mundo International news
Ciencia e tecnoloxía Science and technology
Deportes Sports
Acontece Current events / news in brief

Training Data

Fine-tuned on the prazapublica subset of proxectonos/corpus_dominio_periodistico, a corpus of Galician journalistic text from the online newspaper Praza Pública (39,700 articles).

Usage

Installation

pip install transformers torch

Quick start

from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("proxectonos/MrBERT-nos-gl-thematic-press-classifier")
model = AutoModelForSequenceClassification.from_pretrained("proxectonos/MrBERT-nos-gl-thematic-press-classifier")
classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
)

text = "O Concello de Santiago aprobou o novo plan de mobilidade urbana para reducir o tráfico no casco histórico."
results = classifier(text, top_k=3)

for i, result in enumerate(sorted(results, key=lambda x: x['score'], reverse=True)):
    print(f"{i+1}. {result['label']:<20} {result['score']*100:.1f}%")

Example output

1. Movementos sociais    82.3%
2. Acontece               9.0%
3. Política               3.0%  

Interactive CLI (optional)

For interactive exploration from the command line:

while True:
    text = input("Enter text to classify: ").strip()
    if text.lower() in ["quit", "exit", "q"]:
        break
    results = classifier(text, top_k=3)
    for i, r in enumerate(sorted(results, key=lambda x: x['score'], reverse=True)):
        bar = "█" * int(r['score'] * 20)
        print(f"  {i+1}. {r['label']:<20} {r['score']*100:5.1f}%  {bar}")

Acknowledgements

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA. (Esta publicación del proyecto Desarrollo de Modelos ALIA está financiada por el Ministerio para la Transformación Digital y de la Función Pública y por el Plan de Recuperación, Transformación y Resiliencia – Financiado por la Unión Europea – NextGenerationEU)

Citation

@misc{proxectenos2026MrBERT-nos-gl-thematic-press-classifier,
  author       = {{Proxecto Nós}},
  title        = {{MrBERT-nos-gl-thematic-press-classifier}: Thematic Press Classification for Galician},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/proxectonos/MrBERT-nos-gl-thematic-press-classifier}},
}
Downloads last month
39
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for proxectonos/MrBERT-nos-gl-thematic-press-classifier

Base model

BSC-LT/MrBERT
Finetuned
(4)
this model

Dataset used to train proxectonos/MrBERT-nos-gl-thematic-press-classifier

Collection including proxectonos/MrBERT-nos-gl-thematic-press-classifier