Token Classification
Transformers
Safetensors
Estonian
camembert
estonian
quantifier-extraction
roberta
Instructions to use ahtokiil/est-roberta-quant-extraction_EKI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ahtokiil/est-roberta-quant-extraction_EKI with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ahtokiil/est-roberta-quant-extraction_EKI")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ahtokiil/est-roberta-quant-extraction_EKI") model = AutoModelForTokenClassification.from_pretrained("ahtokiil/est-roberta-quant-extraction_EKI") - Notebooks
- Google Colab
- Kaggle
Est-RoBERTa for Quantifier Extraction (Estonian)
This model is a fine-tuned version of EMBEDDIA/est-roberta on a custom dataset for extracting quantifier constructions (e.g., "kari koeri", "hunnik raamatuid") in Estonian text.
It performs token classification using the BIO labeling scheme with the following labels:
O: OutsideB-QUANT: Beginning of a quantifier expressionI-QUANT: Inside a quantifier expression
๐ Training and Evaluation
Epochs: 12
Batch size: 8
Test set: 159 positive cases, 1000 negative cases
Precision: 87.05%
Recall: 94.53%
F1-score: 90.64%
Accuracy: 99.88%
๐๏ธ Funding This work was supported by the Estonian Research Council grant (PRG 1978). Uurimistรถรถd on finantseerinud Eesti Teadusagentuur (PRG 1978).
๐ Example Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
model = AutoModelForTokenClassification.from_pretrained("ahtokiil/est-roberta-quant-extraction_EKI")
tokenizer = AutoTokenizer.from_pretrained("ahtokiil/est-roberta-quant-extraction_EKI")
sentence = "Arsti juures tuli tรผkk aega oodata."
inputs = tokenizer(sentence, return_tensors="pt")
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
labels = [model.config.id2label[p.item()] for p in predictions[0]]
print(list(zip(tokens, labels)))
- Downloads last month
- 4