SetFit with hackathon-pln-es/paraphrase-spanish-distilroberta

This is a SetFit model that can be used for Text Classification. This SetFit model uses hackathon-pln-es/paraphrase-spanish-distilroberta as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: hackathon-pln-es/paraphrase-spanish-distilroberta
Classification head: a LogisticRegression instance
Maximum Sequence Length: 128 tokens
Number of Classes: 4 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
2.0	'GESTIÓN DE LA PLANEACIÓN INSTITUCIONALACTIVIDADES TRANSVERSALES PARA EL PROCESO DE PLANEACIÓN INSTITUCIONALConfigurar aplicativo SAILFO' 'GESTIÓN DE LA PLANEACIÓN INSTITUCIONALACTIVIDADES TRANSVERSALES PARA EL PROCESO DE PLANEACIÓN INSTITUCIONALVerificar el almacenamiento de las cámaras en relación con la capacidad, la cantidad de almacenamiento disponible y verificar que esten grabado.' 'GESTIÓN DE LA PLANEACIÓN INSTITUCIONALACTIVIDADES TRANSVERSALES PARA EL PROCESO DE PLANEACIÓN INSTITUCIONALLlevar a cabo la atención telefonica'
1.0	'GESTIÓN ADMINISTRATIVAADMINISTRATIVO - MANEJO DE CORRESPONDENCIAFirmar el recibido en la planilla incluyendo fecha y hora o recibir por sistema.' 'GESTIÓN DEL SERVICIO PERICIALSEGURIDAD Y SALUD EN EL TRABAJO - INVESTIGACIÓN Y SEGUIMIENTO DE LOS ACCIDENTES E INCIDENTES DE TRABAJOVerificar el cumplimiento de los planes de intervención y realizar el respectivo registro a las acciones ejecutadas para cada accidente de presunto origen laboral' 'GESTIÓN DEL SERVICIO PERICIALMETROLOGÍA - ACONDICIONAMIENTO DE LABORATORIO, LIMPIEZA Y DISPOSICIÓN DE DESECHOS EN LAS ÁREAS DEL GRUPO DE METROLOGÍAMonitorear las condiciones ambientales de los laboratorios'
0.0	'GESTIÓN DEL SERVICIO PERICIALACTIVIDADES TRANSVERSALES PARA EL PROCESO DE GESTIÓN DEL SERVICIO PERICIALRevisar el correo institucional de la dependencia y tramitar según el tema' 'GESTIÓN DEL SERVICIO PERICIALPATOLOGÍA - ABORDAJE DE CADÁVERES QUEMADOS, CARBONIZADOS Y CALCINADOSDeterminar y clasificar si el cadáver muestra cambios por quemaduras, carbonización o calcinación para así definir el abordaje de necropsia medicolegal en cadáver quemado, carbonizado o calcinado' 'GESTIÓN DEL SERVICIO PERICIALPATOLOGÍA - ATENCIÓN Y BÚSQUEDA DE UN DESAPARECIDO ENTRE CADÁVERES SOMETIDOS A NECROPSIA MEDICO LEGALIntegrar el informe de identificación al informe pericial de necropsia.'
3.0	'GESTIÓN DEL SISTEMA DE EVALUACIÓN Y CONTROLCONTROL - ASESORÍA CONTROL INTERNO\xa0Incluir las necesidades, solicitud de charla o asesoría elaboración de informes en el PUNA' 'GESTIÓN DEL SISTEMA DE EVALUACIÓN Y CONTROLCONTROL - ASESORÍA CONTROL INTERNO\xa0Consultar los documentos necesarios con el fin de preparar la temática, en caso de contar acompañante(s), definir las actividades y tareas con ellos.' 'GESTIÓN DEL SISTEMA DE EVALUACIÓN Y CONTROLCONTROL - ENTRENAMIENTO Y REENTRENAMIENTO EN TEMAS DE CONTROL INTERNO Y AUDITORÍASDesarrollar el objetivo y el contenido temático del modulo(s)'

Evaluation

Metrics

Label	Accuracy
all	0.96

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("rovargasc/setfit-model_actividadesMedicinaLegalV1")
# Run inference
preds = model("GESTIÓN DEL SERVICIO PERICIALANTROPOLOGÍA - ANÁLISIS ANTROPOLÓGICO FORENSERealizar la toma de muestras de la escrictura osea con la anuencia del Médico.")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	6	26.1733	65

Label	Training Sample Count
0.0	69
1.0	79
2.0	75
3.0	77

Training Hyperparameters

batch_size: (64, 64)
num_epochs: (1, 1)
max_steps: -1
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0009	1	0.1977	-
0.0474	50	0.0986	-
0.0949	100	0.0514	-
0.1423	150	0.0025	-
0.1898	200	0.0012	-
0.2372	250	0.0014	-
0.2846	300	0.0003	-
0.3321	350	0.0003	-
0.3795	400	0.0002	-
0.4269	450	0.0001	-
0.4744	500	0.0002	-
0.5218	550	0.0001	-
0.5693	600	0.0002	-
0.6167	650	0.0001	-
0.6641	700	0.0001	-
0.7116	750	0.0002	-
0.7590	800	0.0001	-
0.8065	850	0.0001	-
0.8539	900	0.0001	-
0.9013	950	0.0001	-
0.9488	1000	0.0001	-
0.9962	1050	0.0001	-
1.0	1054	-	0.0517

Framework Versions

Python: 3.10.13
SetFit: 1.0.3
Sentence Transformers: 3.0.1
Transformers: 4.40.0
PyTorch: 2.1.2
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for rovargasc/setfit-model_actividadesMedicinaLegalV1

Base model

somosnlp-hackathon-2022/paraphrase-spanish-distilroberta

Finetuned

(4)

this model

Paper for rovargasc/setfit-model_actividadesMedicinaLegalV1

Efficient Few-Shot Learning Without Prompts

Paper • 2209.11055 • Published Sep 22, 2022 • 5

Evaluation results

Accuracy on Unknown
test set self-reported

0.960