--- library_name: transformers tags: - contrastive-learning - Spanish-UMLS - Hierarchical-enrichment license: mit language: - es base_model: - PlanTL-GOB-ES/roberta-base-biomedical-es --- # HERBERT: Leveraging UMLS Hierarchical Knowledge to Enhance Clinical Entity Normalization in Spanish **HERBERT-GP** is a contrastive-learning-based bi-encoder for medical entity normalization in Spanish. It leverages hierarchical relationships from UMLS (parents and grandparents) to enhance the candidate retrieval step for entity linking in Spanish clinical texts. **Key features:** - Base model: [PlanTL-GOB-ES/roberta-base-biomedical-clinical-es](https://huggingface.co/PlanTL-GOB-ES/roberta-base-biomedical-clinical-es) - Trained with 30 positive pairs per anchor using synonyms, parents, and grandparents from UMLS/SNOMED-CT. - Task: Normalization of disease, procedure, and symptom mentions to SNOMED-CT/UMLS codes. - Domain: Spanish biomedical/clinical texts. - Corpora: DisTEMIST, MedProcNER, SympTEMIST. --- ## Evaluation (top-k accuracy): | Corpus | Top-1 | Top-5 | Top-25 | Top-200 | |-------------|--------|--------|--------|---------| | DisTEMIST | 0.585 | 0.727 | 0.808 | 0.871 | | SympTEMIST | 0.632 | 0.783 | 0.884 | 0.948 | | MedProcNER | 0.655 | 0.770 | 0.840 | 0.891 |