KlarKI β€” EU AI Act Named Entity Recognition (spaCy)

Named entity recognition β€” extracts 8 compliance-specific entity types from EU AI Act and GDPR documents

Part of KlarKI β€” a local-first EU AI Act + GDPR compliance auditor for German SMEs. All inference runs on-device. No data leaves your machine.


Model Overview

Property Value
Base model de_core_news_lg
Architecture spaCy 3.7 NER pipeline (tok2vec + NER)
Parameters ~560k word vectors + custom NER head
Languages German (primary), English
Training samples ~4,000+ train / ~1,000+ validation
License MIT
Part of KlarKI audit pipeline

Quickstart

Option A β€” Via KlarKI (recommended)

Use this if you want the full audit pipeline. The download script places all 5 models exactly where KlarKI expects them.

git clone https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor.git
cd KlarKI-EU-AI-Act-compliance-auditor
pip install huggingface-hub>=0.26.0
python scripts/download_pretrained.py --model ner
./run.sh up

Option B β€” Direct usage

from huggingface_hub import snapshot_download
import spacy

model_path = snapshot_download("s4nkar/klarki-ner-spacy")
nlp = spacy.load(f"{model_path}/model-final")

doc = nlp("The provider must maintain technical documentation under Article 11 of the EU AI Act.")
for ent in doc.ents:
    print(ent.text, ent.label_)
# Output: [('provider', 'ACTOR'), ('technical documentation', 'PROCEDURE'), ('Article 11', 'ARTICLE'), ('EU AI Act', 'REGULATION')]

Labels

Label Description
ARTICLE References to specific articles (e.g. 'Article 9', 'Artikel 13', 'Art. 14')
OBLIGATION Legal obligations (e.g. 'must document', 'shall maintain', 'are required to')
ACTOR Regulated parties (e.g. 'providers', 'operators', 'importers', 'notified bodies')
AI_SYSTEM AI system references (e.g. 'high-risk AI system', 'emotion recognition system')
RISK_TIER Risk classifications (e.g. 'high-risk', 'prohibited', 'hochriskant')
PROCEDURE Regulatory procedures (e.g. 'conformity assessment', 'risk management system')
REGULATION Regulation names (e.g. 'EU AI Act', 'GDPR', 'DSGVO', 'KI-Gesetz')
PROHIBITED_USE Prohibited practices (e.g. 'social scoring', 'real-time biometric surveillance')

Evaluation Results

Metrics not available. Run the model locally to generate.


Training Details

Property Value
Base model de_core_news_lg
Training epochs 60 (early stopping, patience=10)
Data generation Deterministic template expansion + regulatory text extraction
NER backbone tok2vec from de_core_news_lg kept active during training
Training framework Docker container (Python 3.11, isolated from host)

Intended Use

Phase 1 of the KlarKI audit pipeline. Extracted entities feed directly into actor classification (AI_SYSTEM ownership signals) and the applicability gate (PROHIBITED_USE feeds Article 5 detection; RISK_TIER feeds Annex III detection).

This model is a decision-support tool, not a substitute for qualified legal advice. EU AI Act compliance determinations should always be reviewed by a legal professional.


Limitations

  • Trained on synthetic + regulatory text; may miss novel entity phrasings outside training distribution.
  • Inference capped at 1000 characters per chunk in KlarKI to limit latency.
  • German-primary base model; English coverage is strong but secondary.

Citation

@software{klarki2026,
  author    = {Sankar},
  title     = {KlarKI: Local-First EU AI Act and GDPR Compliance Auditor},
  year      = {2026},
  url       = {https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor},
  note      = {Open-source compliance tooling for German SMEs}
}

About KlarKI

KlarKI is an open-source, local-first EU AI Act + GDPR compliance auditor built for German SMEs. Upload a policy document and receive a scored gap analysis against Articles 9–15 entirely on your own hardware.

Key features:

  • Deterministic legal decision hierarchy (actor detection, Annex III applicability gate)
  • Hybrid RAG retrieval (BM25 + ChromaDB vector + cross-encoder re-ranking)
  • LangGraph multi-agent gap analysis (3-node per applicable article)
  • Bilingual EN/DE support β€” all inference runs locally, no external API calls

GitHub  |  All KlarKI Models

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results

  • Overall F1 on KlarKI EU AI Act Regulatory Training Data
    self-reported
    0.000