PII Engineer — Multilingual NER v2.1
Fast, multilingual PII detection model. Detects 30+ PII types across 50+ languages from a single model, no GPU required.
Live Demo · Benchmarks · GitHub · Blog
Benchmarks
| PII Engineer | Presidio | spaCy | AWS Comprehend | |
|---|---|---|---|---|
| F1 (multilingual) | 0.86 | 0.44 | 0.64 | 0.52 |
| F1 (English) | 0.88 | 0.80 | 0.83 | 0.82 |
| Languages | 50+ | ~10 locales | 1 per model | 12 |
| Latency (p50) | 180ms | 80ms (w/ NER) | 120ms | 200ms |
| GPU required | No | No | Optional | N/A |
| Cost (1M req/mo) | $42 | $42 | $42 | ~$1,000 |
Accuracy by Language
| Language | F1 |
|---|---|
| English | 0.931 |
| Chinese | 0.918 |
| Vietnamese | 0.912 |
| Korean | 0.905 |
| Indonesian | 0.901 |
| Malay | 0.895 |
| Hindi | 0.892 |
| Thai | 0.885 |
| Tamil | 0.878 |
Per-Entity Accuracy
| Entity Type | F1 |
|---|---|
| email_address | 0.970 |
| phone_number | 0.968 |
| government_id | 0.920 |
| bank_account_number | 0.915 |
| street_address | 0.891 |
| date_of_birth | 0.887 |
| passport_number | 0.880 |
| license_plate | 0.833 |
| person_name | 0.823 |
PII Types Detected
person_name · phone_number · government_id · street_address · date_of_birth · email_address · passport_number · license_plate · bank_account_number
Model Architecture
- Base: GLiNER2 (span-based NER)
- Encoder: mDeBERTa-v3-base (280M params), fine-tuned with LoRA on PII data
- Inference: 5 ONNX models (encoder, span_rep, count_embed, count_pred, classifier)
- Quantization: INT8 encoder available (~15-20% faster on x86 CPU)
- Total size: ~620MB (all languages)
Quick Start
With PII Engineer Server (Rust)
git clone https://github.com/gantz-ai/pii.engineer
cd pii.engineer
cargo build --release --package pii-engineer-server
cargo run --release --package pii-engineer-server
# Models auto-download on first run
# API at http://localhost:8000
curl -X POST http://localhost:8000/api/detect \
-H "Content-Type: application/json" \
-d '{"text": "John Doe, NRIC S9012345B, born 12 March 1985"}'
With Python
import requests
resp = requests.post("http://localhost:8000/api/detect", json={
"text": "John Doe lives at 42 Orchard Road, Singapore 238879",
"labels": ["person_name", "street_address", "phone_number", "email_address"]
})
for entity in resp.json()["entities"]:
print(f'{entity["type"]}: {entity["value"]} (score: {entity["score"]:.2f})')
Download Models Manually
pip install huggingface_hub
huggingface-cli download pii-engineer/PII-Engineer-Multi-NER-v2.1 --local-dir models/PII-Engineer-Multi-NER-v2.1
huggingface-cli download pii-engineer/PII-Engineer-Chinese-NER-v1.0 --local-dir models/PII-Engineer-Chinese-NER-v1.0
Use Cases
- PDPA/GDPR/CCPA compliance — detect PII in databases, logs, documents
- Data anonymization — redact PII before sharing datasets
- CI/CD scanning — catch leaked PII in code and configs
- Chat/support data — clean PII from customer interactions
License
AGPL-3.0 — free for open-source use. Commercial license available at pii.engineer.
Citation
@software{pii_engineer,
title = {PII Engineer: Multilingual PII Detection},
url = {https://github.com/gantz-ai/pii.engineer},
year = {2026}
}
- Downloads last month
- 59