PII Engineer

PII Engineer — Multilingual NER v2.1

Fast, multilingual PII detection model. Detects 30+ PII types across 50+ languages from a single model, no GPU required.

Live Demo · Benchmarks · GitHub · Blog

Benchmarks

PII Engineer Presidio spaCy AWS Comprehend
F1 (multilingual) 0.86 0.44 0.64 0.52
F1 (English) 0.88 0.80 0.83 0.82
Languages 50+ ~10 locales 1 per model 12
Latency (p50) 180ms 80ms (w/ NER) 120ms 200ms
GPU required No No Optional N/A
Cost (1M req/mo) $42 $42 $42 ~$1,000

Full benchmarks →

Accuracy by Language

Language F1
English 0.931
Chinese 0.918
Vietnamese 0.912
Korean 0.905
Indonesian 0.901
Malay 0.895
Hindi 0.892
Thai 0.885
Tamil 0.878

Per-Entity Accuracy

Entity Type F1
email_address 0.970
phone_number 0.968
government_id 0.920
bank_account_number 0.915
street_address 0.891
date_of_birth 0.887
passport_number 0.880
license_plate 0.833
person_name 0.823

PII Types Detected

person_name · phone_number · government_id · street_address · date_of_birth · email_address · passport_number · license_plate · bank_account_number

Model Architecture

  • Base: GLiNER2 (span-based NER)
  • Encoder: mDeBERTa-v3-base (280M params), fine-tuned with LoRA on PII data
  • Inference: 5 ONNX models (encoder, span_rep, count_embed, count_pred, classifier)
  • Quantization: INT8 encoder available (~15-20% faster on x86 CPU)
  • Total size: ~620MB (all languages)

Quick Start

With PII Engineer Server (Rust)

git clone https://github.com/gantz-ai/pii.engineer
cd pii.engineer
cargo build --release --package pii-engineer-server
cargo run --release --package pii-engineer-server
# Models auto-download on first run
# API at http://localhost:8000
curl -X POST http://localhost:8000/api/detect \
  -H "Content-Type: application/json" \
  -d '{"text": "John Doe, NRIC S9012345B, born 12 March 1985"}'

With Python

import requests

resp = requests.post("http://localhost:8000/api/detect", json={
    "text": "John Doe lives at 42 Orchard Road, Singapore 238879",
    "labels": ["person_name", "street_address", "phone_number", "email_address"]
})

for entity in resp.json()["entities"]:
    print(f'{entity["type"]}: {entity["value"]} (score: {entity["score"]:.2f})')

Download Models Manually

pip install huggingface_hub
huggingface-cli download pii-engineer/PII-Engineer-Multi-NER-v2.1 --local-dir models/PII-Engineer-Multi-NER-v2.1
huggingface-cli download pii-engineer/PII-Engineer-Chinese-NER-v1.0 --local-dir models/PII-Engineer-Chinese-NER-v1.0

Use Cases

  • PDPA/GDPR/CCPA compliance — detect PII in databases, logs, documents
  • Data anonymization — redact PII before sharing datasets
  • CI/CD scanning — catch leaked PII in code and configs
  • Chat/support data — clean PII from customer interactions

License

AGPL-3.0 — free for open-source use. Commercial license available at pii.engineer.

Citation

@software{pii_engineer,
  title = {PII Engineer: Multilingual PII Detection},
  url = {https://github.com/gantz-ai/pii.engineer},
  year = {2026}
}
Downloads last month
59
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support