Instructions to use zachz/pii-ner-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use zachz/pii-ner-model with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("zachz/pii-ner-model", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
metadata
language: en
license: mit
library_name: sklearn
tags:
- token-classification
- ner
- pii-detection
- sklearn
datasets:
- zachz/pii-detection-corpus
pipeline_tag: token-classification
PII NER Model
A lightweight sklearn-based Named Entity Recognition model for detecting Personally Identifiable Information in text.
Model Details
- Type: Dict Vectorizer + Logistic Regression pipeline
- Task: Token-level NER classification
- Framework: scikit-learn
- Labels: O, NAME, EMAIL, PHONE, SSN
Usage
import pickle
with open("model.pkl", "rb") as f:
model = pickle.load(f)
def extract_features(tokens, idx):
token = tokens[idx]
features = {
'word.lower': token.lower(),
'word.length': len(token),
'word.has_at': '@' in token,
'word.is_digit': token.isdigit(),
}
if idx > 0: features['prev.lower'] = tokens[idx-1].lower()
if idx < len(tokens)-1: features['next.lower'] = tokens[idx+1].lower()
return features
text = "Contact jane@test.com or call 555-123-4567"
tokens = text.split()
features = [extract_features(tokens, i) for i in range(len(tokens))]
predictions = model.predict(features)
entities = [(t, l) for t, l in zip(tokens, predictions) if l != "O"]
Limitations
- Small training set (12 examples)
- Simple whitespace tokenization
- English only
- Best used as a lightweight first-pass PII detector
License
MIT