Advanced HuggingFace Features - Comprehensive Guide (Full Tutorial)
Advanced HuggingFace Features - Comprehensive Guide
Author: AYI-NEDJIMI | AI & Cybersecurity Consultant
This tutorial covers advanced HuggingFace features: Collections, Discussions, Webhooks, CLI, Git LFS, model evaluation, Leaderboards, Hub API, programmatic repo management, SEO tips, monetization, and Enterprise solutions.
1. Collections: Create, Organize, Share
Collections let you group repos (models, datasets, Spaces) thematically.
1.1 Create a Collection
from huggingface_hub import HfApi
api = HfApi(token="hf_your_token")
# Create a collection
collection = api.create_collection(
title="CyberSec AI Toolkit",
description="Complete collection of AI tools for cybersecurity: models, datasets, and applications.",
private=False
)
print(f"Collection created: {collection.slug}")
# Add items to the collection
api.add_collection_item(
collection_slug=collection.slug,
item_id="AYI-NEDJIMI/cybersec-threat-classifier",
item_type="model",
note="Cybersec threat classifier based on BERT"
)
api.add_collection_item(
collection_slug=collection.slug,
item_id="AYI-NEDJIMI/Dataset-Explorer",
item_type="space",
note="Interactive dataset explorer"
)
1.2 Manage a Collection
# List your collections
collections = api.list_collections(owner="AYI-NEDJIMI")
for c in collections:
print(f" {c.title} ({c.slug}) - {len(c.items)} items")
# Update a collection
api.update_collection_metadata(
collection_slug=collection.slug,
title="CyberSec AI Toolkit 2026",
description="Updated collection with the latest models"
)
Discover our collection: CyberSec AI Portfolio
2. Discussions and Community Features
2.1 Repo Discussions
Every model, dataset, and Space has a Discussions tab:
from huggingface_hub import HfApi
api = HfApi(token="hf_your_token")
# Create a discussion
discussion = api.create_discussion(
repo_id="AYI-NEDJIMI/Dataset-Explorer",
repo_type="space",
title="Improvement suggestion",
description="Here's a suggestion to improve the explorer..."
)
print(f"Discussion created: {discussion.url}")
# Comment on a discussion
api.comment_discussion(
repo_id="AYI-NEDJIMI/Dataset-Explorer",
repo_type="space",
discussion_num=discussion.num,
comment="Thanks for this suggestion! We'll implement it."
)
# List discussions
discussions = api.get_repo_discussions(
repo_id="AYI-NEDJIMI/Dataset-Explorer",
repo_type="space"
)
for d in discussions:
print(f" #{d.num}: {d.title} ({d.status})")
2.2 Pull Requests
HuggingFace repos support Pull Requests like GitHub:
# Create a PR
# api.create_pull_request(
# repo_id="AYI-NEDJIMI/my-model",
# title="Improve model card",
# description="Added usage examples and benchmarks"
# )
3. Webhooks and CI/CD
3.1 Webhooks
Webhooks let you receive notifications when a repo is modified:
# Configure via web interface:
# 1. Settings > Webhooks
# 2. Your endpoint URL
# 3. Events: push, discussion, etc.
# Example Flask server to receive webhooks
from flask import Flask, request, jsonify
app = Flask(__name__)
@app .route("/webhook", methods=["POST"])
def handle_webhook():
payload = request.json
event = request.headers.get("X-Webhook-Event")
if event == "repo.content.push":
repo = payload.get("repo", {})
print(f"New push on {repo.get('name')}")
# Trigger your CI/CD pipeline here
return jsonify({"status": "ok"}), 200
3.2 CI/CD with GitHub Actions
# .github/workflows/deploy-to-hf.yml
name: Deploy to HuggingFace
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Push to HuggingFace
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
pip install huggingface_hub
huggingface-cli upload AYI-NEDJIMI/my-space . . --repo-type space
4. HuggingFace CLI (huggingface-cli)
4.1 Essential Commands
# Authentication
huggingface-cli login
huggingface-cli whoami
# Download
huggingface-cli download gpt2
huggingface-cli download gpt2 config.json # Single file
huggingface-cli download gpt2 --local-dir ./gpt2 # To a folder
# Upload
huggingface-cli upload AYI-NEDJIMI/my-model ./local_dir
huggingface-cli upload AYI-NEDJIMI/my-model model.safetensors --repo-type model
# Create a repo
huggingface-cli repo create my-new-model --type model
huggingface-cli repo create my-dataset --type dataset
huggingface-cli repo create my-space --type space -y
# Cache management
huggingface-cli scan-cache
huggingface-cli delete-cache
# Environment info
huggingface-cli env
4.2 Cache Management
# View cache
huggingface-cli scan-cache
# Output:
# REPO ID REPO TYPE SIZE NB FILES LAST_ACCESSED REFS LOCAL PATH
# gpt2 model 548.1M 11 1 day ago main ~/.cache/huggingface/hub/models--gpt2
# Clean cache
huggingface-cli delete-cache
# Set cache directory
export HF_HOME=/path/to/cache
export HF_HUB_CACHE=/path/to/cache/hub
5. Git LFS and Version Control
5.1 How It Works
HuggingFace repos use Git + Git LFS:
- Git: for text files (configs, code, README)
- Git LFS: for large files (models, datasets)
# Clone a repo
git clone https://huggingface.co/gpt2
cd gpt2
# Large files are LFS pointers
cat model.safetensors # Shows LFS pointer
# Download actual files
git lfs pull
# Configure LFS for new types
git lfs track "*.safetensors"
git lfs track "*.bin"
git lfs track "*.gguf"
git add .gitattributes
5.2 Branches and Tags
# Create a branch for a version
git checkout -b v2.0
git add .
git commit -m "v2.0: new model"
git push origin v2.0
# Tags for versions
git tag v2.0.0
git push origin v2.0.0
6. Model Evaluation with the evaluate Library
import evaluate
# Load metrics
accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")
bleu = evaluate.load("bleu")
rouge = evaluate.load("rouge")
# Compute accuracy
predictions = [0, 1, 1, 0, 1, 0]
references = [0, 1, 0, 0, 1, 1]
result = accuracy.compute(predictions=predictions, references=references)
print(f"Accuracy: {result['accuracy']:.4f}")
# Compute F1
result = f1.compute(predictions=predictions, references=references, average="weighted")
print(f"F1: {result['f1']:.4f}")
# Compute BLEU (for translation/generation)
predictions_text = ["the cat sat on the mat"]
references_text = [["the cat is sitting on the mat"]]
result = bleu.compute(predictions=predictions_text, references=references_text)
print(f"BLEU: {result['bleu']:.4f}")
# Compute ROUGE (for summarization)
predictions_text = ["The cat sat on the mat."]
references_text = ["The cat is sitting on the mat."]
result = rouge.compute(predictions=predictions_text, references=references_text)
print(f"ROUGE-1: {result['rouge1']:.4f}")
print(f"ROUGE-L: {result['rougeL']:.4f}")
# Combine multiple metrics
combined = evaluate.combine(["accuracy", "f1", "precision", "recall"])
result = combined.compute(predictions=predictions, references=references)
print(result)
7. Leaderboards (Open LLM Leaderboard)
7.1 Open LLM Leaderboard
The Open LLM Leaderboard is the reference for comparing open-source LLMs:
Benchmarks used:
- MMLU: general knowledge (57 subjects)
- ARC: scientific reasoning
- HellaSwag: sentence completion
- TruthfulQA: answer truthfulness
- Winogrande: common sense reasoning
- GSM8K: mathematics
7.2 Submit Your Model
To submit a model to the leaderboard:
- Upload your model to the Hub
- Ensure it works with
transformers - Submit via the leaderboard form
- Wait for evaluation (~24-48h)
8. Papers with Code Integration
HuggingFace integrates research papers with their implementations:
- Daily Papers: AI papers selected daily by the community
- Links: each paper can be linked to models and datasets
- Discussions: community comments on each paper
# Link a paper to your model (in model card YAML)
---
paper: arxiv:2401.12345
---
9. HF Hub API (Python SDK)
9.1 Complete API
from huggingface_hub import HfApi
api = HfApi(token="hf_your_token")
# === MODELS ===
models = api.list_models(search="cybersecurity", sort="downloads", limit=5)
info = api.model_info("gpt2")
print(f"ID: {info.id}")
print(f"Author: {info.author}")
print(f"Pipeline tag: {info.pipeline_tag}")
print(f"Tags: {info.tags}")
print(f"Downloads: {info.downloads}")
print(f"Likes: {info.likes}")
# === DATASETS ===
datasets = api.list_datasets(search="security", limit=5)
ds_info = api.dataset_info("squad_v2")
# === SPACES ===
spaces = api.list_spaces(search="chatbot", limit=5)
sp_info = api.space_info("AYI-NEDJIMI/Dataset-Explorer")
# === FILES ===
files = api.list_repo_files("gpt2")
tree = api.list_repo_tree("gpt2")
for item in tree:
print(f" {item.path}")
9.2 Repo Operations
# Create a repo
api.create_repo("AYI-NEDJIMI/new-model", repo_type="model", private=False)
# Upload a file
api.upload_file(
path_or_fileobj="./model.safetensors",
path_in_repo="model.safetensors",
repo_id="AYI-NEDJIMI/new-model",
repo_type="model",
commit_message="Initial model upload"
)
# Upload a folder
api.upload_folder(
folder_path="./my_model",
repo_id="AYI-NEDJIMI/new-model",
repo_type="model",
commit_message="Complete model upload"
)
# Delete a file
api.delete_file(
path_in_repo="old_file.bin",
repo_id="AYI-NEDJIMI/new-model",
repo_type="model"
)
10. Programmatic Repo Management
10.1 Automate Publishing
from huggingface_hub import HfApi
api = HfApi(token="hf_your_token")
def publish_model(model_path, repo_id, model_card_content):
"""Publish a model with its model card."""
api.create_repo(repo_id, repo_type="model", exist_ok=True)
with open(f"{model_path}/README.md", "w") as f:
f.write(model_card_content)
api.upload_folder(
folder_path=model_path,
repo_id=repo_id,
repo_type="model",
commit_message="Automated model publication"
)
print(f"Model published: https://huggingface.co/{repo_id}")
10.2 Monitor Your Repos
def monitor_repos(username):
"""Monitor your repo statistics."""
api = HfApi()
models = api.list_models(author=username)
total_downloads = 0
total_likes = 0
for m in models:
total_downloads += m.downloads or 0
total_likes += m.likes or 0
print(f" Model: {m.id:40s} | DL: {m.downloads:>8,} | Likes: {m.likes:>4}")
print(f"\n Total Downloads: {total_downloads:,}")
print(f" Total Likes: {total_likes}")
# monitor_repos("AYI-NEDJIMI")
11. SEO Tips for HuggingFace Repos
11.1 Optimize Visibility
Descriptive name: include task, language, architecture
- Good:
cybersec-threat-classifier-bert-english - Bad:
my-model-v2
- Good:
Complete tags in YAML:
---
language: en
license: mit
pipeline_tag: text-classification
tags:
- cybersecurity
- threat-detection
- english
- bert
- security
datasets:
- AYI-NEDJIMI/cybersec-threats
---
- Rich model card: description, code examples, benchmarks, limitations
- Usage examples: directly copyable code
- Metrics: include your scores (accuracy, F1, BLEU)
- Links: to your papers, website, other repos
11.2 Increase Visibility
- Share on social media (Twitter/X, LinkedIn)
- Publish a blog post (HuggingFace Blog, Medium)
- Submit to relevant leaderboards
- Participate in community discussions
- Create thematic collections
12. Monetization and Licensing Strategies
12.1 Common Licenses
| License | Commercial Use | Modifications | Sharing |
|---|---|---|---|
| MIT | Yes | Yes | Optional |
| Apache 2.0 | Yes | Yes | Optional |
| CC-BY-4.0 | Yes | Yes | Attribution required |
| CC-BY-NC-4.0 | No | Yes | Attribution required |
| Llama 3.1 License | Yes (<700M users) | Yes | Specific conditions |
12.2 Monetization Strategies
- Consulting: offer your AI expertise services
- Training: create courses and tutorials
- SaaS: deploy your models as a service
- Dual Licensing: free version + premium version
- Premium Support: technical support for your models
13. HuggingFace for Enterprise
13.1 Private Hub (Enterprise Hub)
- Dedicated and secure instance
- SSO with SAML/OIDC
- Granular roles and permissions
- Complete audit logs
- Compliance (SOC2, HIPAA)
13.2 Inference Endpoints
from huggingface_hub import InferenceClient
# Dedicated production endpoint
client = InferenceClient(
model="https://xyz123.us-east-1.aws.endpoints.huggingface.cloud",
token="hf_your_token"
)
response = client.text_generation(
"Analyze this cybersecurity threat:",
max_new_tokens=200,
temperature=0.3
)
13.3 Enterprise Benefits
- Performance: dedicated GPUs, guaranteed latency
- Security: VPC, encryption, isolation
- Scalability: auto-scaling 0 to N replicas
- Support: dedicated team, SLA
- Compliance: industry certifications
Conclusion
HuggingFace offers much more than the Hub and libraries. Collections, Discussions, Webhooks, CLI, evaluation, and Enterprise features make HuggingFace a complete platform for the entire AI lifecycle: from prototyping to production.
Discover our complete collection: CyberSec AI Portfolio
Tutorial written by AYI-NEDJIMI - AI & Cybersecurity Consultant
For more resources, check out our other tutorials in this series.