Advanced HuggingFace Features - Comprehensive Guide (Full Tutorial)

#1
by AYI-NEDJIMI - opened

Advanced HuggingFace Features - Comprehensive Guide

Author: AYI-NEDJIMI | AI & Cybersecurity Consultant

This tutorial covers advanced HuggingFace features: Collections, Discussions, Webhooks, CLI, Git LFS, model evaluation, Leaderboards, Hub API, programmatic repo management, SEO tips, monetization, and Enterprise solutions.


1. Collections: Create, Organize, Share

Collections let you group repos (models, datasets, Spaces) thematically.

1.1 Create a Collection

from huggingface_hub import HfApi

api = HfApi(token="hf_your_token")

# Create a collection
collection = api.create_collection(
    title="CyberSec AI Toolkit",
    description="Complete collection of AI tools for cybersecurity: models, datasets, and applications.",
    private=False
)
print(f"Collection created: {collection.slug}")

# Add items to the collection
api.add_collection_item(
    collection_slug=collection.slug,
    item_id="AYI-NEDJIMI/cybersec-threat-classifier",
    item_type="model",
    note="Cybersec threat classifier based on BERT"
)

api.add_collection_item(
    collection_slug=collection.slug,
    item_id="AYI-NEDJIMI/Dataset-Explorer",
    item_type="space",
    note="Interactive dataset explorer"
)

1.2 Manage a Collection

# List your collections
collections = api.list_collections(owner="AYI-NEDJIMI")
for c in collections:
    print(f"  {c.title} ({c.slug}) - {len(c.items)} items")

# Update a collection
api.update_collection_metadata(
    collection_slug=collection.slug,
    title="CyberSec AI Toolkit 2026",
    description="Updated collection with the latest models"
)

Discover our collection: CyberSec AI Portfolio


2. Discussions and Community Features

2.1 Repo Discussions

Every model, dataset, and Space has a Discussions tab:

from huggingface_hub import HfApi

api = HfApi(token="hf_your_token")

# Create a discussion
discussion = api.create_discussion(
    repo_id="AYI-NEDJIMI/Dataset-Explorer",
    repo_type="space",
    title="Improvement suggestion",
    description="Here's a suggestion to improve the explorer..."
)
print(f"Discussion created: {discussion.url}")

# Comment on a discussion
api.comment_discussion(
    repo_id="AYI-NEDJIMI/Dataset-Explorer",
    repo_type="space",
    discussion_num=discussion.num,
    comment="Thanks for this suggestion! We'll implement it."
)

# List discussions
discussions = api.get_repo_discussions(
    repo_id="AYI-NEDJIMI/Dataset-Explorer",
    repo_type="space"
)
for d in discussions:
    print(f"  #{d.num}: {d.title} ({d.status})")

2.2 Pull Requests

HuggingFace repos support Pull Requests like GitHub:

# Create a PR
# api.create_pull_request(
#     repo_id="AYI-NEDJIMI/my-model",
#     title="Improve model card",
#     description="Added usage examples and benchmarks"
# )

3. Webhooks and CI/CD

3.1 Webhooks

Webhooks let you receive notifications when a repo is modified:

# Configure via web interface:
# 1. Settings > Webhooks
# 2. Your endpoint URL
# 3. Events: push, discussion, etc.

# Example Flask server to receive webhooks
from flask import Flask, request, jsonify

app = Flask(__name__)

@app .route("/webhook", methods=["POST"])
def handle_webhook():
    payload = request.json
    event = request.headers.get("X-Webhook-Event")

    if event == "repo.content.push":
        repo = payload.get("repo", {})
        print(f"New push on {repo.get('name')}")
        # Trigger your CI/CD pipeline here

    return jsonify({"status": "ok"}), 200

3.2 CI/CD with GitHub Actions

# .github/workflows/deploy-to-hf.yml
name: Deploy to HuggingFace
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Push to HuggingFace
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: |
          pip install huggingface_hub
          huggingface-cli upload AYI-NEDJIMI/my-space . . --repo-type space

4. HuggingFace CLI (huggingface-cli)

4.1 Essential Commands

# Authentication
huggingface-cli login
huggingface-cli whoami

# Download
huggingface-cli download gpt2
huggingface-cli download gpt2 config.json  # Single file
huggingface-cli download gpt2 --local-dir ./gpt2  # To a folder

# Upload
huggingface-cli upload AYI-NEDJIMI/my-model ./local_dir
huggingface-cli upload AYI-NEDJIMI/my-model model.safetensors --repo-type model

# Create a repo
huggingface-cli repo create my-new-model --type model
huggingface-cli repo create my-dataset --type dataset
huggingface-cli repo create my-space --type space -y

# Cache management
huggingface-cli scan-cache
huggingface-cli delete-cache

# Environment info
huggingface-cli env

4.2 Cache Management

# View cache
huggingface-cli scan-cache
# Output:
# REPO ID  REPO TYPE  SIZE    NB FILES  LAST_ACCESSED  REFS  LOCAL PATH
# gpt2     model      548.1M  11        1 day ago      main  ~/.cache/huggingface/hub/models--gpt2

# Clean cache
huggingface-cli delete-cache

# Set cache directory
export HF_HOME=/path/to/cache
export HF_HUB_CACHE=/path/to/cache/hub

5. Git LFS and Version Control

5.1 How It Works

HuggingFace repos use Git + Git LFS:

  • Git: for text files (configs, code, README)
  • Git LFS: for large files (models, datasets)
# Clone a repo
git clone https://huggingface.co/gpt2
cd gpt2

# Large files are LFS pointers
cat model.safetensors  # Shows LFS pointer

# Download actual files
git lfs pull

# Configure LFS for new types
git lfs track "*.safetensors"
git lfs track "*.bin"
git lfs track "*.gguf"
git add .gitattributes

5.2 Branches and Tags

# Create a branch for a version
git checkout -b v2.0
git add .
git commit -m "v2.0: new model"
git push origin v2.0

# Tags for versions
git tag v2.0.0
git push origin v2.0.0

6. Model Evaluation with the evaluate Library

import evaluate

# Load metrics
accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")
bleu = evaluate.load("bleu")
rouge = evaluate.load("rouge")

# Compute accuracy
predictions = [0, 1, 1, 0, 1, 0]
references = [0, 1, 0, 0, 1, 1]

result = accuracy.compute(predictions=predictions, references=references)
print(f"Accuracy: {result['accuracy']:.4f}")

# Compute F1
result = f1.compute(predictions=predictions, references=references, average="weighted")
print(f"F1: {result['f1']:.4f}")

# Compute BLEU (for translation/generation)
predictions_text = ["the cat sat on the mat"]
references_text = [["the cat is sitting on the mat"]]
result = bleu.compute(predictions=predictions_text, references=references_text)
print(f"BLEU: {result['bleu']:.4f}")

# Compute ROUGE (for summarization)
predictions_text = ["The cat sat on the mat."]
references_text = ["The cat is sitting on the mat."]
result = rouge.compute(predictions=predictions_text, references=references_text)
print(f"ROUGE-1: {result['rouge1']:.4f}")
print(f"ROUGE-L: {result['rougeL']:.4f}")

# Combine multiple metrics
combined = evaluate.combine(["accuracy", "f1", "precision", "recall"])
result = combined.compute(predictions=predictions, references=references)
print(result)

7. Leaderboards (Open LLM Leaderboard)

7.1 Open LLM Leaderboard

The Open LLM Leaderboard is the reference for comparing open-source LLMs:

Benchmarks used:

  • MMLU: general knowledge (57 subjects)
  • ARC: scientific reasoning
  • HellaSwag: sentence completion
  • TruthfulQA: answer truthfulness
  • Winogrande: common sense reasoning
  • GSM8K: mathematics

7.2 Submit Your Model

To submit a model to the leaderboard:

  1. Upload your model to the Hub
  2. Ensure it works with transformers
  3. Submit via the leaderboard form
  4. Wait for evaluation (~24-48h)

8. Papers with Code Integration

HuggingFace integrates research papers with their implementations:

  • Daily Papers: AI papers selected daily by the community
  • Links: each paper can be linked to models and datasets
  • Discussions: community comments on each paper
# Link a paper to your model (in model card YAML)
---
paper: arxiv:2401.12345
---

9. HF Hub API (Python SDK)

9.1 Complete API

from huggingface_hub import HfApi

api = HfApi(token="hf_your_token")

# === MODELS ===
models = api.list_models(search="cybersecurity", sort="downloads", limit=5)

info = api.model_info("gpt2")
print(f"ID: {info.id}")
print(f"Author: {info.author}")
print(f"Pipeline tag: {info.pipeline_tag}")
print(f"Tags: {info.tags}")
print(f"Downloads: {info.downloads}")
print(f"Likes: {info.likes}")

# === DATASETS ===
datasets = api.list_datasets(search="security", limit=5)
ds_info = api.dataset_info("squad_v2")

# === SPACES ===
spaces = api.list_spaces(search="chatbot", limit=5)
sp_info = api.space_info("AYI-NEDJIMI/Dataset-Explorer")

# === FILES ===
files = api.list_repo_files("gpt2")
tree = api.list_repo_tree("gpt2")
for item in tree:
    print(f"  {item.path}")

9.2 Repo Operations

# Create a repo
api.create_repo("AYI-NEDJIMI/new-model", repo_type="model", private=False)

# Upload a file
api.upload_file(
    path_or_fileobj="./model.safetensors",
    path_in_repo="model.safetensors",
    repo_id="AYI-NEDJIMI/new-model",
    repo_type="model",
    commit_message="Initial model upload"
)

# Upload a folder
api.upload_folder(
    folder_path="./my_model",
    repo_id="AYI-NEDJIMI/new-model",
    repo_type="model",
    commit_message="Complete model upload"
)

# Delete a file
api.delete_file(
    path_in_repo="old_file.bin",
    repo_id="AYI-NEDJIMI/new-model",
    repo_type="model"
)

10. Programmatic Repo Management

10.1 Automate Publishing

from huggingface_hub import HfApi

api = HfApi(token="hf_your_token")

def publish_model(model_path, repo_id, model_card_content):
    """Publish a model with its model card."""
    api.create_repo(repo_id, repo_type="model", exist_ok=True)

    with open(f"{model_path}/README.md", "w") as f:
        f.write(model_card_content)

    api.upload_folder(
        folder_path=model_path,
        repo_id=repo_id,
        repo_type="model",
        commit_message="Automated model publication"
    )

    print(f"Model published: https://huggingface.co/{repo_id}")

10.2 Monitor Your Repos

def monitor_repos(username):
    """Monitor your repo statistics."""
    api = HfApi()
    models = api.list_models(author=username)
    total_downloads = 0
    total_likes = 0
    for m in models:
        total_downloads += m.downloads or 0
        total_likes += m.likes or 0
        print(f"  Model: {m.id:40s} | DL: {m.downloads:>8,} | Likes: {m.likes:>4}")

    print(f"\n  Total Downloads: {total_downloads:,}")
    print(f"  Total Likes: {total_likes}")

# monitor_repos("AYI-NEDJIMI")

11. SEO Tips for HuggingFace Repos

11.1 Optimize Visibility

  1. Descriptive name: include task, language, architecture

    • Good: cybersec-threat-classifier-bert-english
    • Bad: my-model-v2
  2. Complete tags in YAML:

---
language: en
license: mit
pipeline_tag: text-classification
tags:
  - cybersecurity
  - threat-detection
  - english
  - bert
  - security
datasets:
  - AYI-NEDJIMI/cybersec-threats
---
  1. Rich model card: description, code examples, benchmarks, limitations
  2. Usage examples: directly copyable code
  3. Metrics: include your scores (accuracy, F1, BLEU)
  4. Links: to your papers, website, other repos

11.2 Increase Visibility

  • Share on social media (Twitter/X, LinkedIn)
  • Publish a blog post (HuggingFace Blog, Medium)
  • Submit to relevant leaderboards
  • Participate in community discussions
  • Create thematic collections

12. Monetization and Licensing Strategies

12.1 Common Licenses

License Commercial Use Modifications Sharing
MIT Yes Yes Optional
Apache 2.0 Yes Yes Optional
CC-BY-4.0 Yes Yes Attribution required
CC-BY-NC-4.0 No Yes Attribution required
Llama 3.1 License Yes (<700M users) Yes Specific conditions

12.2 Monetization Strategies

  1. Consulting: offer your AI expertise services
  2. Training: create courses and tutorials
  3. SaaS: deploy your models as a service
  4. Dual Licensing: free version + premium version
  5. Premium Support: technical support for your models

13. HuggingFace for Enterprise

13.1 Private Hub (Enterprise Hub)

  • Dedicated and secure instance
  • SSO with SAML/OIDC
  • Granular roles and permissions
  • Complete audit logs
  • Compliance (SOC2, HIPAA)

13.2 Inference Endpoints

from huggingface_hub import InferenceClient

# Dedicated production endpoint
client = InferenceClient(
    model="https://xyz123.us-east-1.aws.endpoints.huggingface.cloud",
    token="hf_your_token"
)

response = client.text_generation(
    "Analyze this cybersecurity threat:",
    max_new_tokens=200,
    temperature=0.3
)

13.3 Enterprise Benefits

  • Performance: dedicated GPUs, guaranteed latency
  • Security: VPC, encryption, isolation
  • Scalability: auto-scaling 0 to N replicas
  • Support: dedicated team, SLA
  • Compliance: industry certifications

Conclusion

HuggingFace offers much more than the Hub and libraries. Collections, Discussions, Webhooks, CLI, evaluation, and Enterprise features make HuggingFace a complete platform for the entire AI lifecycle: from prototyping to production.

Discover our complete collection: CyberSec AI Portfolio


Tutorial written by AYI-NEDJIMI - AI & Cybersecurity Consultant
For more resources, check out our other tutorials in this series.

Sign up or log in to comment