Spaces:

AYI-NEDJIMI
/

compliance-checker

Paused

App Files Files Community

Advanced HuggingFace Features - Comprehensive Guide (Full Tutorial)

by AYI-NEDJIMI - opened Feb 18

Discussion

AYI-NEDJIMI

Owner Feb 18

Advanced HuggingFace Features - Comprehensive Guide

Author: AYI-NEDJIMI | AI & Cybersecurity Consultant

This tutorial covers advanced HuggingFace features: Collections, Discussions, Webhooks, CLI, Git LFS, model evaluation, Leaderboards, Hub API, programmatic repo management, SEO tips, monetization, and Enterprise solutions.

1. Collections: Create, Organize, Share

Collections let you group repos (models, datasets, Spaces) thematically.

1.1 Create a Collection

from huggingface_hub import HfApi

api = HfApi(token="hf_your_token")

# Create a collection
collection = api.create_collection(
    title="CyberSec AI Toolkit",
    description="Complete collection of AI tools for cybersecurity: models, datasets, and applications.",
    private=False
)
print(f"Collection created: {collection.slug}")

# Add items to the collection
api.add_collection_item(
    collection_slug=collection.slug,
    item_id="AYI-NEDJIMI/cybersec-threat-classifier",
    item_type="model",
    note="Cybersec threat classifier based on BERT"
)

api.add_collection_item(
    collection_slug=collection.slug,
    item_id="AYI-NEDJIMI/Dataset-Explorer",
    item_type="space",
    note="Interactive dataset explorer"
)

1.2 Manage a Collection

# List your collections
collections = api.list_collections(owner="AYI-NEDJIMI")
for c in collections:
    print(f"  {c.title} ({c.slug}) - {len(c.items)} items")

# Update a collection
api.update_collection_metadata(
    collection_slug=collection.slug,
    title="CyberSec AI Toolkit 2026",
    description="Updated collection with the latest models"
)

Discover our collection: CyberSec AI Portfolio

2. Discussions and Community Features

2.1 Repo Discussions

Every model, dataset, and Space has a Discussions tab:

from huggingface_hub import HfApi

api = HfApi(token="hf_your_token")

# Create a discussion
discussion = api.create_discussion(
    repo_id="AYI-NEDJIMI/Dataset-Explorer",
    repo_type="space",
    title="Improvement suggestion",
    description="Here's a suggestion to improve the explorer..."
)
print(f"Discussion created: {discussion.url}")

# Comment on a discussion
api.comment_discussion(
    repo_id="AYI-NEDJIMI/Dataset-Explorer",
    repo_type="space",
    discussion_num=discussion.num,
    comment="Thanks for this suggestion! We'll implement it."
)

# List discussions
discussions = api.get_repo_discussions(
    repo_id="AYI-NEDJIMI/Dataset-Explorer",
    repo_type="space"
)
for d in discussions:
    print(f"  #{d.num}: {d.title} ({d.status})")

2.2 Pull Requests

HuggingFace repos support Pull Requests like GitHub:

# Create a PR
# api.create_pull_request(
#     repo_id="AYI-NEDJIMI/my-model",
#     title="Improve model card",
#     description="Added usage examples and benchmarks"
# )

3. Webhooks and CI/CD

3.1 Webhooks

Webhooks let you receive notifications when a repo is modified:

# Configure via web interface:
# 1. Settings > Webhooks
# 2. Your endpoint URL
# 3. Events: push, discussion, etc.

# Example Flask server to receive webhooks
from flask import Flask, request, jsonify

app = Flask(__name__)

@app .route("/webhook", methods=["POST"])
def handle_webhook():
    payload = request.json
    event = request.headers.get("X-Webhook-Event")

    if event == "repo.content.push":
        repo = payload.get("repo", {})
        print(f"New push on {repo.get('name')}")
        # Trigger your CI/CD pipeline here

    return jsonify({"status": "ok"}), 200

3.2 CI/CD with GitHub Actions

# .github/workflows/deploy-to-hf.yml
name: Deploy to HuggingFace
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Push to HuggingFace
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: |
          pip install huggingface_hub
          huggingface-cli upload AYI-NEDJIMI/my-space . . --repo-type space

4. HuggingFace CLI (huggingface-cli)

4.1 Essential Commands

# Authentication
huggingface-cli login
huggingface-cli whoami

# Download
huggingface-cli download gpt2
huggingface-cli download gpt2 config.json  # Single file
huggingface-cli download gpt2 --local-dir ./gpt2  # To a folder

# Upload
huggingface-cli upload AYI-NEDJIMI/my-model ./local_dir
huggingface-cli upload AYI-NEDJIMI/my-model model.safetensors --repo-type model

# Create a repo
huggingface-cli repo create my-new-model --type model
huggingface-cli repo create my-dataset --type dataset
huggingface-cli repo create my-space --type space -y

# Cache management
huggingface-cli scan-cache
huggingface-cli delete-cache

# Environment info
huggingface-cli env

4.2 Cache Management

# View cache
huggingface-cli scan-cache
# Output:
# REPO ID  REPO TYPE  SIZE    NB FILES  LAST_ACCESSED  REFS  LOCAL PATH
# gpt2     model      548.1M  11        1 day ago      main  ~/.cache/huggingface/hub/models--gpt2

# Clean cache
huggingface-cli delete-cache

# Set cache directory
export HF_HOME=/path/to/cache
export HF_HUB_CACHE=/path/to/cache/hub

5. Git LFS and Version Control

5.1 How It Works

HuggingFace repos use Git + Git LFS:

Git: for text files (configs, code, README)
Git LFS: for large files (models, datasets)

# Clone a repo
git clone https://huggingface.co/gpt2
cd gpt2

# Large files are LFS pointers
cat model.safetensors  # Shows LFS pointer

# Download actual files
git lfs pull

# Configure LFS for new types
git lfs track "*.safetensors"
git lfs track "*.bin"
git lfs track "*.gguf"
git add .gitattributes

5.2 Branches and Tags

# Create a branch for a version
git checkout -b v2.0
git add .
git commit -m "v2.0: new model"
git push origin v2.0

# Tags for versions
git tag v2.0.0
git push origin v2.0.0

6. Model Evaluation with the evaluate Library

import evaluate

# Load metrics
accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")
bleu = evaluate.load("bleu")
rouge = evaluate.load("rouge")

# Compute accuracy
predictions = [0, 1, 1, 0, 1, 0]
references = [0, 1, 0, 0, 1, 1]

result = accuracy.compute(predictions=predictions, references=references)
print(f"Accuracy: {result['accuracy']:.4f}")

# Compute F1
result = f1.compute(predictions=predictions, references=references, average="weighted")
print(f"F1: {result['f1']:.4f}")

# Compute BLEU (for translation/generation)
predictions_text = ["the cat sat on the mat"]
references_text = [["the cat is sitting on the mat"]]
result = bleu.compute(predictions=predictions_text, references=references_text)
print(f"BLEU: {result['bleu']:.4f}")

# Compute ROUGE (for summarization)
predictions_text = ["The cat sat on the mat."]
references_text = ["The cat is sitting on the mat."]
result = rouge.compute(predictions=predictions_text, references=references_text)
print(f"ROUGE-1: {result['rouge1']:.4f}")
print(f"ROUGE-L: {result['rougeL']:.4f}")

# Combine multiple metrics
combined = evaluate.combine(["accuracy", "f1", "precision", "recall"])
result = combined.compute(predictions=predictions, references=references)
print(result)

7. Leaderboards (Open LLM Leaderboard)

7.1 Open LLM Leaderboard

The Open LLM Leaderboard is the reference for comparing open-source LLMs:

huggingface.co/spaces/open-llm-leaderboard

Benchmarks used:

MMLU: general knowledge (57 subjects)
ARC: scientific reasoning
HellaSwag: sentence completion
TruthfulQA: answer truthfulness
Winogrande: common sense reasoning
GSM8K: mathematics

7.2 Submit Your Model

To submit a model to the leaderboard:

Upload your model to the Hub
Ensure it works with transformers
Submit via the leaderboard form
Wait for evaluation (~24-48h)

8. Papers with Code Integration

HuggingFace integrates research papers with their implementations:

Daily Papers: AI papers selected daily by the community
Links: each paper can be linked to models and datasets
Discussions: community comments on each paper

# Link a paper to your model (in model card YAML)
---
paper: arxiv:2401.12345
---

9. HF Hub API (Python SDK)

9.1 Complete API

from huggingface_hub import HfApi

api = HfApi(token="hf_your_token")

# === MODELS ===
models = api.list_models(search="cybersecurity", sort="downloads", limit=5)

info = api.model_info("gpt2")
print(f"ID: {info.id}")
print(f"Author: {info.author}")
print(f"Pipeline tag: {info.pipeline_tag}")
print(f"Tags: {info.tags}")
print(f"Downloads: {info.downloads}")
print(f"Likes: {info.likes}")

# === DATASETS ===
datasets = api.list_datasets(search="security", limit=5)
ds_info = api.dataset_info("squad_v2")

# === SPACES ===
spaces = api.list_spaces(search="chatbot", limit=5)
sp_info = api.space_info("AYI-NEDJIMI/Dataset-Explorer")

# === FILES ===
files = api.list_repo_files("gpt2")
tree = api.list_repo_tree("gpt2")
for item in tree:
    print(f"  {item.path}")

9.2 Repo Operations

# Create a repo
api.create_repo("AYI-NEDJIMI/new-model", repo_type="model", private=False)

# Upload a file
api.upload_file(
    path_or_fileobj="./model.safetensors",
    path_in_repo="model.safetensors",
    repo_id="AYI-NEDJIMI/new-model",
    repo_type="model",
    commit_message="Initial model upload"
)

# Upload a folder
api.upload_folder(
    folder_path="./my_model",
    repo_id="AYI-NEDJIMI/new-model",
    repo_type="model",
    commit_message="Complete model upload"
)

# Delete a file
api.delete_file(
    path_in_repo="old_file.bin",
    repo_id="AYI-NEDJIMI/new-model",
    repo_type="model"
)

10. Programmatic Repo Management

10.1 Automate Publishing

from huggingface_hub import HfApi

api = HfApi(token="hf_your_token")

def publish_model(model_path, repo_id, model_card_content):
    """Publish a model with its model card."""
    api.create_repo(repo_id, repo_type="model", exist_ok=True)

    with open(f"{model_path}/README.md", "w") as f:
        f.write(model_card_content)

    api.upload_folder(
        folder_path=model_path,
        repo_id=repo_id,
        repo_type="model",
        commit_message="Automated model publication"
    )

    print(f"Model published: https://huggingface.co/{repo_id}")

10.2 Monitor Your Repos

def monitor_repos(username):
    """Monitor your repo statistics."""
    api = HfApi()
    models = api.list_models(author=username)
    total_downloads = 0
    total_likes = 0
    for m in models:
        total_downloads += m.downloads or 0
        total_likes += m.likes or 0
        print(f"  Model: {m.id:40s} | DL: {m.downloads:>8,} | Likes: {m.likes:>4}")

    print(f"\n  Total Downloads: {total_downloads:,}")
    print(f"  Total Likes: {total_likes}")

# monitor_repos("AYI-NEDJIMI")

11. SEO Tips for HuggingFace Repos

11.1 Optimize Visibility

Descriptive name: include task, language, architecture
- Good: cybersec-threat-classifier-bert-english
- Bad: my-model-v2
Complete tags in YAML:

---
language: en
license: mit
pipeline_tag: text-classification
tags:
  - cybersecurity
  - threat-detection
  - english
  - bert
  - security
datasets:
  - AYI-NEDJIMI/cybersec-threats
---

Rich model card: description, code examples, benchmarks, limitations
Usage examples: directly copyable code
Metrics: include your scores (accuracy, F1, BLEU)
Links: to your papers, website, other repos

11.2 Increase Visibility

Share on social media (Twitter/X, LinkedIn)
Publish a blog post (HuggingFace Blog, Medium)
Submit to relevant leaderboards
Participate in community discussions
Create thematic collections

12. Monetization and Licensing Strategies

12.1 Common Licenses

License	Commercial Use	Modifications	Sharing
MIT	Yes	Yes	Optional
Apache 2.0	Yes	Yes	Optional
CC-BY-4.0	Yes	Yes	Attribution required
CC-BY-NC-4.0	No	Yes	Attribution required
Llama 3.1 License	Yes (<700M users)	Yes	Specific conditions

12.2 Monetization Strategies

Consulting: offer your AI expertise services
Training: create courses and tutorials
SaaS: deploy your models as a service
Dual Licensing: free version + premium version
Premium Support: technical support for your models

13. HuggingFace for Enterprise

13.1 Private Hub (Enterprise Hub)

Dedicated and secure instance
SSO with SAML/OIDC
Granular roles and permissions
Complete audit logs
Compliance (SOC2, HIPAA)

13.2 Inference Endpoints

from huggingface_hub import InferenceClient

# Dedicated production endpoint
client = InferenceClient(
    model="https://xyz123.us-east-1.aws.endpoints.huggingface.cloud",
    token="hf_your_token"
)

response = client.text_generation(
    "Analyze this cybersecurity threat:",
    max_new_tokens=200,
    temperature=0.3
)

13.3 Enterprise Benefits

Performance: dedicated GPUs, guaranteed latency
Security: VPC, encryption, isolation
Scalability: auto-scaling 0 to N replicas
Support: dedicated team, SLA
Compliance: industry certifications

Conclusion

HuggingFace offers much more than the Hub and libraries. Collections, Discussions, Webhooks, CLI, evaluation, and Enterprise features make HuggingFace a complete platform for the entire AI lifecycle: from prototyping to production.

Discover our complete collection: CyberSec AI Portfolio

Tutorial written by AYI-NEDJIMI - AI & Cybersecurity Consultant
For more resources, check out our other tutorials in this series.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment