# Machine Learning in Elixir — End-to-End with Bumblebee + Hugging Face

<!-- livebook:{"persist_outputs":true} -->

## Overview

## Skills
- `hf_cli.md` – Hugging Face CLI usage
- `hf_jobs.md` – Running workloads on HF Jobs
- `training_trl.md` – TRL model training
- `hf_dataset_viewer.md` – Dataset Viewer API
- `gradio.md` – Gradio UI integration
- *(Full catalog at https://skills.sh/huggingface/skills)*

This Livebook is a complete end-to-end ML template built on the Elixir ML ecosystem
from _Machine Learning in Elixir_ by Sean Moriarity, with **Bumblebee** as the core
integration layer to the **Hugging Face Hub**.

**What we cover:**

| Section | Library | Task |
|---------|---------|------|
| Foundations | `Nx` | Tensors, gradients, JIT compilation |
| Pre-trained NLP | `Bumblebee` | Fill-mask, sentiment, NER, zero-shot |
| Pre-trained Vision | `Bumblebee` | Image classification (ViT, ResNet) |
| Audio | `Bumblebee` | Speech-to-text (Whisper) |
| Generative AI | `Bumblebee` | Text generation (GPT-2) & Stable Diffusion |
| Embeddings | `Bumblebee` | Sentence similarity search |
| Custom Training | `Axon` | Build & train from scratch |
| Fine-tuning | `Bumblebee` | Boosted training on pre-trained models |
| Serving | `Nx.Serving` | Production batched inference |
| Deployment | `Phoenix` | LiveView integration pattern |
| Interactive UI | `Kino` | Live input forms & charts |

---

## Section 0 — Install & Configure

```elixir
Mix.install([
  {:nx, "~> 0.10"},
  {:axon, "~> 0.7"},
  {:exla, "~> 0.10"},
  {:bumblebee, "~> 0.6"},
  {:kino, "~> 0.15"},
  {:kino_vega_lite, "~> 0.1"},
  {:vega_lite, "~> 0.1"},
  {:stb_image, "~> 0.6"},
  {:req, "~> 0.5"}
])

Nx.global_default_backend(EXLA.Backend)

IO.puts("Nx version:        #{Nx.version()}")
IO.puts("Axon version:      #{Axon.version()}")
IO.puts("EXLA backend:      #{inspect(Nx.default_backend())}")
IO.puts("Bumblebee loaded:  #{Code.ensure_loaded?(Bumblebee)}")
IO.puts("Cache dir:         #{Bumblebee.cache_dir()}")
```

---

## Section 1 — Nx Foundations

Before diving into Bumblebee, let's ground ourselves in Nx — the numerical
backbone that every Elixir ML library builds on.

### 1.1 Tensors

```elixir
import Nx

# Scalars, vectors, matrices, higher-order
scalar = Nx.tensor(3.14)
vector = Nx.tensor([1.0, 2.0, 3.0])
matrix = Nx.tensor([[1, 2, 3], [4, 5, 6]])
cube = Nx.iota({2, 3, 4})

IO.puts("scalar  shape=#{inspect(Nx.shape(scalar))}  type=#{Nx.type(scalar)}")
IO.puts("vector  shape=#{inspect(Nx.shape(vector))}  type=#{Nx.type(vector)}")
IO.puts("matrix  shape=#{inspect(Nx.shape(matrix))}  type=#{Nx.type(matrix)}")
IO.puts("cube    shape=#{inspect(Nx.shape(cube))}  type=#{Nx.type(cube)}")

{scalar, vector, matrix}
```

### 1.2 Operations & Broadcasting

```elixir
a = Nx.tensor([1.0, 2.0, 3.0])
b = Nx.tensor([10.0, 20.0, 30.0])

# Element-wise
IO.puts("add:       #{inspect(Nx.add(a, b))}")
IO.puts("multiply:  #{inspect(Nx.multiply(a, b))}")
IO.puts("pow:       #{inspect(Nx.pow(a, 2))}")

# Reductions
IO.puts("sum:       #{Nx.sum(a)}")
IO.puts("mean:      #{Nx.mean(a)}")
IO.puts("std:       #{Nx.standard_deviation(a)}")

# Dot product
IO.puts("dot:       #{Nx.dot(a, b)}")

# Matrix multiply
m1 = Nx.tensor([[1.0, 2.0], [3.0, 4.0]])
m2 = Nx.tensor([[5.0, 6.0], [7.0, 8.0]])
IO.puts("matmul:    #{inspect(Nx.dot(m1, m2))}")
```

### 1.3 Automatic Differentiation

This is how models learn — computing gradients of loss with respect to parameters.

```elixir
defmodule AutoDiff do
  import Nx.Defn

  # Define a function f(x) = x³ + 2x²
  defnp f(x), do: Nx.pow(x, 3) + 2 * Nx.pow(x, 2)

  # Compute gradient symbolically
  def grad_f(x), do: Nx.Defn.grad(x, &f/1)

  # Gradient of MSE loss
  defnp mse_loss(y_true, y_pred) do
    Nx.mean(Nx.pow(y_true - y_pred, 2))
  end

  def grad_mse(y_true, y_pred, w) do
    Nx.Defn.grad(w, fn weights ->
      predictions = Nx.dot(y_pred, weights)
      mse_loss(y_true, predictions)
    end)
  end
end

x = Nx.tensor(3.0)
IO.puts("f(3)      = #{Nx.to_number(AutoDiff.f(x))}")
IO.puts("f'(3)     = #{Nx.to_number(AutoDiff.grad_f(x))}")
IO.puts("expected  = 3*9 + 2*2*3 = #{3 * 9 + 2 * 2 * 3}")
```

### 1.4 JIT Compilation

```elixir
# JIT compiles for GPU/CPU acceleration — critical for inference speed
defmodule FastMath do
  import Nx.Defn

  defn slow_sigmoid(x) do
    1 / (1 + Nx.exp(-x))
  end
end

# JIT-compiled version
fast_sigmoid = Nx.Defn.jit(&FastMath.slow_sigmoid/1)

input = Nx.tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]])

# Benchmark
{us, result} = :timer.tc(fn -> fast_sigmoid.(input) end)
IO.puts("JIT sigmoid: #{us}μs  result=#{inspect(result)}")
```

---

## Section 2 — Bumblebee: Pre-trained NLP Models

Bumblebee loads pre-trained models from the Hugging Face Hub and wraps them
in `Nx.Serving` for production-ready batched inference.

### 2.1 Fill-Mask (BERT)

```elixir
{:ok, bert_model} = Bumblebee.load_model({:hf, "google-bert/bert-base-uncased"})
{:ok, bert_tokenizer} = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})

bert_fill_mask = Bumblebee.Text.fill_mask(bert_model, bert_tokenizer)

results = Nx.Serving.run(bert_fill_mask, "Elixir is a [MASK] language.")
IO.inspect(results, label: "Fill-Mask Results")
```

### 2.2 Sentiment Analysis (DistilBERT)

```elixir
{:ok, sentiment_model} =
  Bumblebee.load_model({:hf, "distilbert/distilbert-base-uncased-finetuned-sst-2-english"})

{:ok, sentiment_tokenizer} =
  Bumblebee.load_tokenizer({:hf, "distilbert/distilbert-base-uncased-finetuned-sst-2-english"})

sentiment_serving = Bumblebee.Text.Classification.text_classification(
  sentiment_model,
  sentiment_tokenizer
)

texts = [
  "Machine learning in Elixir is amazing!",
  "This tutorial is boring and confusing.",
  "The BEAM VM handles concurrent ML workloads well.",
  "I love how functional programming simplifies ML pipelines."
]

Enum.each(texts, fn text ->
  result = Nx.Serving.run(sentiment_serving, text)
  IO.puts("#{inspect(result)}  ← \"#{String.slice(text, 0..50)}...\"")
end)
```

### 2.3 Named Entity Recognition (BERT-NER)

```elixir
{:ok, ner_model} = Bumblebee.load_model({:hf, "dslim/bert-base-NER"})
{:ok, ner_tokenizer} = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})

ner_serving = Bumblebee.Text.TokenClassification.token_classification(
  ner_model,
  ner_tokenizer
)

ner_text = "Sean Moriarity wrote Machine Learning in Elixir for Pragmatic Bookshelf. He lives in Austin, Texas."
ner_result = Nx.Serving.run(ner_serving, ner_text)

IO.puts("Input: #{ner_text}")
IO.inspect(ner_result, label: "NER Entities")
```

### 2.4 Zero-Shot Classification

No fine-tuning needed — classify arbitrary text into custom categories.

```elixir
{:ok, zs_model} = Bumblebee.load_model({:hf, "facebook/bart-large-mnli"})
{:ok, zs_tokenizer} = Bumblebee.load_tokenizer({:hf, "facebook/bart-large-mnli"})

zs_serving = Bumblebee.Text.ZeroShotClassification.zero_shot_classification(
  zs_model,
  zs_tokenizer
)

article = """
Nx brings numerical computing to the BEAM, enabling machine learning
pipelines that leverage Elixir's concurrency and fault tolerance.
Bumblebee provides access to thousands of pre-trained models from
the Hugging Face Hub directly in Livebook.
"""

labels = ["technology", "sports", "politics", "science", "finance"]
zs_result = Nx.Serving.run(zs_serving, %{text: article, labels: labels})

IO.inspect(zs_result, label: "Zero-Shot Classification")
```

---

## Section 3 — Bumblebee: Vision Models

### 3.1 Image Classification (ViT / ResNet)

```elixir
{:ok, vit_model} = Bumblebee.load_model({:hf, "google/vit-base-patch16-224"})
{:ok, vit_featurizer} = Bumblebee.load_featurizer({:hf, "google/vit-base-patch16-224"})

vit_serving = Bumblebee.Vision.ImageClassification.image_classification(
  vit_model,
  vit_featurizer
)

# Download a sample image
image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
image_data = Req.get!(image_url).body

# Save and load
File.write!("/tmp/sample.jpg", image_data)
{:ok, image} = StbImage.read_file("/tmp/sample.jpg")

IO.puts("Image: #{StbImage.width(image)}x#{StbImage.height(image)}")

img_result = Nx.Serving.run(vit_serving, image)
IO.inspect(img_result, label: "Image Classification")
```

### 3.2 Batch Image Classification

```elixir
# Nx.Serving automatically batches multiple requests for GPU efficiency
images = [image]  # In production, this would be multiple images

batch_result = Nx.Serving.run(vit_serving, images)
IO.inspect(batch_result, label: "Batch Classification")
```

---

## Section 4 — Bumblebee: Text Generation

### 4.1 GPT-2 Text Generation

```elixir
{:ok, gpt2_model} = Bumblebee.load_model({:hf, "openai-community/gpt2"})
{:ok, gpt2_tokenizer} = Bumblebee.load_tokenizer({:hf, "openai-community/gpt2"})
{:ok, gpt2_generation_config} = Bumblebee.load_generation_config({:hf, "openai-community/gpt2"})

gpt2_serving = Bumblebee.Text.generation(
  gpt2_model,
  gpt2_tokenizer,
  gpt2_generation_config,
  compile: [batch_size: 1, sequence_length: 64],
  defn_options: [compiler: EXLA]
)

prompt = "Machine learning in Elixir is"
gen_result = Nx.Serving.run(gpt2_serving, prompt)

IO.puts("Prompt: #{prompt}")
IO.puts("Output: #{inspect(gen_result)}")
```

### 4.2 Interactive Text Generation

<!-- livebook:{"attrs":{"source":"# Interactive Text Generation\nalias Kino.Input\n\nprompt_input = Kino.Input.text(\"Prompt\", default: \"The future of ML in Elixir is\")\nmax_tokens = Kino.Input.number(\"Max tokens\", default: 50)\n\nform = Kino.Control.form(\n  %{prompt: prompt_input, max_tokens: max_tokens},\n  submit: \"Generate\"\n)\n\nKino.listen(form, fn %{data: %{prompt: prompt, max_tokens: max_tokens}} ->\n  config = Bumblebee.configure(gpt2_generation_config, max_new_tokens: trunc(max_tokens))\n  serving = Bumblebee.Text.generation(gpt2_model, gpt2_tokenizer, config)\n  result = Nx.Serving.run(serving, prompt)\n  Kino.Text.new(\"#{prompt}#{result.text}\")\nend)\n\nform","title":"GPT-2 Generator"},"chunks":[{"chunk":"","type":"Elixir"}],"kind":"Elixir","source_type":"cell"} -->

```elixir
alias Kino.Input

prompt_input = Kino.Input.text("Prompt", default: "The future of ML in Elixir is")
max_tokens_input = Kino.Input.number("Max tokens", default: 50)

form =
  Kino.Control.form(
    %{prompt: prompt_input, max_tokens: max_tokens_input},
    submit: "Generate"
  )

Kino.listen(form, fn %{data: %{prompt: prompt, max_tokens: max_tokens}} ->
  config = Bumblebee.configure(gpt2_generation_config, max_new_tokens: trunc(max_tokens))

  serving =
    Bumblebee.Text.generation(
      gpt2_model,
      gpt2_tokenizer,
      config,
      defn_options: [compiler: EXLA]
    )

  result = Nx.Serving.run(serving, prompt)
  Kino.Text.new("#{prompt}#{result.text}")
end)

form
```

---

## Section 5 — Bumblebee: Embeddings & Similarity

### 5.1 Sentence Embeddings

```elixir
{:ok, emb_model} = Bumblebee.load_model({:hf, "sentence-transformers/all-MiniLM-L6-v2"})
{:ok, emb_tokenizer} = Bumblebee.load_tokenizer({:hf, "sentence-transformers/all-MiniLM-L6-v2"})

embedding_serving = Bumblebee.Text.TextEmbedding.text_embedding(
  emb_model,
  emb_tokenizer
)

sentences = [
  "Nx provides numerical computing for Elixir",
  "Axon is a neural network library built on Nx",
  "Bumblebee connects Elixir to the Hugging Face Hub",
  "I enjoy cooking Italian food on weekends",
  "The weather forecast predicts rain tomorrow"
]

embeddings =
  Enum.map(sentences, fn s ->
    result = Nx.Serving.run(embedding_serving, s)
    result.embedding
  end)

IO.puts("Generated #{length(embeddings)} embeddings")
IO.puts("Embedding dim: #{inspect(Nx.shape(hd(embeddings)))}")
```

### 5.2 Cosine Similarity Search

```elixir
defmodule Similarity do
  import Nx

  defn cosine_similarity(a, b) do
    a_norm = a / Nx.sqrt(Nx.sum(a * a))
    b_norm = b / Nx.sqrt(Nx.sum(b * b))
    Nx.sum(a_norm * b_norm)
  end

  def find_most_similar(query_embedding, corpus_embeddings) do
    corpus_embeddings
    |> Enum.map(fn emb -> Nx.to_number(cosine_similarity(query_embedding, emb)) end)
    |> Enum.with_index()
    |> Enum.sort_by(fn {score, _idx} -> -score end)
  end
end

query = "How do I build neural networks in Elixir?"
query_emb = Nx.Serving.run(embedding_serving, query).embedding

IO.puts("Query: \"#{query}\"\n")

Similarity.find_most_similar(query_emb, embeddings)
|> Enum.each(fn {score, idx} ->
  IO.puts("  #{Float.round(score, 4)}  #{Enum.at(sentences, idx)}")
end)
```

---

## Section 5.5 — Bumblebee: Audio (Whisper Speech-to-Text)

Bumblebee wraps OpenAI's Whisper for speech-to-text directly in Elixir.

```elixir
{:ok, whisper_model} = Bumblebee.load_model({:hf, "openai/whisper-tiny"})
{:ok, whisper_featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
{:ok, whisper_tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-tiny"})
{:ok, whisper_generation_config} = Bumblebee.load_generation_config({:hf, "openai/whisper-tiny"})

whisper_serving = Bumblebee.Audio.speech_to_text(
  whisper_model,
  whisper_featurizer,
  whisper_tokenizer,
  whisper_generation_config
)

# Download a sample audio file
audio_url = "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"
audio_data = Req.get!(audio_url).body
File.write!("/tmp/sample_audio.flac", audio_data)

# Transcribe
{:ok, audio_info} = Bumblebee.Audio.LoadedAudio.from_file("/tmp/sample_audio.flac")
whisper_result = Nx.Serving.run(whisper_serving, audio_info)

IO.puts("Transcription: #{whisper_result.text}")
```

### Interactive Audio Transcription

```elixir
audio_file_input = Kino.Input.file("Upload audio (WAV/FLAC/MP3)")

audio_form = Kino.Control.form(%{file: audio_file_input}, submit: "Transcribe")

Kino.listen(audio_form, fn %{data: %{file: file}} ->
  if file do
    {:ok, audio} = Bumblebee.Audio.LoadedAudio.from_file(file.path)
    result = Nx.Serving.run(whisper_serving, audio)
    Kino.Text.new(result.text)
  else
    Kino.Text.new("Please upload an audio file.")
  end
end)

audio_form
```

---

## Section 5.6 — Bumblebee: Stable Diffusion (Image Generation)

Generate images from text prompts using Stable Diffusion — all within Elixir.

> **Note:** This section requires a GPU with 4GB+ VRAM. On CPU it will be very slow.

```elixir
{:ok, sd_info} = Bumblebee.load_model({:hf, "CompVis/stable-diffusion-v1-4", subdir: "unet"})
{:ok, sd_vae} = Bumblebee.load_model({:hf, "CompVis/stable-diffusion-v1-4", subdir: "vae"})
{:ok, sd_clip} = Bumblebee.load_model({:hf, "CompVis/stable-diffusion-v1-4", subdir: "text_encoder"})
{:ok, sd_tokenizer} = Bumblebee.load_tokenizer({:hf, "CompVis/stable-diffusion-v1-4", subdir: "tokenizer"})
{:ok, sd_scheduler} = Bumblebee.load_scheduler({:hf, "CompVis/stable-diffusion-v1-4", subdir: "scheduler"})

# Image generation serving
sd_serving = Bumblebee.Diffusion.StableDiffusion.text_to_image(
  sd_info,
  sd_vae,
  sd_clip,
  sd_tokenizer,
  sd_scheduler,
  num_steps: 20,
  guidance_scale: 7.5
)

# Generate
sd_result = Nx.Serving.run(sd_serving, %{
  prompt: "a photograph of a bee programming in elixir, highly detailed, 4k",
  negative_prompt: "blurry, low quality"
})

# Display the generated image
Kino.Image.new(sd_result.image)
```

### Interactive Image Generation

```elixir
prompt_input = Kino.Input.text("Prompt", default: "a cute robot bee coding in Elixir")
neg_input = Kino.Input.text("Negative prompt", default: "blurry, ugly, low quality")
steps_input = Kino.Input.number("Steps", default: 20)

sd_form = Kino.Control.form(
  %{prompt: prompt_input, negative: neg_input, steps: steps_input},
  submit: "Generate"
)

Kino.listen(sd_form, fn %{data: %{prompt: prompt, negative: negative, steps: steps}} ->
  result = Nx.Serving.run(sd_serving, %{
    prompt: prompt,
    negative_prompt: negative,
    num_steps: trunc(steps)
  })
  Kino.Layout.grid([Kino.Image.new(result.image)], columns: 1)
end)

sd_form
```

---

## Section 6 — Custom Training with Axon

Beyond pre-trained models, Axon lets you build and train from scratch.

### 6.1 Synthetic Data

```elixir
defmodule Data do
  import Nx

  def make_classification(n \\ 2000, features \\ 4, classes \\ 3, seed \\ 42) do
    key = Nx.Random.key(seed)
    {centers, key} = Nx.Random.normal(key, 0, 2, shape: {classes, features})
    {labels_raw, key} = Nx.Random.randint(key, 0, classes, shape: {n})
    {noise, _key} = Nx.Random.normal(key, 0, 0.4, shape: {n, features})

    x = Nx.take(centers, labels_raw) |> Nx.add(noise)
    y = Nx.equal(Nx.new_axis(labels_raw, 1), Nx.iota({1, classes})) |> Nx.as_type(:f32)

    # Normalize
    mu = Nx.mean(x, axes: [0])
    sigma = Nx.standard_deviation(x, axes: [0])
    x_norm = Nx.divide(Nx.subtract(x, mu), sigma)

    # Split
    split = round(n * 0.8)
    {{x_norm[0..(split - 1)//1], y[0..(split - 1)//1]},
     {x_norm[split..(n - 1)//1], y[split..(n - 1)//1]}}
  end
end

{{x_train, y_train}, {x_test, y_test} = _test_data} = Data.make_classification()
IO.puts("Train: #{Nx.axis_size(x_train, 0)} | Test: #{Nx.axis_size(x_test, 0)}")
IO.puts("Features: #{Nx.axis_size(x_train, 1)} | Classes: #{Nx.axis_size(y_train, 1)}")
```

### 6.2 Define Model

```elixir
import Axon

n_features = Nx.axis_size(x_train, 1)
n_classes = Nx.axis_size(y_train, 1)

model =
  Axon.input("features", shape: {nil, n_features})
  |> Axon.dense(64, activation: :relu, name: "hidden_1")
  |> Axon.batch_norm(name: "bn_1")
  |> Axon.dropout(rate: 0.2, name: "drop_1")
  |> Axon.dense(32, activation: :relu, name: "hidden_2")
  |> Axon.batch_norm(name: "bn_2")
  |> Axon.dropout(rate: 0.2, name: "drop_2")
  |> Axon.dense(n_classes, activation: :softmax, name: "output")

Axon.Display.as_table(model, Nx.template({1, n_features}, :f32)) |> IO.puts()
```

### 6.3 Train

```elixir
train_data =
  x_train
  |> Nx.to_batched(64)
  |> Enum.zip(Nx.to_batched(y_train, 64))
  |> Stream.map(fn {xb, yb} -> %{"features" => xb, "targets" => yb} end)

val_data =
  x_test
  |> Nx.to_batched(64)
  |> Enum.zip(Nx.to_batched(y_test, 64))
  |> Stream.map(fn {xb, yb} -> %{"features" => xb, "targets" => yb} end)

trained_state =
  model
  |> Axon.Loop.trainer(:categorical_cross_entropy, Axon.Optimizers.adam(0.001))
  |> Axon.Loop.metric(:accuracy, "acc")
  |> Axon.Loop.validate(model, val_data)
  |> Axon.Loop.early_stopping("validation_loss", patience: 5, mode: :min)
  |> Axon.Loop.run(train_data, %{}, epochs: 30, compiler: EXLA)

IO.puts("Training complete!")
```

### 6.4 Evaluate & Predict

```elixir
# JIT-compiled prediction
predict_fn = Nx.Defn.jit(fn params, input -> Axon.predict(model, params, input) end)

# Evaluate on test set
test_preds = predict_fn.(trained_state, x_test)
pred_classes = Nx.argmax(test_preds, axis: 1)
true_classes = Nx.argmax(y_test, axis: 1)

accuracy = Nx.mean(Nx.equal(pred_classes, true_classes) |> Nx.as_type(:f32))
IO.puts("Test accuracy: #{Float.round(Nx.to_number(accuracy) * 100, 2)}%")

# Single prediction
sample = x_test[0]
probs = predict_fn.(trained_state, Nx.new_axis(sample, 0)) |> Nx.squeeze()
IO.puts("Sample prediction: class #{Nx.argmax(probs) |> Nx.to_number()}  probs=#{inspect(Nx.to_flat_list(probs) |> Enum.map(&Float.round(&1, 4)))}")
```

### 6.5 Visualize Training

```elixir
# Visualize predictions vs actual on first 2 features
alias VegaLite, as: Vl

scatter_data =
  Enum.map(0..(min(Nx.axis_size(x_test, 0), 500) - 1), fn i ->
    %{
      "f1" => Nx.to_number(x_test[i][0]),
      "f2" => Nx.to_number(x_test[i][1]),
      "actual" => Nx.to_number(Nx.argmax(y_test[i])),
      "predicted" => Nx.to_number(pred_classes[i])
    }
  end)

Vl.new(
  title: "Test Predictions (first 2 features)",
  width: 500,
  height: 400
)
|> Vl.data_from_values(scatter_data)
|> Vl.layers([
  Vl.new()
  |> Vl.mark(:circle, opacity: 0.7, size: 40)
  |> Vl.encode_field(:x, "f1", type: :quantitative)
  |> Vl.encode_field(:y, "f2", type: :quantitative)
  |> Vl.encode_field(:color, "actual", type: :nominal, title: "Actual"),
  Vl.new()
  |> Vl.mark(:point, opacity: 0.3, size: 20, shape: "cross")
  |> Vl.encode_field(:x, "f1", type: :quantitative)
  |> Vl.encode_field(:y, "f2", type: :quantitative)
  |> Vl.encode_field(:color, "predicted", type: :nominal, title: "Predicted")
])
```

---

## Section 7 — Nx.Serving: Production Inference

`Nx.Serving` batches requests from multiple clients and runs them efficiently
on GPU — essential for production deployment.

### 7.1 Serving Architecture

```elixir
# Start a named serving process (typically in your Application supervisor)
# In production, you would do:
#
#   Nx.Serving.start_link(
#     serving: sentiment_serving,
#     name: MyApp.SentimentServing,
#     batch_size: 16,
#     batch_timeout: 100
#   )
#
# Then in your Phoenix controller:
#
#   Nx.Serving.run(MyApp.SentimentServing, text)

# For demonstration, run directly:
IO.puts("Serving is stateless — just call Nx.Serving.run/2")
IO.puts("For production, wrap in Nx.Serving.start_link/1 for automatic batching")
```

### 7.2 Benchmark Inference

```elixir
# Benchmark a single inference
{us_single, _} = :timer.tc(fn ->
  Nx.Serving.run(sentiment_serving, "This is a test sentence for benchmarking.")
end)

IO.puts("Single inference: #{Float.round(us_single / 1000, 2)}ms")

# Benchmark batch
batch = for i <- 1..8, do: "Test sentence number #{i} for batch benchmarking."
{us_batch, _} = :timer.tc(fn ->
  Enum.map(batch, fn text -> Nx.Serving.run(sentiment_serving, text) end)
end)

IO.puts("8 sequential inferences: #{Float.round(us_batch / 1000, 2)}ms")
IO.puts("Per-request (batched):   #{Float.round(us_batch / 8000, 2)}ms")
```

---

## Section 7.5 — Fine-tuning with Bumblebee

Bumblebee supports **boosted training** — taking a pre-trained model head and
fine-tuning it on your own labeled data. This is far more practical than
training from scratch when you have limited data.

### 7.5.1 Load Pre-trained Model for Fine-tuning

```elixir
# Load a pre-trained BERT with a classification head
{:ok, ft_spec} = Bumblebee.load_spec({:hf, "google-bert/bert-base-uncased"},
  module: Bumblebee.Text.BertForSequenceClassification
)

# Configure for your number of classes
ft_spec = Bumblebee.configure(ft_spec, num_labels: 3)

# Load with the custom spec — only the encoder weights are pre-trained,
# the classification head is randomly initialized
{:ok, ft_model_info} = Bumblebee.load_model({:hf, "google-bert/bert-base-uncased"},
  spec: ft_spec
)

%{model: ft_model, params: ft_params} = ft_model_info
IO.puts("Model loaded. Pre-trained encoder + fresh classification head.")
IO.puts("Total params: #{inspect(Nx.size(ft_params.parameters))}")
```

### 7.5.2 Prepare Labeled Data

```elixir
# Your labeled dataset — in practice, load from CSV/JSON
training_texts = [
  "The BEAM VM provides fault-tolerant concurrent computing",
  "Nx brings numerical computing to the Elixir ecosystem",
  "Phoenix LiveView enables real-time web applications",
  "Bumblebee integrates Hugging Face models into Elixir",
  "Axon provides a functional API for neural networks",
  "The weather is sunny and warm today",
  "Football season starts next month",
  "The stock market rallied on positive earnings reports",
  # ... add more samples per class
]

training_labels = [0, 0, 0, 0, 0, 1, 1, 2]  # 0=tech, 1=sports, 2=finance

# Tokenize
{:ok, ft_tokenizer} = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})

encoded = Bumblebee.apply_tokenizer(ft_tokenizer, training_texts)
IO.inspect(encoded, label: "Tokenized input")
```

### 7.5.3 Training Loop

```elixir
defmodule FineTuner do
  import Nx.Defn

  defn cross_entropy_loss(logits, labels) do
    log_probs = Axon.Activations.log_softmax(logits, axis: -1)
    -Nx.mean(Nx.sum(log_probs * labels, axes: [-1]))
  end

  def train_step(model, params, batch, learning_rate \\ 2.0e-5) do
    {loss, gradient} = Nx.Defn.value_and_grad(params, fn p ->
      output = Axon.predict(model, p, batch)
      cross_entropy_loss(output.logits, batch["labels"])
    end)

    new_params =
      Map.new(params, fn {k, v} ->
        {k, Nx.subtract(v, Nx.multiply(learning_rate, gradient[k] || 0))}
      end)

    {loss, new_params}
  end
end

# Example training (simplified — real training uses Axon.Loop)
epochs = 3
batch_size = 4

for epoch <- 1..epochs do
  encoded
  |> Nx.to_batched(batch_size)
  |> Enum.reduce(ft_params, fn batch, params ->
    labels = Nx.eye(3) |> Nx.take(Nx.tensor(Enum.take(training_labels, batch_size)))
    batch_with_labels = Map.put(batch, "labels", labels)

    {loss, new_params} = FineTuner.train_step(ft_model, params, batch_with_labels)

    if rem(epoch, 1) == 0 do
      IO.puts("Epoch #{epoch}, Loss: #{Float.round(Nx.to_number(loss), 4)}")
    end

    new_params
  end)
end
```

> **Tip:** For production fine-tuning, use `Axon.Loop` with proper data
> pipelines, learning rate scheduling, and mixed precision. The above
> demonstrates the concept — see `Bumblebee` examples on GitHub for
> full fine-tuning recipes.

---

## Section 7.6 — Model Export (ONNX / GGUF)

### 7.6.1 Export a Bumblebee model to ONNX

```elixir
# Load a pretrained model
{:ok, spec} = Bumblebee.load_spec({:hf, "distilbert/bert-base-uncased"}, module: Bumblebee.Text.BertForSequenceClassification)
{:ok, model_info} = Bumblebee.load_model({:hf, "distilbert/bert-base-uncased"}, spec: spec)

# Export to ONNX (requires `onnx` library installed)
Bumblebee.export(model_info.model, format: :onnx, path: "distilbert.onnx")
```

The resulting `distilbert.onnx` can be loaded in any ONNX runtime.

### 7.6.2 Export to GGUF (via `gguf-converter`)

```bash
gguf-converter --onnx distilbert.onnx --output distilbert.gguf
```

> **Note:** GGUF is primarily for decoder models (e.g., Llama). Ensure compatibility.

---

## Section 7.7 — Phoenix LiveView Integration

Deploy ML models into Phoenix web apps using `Nx.Serving` and LiveView.

### 7.7.1 Application Supervision Tree

```elixir
# In your Phoenix app's application.ex:
defmodule MyApp.Application do
  use Application

  @impl true
  def start(_type, _args) do
    children = [
      MyAppWeb.Telemetry,
      {DNSCluster, query: Application.get_env(:my_app, :dns_cluster_query) || :ignore},
      {Phoenix.PubSub, name: MyApp.PubSub},

      # ─── ML Servings ───────────────────────────────────────
      # Sentiment analysis serving
      {Nx.Serving,
       serving: sentiment_serving(),
       name: MyApp.SentimentServing,
       batch_size: 16,
       batch_timeout: 100},

      # Image classification serving
      {Nx.Serving,
       serving: image_serving(),
       name: MyApp.ImageServing,
       batch_size: 8,
       batch_timeout: 200},

      # Embedding serving (for similarity search)
      {Nx.Serving,
       serving: embedding_serving(),
       name: MyApp.EmbeddingServing,
       batch_size: 32,
       batch_timeout: 50},

      MyAppWeb.Endpoint
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

  defp sentiment_serving do
    {:ok, model} = Bumblebee.load_model({:hf, "distilbert/distilbert-base-uncased-finetuned-sst-2-english"})
    {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "distilbert/distilbert-base-uncased-finetuned-sst-2-english"})
    Bumblebee.Text.Classification.text_classification(model, tokenizer)
  end

  defp image_serving do
    {:ok, model} = Bumblebee.load_model({:hf, "google/vit-base-patch16-224"})
    {:ok, featurizer} = Bumblebee.load_featurizer({:hf, "google/vit-base-patch16-224"})
    Bumblebee.Vision.ImageClassification.image_classification(model, featurizer)
  end

  defp embedding_serving do
    {:ok, model} = Bumblebee.load_model({:hf, "sentence-transformers/all-MiniLM-L6-v2"})
    {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "sentence-transformers/all-MiniLM-L6-v2"})
    Bumblebee.Text.TextEmbedding.text_embedding(model, tokenizer)
  end
end
```

### 7.7.2 LiveView for Sentiment Analysis

```elixir
# lib/my_app_web/live/sentiment_live.ex
defmodule MyAppWeb.SentimentLive do
  use MyAppWeb, :live_view

  def mount(_params, _session, socket) do
    {:ok, assign(socket, text: "", result: nil, loading: false)}
  end

  def handle_event("analyze", %{"text" => text}, socket) do
    # Nx.Serving.run is synchronous and fast (batches with other requests)
    result = Nx.Serving.run(MyApp.SentimentServing, text)

    label = hd(result.predictions)
    {:noreply, assign(socket,
      text: text,
      result: %{
        label: label.label,
        score: Float.round(label.score * 100, 1)
      },
      loading: false
    )}
  end

  def handle_event("update_text", %{"text" => text}, socket) do
    {:noreply, assign(socket, text: text)}
  end

  def render(assigns) do
    ~H"""
    <div class="max-w-lg mx-auto p-6">
      <h1 class="text-2xl font-bold mb-4">🐝 Sentiment Analysis</h1>

      <form phx-submit="analyze">
        <textarea
          name="text"
          phx-change="update_text"
          class="w-full p-3 border rounded-lg"
          rows="3"
          placeholder="Enter text to analyze..."
        ><%= @text %></textarea>
        <button type="submit" class="mt-2 px-4 py-2 bg-purple-600 text-white rounded-lg">
          Analyze
        </button>
      </form>

      <%= if @result do %>
        <div class="mt-4 p-4 bg-gray-100 rounded-lg">
          <p class="text-lg">
            <strong><%= @result.label %></strong>
            — <%= @result.score %>%
          </p>
        </div>
      <% end %>
    </div>
    """
  end
end
```

### 7.7.3 LiveView for Image Classification

```elixir
# lib/my_app_web/live/vision_live.ex
defmodule MyAppWeb.VisionLive do
  use MyAppWeb, :live_view

  def mount(_params, _session, socket) do
    {:ok, assign(socket, predictions: nil)}
  end

  def handle_event("classify", %{"image" => %Plug.Upload{path: path}}, socket) do
    {:ok, image} = StbImage.read_file(path)

    result = Nx.Serving.run(MyApp.ImageServing, image)

    predictions =
      result.predictions
      |> Enum.take(5)
      |> Enum.map(fn %{label: label, score: score} ->
        %{label: label, score: Float.round(score * 100, 1)}
      end)

    {:noreply, assign(socket, predictions: predictions)}
  end

  def render(assigns) do
    ~H"""
    <div class="max-w-lg mx-auto p-6">
      <h1 class="text-2xl font-bold mb-4">🖼️ Image Classification</h1>

      <form phx-submit="classify" multipart>
        <input type="file" name="image" accept="image/*" class="mb-2" />
        <button type="submit" class="px-4 py-2 bg-purple-600 text-white rounded-lg">
          Classify
        </button>
      </form>

      <%= if @predictions do %>
        <div class="mt-4">
          <%= for pred <- @predictions do %>
            <div class="flex justify-between py-1 border-b">
              <span><%= pred.label %></span>
              <span class="font-mono"><%= pred.score %>%</span>
            </div>
          <% end %>
        </div>
      <% end %>
    </div>
    """
  end
end
```

### 7.7.4 API Endpoint (REST)

```elixir
# lib/my_app_web/controllers/prediction_controller.ex
defmodule MyAppWeb.PredictionController do
  use MyAppWeb, :controller

  def sentiment(conn, %{"text" => text}) do
    result = Nx.Serving.run(MyApp.SentimentServing, text)

    json(conn, %{
      predictions: Enum.map(result.predictions, fn p ->
        %{label: p.label, score: Float.round(p.score, 4)}
      end)
    })
  end

  def embed(conn, %{"text" => text}) do
    result = Nx.Serving.run(MyApp.EmbeddingServing, text)

    json(conn, %{
      embedding: Nx.to_flat_list(result.embedding),
      dimensions: Nx.size(result.embedding)
    })
  end
end

# In router.ex:
# scope "/api", MyAppWeb do
#   post "/sentiment", PredictionController, :sentiment
#   post "/embed", PredictionController, :embed
# end
```

---

## Section 8 — Interactive Playground

### 8.1 Text Classification UI

```elixir
alias Kino.Input

text_input = Kino.Input.textarea("Enter text to classify", default: "Elixir is the best language for building scalable ML systems!")

class_form = Kino.Control.form(%{text: text_input}, submit: "Classify Sentiment")

Kino.listen(class_form, fn %{data: %{text: text}} ->
  result = Nx.Serving.run(sentiment_serving, text)

  label_text =
    result
    |> Map.get(:predictions, [])
    |> Enum.map(fn %{label: label, score: score} ->
      "#{label}: #{Float.round(score * 100, 1)}%"
    end)
    |> Enum.join(" | ")

  Kino.Text.new(label_text)
end)

class_form
```

### 8.2 Embedding Similarity UI

```elixir
ref_input = Kino.Input.textarea("Reference text", default: "Nx provides numerical computing for the BEAM")
query_input = Kino.Input.textarea("Query text", default: "How do I do math in Elixir?")

sim_form = Kino.Control.form(%{ref: ref_input, query: query_input}, submit: "Compute Similarity")

Kino.listen(sim_form, fn %{data: %{ref: ref, query: query}} ->
  ref_emb = Nx.Serving.run(embedding_serving, ref).embedding
  query_emb = Nx.Serving.run(embedding_serving, query).embedding
  score = Similarity.cosine_similarity(ref_emb, query_emb) |> Nx.to_number()

  Kino.Text.new("Cosine similarity: #{Float.round(score, 6)}")
end)

sim_form
```

---
## Section 8.3 — Distributed Training on BEAM Nodes

### 8.3.1 Using `Nx` with `Node.spawn/4`

```elixir
# Assuming you have a cluster of BEAM nodes: node1@host, node2@host, ...
nodes = [:"node1@host", :"node2@host"]

# Distribute a tensor computation across nodes
defmodule DistTrainer do
  def compute(tensor) do
    Enum.map(nodes, fn node ->
      Node.spawn(node, fn -> Nx.mean(tensor) end)
    end)
  end
end

tensor = Nx.tensor([[1, 2, 3], [4, 5, 6]])
results = DistTrainer.compute(tensor)
IO.inspect(results, label: "Means from each node")
```

### 8.3.2 Distributed Axon training

```elixir
# Using Axon with `Nx.Cluster` (requires `:zx` and `Nx.Cluster` setup)
defmodule ClusterTrainer do
  use Axon

  # Define a simple model
  defmodel do
    input({nil, 784})
    |> dense(128, activation: :relu)
    |> dense(10, activation: :softmax)
  end

  # Launch training across nodes
  def train(data, labels) do
    model = model()
    # `Nx.Cluster.train/5` will split data and run steps on each BEAM node
    Nx.Cluster.train(model, data, labels, nodes: nodes, epochs: 5)
  end
end
```

> **Tip:** Ensure all nodes have the same code version and the required dependencies (`nx`, `axon`). Use `:net_kernel.connect_node/1` or a clustering tool such as `libcluster` to form the cluster.

---

## Section 9 — Summary & Next Steps

```elixir
Kino.Markdown.new("""
## What You've Built

| Pipeline Stage | Implementation | Key Library |
|----------------|----------------|-------------|
| **Tensors** | Creation, ops, broadcasting, gradients | `Nx` |
| **JIT Compile** | GPU-accelerated inference | `EXLA` |
| **Fill-Mask** | BERT masked language modeling | `Bumblebee` |
| **Sentiment** | DistilBERT text classification | `Bumblebee` |
| **NER** | Named entity recognition | `Bumblebee` |
| **Zero-Shot** | Classify without fine-tuning | `Bumblebee` |
| **Image CLS** | Vision Transformer (ViT) | `Bumblebee` |
| **Audio** | Whisper speech-to-text | `Bumblebee` |
| **Stable Diffusion** | Text-to-image generation | `Bumblebee` |
| **Text Gen** | GPT-2 autoregressive generation | `Bumblebee` |
| **Embeddings** | Sentence similarity search | `Bumblebee` |
| **Custom MLP** | Train from scratch with Axon | `Axon` |
| **Fine-tuning** | Boosted training on pre-trained models | `Bumblebee` |
| **Serving** | Production batched inference | `Nx.Serving` |
| **Phoenix** | LiveView + REST API deployment | `Phoenix` |
| **Interactive** | Kino live forms | `Kino` |

### Companion Notebooks

| Format | Path | Deploy To |
|--------|------|-----------|
| Livebook | `ml_e2e_template.livemd` | HF Spaces (Docker), Livebook Teams |
| Jupyter | `colab_kaggle/ml_e2e_python.ipynb` | Google Colab, Kaggle |
| Gradio | `gradio_hf_deploy/app.py` | HF Spaces (sdk: gradio) |
| marimo | `marimo/ml_e2e_marimo.py` | Anywhere Python runs |

### Resources

* [Bumblebee docs](https://hexdocs.pm/bumblebee) — Pre-trained models
* [Nx docs](https://hexdocs.pm/nx) — Numerical computing
* [Axon docs](https://hexdocs.pm/axon) — Neural networks
* [EXLA docs](https://hexdocs.pm/exla) — GPU backend
* [Phoenix docs](https://hexdocs.pm/phoenix) — Web framework
* [Hugging Face Hub](https://huggingface.co/models) — 500k+ models
* [marimo docs](https://docs.marimo.io) — Reactive Python notebooks
* _Machine Learning in Elixir_ — Sean Moriarity, Pragmatic Bookshelf

### Deploy

```bash
just check           # Verify all files and tools
just livebook        # Open Livebook locally
just deploy-livebook # Push to HF Spaces
just marimo          # Open marimo editor
just gradio          # Run Gradio app
```
""")
```