Instructions to use turhancan97/vit-tiny-lora-food101 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use turhancan97/vit-tiny-lora-food101 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
ViT-tiny LoRA adapter on Food-101
A LoRA adapter that teaches WinKawaks/vit-tiny-patch16-224 to classify images from the Food-101 dataset (101 food categories) while leaving the original pretrained weights mathematically untouched.
- Base model:
WinKawaks/vit-tiny-patch16-224(~5.7M params) - Dataset: Food-101 (75,750 train / 25,250 test, 101 classes)
- Method: LoRA on attention
query+valueprojections + a fresh 101-way classification head - Demo Space:
turhancan97/vit-tiny-imagenet-demo
How it works
The backbone is never fine-tuned. Instead a low-rank update $\Delta W = BA$ (with rank $r = 8$) is added to each attention projection, and a separate 101-class linear head is trained on top of the pooled CLS features. The full artifact is tiny (~1–2 MB) and additive — disabling the adapter at inference time recovers the exact original ImageNet-1k model.
adapter_config.json # PEFT LoRA config
adapter_model.safetensors # LoRA weights (B, A matrices)
classifier.pt # 101-way Linear head (state_dict)
labels.json # {"0": "apple_pie", "1": "baby_back_ribs", ...}
preprocessor_config.json # image processor (224x224, standard ImageNet norm)
Training
Trained with the script at turhancan97/vit-tiny-imagenet-demo/train_lora.py:
python train_lora.py \
--rank 8 --alpha 16 --dropout 0.1 \
--target-modules query value \
--epochs 5 --batch-size 64 --lr 5e-4 \
--warmup-ratio 0.03 --weight-decay 0.0 \
--push-to-hub turhancan97/vit-tiny-lora-food101
Hyperparameters
| Setting | Value |
|---|---|
| LoRA rank | 8 |
| LoRA alpha | 16 |
| LoRA dropout | 0.1 |
| Target modules | query, value |
| Optimizer | AdamW (HF Trainer default) |
| Learning rate | 5e-4 |
| Batch size | 64 |
| Epochs | 5 |
| Warmup ratio | 0.03 |
| Weight decay | 0.0 |
| Precision | FP16 |
| Augmentation | RandomResizedCrop(0.8–1.0), RandomHorizontalFlip |
Trainable parameters: 93k of ~5.6M total (**1.7%**).
Evaluation
Evaluated on the Food-101 test split (25,250 images).
| Metric | Value |
|---|---|
| Top-1 accuracy | 85 % |
| Top-5 accuracy | 90 % |
Usage
The adapter uses the standard PEFT format plus a sidecar classifier.pt and labels.json. Minimal loader:
import json
import torch
from huggingface_hub import hf_hub_download
from peft import PeftModel
from torch import nn
from transformers import AutoImageProcessor, AutoModelForImageClassification
BASE = "WinKawaks/vit-tiny-patch16-224"
ADAPTER = "turhancan97/vit-tiny-lora-food101"
processor = AutoImageProcessor.from_pretrained(BASE, use_fast=True)
base = AutoModelForImageClassification.from_pretrained(BASE)
model = PeftModel.from_pretrained(base, ADAPTER)
id2label = {int(k): v for k, v in json.loads(
open(hf_hub_download(ADAPTER, "labels.json")).read()
).items()}
head_state = torch.load(
hf_hub_download(ADAPTER, "classifier.pt"), map_location="cpu", weights_only=True
)
head = nn.Linear(base.config.hidden_size, len(id2label))
head.load_state_dict(head_state)
model.base_model.model.classifier = head
model.eval()
Inference:
from PIL import Image
image = Image.open("my_food.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.inference_mode():
logits = model(**inputs).logits[0]
topk = logits.softmax(-1).topk(5)
for score, idx in zip(topk.values, topk.indices):
print(f"{id2label[idx.item()]:30s} {score.item():.3f}")
Switching back to the base model (ImageNet-1k, 1000 classes) without unloading:
with model.disable_adapter():
logits = base(**inputs).logits # uses the pristine pretrained weights
Intended use
- Educational / demo use for showing how LoRA adds new capabilities to a frozen backbone.
- Classifying photos of prepared food into the Food-101 taxonomy.
Limitations
- Only 101 food categories; anything outside the taxonomy will be misclassified.
- Trained on Food-101 which is mostly western/restaurant-style dishes, with label noise in the original data.
- ViT-tiny is a low-capacity backbone; a larger base model would likely get higher accuracy with the same adapter recipe.
License
Apache-2.0, matching the base model and the Food-101 dataset license.
Citation
If you use this adapter, please cite the underlying works:
@inproceedings{hu2022lora,
title={{LoRA}: Low-Rank Adaptation of Large Language Models},
author={Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu},
booktitle={ICLR},
year={2022}
}
@inproceedings{bossard2014food101,
title={Food-101 -- Mining Discriminative Components with Random Forests},
author={Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc},
booktitle={ECCV},
year={2014}
}
- Downloads last month
- 3
Model tree for turhancan97/vit-tiny-lora-food101
Base model
WinKawaks/vit-tiny-patch16-224