Instructions to use jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B

SGLang

How to use jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B with Docker Model Runner:
```
docker model run hf.co/jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B
```

TinyLLaVA

TinyLLaVA has released a family of small-scale Large Multimodel Models(LMMs), ranging from 0.55B to 3.1B. Our best model, TinyLLaVA-Phi-2-SigLIP-3.1B, achieves better overall performance against existing 7B models such as LLaVA-1.5 and Qwen-VL.

TinyLLaVA

Here, we introduce TinyLLaVA-OpenELM-450M-SigLIP-0.89B, which is trained by the TinyLLaVA Factory codebase. For LLM and vision tower, we choose OpenELM-450M-Instruct and siglip-so400m-patch14-384, respectively. The dataset used for training this model is the The dataset used for training this model is the LLaVA dataset.

Usage

Execute the following test code:

from transformers import AutoTokenizer, AutoModelForCausalLM
hf_path = 'jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B'
model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True)
model.cuda()
config = model.config
tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False, model_max_length = config.tokenizer_model_max_length,padding_side = config.tokenizer_padding_side)
prompt="What are these?"
image_url="http://images.cocodataset.org/test-stuff2017/000000000001.jpg"
output_text, genertaion_time = model.chat(prompt=prompt, image=image_url, tokenizer=tokenizer)
print('model output:', output_text)
print('runing time:', genertaion_time)

Result

model_name	gqa	textvqa	sqa	vqav2	MME	MMB	MM-VET
TinyLLaVA-1.5B	60.3	51.7	60.3	76.9	1276.5	55.2	25.8
TinyLLaVA-0.89B	53.87	44.02	54.09	71.74	1118.75	37.8	20

P.S. TinyLLaVA Factory is an open-source modular codebase for small-scale LMMs with a focus on simplicity of code implementations, extensibility of new features, and reproducibility of training results. This code repository provides standard training&evaluating pipelines, flexible data preprocessing&model configurations, and easily extensible architectures. Users can customize their own LMMs with minimal coding effort and less coding mistake. TinyLLaVA Factory integrates a suite of cutting-edge models and methods.

LLM currently supports OpenELM, TinyLlama, StableLM, Qwen, Gemma, and Phi.
Vision tower currently supports CLIP, SigLIP, Dino, and combination of CLIP and Dino.
Connector currently supports MLP, Qformer, and Resampler.

Downloads last month: 28

Safetensors

Model size

0.9B params

Tensor type

F16

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for jiajunlong/TinyLLaVA-OpenELM-450M-SigLIP-0.89B

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22, 2024 • 21