Instructions to use google/gemma-7b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-7b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-7b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it") model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use google/gemma-7b-it with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="google/gemma-7b-it", filename="gemma-7b-it.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use google/gemma-7b-it with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf google/gemma-7b-it # Run inference directly in the terminal: llama-cli -hf google/gemma-7b-it
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf google/gemma-7b-it # Run inference directly in the terminal: llama-cli -hf google/gemma-7b-it
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf google/gemma-7b-it # Run inference directly in the terminal: ./llama-cli -hf google/gemma-7b-it
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf google/gemma-7b-it # Run inference directly in the terminal: ./build/bin/llama-cli -hf google/gemma-7b-it
Use Docker
docker model run hf.co/google/gemma-7b-it
- LM Studio
- Jan
- vLLM
How to use google/gemma-7b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-7b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-7b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/google/gemma-7b-it
- SGLang
How to use google/gemma-7b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-7b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-7b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-7b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-7b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use google/gemma-7b-it with Ollama:
ollama run hf.co/google/gemma-7b-it
- Unsloth Studio new
How to use google/gemma-7b-it with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for google/gemma-7b-it to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for google/gemma-7b-it to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for google/gemma-7b-it to start chatting
- Docker Model Runner
How to use google/gemma-7b-it with Docker Model Runner:
docker model run hf.co/google/gemma-7b-it
- Lemonade
How to use google/gemma-7b-it with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull google/gemma-7b-it
Run and chat with the model
lemonade run user.gemma-7b-it-{{QUANT_TAG}}List all available models
lemonade list
Difficulty importing Pipeline - AttributeError: module 'keras._tf_keras.keras' has no attribute '__internal__'
Can't seem to do:
from transformers import pipeline
what versions of keras, tensorflow, transformers, etc are you guys using? Full traceback below:
AttributeError Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1390, in _LazyModule._get_module(self, module_name)
1389 try:
-> 1390 return importlib.import_module("." + module_name, self.name)
1391 except Exception as e:
File /opt/conda/lib/python3.10/importlib/init.py:126, in import_module(name, package)
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
File :1050, in _gcd_import(name, package, level)
File :1027, in find_and_load(name, import)
File :1006, in find_and_load_unlocked(name, import)
File :688, in _load_unlocked(spec)
File :883, in exec_module(self, module)
File :241, in _call_with_frames_removed(f, *args, **kwds)
File /opt/conda/lib/python3.10/site-packages/transformers/pipelines/init.py:74
73 from .question_answering import QuestionAnsweringArgumentHandler, QuestionAnsweringPipeline
---> 74 from .table_question_answering import TableQuestionAnsweringArgumentHandler, TableQuestionAnsweringPipeline
75 from .text2text_generation import SummarizationPipeline, Text2TextGenerationPipeline, TranslationPipeline
File /opt/conda/lib/python3.10/site-packages/transformers/pipelines/table_question_answering.py:26
25 import tensorflow as tf
---> 26 import tensorflow_probability as tfp
28 from ..models.auto.modeling_tf_auto import (
29 TF_MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES,
30 TF_MODEL_FOR_TABLE_QUESTION_ANSWERING_MAPPING_NAMES,
31 )
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/init.py:20
17 # Contributors to the python/ dir should not alter this file; instead update
18 # python/__init__.py as necessary.
---> 20 from tensorflow_probability import substrates
21 # from tensorflow_probability.google import staging # DisableOnExport
22 # from tensorflow_probability.google import tfp_google # DisableOnExport
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/substrates/init.py:17
15 """TensorFlow Probability alternative substrates."""
---> 17 from tensorflow_probability.python.internal import all_util
18 from tensorflow_probability.python.internal import lazy_loader # pylint: disable=g-direct-tensorflow-import
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/init.py:138
137 for pkg_name in _maybe_nonlazy_load:
--> 138 dir(globals()[pkg_name]) # Forces loading the package from its lazy loader.
141 all_util.remove_undocumented(name, _lazy_load + _maybe_nonlazy_load)
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/internal/lazy_loader.py:57, in LazyLoader.dir(self)
56 def dir(self):
---> 57 module = self._load()
58 return dir(module)
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/internal/lazy_loader.py:40, in LazyLoader._load(self)
39 # Import the target module and insert it into the parent's namespace
---> 40 module = importlib.import_module(self.name)
41 if self._parent_module_globals is not None:
File /opt/conda/lib/python3.10/importlib/init.py:126, in import_module(name, package)
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/experimental/init.py:31
30 from tensorflow_probability.python.experimental import auto_batching
---> 31 from tensorflow_probability.python.experimental import bayesopt
32 from tensorflow_probability.python.experimental import bijectors
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/experimental/bayesopt/init.py:17
15 """TensorFlow Probability experimental Bayesopt package."""
---> 17 from tensorflow_probability.python.experimental.bayesopt import acquisition
18 from tensorflow_probability.python.internal import all_util
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/experimental/bayesopt/acquisition/init.py:19
18 from tensorflow_probability.python.experimental.bayesopt.acquisition.acquisition_function import MCMCReducer
---> 19 from tensorflow_probability.python.experimental.bayesopt.acquisition.expected_improvement import GaussianProcessExpectedImprovement
20 from tensorflow_probability.python.experimental.bayesopt.acquisition.expected_improvement import ParallelExpectedImprovement
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/experimental/bayesopt/acquisition/expected_improvement.py:19
17 import tensorflow.compat.v2 as tf
---> 19 from tensorflow_probability.python.distributions import normal
20 from tensorflow_probability.python.distributions import student_t
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/distributions/init.py:110
109 from tensorflow_probability.python.distributions.pert import PERT
--> 110 from tensorflow_probability.python.distributions.pixel_cnn import PixelCNN
111 from tensorflow_probability.python.distributions.plackett_luce import PlackettLuce
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/distributions/pixel_cnn.py:33
32 from tensorflow_probability.python.internal import tensorshape_util
---> 33 from tensorflow_probability.python.layers import weight_norm
36 class PixelCNN(distribution.Distribution):
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/layers/init.py:27
26 from tensorflow_probability.python.layers.dense_variational_v2 import DenseVariational
---> 27 from tensorflow_probability.python.layers.distribution_layer import CategoricalMixtureOfOneHotCategorical
28 from tensorflow_probability.python.layers.distribution_layer import DistributionLambda
File /opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/layers/distribution_layer.py:68
50 all = [
51 'CategoricalMixtureOfOneHotCategorical',
52 'DistributionLambda',
(...)
64 'VariationalGaussianProcess',
65 ]
---> 68 tf.keras.internal.utils.register_symbolic_tensor_type(dtc._TensorCoercible) # pylint: disable=protected-access
71 def _event_size(event_shape, name=None):
AttributeError: module 'keras._tf_keras.keras' has no attribute 'internal'
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
Cell In[1], line 2
1 #from tensorflow import keras
----> 2 from transformers import pipeline
File :1075, in handle_fromlist(module, fromlist, import, recursive)
File /opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1380, in _LazyModule.getattr(self, name)
1378 value = self._get_module(name)
1379 elif name in self._class_to_module.keys():
-> 1380 module = self._get_module(self._class_to_module[name])
1381 value = getattr(module, name)
1382 else:
File /opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1392, in _LazyModule._get_module(self, module_name)
1390 return importlib.import_module("." + module_name, self.name)
1391 except Exception as e:
-> 1392 raise RuntimeError(
1393 f"Failed to import {self.name}.{module_name} because of the following error (look up to see its"
1394 f" traceback):\n{e}"
1395 ) from e
RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback):
module 'keras._tf_keras.keras' has no attribute 'internal'
Seeing the same error. Any solutions?
Hi there! Could you please open an issue in transformers with the details about the environment and version?
https://github.com/huggingface/transformers
Hey! It looks like a potential error with tensorflow_probability/keras? Can you try uninstalling it to see if it fixes your issue?
Also happy to follow this in Transformers issues as @osanseviero recommends above.
Yes, this seems related to https://github.com/tensorflow/probability/issues/1774#issuecomment-1979642276 which has not been released yet
Hi all, we've just merged a workaround in transformers here. If you install transformers from main with pip install git+https://github.com/huggingface/transformers.git the issue should now be resolved. The fix will also be included in the next transformers release.
If this doesn't resolve the issue for you, please ping me and I'll keep investigating!
I upgraded tensorflow-probability to version 0.24.0 then installed tensorflow-keras ---> problem is solved!