Instructions to use google/functiongemma-270m-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/functiongemma-270m-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/functiongemma-270m-it")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("google/functiongemma-270m-it", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/functiongemma-270m-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/functiongemma-270m-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/functiongemma-270m-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/functiongemma-270m-it
- SGLang
How to use google/functiongemma-270m-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/functiongemma-270m-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/functiongemma-270m-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/functiongemma-270m-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/functiongemma-270m-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/functiongemma-270m-it with Docker Model Runner:
docker model run hf.co/google/functiongemma-270m-it
config.rope_parameters["rope_theta"] = rope_theta
#7
by Daemontatox - opened
config.rope_parameters["rope_theta"] = rope_theta
(APIServer pid=3453776) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
(APIServer pid=3453776) TypeError: 'NoneType' object does not support item assignment
i am using vllm 0.12.0.
for anyone having issues with deploying this in vllm , you can run this python script and modify the deploy command to your needs till its fixed in vllm 0.13.0
# launch_vllm.py
import sys
import multiprocessing
multiprocessing.set_start_method('fork', force=True)
from vllm.transformers_utils import config as vllm_config
_original_patch = vllm_config.patch_rope_parameters
def safe_patch_rope_parameters(config):
if not hasattr(config, 'rope_parameters') or config.rope_parameters is None:
config.rope_parameters = {'rope_type': 'default'}
elif isinstance(config.rope_parameters, dict) and 'rope_type' not in config.rope_parameters:
config.rope_parameters['rope_type'] = 'default'
return _original_patch(config)
vllm_config.patch_rope_parameters = safe_patch_rope_parameters
if __name__ == '__main__':
from vllm.entrypoints.cli.main import main
sys.exit(main())
CUDA_VISIBLE_DEVICES=2 \
nohup python launch_vllm.py serve unsloth/functiongemma-270m-it \
--port 8008 \
--host 0.0.0.0 \
--gpu-memory-utilization 0.2 \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--enable-chunked-prefill \
--max-model-len 2048 \
--max-num-batched-tokens 2048 \
--enable-prefix-caching \
--tensor-parallel-size 1 \
--max-num-seqs 5 \
> gemma_server.log 2>&1 &
Daemontatox changed discussion status to closed