Instructions to use ai21labs/Jamba-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ai21labs/Jamba-v0.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ai21labs/Jamba-v0.1", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ai21labs/Jamba-v0.1", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("ai21labs/Jamba-v0.1", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ai21labs/Jamba-v0.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ai21labs/Jamba-v0.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ai21labs/Jamba-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ai21labs/Jamba-v0.1
- SGLang
How to use ai21labs/Jamba-v0.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ai21labs/Jamba-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ai21labs/Jamba-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ai21labs/Jamba-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ai21labs/Jamba-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ai21labs/Jamba-v0.1 with Docker Model Runner:
docker model run hf.co/ai21labs/Jamba-v0.1
Install & run ai21labs/Jamba-v0.1 easily using llmpm
#53 opened 3 months ago
by
sarthak-saxena
Update README.md
#52 opened about 1 year ago
by
Motit
TypeError: forward() got an unexpected keyword argument 'num_logits_to_keep'
#51 opened almost 2 years ago
by
shajiu
Adding Evaluation Results
#50 opened almost 2 years ago
by
leaderboard-pr-bot
AttributeError: 'HybridMambaAttentionDynamicCache' object has no attribute '_modules'
7
#48 opened almost 2 years ago
by
xxrjun
Adding Evaluation Results
#47 opened almost 2 years ago
by
leaderboard-pr-bot
ai21 instance not runnable with langchain
1
#45 opened almost 2 years ago
by
LordSahu
Is there any SFT or Chat model?
2
#41 opened about 2 years ago
by
chuyi777
How to use accelerate evaluate Jamba
#40 opened about 2 years ago
by
Xidong
Jamba Evaluation Task on GSM8K
β 3
#39 opened about 2 years ago
by
ssparks
Do you have plans to release papers on Jamba's architecture or miniature models?
π 2
#38 opened about 2 years ago
by
badrabbitt
Are there any weight files for pre-trained models?
#37 opened about 2 years ago
by
aidenxy
Memory usage on single A100*80GB in training
π 2
#36 opened about 2 years ago
by
DavidWu1116
Fast Mamba
5
#34 opened about 2 years ago
by
Praneethkeerthi
Why does throughput increase with longer context window?
3
#33 opened about 2 years ago
by
jingyu-q
Request: DOI
#32 opened about 2 years ago
by
kozolex
GGUF quants?
1
#31 opened about 2 years ago
by
6346y9uey
Any release plans for the 7b jamba model without MoE?
β 2
2
#30 opened about 2 years ago
by
danielpark
Why is there an MLP in the Mamba Layer?
π 3
#28 opened about 2 years ago
by
naston
Complex vs Real parametrization.
#27 opened about 2 years ago
by
YuvMil
How to Fine-tune Jamba on google Colab?
7
#26 opened about 2 years ago
by
Ateeqq
Layer-Selective Rank Reduction
#25 opened about 2 years ago
by
mizinovmv
Update README.md
#23 opened about 2 years ago
by
rombodawg
Would there a chance Jamba to be train in 1.58bit weight?
π 6
1
#22 opened about 2 years ago
by
shing3232
Anyone else currently experimenting with fine-tuning Jamba?
3
#21 opened about 2 years ago
by
Severian
IndentationError: unindent does not match any outer indentation level
#19 opened about 2 years ago
by
thebeline
ModuleNotFoundError: No module named 'transformers_modules.ai21labs.Jamba-v0'
5
#17 opened about 2 years ago
by
hjewr
Fast Mamba kernels are not available
10
#16 opened about 2 years ago
by
MohamedRashad
does all safe tensors needed to be downloaded to use this model on colab?
2
#14 opened about 2 years ago
by
Kv-boii
How many pretraining tokens?
π 12
#13 opened about 2 years ago
by
CyberNative
Smaller version to ease implementation experiments?
π₯π 6
7
#12 opened about 2 years ago
by
compilade
Coding performance of base model?
4
#11 opened about 2 years ago
by
rombodawg
Jambaleo
π€―π₯ 12
#10 opened about 2 years ago
by
pszemraj
Can you give a short explanation about the benefits and the architecture?
π 2
2
#7 opened about 2 years ago
by
SicariusSicariiStuff
A Bang Up Job
π₯ 1
2
#4 opened about 2 years ago
by
nightvision04
multiple gpu?
3
#3 opened about 2 years ago
by
bdambrosio
Just a solid congrats and thank you to your team
π₯β€οΈ 54
1
#1 opened about 2 years ago
by
Severian