Instructions to use FreedomIntelligence/Apollo-72B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FreedomIntelligence/Apollo-72B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FreedomIntelligence/Apollo-72B", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("FreedomIntelligence/Apollo-72B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use FreedomIntelligence/Apollo-72B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FreedomIntelligence/Apollo-72B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FreedomIntelligence/Apollo-72B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/FreedomIntelligence/Apollo-72B
- SGLang
How to use FreedomIntelligence/Apollo-72B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FreedomIntelligence/Apollo-72B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FreedomIntelligence/Apollo-72B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FreedomIntelligence/Apollo-72B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FreedomIntelligence/Apollo-72B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use FreedomIntelligence/Apollo-72B with Docker Model Runner:
docker model run hf.co/FreedomIntelligence/Apollo-72B
Use Docker images
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "FreedomIntelligence/Apollo-72B" \
--host 0.0.0.0 \
--port 30000# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "FreedomIntelligence/Apollo-72B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'Multilingual Medicine: Model, Dataset, Benchmark, Code
Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far
π¨π»βπ»Github β’π Paper β’ π Demo β’ π€ ApolloCorpus β’ π€ XMedBench
δΈζ | English
π Update
- [2024.04.25] MedJamba released, train and evaluation code refer to repo.
- [2024.03.07] Paper released.
- [2024.02.12] ApolloCorpus and XMedBench is publishedοΌπ
- [2024.01.23] Apollo repo is publishedοΌπ
Results
π€ Apollo-0.5B β’ π€ Apollo-1.8B β’ π€ Apollo-2B β’ π€ Apollo-6B β’ π€ Apollo-7B β’ π€ Apollo-34B β’ π€ Apollo-72B
π€ MedJamba
π€ Apollo-0.5B-GGUF β’ π€ Apollo-2B-GGUF β’ π€ Apollo-6B-GGUF β’ π€ Apollo-7B-GGUF
Usage Format
<|User|>:{query}\n<|Assistant|>:{response}<|endoftext|>
Dataset & Evaluation
Dataset π€ ApolloCorpus
Click to expand
- Zip File
- Data category
- Pretrain:
- data item:
- json_name: {data_source}{language}{data_type}.json
- data_type: medicalBook, medicalGuideline, medicalPaper, medicalWeb(from online forum), medicalWiki
- language: en(English), zh(chinese), es(spanish), fr(french), hi(Hindi)
- data_type: qa(generated qa from text)
- data_type==text: list of string
[ "string1", "string2", ... ] - data_type==qa: list of qa pairs(list of string)
[ [ "q1", "a1", "q2", "a2", ... ], ... ]
- data item:
- SFT:
- json_name: {data_source}_{language}.json
- data_type: code, general, math, medicalExam, medicalPatient
- data item: list of qa pairs(list of string)
[ [ "q1", "a1", "q2", "a2", ... ], ... ]
- Pretrain:
Evaluation π€ XMedBench
Click to expand
EN:
- MedQA-USMLE
- MedMCQA
- PubMedQA: Because the results fluctuated too much, they were not used in the paper.
- MMLU-Medical
- Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
ZH:
- MedQA-MCMLE
- CMB-single: Not used in the paper
- Randomly sample 2,000 multiple-choice questions with single answer.
- CMMLU-Medical
- Anatomy, Clinical_knowledge, College_medicine, Genetics, Nutrition, Traditional_chinese_medicine, Virology
- CExam: Not used in the paper
- Randomly sample 2,000 multiple-choice questions
ES: Head_qa
FR: Frenchmedmcqa
HI: MMLU_HI
- Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
AR: MMLU_Ara
- Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
Results reproduction
Click to expand
Waiting for Update
Citation
Please use the following citation if you intend to use our dataset for training or evaluation:
@misc{wang2024apollo,
title={Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People},
author={Xidong Wang and Nuo Chen and Junyin Chen and Yan Hu and Yidong Wang and Xiangbo Wu and Anningzhe Gao and Xiang Wan and Haizhou Li and Benyou Wang},
year={2024},
eprint={2403.03640},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 143



Install from pip and serve model
# Install SGLang from pip: pip install sglang# Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FreedomIntelligence/Apollo-72B" \ --host 0.0.0.0 \ --port 30000# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FreedomIntelligence/Apollo-72B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'