Instructions to use hanchaow/QTuneVL1_5-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hanchaow/QTuneVL1_5-2B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="hanchaow/QTuneVL1_5-2B", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("hanchaow/QTuneVL1_5-2B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use hanchaow/QTuneVL1_5-2B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "hanchaow/QTuneVL1_5-2B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hanchaow/QTuneVL1_5-2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/hanchaow/QTuneVL1_5-2B
- SGLang
How to use hanchaow/QTuneVL1_5-2B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "hanchaow/QTuneVL1_5-2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hanchaow/QTuneVL1_5-2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "hanchaow/QTuneVL1_5-2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hanchaow/QTuneVL1_5-2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use hanchaow/QTuneVL1_5-2B with Docker Model Runner:
docker model run hf.co/hanchaow/QTuneVL1_5-2B
QTuneVL1.5-2B developed by the Reconova AI Lab && BDAA-Lab
Introduction
We’re excited to introduce QTuneVL1.5-2B, the latest in Reconova AI Lab’s series of multimodal large language models. Building on QTuneVL1-2B, it incorporates key features from both InternVL and Mini-Monkey to deliver even greater performance.
Like QTuneVL1-2B, QTuneVL1.5-2B is a lightweight MLLM that incorporates cropping and padding strategies from Mini-Monkey/Ureader/InternVL, and has been fine-tuned on InternVL3-2B.
Evaluation
By evaluating our model on eight benchmarks in the OpenCompass leaderboard using VLMEvalKit, we found that it outperformed its predecessor(QTuneVL1-2B) in terms of average scores, particularly on MMStar MMMU_DEV_VAL and OCRBench benchmarks. The eight benchmarks and specific experimental results are as follows:
Eight benchmark: 'MMBench_DEV_EN_V11', 'MMStar', 'MMMU_DEV_VAL', 'MathVista_MINI', 'HallusionBench', 'AI2D_TEST', 'OCRBench', 'MMVet'.
| Index | Model | AVG | MMBench_DEV_EN_V11 | MMStar | MMMU_DEV_VAL | MathVista_MINI | HallusionBench | AI2D_TEST | OCRBench | MMVet |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Minimonkey | 54.3 | 71.4 | 50.3 | 35.6 | 46.3 | 38.6 | 74.8 | 802 | 37.2 |
| 2 | InternVL2-2B | 54.2 | 71.4 | 50.3 | 34.6 | 47.2 | 38.2 | 74.2 | 783 | 39.8 |
| 3 | InternVL2_5-2B | 59.4 | 74.6 | 53.7 | 40.1 | 49.7 | 42.2 | 74.9 | 802 | 59.5 |
| 4 | InternVL3-2B | 63.5 | 79.6 | 61.1 | 48.6 | 51.1 | 42 | 78.4 | 835 | 64.08 |
| 5 | QTuneVL1-2B | 59.7 | 74.9 | 53.9 | 41.5 | 48.8 | 43.0 | 75.2 | 806 | 59.6 |
| 6 | QTuneVL1.5-2B | 64.2(+4.5) | 79.6(+4.7) | 61.4(+7.5) | 51.1(+9.6) | 51.8(+3) | 43.0 | 78.8(+3.6) | 858(+52) | 62.1(+2.5) |
It is important to note that when using VLMEvalKit for evaluation, the GPT-related evaluation models being called differ slightly from the official ones. In the code (vlmeval/dataset/utils/judge_util.py), it uses:
'gpt-4o-mini': 'gpt-4o-mini'instead of'gpt-4o-mini': 'gpt-4o-mini-2024-07-18''gpt-4-turbo': 'gpt-4-turbo'instead of `'gpt-4-turbo': 'gpt-4-1106-preview'
This configuration will result in evaluation results that slightly differ from the official ones.
Copyright
We welcome suggestions to help us improve the QTuneVL. For any query, please contact HanChao Wang: wanghanchao@reconova.com. If you find something interesting, please also feel free to share with us through email or open an issue.
- Downloads last month
- 8