Instructions to use hanchaow/QTuneVL1_5-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use hanchaow/QTuneVL1_5-2B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="hanchaow/QTuneVL1_5-2B", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("hanchaow/QTuneVL1_5-2B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use hanchaow/QTuneVL1_5-2B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "hanchaow/QTuneVL1_5-2B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hanchaow/QTuneVL1_5-2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/hanchaow/QTuneVL1_5-2B

SGLang

How to use hanchaow/QTuneVL1_5-2B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "hanchaow/QTuneVL1_5-2B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hanchaow/QTuneVL1_5-2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "hanchaow/QTuneVL1_5-2B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hanchaow/QTuneVL1_5-2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use hanchaow/QTuneVL1_5-2B with Docker Model Runner:
```
docker model run hf.co/hanchaow/QTuneVL1_5-2B
```

QTuneVL1.5-2B developed by the Reconova AI Lab && BDAA-Lab

Introduction

We’re excited to introduce QTuneVL1.5-2B, the latest in Reconova AI Lab’s series of multimodal large language models. Building on QTuneVL1-2B, it incorporates key features from both InternVL and Mini-Monkey to deliver even greater performance.

Like QTuneVL1-2B, QTuneVL1.5-2B is a lightweight MLLM that incorporates cropping and padding strategies from Mini-Monkey/Ureader/InternVL, and has been fine-tuned on InternVL3-2B.

Evaluation

By evaluating our model on eight benchmarks in the OpenCompass leaderboard using VLMEvalKit, we found that it outperformed its predecessor(QTuneVL1-2B) in terms of average scores, particularly on MMStar MMMU_DEV_VAL and OCRBench benchmarks. The eight benchmarks and specific experimental results are as follows:

Eight benchmark: 'MMBench_DEV_EN_V11', 'MMStar', 'MMMU_DEV_VAL', 'MathVista_MINI', 'HallusionBench', 'AI2D_TEST', 'OCRBench', 'MMVet'.

Index	Model	AVG	MMBench_DEV_EN_V11	MMStar	MMMU_DEV_VAL	MathVista_MINI	HallusionBench	AI2D_TEST	OCRBench	MMVet
1	Minimonkey	54.3	71.4	50.3	35.6	46.3	38.6	74.8	802	37.2
2	InternVL2-2B	54.2	71.4	50.3	34.6	47.2	38.2	74.2	783	39.8
3	InternVL2_5-2B	59.4	74.6	53.7	40.1	49.7	42.2	74.9	802	59.5
4	InternVL3-2B	63.5	79.6	61.1	48.6	51.1	42	78.4	835	64.08
5	QTuneVL1-2B	59.7	74.9	53.9	41.5	48.8	43.0	75.2	806	59.6
6	QTuneVL1.5-2B	64.2(+4.5)	79.6(+4.7)	61.4(+7.5)	51.1(+9.6)	51.8(+3)	43.0	78.8(+3.6)	858(+52)	62.1(+2.5)

It is important to note that when using VLMEvalKit for evaluation, the GPT-related evaluation models being called differ slightly from the official ones. In the code (vlmeval/dataset/utils/judge_util.py), it uses:

'gpt-4o-mini': 'gpt-4o-mini' instead of 'gpt-4o-mini': 'gpt-4o-mini-2024-07-18'
'gpt-4-turbo': 'gpt-4-turbo' instead of `'gpt-4-turbo': 'gpt-4-1106-preview'

This configuration will result in evaluation results that slightly differ from the official ones.

Copyright

We welcome suggestions to help us improve the QTuneVL. For any query, please contact HanChao Wang: wanghanchao@reconova.com. If you find something interesting, please also feel free to share with us through email or open an issue.

Downloads last month: 8

Safetensors

Model size

2B params

Tensor type

BF16

Paper for hanchaow/QTuneVL1_5-2B

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

Paper • 2310.05126 • Published Oct 8, 2023 • 1