IQ4_NL Quantized version of QwentileLambda2.5-32B-Instruct. A very good merge for multiple Qwen2.5 and QwQ fine-tunes.

I noticed that the IQ4_NL variant was missing in mradermacher's repo. So I'm filling the blank. It tends to behave better than Q4_K_S and Q4_K_M at slightly lower VRAM consumption.

For cards with 24GB of VRAM

  • IQ4_NL

It's of an ideal size to be run with 24GB VRAM at 16K to 20K context length.

Settings

Instruction Template: ChatML. You can also use CoT with ChatML-Thinker, but you need to prefill the thinking tag in that case.

Note: If your backend has a setting for it, disable the BoS token. It's set to disabled at the GGUF level, but no all backends recognize the flag.

Downloads last month
9
GGUF
Model size
33B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SerialKicked/QwentileLambda2.5-32B-Instruct-GGUF-IQ4_NL

Base model

Qwen/Qwen2.5-32B
Finetuned
Qwen/QwQ-32B
Quantized
(3)
this model