IQ4_NL Quantized version of QwentileLambda2.5-32B-Instruct. A very good merge for multiple Qwen2.5 and QwQ fine-tunes.

I noticed that the IQ4_NL variant was missing in mradermacher's repo. So I'm filling the blank. It tends to behave better than Q4_K_S and Q4_K_M at slightly lower VRAM consumption.

For cards with 24GB of VRAM

IQ4_NL

It's of an ideal size to be run with 24GB VRAM at 16K to 20K context length.

Settings

Instruction Template: ChatML. You can also use CoT with ChatML-Thinker, but you need to prefill the thinking tag in that case.

Note: If your backend has a setting for it, disable the BoS token. It's set to disabled at the GGUF level, but no all backends recognize the flag.