MD3 Preview - Int4 Quantized (MLX)

Pre-quantized version of Moondream 3 Preview for MLX inference.

Quantization Details

  • MoE Experts: int4 affine quantization (bits=4, group_size=64)
  • Other weights: bf16 (unchanged)
  • Memory savings: ~60% reduction in MoE weight memory

Source

Quantized from moondream/moondream3-preview

Downloads last month
109
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support