Stateful TranslateGemma 4B IT (Core ML)

Stateful Core ML export of google/translategemma-4b-it with KV-cache states for incremental decoding on Apple platforms.

Included Files

  • StatefulTranslateGemma4BITFP16.mlpackage
  • StatefulTranslateGemma4BITInt8PerChannel.mlpackage
  • StatefulTranslateGemma4BITInt4PerChannel.mlpackage
  • convert_stateful_translategemma_coreml.py
  • NOTICE

Model Interface

Inputs:

  • inputIds: int32, shape (1, queryLength)
  • fullAttentionMask: float16, shape (1, 1, queryLength, endStep)
  • slidingAttentionMask: float16, shape (1, 1, queryLength, endStep)

States:

  • keyCache: float16, shape (layers, 1, kvHeads, maxContext, headDim)
  • valueCache: float16, same shape as keyCache

Output:

  • logits: float16

Conversion Notes

  • Conversion target: iOS 18+ (ct.target.iOS18)
  • Stateful export via Core ML states (ct.StateType)
  • Gemma3 mixed-attention export with explicit fullAttentionMask and slidingAttentionMask inputs
  • StatefulTranslateGemma4BITFP16.mlpackage: smallest stable FP16-focused artifact
  • StatefulTranslateGemma4BITInt8PerChannel.mlpackage: working balanced-size quantized variant
  • StatefulTranslateGemma4BITInt4PerChannel.mlpackage: smallest working quantized variant validated in short decode runs

Base Model and License

This repository contains a converted derivative of Gemma model weights. Use is subject to Gemma license terms and policies.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aoiandroid/translategemma-4b-it-coreml

Quantized
(30)
this model