Duplicated from Skyline23/translategemma-4b-it-coreml

aoiandroid
/

translategemma-4b-it-coreml

Text Generation

Model card Files Files and versions

Stateful TranslateGemma 4B IT (Core ML)

Stateful Core ML export of google/translategemma-4b-it with KV-cache states for incremental decoding on Apple platforms.

Included Files

StatefulTranslateGemma4BITFP16.mlpackage
StatefulTranslateGemma4BITInt8PerChannel.mlpackage
StatefulTranslateGemma4BITInt4PerChannel.mlpackage
convert_stateful_translategemma_coreml.py
NOTICE

Model Interface

Inputs:

inputIds: int32, shape (1, queryLength)
fullAttentionMask: float16, shape (1, 1, queryLength, endStep)
slidingAttentionMask: float16, shape (1, 1, queryLength, endStep)

States:

keyCache: float16, shape (layers, 1, kvHeads, maxContext, headDim)
valueCache: float16, same shape as keyCache

Output:

logits: float16

Conversion Notes

Conversion target: iOS 18+ (ct.target.iOS18)
Stateful export via Core ML states (ct.StateType)
Gemma3 mixed-attention export with explicit fullAttentionMask and slidingAttentionMask inputs
StatefulTranslateGemma4BITFP16.mlpackage: smallest stable FP16-focused artifact
StatefulTranslateGemma4BITInt8PerChannel.mlpackage: working balanced-size quantized variant
StatefulTranslateGemma4BITInt4PerChannel.mlpackage: smallest working quantized variant validated in short decode runs

Base Model and License

Base model: https://huggingface.co/google/translategemma-4b-it
Gemma terms: https://ai.google.dev/gemma/terms

This repository contains a converted derivative of Gemma model weights. Use is subject to Gemma license terms and policies.

Downloads last month: -

Model tree for aoiandroid/translategemma-4b-it-coreml

Base model

google/translategemma-4b-it

Quantized

(30)

this model