Post
51
š„ Introducing BetaEarth - your own Earth embedding emulator [šš«š-ššš„ššš¬š]
The past year has brought many notable embedding products, like AlphaEarth, TESSERA or OlmoEarth. We are entering a phase where embeddings begin to act as a substitute for real observation data.
BetaEarth is an attempt to explore how much one can learn from a model based on its embeddings alone, and whether those embeddings can serve as a useful training target for other models. Huge credit to the AlphaEarth team for releasing the embedding archive openly ā it's what made this kind of community-built extension possible.
[ššššššš«šš” š¢š¬ š§šØš š ššØš®š§šššš¢šØš§ š¦šØššš„ šš®š š¢š šš«š¢šš¬ š¢šš¬ ššš¬š]
BetaEarth is a flexible (and relatively lightweight) emulator of the AlphaEarth annual product. It doesn't reproduce AlphaEarth's exact outputs, nor the product, but it reaches ~0.87 cosine similarity on held-out data and retains 97% of downstream land-cover classification accuracy. It only took 1-2 days to train.
It can encode any combination (including multi-temporal) of:
- Sentinel-2 L1C
- Sentinel-2 L2A
- Sentinel-1 RTC
- COP-DEM 30 product
The model weights are open, just like its training data (built exclusively using Major TOM). The GitHub repository provides a script for automated generation of embeddings across any footprint.
You can also try the workflow over small bounding boxes on the free Hugging Face web app!
āļø GitHub: https://github.com/asterisk-labs/beta-earth
š„ļø Web App: asterisk-labs/betaearth
š Models: https://huggingface.co/collections/asterisk-labs/beta-earth
šØ Colab: https://colab.research.google.com/github/asterisk-labs/beta-earth/blob/main/examples/generate_demo.ipynb
šļø Pre-print: https://github.com/asterisk-labs/beta-earth/blob/main/docs/beta_earth_preprint.pdf
The past year has brought many notable embedding products, like AlphaEarth, TESSERA or OlmoEarth. We are entering a phase where embeddings begin to act as a substitute for real observation data.
BetaEarth is an attempt to explore how much one can learn from a model based on its embeddings alone, and whether those embeddings can serve as a useful training target for other models. Huge credit to the AlphaEarth team for releasing the embedding archive openly ā it's what made this kind of community-built extension possible.
[ššššššš«šš” š¢š¬ š§šØš š ššØš®š§šššš¢šØš§ š¦šØššš„ šš®š š¢š šš«š¢šš¬ š¢šš¬ ššš¬š]
BetaEarth is a flexible (and relatively lightweight) emulator of the AlphaEarth annual product. It doesn't reproduce AlphaEarth's exact outputs, nor the product, but it reaches ~0.87 cosine similarity on held-out data and retains 97% of downstream land-cover classification accuracy. It only took 1-2 days to train.
It can encode any combination (including multi-temporal) of:
- Sentinel-2 L1C
- Sentinel-2 L2A
- Sentinel-1 RTC
- COP-DEM 30 product
The model weights are open, just like its training data (built exclusively using Major TOM). The GitHub repository provides a script for automated generation of embeddings across any footprint.
You can also try the workflow over small bounding boxes on the free Hugging Face web app!
āļø GitHub: https://github.com/asterisk-labs/beta-earth
š„ļø Web App: asterisk-labs/betaearth
š Models: https://huggingface.co/collections/asterisk-labs/beta-earth
šØ Colab: https://colab.research.google.com/github/asterisk-labs/beta-earth/blob/main/examples/generate_demo.ipynb
šļø Pre-print: https://github.com/asterisk-labs/beta-earth/blob/main/docs/beta_earth_preprint.pdf