☁️ FDR4VGT-CLOUD
Official model release accompanying the manuscript:
A multisensor deep learning framework for robust cloud segmentation in SPOT-VGT and Proba-V Julio Contreras, Cesar Aybar, Luis Gómez-Chova IEEE Geoscience and Remote Sensing Letters (submitted), 2026.
This model is the operational cloud masking algorithm selected for the ESA FDR4VGT SPOT-VGT and Proba-V archives reprocessing, delivering consistent cloud detection across the full SPOT-VGT (VGT1 1998–2003, VGT2 2002–2014) and Proba-V (2013–2020) record —a single sensor-agnostic model for the three missions.
✨ Overview
- Architecture: Hybrid DeepLabV3+ (MobileNetV2 backbone) + PixelWise MLP (
PW-DL3+) - Input: 4 Top-of-Atmosphere reflectance bands (Blue, Red, NIR, SWIR) — sensor-agnostic
- Supported sensors: SPOT-VGT1, SPOT-VGT2, Proba-V
- Input shape:
[B, 4, 512, 512] - Parameters: 12.65M (57.29 MB)
- Training: Weak-to-strong supervision — large-scale pre-training on 3,647 weakly-labeled scenes, followed by fine-tuning on 109 hand-annotated hard-example scenes.
🚀 Quick start
Installation
pip install mlstac rasterio torch==2.5.1
Inference
import torch
import mlstac
import rasterio as rio
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 1. Load the model
framework = mlstac.download(
file="https://huggingface.co/isp-uv-es/FDR4VGT-CLOUD/resolve/main/single/multisensor_single_1dpwdeeplabv3.json",
output_dir="FDR4VGT/single",
)
model = framework.model
# 2. Load a 4-band image (Blue, Red, NIR, SWIR)
with rio.open("https://huggingface.co/isp-uv-es/FDR4VGT-CLOUD/resolve/main/ensemble/rgb.tif") as src:
image = src.read()
# 3. Run large-scene inference (sliding window + Hann blending)
prob = framework.predict_large(
image=image,
model=model,
device=device,
batch_size=8, # increase on GPU to speed up; lower on CPU
num_workers=8,
nodata=0, # pixel value treated as invalid/padding
)
# 4. Binarize with the operational threshold
cloud_mask = (prob.squeeze() > 0.5).astype("uint8")
The binarization threshold (default
0.5) can be tuned per use case; the paper uses the F₂-optimal threshold on the validation set.
📊 Performance
Results on the manually-annotated test set (PW-DL3+, Multi-FT strategy) — mean over scenes:
| Sensor | F₂ | IoU | κ |
|---|---|---|---|
| Proba-V | 0.891 | 0.842 | 0.808 |
| SPOT-VGT | 0.949 | 0.898 | 0.829 |
The model substantially outperforms the legacy BS1 (physical thresholds) and BS2 (pixel-wise MLP) baselines on both sensors, with the largest gain on SPOT-VGT (ΔF₂ = +0.090 over BS1). Temporal analysis across the 1998–2020 archive shows no statistically significant discontinuity at the VGT→Proba-V transition (Mann-Whitney U, p > 0.05), in contrast to the legacy record.
📁 Repository layout
| Path | Description |
|---|---|
single/multisensor_single_1dpwdeeplabv3.json |
Operational single-model weights (PW-DL3+) |
ensemble/rgb.tif |
Example test scene (4-band TOA reflectance) |
📄 Citation
If you use this model, please cite:
@article{contreras2026fdr4vgt,
title = {A multisensor deep learning framework for robust cloud segmentation in SPOT-VGT and Proba-V},
author = {Contreras, Julio and Aybar, Cesar and G{\'o}mez-Chova, Luis},
journal = {IEEE Geoscience and Remote Sensing Letters},
year = {2026},
}
🙏 Acknowledgements
This work was supported by the European Space Agency (ESA) within the FDR4VGT: Fundamental Data Record for VGT project, leaded by VITO.
Developed at the Image Processing Laboratory (IPL), University of Valencia, Spain.