🌌 CorridorVLA

This repository provides the official implementation of CorridorVLA.

Direct spatial constraints for Vision-Language-Action models via sparse physical anchors

arXiv
Code
License


πŸ” TL;DR

  • Explore an alternative to common visual-style spatial guidance (e.g., predicting future images/videos) using text-style physical anchors
  • Predict sparse end-effector Ξ”-positions
  • Use them to impose an explicit corridor constraint on action generation
  • Achieves 83.21% success rate on LIBERO-Plus

🧠 Motivation

Existing VLA paradigm

  • Spatial guidance is encoded as visual-style tokens or latent features
  • Action generation is influenced indirectly through the backbone features

CorridorVLA

  • Predict compact physical quantities (spatial anchors)
  • Apply them as direct constraints in the loss
  • No need for heavy visual intermediate representations

πŸ—οΈ Method Overview

Key components

(1) Sparse Anchor Prediction

  • Predict $K$ future Ξ”-position anchors
  • Represent trajectory structure in a compact form

(2) Action Augmentation

  • Concatenate state-related physical quantities (e.g., Ξ”-positions) to the action vector
  • Enable joint prediction of state and action, providing implicit alignment between state space and action space

(3) Corridor Loss

  • Defines a tolerance region over the predicted trajectory
  • Penalizes deviations outside the region while allowing smooth convergence within it

πŸ‘‰ Behaves like a structured smooth-L1 in trajectory space


πŸ“Š Results

LIBERO-Plus (GR00T-based)

Variant Description AVG
base 75.23
c1 query=3 77.25
c2 + extra data 77.25
c3 + Ξ”pos anchors 79.21
c4 + corridor loss (CorridorVLA) 83.21

πŸ“ˆ Improvement:

  • +7.98% over baselines
  • Largest gain from explicit spatial constraint

βš™οΈ Implementation

  • Built on StarVLA

  • Minimal changes:

    • few prediction slots
    • loss terms
  • No heavy architecture redesign


πŸ“Œ Key Insights

  • Spatial guidance can be:

    • explicit (loss-level) instead of implicit (feature-level)
  • Physical quantities are:

    • more action-aligned
    • more interpretable
  • Simple constraints can:

    • significantly improve stability
    • reduce unstructured exploration

πŸ“– Citation

If you find this work useful, please cite:

@article{corridorvla2025,
  title={CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors},
  author={Dachong Li and ZhuangZhuang Chen and Jin Zhang and Jianqiang Li},
  year={2026},
  eprint={2604.21241},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2604.21241}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train lidc/CorridorVLA

Paper for lidc/CorridorVLA