<aside>
💡
TL;DR: We are developing a compact VLA integrated into Lerobot and pre-trained on SO100/1 embodiment.
</aside>
https://discord.gg/TAFGuV2tE4
Why
<aside>
💡
No open-source plug-and-play VLA for ‘consumers’
</aside>
- Pi0: Jax or bugs in torch; expensive to train; not pretrained on so101; outdated (not tokenised)
- SmolVLA: questionable evaluation (no lang conditioning); outdated (not tokenised); serious design flow (frozen VLM)
- ACT/Diffusion: no lang
- Open-VLA+: poor performance; not pretrained on so101; not integrated in Lerobot
What
Knowledge Insulation inspired tokenised VLA (mostly VLM + separate de-noising expert)
Features in priority order
- Joint-training: tokenised (only for training) and de-noising (for fast inference)
- Test in sim (faster)
- Test on a real robot (50-100 eps per task)
- Infusing the de-noising timestamp at every level of the transformer
- Robot state as text
- System 2 and system 1 in one model (training and inference)
- Synthetic demonstrations relabeling
- Live audio guidance
- Webdata (Image cap, VQA, localisation)
- Metadata about the robot in lang
Datasets