<aside> 💡

TL;DR: We are developing a compact VLA integrated into Lerobot and pre-trained on SO100/1 embodiment.

</aside>

https://discord.gg/TAFGuV2tE4

Why

<aside> 💡

No open-source plug-and-play VLA for ‘consumers’

</aside>

What

Knowledge Insulation inspired tokenised VLA (mostly VLM + separate de-noising expert)

Features in priority order

  1. Joint-training: tokenised (only for training) and de-noising (for fast inference)
    1. Test in sim (faster)
    2. Test on a real robot (50-100 eps per task)
  2. Infusing the de-noising timestamp at every level of the transformer
  3. Robot state as text
  4. System 2 and system 1 in one model (training and inference)
    1. Synthetic demonstrations relabeling
    2. Live audio guidance
  5. Webdata (Image cap, VQA, localisation)
  6. Metadata about the robot in lang

Datasets