# Vision Language Action Models ManiSkill supports evaluating and pretraining vision language action models. Currently the following VLAs have been tested via the ManiSkill framework: - [Octo](https://github.com/octo-models/octo) - [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer) - [RT-x](https://robotics-transformer-x.github.io/) RDT-1B uses some of the ManiSkill demonstrations for pretraining data and evaluates by fine-tuning on some demonstrations on various ManiSkill tasks, see their [README](#https://github.com/thu-ml/RoboticsDiffusionTransformer?tab=readme-ov-file#simulation-benchmark) for more details. Octo and RT series of models are evaluated through various real2sim environments as part of the SIMPLER project, see their [README](https://github.com/simpler-env/SimplerEnv/tree/maniskill3) for details on how to run the evaluation setup.