Control Tasks#

These are classic control tasks where the objective is to control a robot to reach a particular state, similar to the DM Control suite but with GPU parallelized simulation and rendering. The document here has both a high-level overview/list of all tasks in a table as well as detailed task cards with video demonstrations after.

Task Table#

Table of all tasks/environments in this category. Task column is the environment ID, Preview is a thumbnail pair of the first and last frames of an example success demonstration. Max steps is the task’s default max episode steps, generally tuned for RL workflows.

Task

Preview

Dense Reward

Success/Fail Conditions

Demos

Max Episode Steps

MS-AntRun-v1

1000

MS-AntWalk-v1

1000

MS-CartpoleBalance-v1

MS-CartpoleBalance-v1 MS-CartpoleBalance-v1

1000

MS-CartpoleSwingUp-v1

1000

MS-HopperHop-v1

600

MS-HopperStand-v1

600

MS-HumanoidRun-v1

1000

MS-HumanoidStand-v1

1000

MS-HumanoidWalk-v1

1000

MS-AntRun-v1#

dense-reward no-sparse-reward

Task Card

Task Description: Ant moves in x direction at 4 m/s

Randomizations:

  • Ant qpos and qvel have added noise from uniform distribution [-1e-2, 1e-2]

Success Conditions:

  • No specific success conditions.

MS-AntWalk-v1#

dense-reward no-sparse-reward

Task Card

Task Description: Ant moves in x direction at 0.5 m/s

Randomizations:

  • Ant qpos and qvel have added noise from uniform distribution [-1e-2, 1e-2]

Success Conditions:

  • No specific success conditions.

MS-CartpoleBalance-v1#

dense-reward sparse-reward

Task Card

Task Description: Use the Cartpole robot to balance a pole on a cart.

Randomizations:

  • Pole direction is randomized around the vertical axis. the range is [-0.05, 0.05] radians.

Fail Conditions:

  • Pole is lower than the horizontal plane

MS-CartpoleSwingUp-v1#

dense-reward no-sparse-reward

Task Card

Task Description: Use the Cartpole robot to swing up a pole on a cart.

Randomizations:

  • Pole direction is randomized around the whole circle. the range is [-pi, pi] radians.

Success Conditions:

  • No specific success conditions. The task is considered successful if the pole is upright for the whole episode.

MS-HopperHop-v1#

dense-reward sparse-reward

Task Card

Task Description: Hopper robot stays upright and moves in positive x direction with hopping motion

Randomizations:

  • Hopper robot is randomly rotated [-pi, pi] radians about y axis.

  • Hopper qpos are uniformly sampled within their allowed ranges

Success Conditions:

  • No specific success conditions. The task is considered successful if the pole is upright for the whole episode.

MS-HopperStand-v1#

dense-reward sparse-reward

Task Card

Task Description: Hopper robot stands upright

Randomizations:

  • Hopper robot is randomly rotated [-pi, pi] radians about y axis.

  • Hopper qpos are uniformly sampled within their allowed ranges

Success Conditions:

  • No specific success conditions.

MS-HumanoidRun-v1#

dense-reward no-sparse-reward

Task Card

Task Description: Humanoid moves in x direction at running pace

Randomizations:

  • Humanoid qpos and qvel have added noise from uniform distribution [-1e-2, 1e-2]

Fail Conditions:

  • Humanoid robot torso link leaves z range [0.7, 1.0]

MS-HumanoidStand-v1#

dense-reward no-sparse-reward

Task Card

Task Description: Humanoid robot stands upright

Randomizations:

  • Humanoid robot is randomly rotated [-pi, pi] radians about z axis.

  • Humanoid qpos and qvel have added noise from uniform distribution [-1e-2, 1e-2]

Fail Conditions:

  • Humanoid robot torso link leaves z range [0.7, 1.0]

MS-HumanoidWalk-v1#

dense-reward no-sparse-reward

Task Card

Task Description: Humanoid moves in x direction at walking pace

Randomizations:

  • Humanoid qpos and qvel have added noise from uniform distribution [-1e-2, 1e-2]

Fail Conditions:

  • Humanoid robot torso link leaves z range [0.7, 1.0]