mani_skill.vector.wrappers.gymnasium#

Classes#

ManiSkillVectorEnv

Gymnasium Vector Env implementation for ManiSkill environments running on the GPU for parallel simulation and optionally parallel rendering

Module Contents#

class mani_skill.vector.wrappers.gymnasium.ManiSkillVectorEnv(env, num_envs=1, auto_reset=True, ignore_terminations=False, record_metrics=False, **kwargs)[source]#

Bases: gymnasium.vector.VectorEnv

Gymnasium Vector Env implementation for ManiSkill environments running on the GPU for parallel simulation and optionally parallel rendering

Parameters:

env (Union[gymnasium.Env, str]) – The environment created via gym.make / after wrappers are applied. If a string is given, we use gym.make(env) to create an environment
num_envs (int) – The number of parallel environments. This is only used if the env argument is a string
env_kwargs – Environment kwargs to pass to gym.make. This is only used if the env argument is a string
auto_reset (bool) – Whether this wrapper will auto reset the environment (following the same API/conventions as Gymnasium). Default is True (recommended as most ML/RL libraries use auto reset)
ignore_terminations (bool) – Whether this wrapper ignores terminations when deciding when to auto reset. Terminations can be caused by the task reaching a success or fail state as defined in a task’s evaluation function. Default is False, meaning there is early stop in episode rollouts. If set to True, this would generally for situations where you may want to model a task as infinite horizon where a task stops only due to the timelimit.
record_metrics (bool) – If True, the returned info objects will contain the metrics: return, length, success_once, success_at_end, fail_once, fail_at_end. success/fail metrics are recorded only when the environment has success/fail criteria. success/fail_at_end are recorded only when ignore_terminations is True.

call(name, *args, **kwargs)[source]#

Parameters:: name (str)

close_extras(**kwargs)[source]#: Clean up the extra resources e.g. beyond what’s in this base class.

get_attr(name)[source]#

Parameters:: name (str)

render()[source]#

Returns the rendered frames from the parallel environments.

Returns:: A tuple of rendered frames from the parallel environments

reset(*, seed=None, options=None)[source]#

Reset all parallel environments and return a batch of initial observations and info.

Parameters:

seed (Optional[Union[int, list[int]]]) – The environment reset seed
options (Optional[dict]) – If to return the options

Returns:

A batch of observations and info from the vectorized environment.

Example

>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> observations, infos = envs.reset(seed=42)
>>> observations
array([[ 0.0273956 , -0.00611216,  0.03585979,  0.0197368 ],
       [ 0.01522993, -0.04562247, -0.04799704,  0.03392126],
       [-0.03774345, -0.02418869, -0.00942293,  0.0469184 ]],
      dtype=float32)
>>> infos
{}

step(actions)[source]#

Take an action for each parallel environment.

Parameters:: actions (Union[mani_skill.utils.structs.types.Array, dict]) – Batch of actions with the action_space shape.
Returns:: Batch of (observations, rewards, terminations, truncations, infos)
Return type:: Tuple[mani_skill.utils.structs.types.Array, mani_skill.utils.structs.types.Array, mani_skill.utils.structs.types.Array, mani_skill.utils.structs.types.Array, dict]

Note

As the vector environments autoreset for a terminating and truncating sub-environments, this will occur on the next step after terminated or truncated is True.

Example

>>> import gymnasium as gym
>>> import numpy as np
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> _ = envs.reset(seed=42)
>>> actions = np.array([1, 0, 1], dtype=np.int32)
>>> observations, rewards, terminations, truncations, infos = envs.step(actions)
>>> observations
array([[ 0.02727336,  0.18847767,  0.03625453, -0.26141977],
       [ 0.01431748, -0.24002443, -0.04731862,  0.3110827 ],
       [-0.03822722,  0.1710671 , -0.00848456, -0.2487226 ]],
      dtype=float32)
>>> rewards
array([1., 1., 1.])
>>> terminations
array([False, False, False])
>>> terminations
array([False, False, False])
>>> infos
{}

auto_reset = True[source]#

property base_env: mani_skill.envs.sapien_env.BaseEnv[source]#

Return type:: mani_skill.envs.sapien_env.BaseEnv

property device[source]#

ignore_terminations = False[source]#

num_envs = 1[source]#

record_metrics = False[source]#

spec[source]#

property unwrapped[source]#: Return the base environment.