mani_skill.envs.sapien_env#
The BaseEnv class is the class you should inherit from if you want to create a new environment/task. The arguments here also describe all the possible run-time arguments you can pass when creating environments via gym.make.
- class mani_skill.envs.sapien_env.BaseEnv(num_envs: int = 1, obs_mode: str | None = None, reward_mode: str | None = None, control_mode: str | None = None, render_mode: str | None = None, shader_dir: str | None = None, enable_shadow: bool = False, sensor_configs: dict | None = {}, human_render_camera_configs: dict | None = {}, viewer_camera_configs: dict | None = {}, robot_uids: str | BaseAgent | List[str | BaseAgent] | None = None, sim_config: SimConfig | dict = {}, reconfiguration_freq: int | None = None, sim_backend: str = 'auto', render_backend: str = 'gpu', parallel_in_single_scene: bool = False, enhanced_determinism: bool = False)[source]#
Bases:
EnvSuperclass for ManiSkill environments.
- Parameters:
num_envs – number of parallel environments to run. By default this is 1, which means a CPU simulation is used. If greater than 1, then we initialize the GPU simulation setup. Note that not all environments are faster when simulated on the GPU due to limitations of GPU simulations. For example, environments with many moving objects are better simulated by parallelizing across CPUs.
obs_mode – observation mode to be used. Must be one of (“state”, “state_dict”, “none”, “sensor_data”, “rgb”, “depth”, “segmentation”, “rgbd”, “rgb+depth”, “rgb+depth+segmentation”, “rgb+segmentation”, “depth+segmentation”, “pointcloud”) The obs_mode is mostly for convenience to automatically optimize/setup all sensors/cameras for the given observation mode to render the correct data and try to ignore unecesary rendering. For the most advanced use cases (e.g. you have 1 RGB only camera and 1 depth only camera)
reward_mode – reward mode to use. Must be one of (“normalized_dense”, “dense”, “sparse”, “none”). With “none” the reward returned is always 0
control_mode – control mode of the agent. “*” represents all registered controllers, and the action space will be a dict.
render_mode – render mode registered in @SUPPORTED_RENDER_MODES.
shader_dir (Optional[str]) –
shader directory. Defaults to None. Setting this will override the shader used for all cameras in the environment. This is legacy behavior kept for backwards compatibility. The proper way to change the shaders used for cameras is to either change the environment code or pass in sensor_configs/human_render_camera_configs with the desired shaders.
Previously the options are “default”, “rt”, “rt-fast”. “rt” means ray-tracing which results in more photorealistic renders but is slow, “rt-fast” is a lower quality but faster version of “rt”.
enable_shadow (bool) – whether to enable shadow for lights. Defaults to False.
sensor_configs (dict) – configurations of sensors to override any environment defaults. If the key is one of sensor names (e.g. a camera), the config value will be applied to the corresponding sensor. Otherwise, the value will be applied to all sensors (but overridden by sensor-specific values). For possible configurations see the documentation see the sensors documentation.
human_render_camera_configs (dict) – configurations of human rendering cameras to override any environment defaults. Similar usage as @sensor_configs.
viewer_camera_configs (dict) – configurations of the viewer camera in the GUI to override any environment defaults. Similar usage as @sensor_configs.
robot_uids (Union[str, BaseAgent, List[Union[str, BaseAgent]]]) – List of robots to instantiate and control in the environment.
sim_config (Union[SimConfig, dict]) – Configurations for simulation if used that override the environment defaults. If given a dictionary, it can just override specific attributes e.g.
sim_config=dict(scene_config=dict(solver_iterations=25)). If passing in a SimConfig object, while typed, will override every attribute including the task defaults. Some environments define their own recommended default sim configurations via theself._default_sim_configattribute that generally should not be completely overriden.reconfiguration_freq (int) – How frequently to call reconfigure when environment is reset via self.reset(…) Generally for most users who are not building tasks this does not need to be changed. The default is 0, which means the environment reconfigures upon creation, and never again.
sim_backend (str) – By default this is “auto”. If sim_backend is “auto”, then if
num_envs == 1, we use the PhysX CPU sim backend, otherwise we use the PhysX GPU sim backend and automatically pick a GPU to use. Can also be “physx_cpu” or “physx_cuda” to force usage of a particular sim backend. To select a particular GPU to run the simulation on, you can pass “cuda:n” where n is the ID of the GPU, similar to the way PyTorch selects GPUs. Note that if this is “physx_cpu”, num_envs can only be equal to 1.render_backend (str) – By default this is “gpu”. If render_backend is “gpu”, then we auto select a GPU to render with. It can be “cuda:n” where n is the ID of the GPU to render with. If this is “cpu”, then we render on the CPU.
parallel_in_single_scene (bool) – By default this is False. If True, rendered images and the GUI will show all objects in one view. This is only really useful for generating cool videos showing all environments at once but it is not recommended otherwise as it slows down simulation and rendering.
enhanced_determinism (bool) – By default this is False and env resets will reset the episode RNG only when a seed / seed list is given. If True, the environment will reset the episode RNG upon each reset regardless of whether a seed is provided. Generally enhanced_determinisim is not needed and users are recommended to pass seeds into the env reset function instead.
- SUPPORTED_OBS_MODES = ('state', 'state_dict', 'none', 'sensor_data', 'any_textures', 'pointcloud')#
The string observation modes the environment supports. Note that “none” and “any_texture” are special keys. none indicates no observation data is generated. “any_texture” indicates that any combination of image textures generated by cameras are supported e.g. rgb+depth, normal+segmentation, albedo+rgb+depth etc. For a full list of supported textures see
- SUPPORTED_RENDER_MODES = ('human', 'rgb_array', 'sensors', 'all')#
The supported render modes. Human opens up a GUI viewer. rgb_array returns an rgb array showing the current environment state. sensors returns an rgb array but only showing all data collected by sensors as images put together
- SUPPORTED_REWARD_MODES = ('normalized_dense', 'dense', 'sparse', 'none')#
- SUPPORTED_ROBOTS: List[str | Tuple[str]] = None#
Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups
- action_space: Space#
the batched action space of the environment, which is also the action space of the agent
- add_to_state_dict_registry(object: Actor | Articulation)[source]#
- close()[source]#
After the user has finished using the environment, close contains the code necessary to “clean up” the environment.
This is critical for closing rendering windows, database or HTTP connections. Calling
closeon an already closed environment has no effect and won’t raise an error.
- compute_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
Compute the dense reward.
- Parameters:
obs (Any) – The observation data. By default the observation data will be in its most raw form, a dictionary (no flattening, wrappers etc.)
action (torch.Tensor) – The most recent action.
info (Dict) – The info dictionary.
- compute_normalized_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
Compute the normalized dense reward.
- Parameters:
obs (Any) – The observation data. By default the observation data will be in its most raw form, a dictionary (no flattening, wrappers etc.)
action (torch.Tensor) – The most recent action.
info (Dict) – The info dictionary.
- compute_sparse_reward(obs: Any, action: Tensor, info: Dict)[source]#
Computes the sparse reward. By default this function tries to use the success/fail information in returned by the evaluate function and gives +1 if success, -1 if fail, 0 otherwise.
- Parameters:
obs (Any) – The observation data. By default the observation data will be in its most raw form, a dictionary (no flattening, wrappers etc.)
action (torch.Tensor) – The most recent action.
info (Dict) – The info dictionary.
- property control_freq#
The frequency (Hz) of the control loop
- property control_mode: str#
The control mode of the agent
- property control_timestep#
The timestep (dt) of the control loop
- property elapsed_steps: Tensor#
The number of steps that have elapsed in the environment
- evaluate() dict[source]#
Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key
This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.
By default if not overriden this function returns an empty dictionary
- get_info() dict[source]#
Get info about the current environment state, include elapsed steps and evaluation information
- get_obs(info: Dict | None = None, unflattened: bool = False)[source]#
Return the current observation of the environment. User may call this directly to get the current observation as opposed to taking a step with actions in the environment.
Note that some tasks use info of the current environment state to populate the observations to avoid having to compute slow operations twice. For example a state based observation may wish to include a boolean indicating if a robot is grasping an object. Computing this boolean correctly is slow, so it is preferable to generate that data in the info object by overriding the self.evaluate function.
- Parameters:
info (Dict) – The info object of the environment. Generally should always be the result of self.get_info(). If this is None (the default), this function will call self.get_info() itself
unflattened (bool) – Whether to return the observation without flattening even if the observation mode (self.obs_mode) asserts to return a flattened observation.
- get_reward(obs: Any, action: Tensor, info: Dict)[source]#
Compute the reward for environment at its current state. observation data, the most recent action, and the info dictionary (generated by the self.evaluate() function) are provided as inputs. By default the observation data will be in its most raw form, a dictionary (no flattening, wrappers etc.)
- Parameters:
obs (Any) – The observation data.
action (torch.Tensor) – The most recent action.
info (Dict) – The info dictionary.
- get_sensor_images() Dict[str, Dict[str, Tensor]][source]#
Get image (RGB) visualizations of what sensors currently sense. This function calls self._get_obs_sensor_data() internally which automatically hides objects and updates the render
- get_state()[source]#
Get environment state as a flat vector, which is just a ordered flattened version of the state_dict.
Users should not override this function
- get_state_dict()[source]#
Get environment state dictionary. Override to include task information (e.g., goal)
- property gpu_sim_enabled#
Whether the gpu simulation is enabled.
- metadata: dict[str, Any] = {'render_modes': ('human', 'rgb_array', 'sensors', 'all')}#
- property obs_mode: str#
The current observation mode. This affects the observation returned by env.get_obs()
- obs_mode_struct#
dataclass describing what observation data is being requested by the user, detailing if state data is requested and what visual data is requested
- print_sim_details()[source]#
Debug tool to call to simply print a bunch of details about the running environment, including the task ID, number of environments, sim backend, etc.
- remove_from_state_dict_registry(object: Actor | Articulation)[source]#
- render()[source]#
Either opens a viewer if
self.render_modeis “human”, or returns an array that you can use to save videos.If
self.render_modeis “rgb_array”, usually a higher quality image is rendered for the purpose of viewing only.If
self.render_modeis “sensors”, all visual observations the agent can see is providedIf
self.render_modeis “all”, this is then a combination of “rgb_array” and “sensors”
- render_human()[source]#
render the environment by opening a GUI viewer. This also returns the viewer object. Any objects registered in the _hidden_objects list will be shown
- render_rgb_array(camera_name: str | None = None)[source]#
Returns an RGB array / image of size (num_envs, H, W, 3) of the current state of the environment. This is captured by any of the registered human render cameras. If a camera_name is given, only data from that camera is returned. Otherwise all camera data is captured and returned as a single batched image. Any objects registered in the _hidden_objects list will be shown
- render_sensors()[source]#
Renders all sensors that the agent can use and see and displays them in a human readable image format. Any objects registered in the _hidden_objects list will not be shown
- reset(seed: None | int | list[int] = None, options: None | dict = None)[source]#
Reset the ManiSkill environment with given seed(s) and options. Typically seed is either None (for unseeded reset) or an int (seeded reset). For GPU parallelized environments you can also pass a list of seeds for each parallel environment to seed each one separately.
If options[“env_idx”] is given, will only reset the selected parallel environments. If options[“reconfigure”] is True, will call self._reconfigure() which deletes the entire physx scene and reconstructs everything. Users building custom tasks generally do not need to override this function.
Returns the first observation and a info dictionary. The info dictionary is of type
{ "reconfigure": bool # (True if the env reconfigured. False otherwise) }
Note that ManiSkill always holds two RNG states, a main RNG, and an episode RNG. The main RNG is used purely to sample an episode seed which helps with reproducibility of episodes and is for internal use only. The episode RNG is used by the environment/task itself to e.g. randomize object positions, randomize assets etc. Episode RNG is accessible by using self._batched_episode_rng which is numpy based and torch.rand which can be used to generate random data on the GPU directly and is seeded. Note that it is recommended to use self._batched_episode_rng if you need to ensure during reconfiguration the same objects are loaded. Reproducibility and seeding when there is GPU and CPU simulation can be tricky and we recommend reading the documentation for more recommendations and details on RNG https://maniskill.readthedocs.io/en/latest/user_guide/concepts/rng.html
Upon environment creation via gym.make, the main RNG is set with fixed seeds of 2022 to 2022 + num_envs - 1 (seed is just 2022 if there is only one environment) During each reset call, if seed is None, main RNG is unchanged and an episode seed is sampled from the main RNG to create the episode RNG. If seed is not None, main RNG is set to that seed and the episode seed is also set to that seed. This design means the main RNG determines the episode RNG deterministically.
- property reward_mode#
- property robot_link_names#
Get link ids for the robot. This is used for segmentation observations.
- scene: ManiSkillScene = None#
the main scene, which manages all sub scenes. In CPU simulation there is only one sub-scene
- property segmentation_id_map#
Returns a dictionary mapping every ID to the appropriate Actor or Link object
- set_state(state: Tensor | ndarray | Sequence, env_idx: Tensor | None = None)[source]#
Set environment state with a flat state vector. Internally this reconstructs the state dictionary and calls env.set_state_dict
Users should not override this function
- set_state_dict(state: Dict, env_idx: Tensor | None = None)[source]#
Set environment state with a state dictionary. Override to include task information (e.g., goal)
Note that it is recommended to keep around state dictionaries as opposed to state vectors. With state vectors we assume the order of data in the vector is the same exact order that would be returned by flattening the state dictionary you get from env.get_state_dict() or the result of env.get_state()
- sim_config#
the final sim config after merging user overrides with the environment default
- property sim_freq: int#
The frequency (Hz) of the simulation loop
- property sim_timestep#
The timestep (dt) of the simulation loop
- step(action: None | ndarray | Tensor | Dict)[source]#
Take a step through the environment with an action. Actions are automatically clipped to the action space.
If
actionis None, the environment will proceed forward in time without sending any actions/control signals to the agent
- update_obs_space(obs: Tensor)[source]#
A convenient function to auto generate observation spaces if you modify them. Call this function if you modify the observations returned by env.step and env.reset via an observation wrapper.
The recommended way to use this is in a observation wrapper is as so
import gymnasium as gym from mani_skill.envs.sapien_env import BaseEnv class YourObservationWrapper(gym.ObservationWrapper): def __init__(self, env): super().__init__(env) self.base_env.update_obs_space(self.observation(self.base_env._init_raw_obs)) @property def base_env(self) -> BaseEnv: return self.env.unwrapped def observation(self, obs): # your code for transforming the observation
- property viewer#