Tasks
Description
Tasks define the high-level objectives that an agent must complete in a given Environment, subject to certain constraints (e.g. not flip over).
Tasks have two important internal variables:
_termination_conditions: a dict of {str:TerminationCondition} that define when an episode should be terminated. For each of the termination conditions,termination_condition.step(...)returns a tuple of(done [bool], success [bool]). If any of the termination conditions returnsdone = True, the episode is terminated. If any returnssuccess = True, the episode is cnosidered successful._reward_functions: a dict of {str:RewardFunction} that define how the agent is rewarded. Each reward function has areward_function.step(...)method that returns a tuple of(reward [float], info [dict]). Therewardis a scalar value that is added to the agent's total reward for the current step. Theinfois a dictionary that can contain additional information about the reward.
Tasks usually specify task-relevant observations (e.g. goal location for a navigation task) via the _get_obs method, which returns a tuple of (low_dim_obs [dict], obs [dict]), where the first element is a dict of low-dimensional observations that will be automatically flattened into a 1D array, and the second element is everything else that shouldn't be flattened. Different types of tasks should overwrite the _get_obs method to return the appropriate observations.
Tasks also define the reset behavior (in-between episodes) of the environment via the _reset_scene, _reset_agent, and _reset_variables methods.
_reset_scene: reset the scene for the next episode, default isscene.reset()._reset_agent: reset the agent for the next episode, default is do nothing._reset_variables: reset any internal variables as needed, default is do nothing.
Different types of tasks should overwrite these methods for the appropriate reset behavior, e.g. a navigation task might want to randomize the initial pose of the agent and the goal location.
Usage
Specifying
Every Environment instance includes a task, defined by its config that is passed to the environment constructor via the task key.
This is expected to be a dictionary of relevant keyword arguments, specifying the desired task configuration to be created (e.g. reward type and weights, hyperparameters for reset behavior, etc).
The type key is required and specifies the desired task class. Additional keys can be specified and will be passed directly to the specific task class constructor.
An example of a task configuration is shown below in .yaml form:
point_nav_example.yaml
Runtime
Environment instance has a task attribute that is an instance of the specified task class.
Internally, Environment's reset method will call the task's reset method, step method will call the task's step method, and the get_obs method will call the task's get_obs method.
Types
OmniGibson currently supports 5 types of tasks, 7 types of termination conditions, and 5 types of reward functions.
Task
DummyTaskDummy task with trivial implementations.
|
PointNavigationTaskPointGoal navigation task with fixed / randomized initial pose and goal location.
|
PointReachingTaskSimilar to PointNavigationTask, except the goal is specified with respect to the robot's end effector.
|
GraspTaskGrasp task for a single object.
|
BehaviorTaskBEHAVIOR task of long-horizon household activity.
|
Follow our tutorial on BEHAVIOR tasks!
To better understand how to use / sample / load / customize BEHAVIOR tasks, please read our BEHAVIOR tasks documentation!
TerminationCondition
TimeoutFailureCondition: episode terminates if max_step steps have passed.
|
FallingFailureCondition: episode terminates if the robot can no longer function (i.e.: falls below the floor height by at least
fall_height or tilt too much by at least tilt_tolerance).
|
MaxCollisionFailureCondition: episode terminates if the robot has collided more than max_collisions times.
|
PointGoalSuccessCondition: episode terminates if point goal is reached within distance_tol by the robot's base.
|
ReachingGoalSuccessCondition: episode terminates if reaching goal is reached within distance_tol by the robot's end effector.
|
GraspGoalSuccessCondition: episode terminates if target object has been grasped (by assistive grasping).
|
PredicateGoalSuccessCondition: episode terminates if all the goal predicates of BehaviorTask are satisfied.
|
RewardFunction
CollisionRewardPenalization of robot collision with non-floor objects, with a negative weight r_collision.
|
PointGoalRewardReward for reaching the goal with the robot's base, with a positive weight r_pointgoal.
|
ReachingGoalRewardReward for reaching the goal with the robot's end-effector, with a positive weight r_reach.
|
PotentialRewardReward for decreasing some arbitrary potential function value, with a positive weight r_potential.
It assumes the task already has get_potential implemented.
Generally low potential is preferred (e.g. a common potential for goal-directed task is the distance to goal).
|
GraspRewardReward for grasping an object. It not only evaluates the success of object grasping but also considers various penalties and efficiencies. The reward is calculated based on several factors:
|