🏆 2025 BEHAVIOR Challenge
Join us and solve 50 full-length household tasks in the realistic BEHAVIOR-1K environment, with 10,000 teleoperated expert demonstrations (1000+ hours) available! 🤖
Overview
BEHAVIOR is a robotics challenge for everyday household tasks. It's a large-scale, human-grounded benchmark that consists of three main components:
- 1,000 everyday household activities task definitions
- 50 fully interactive scenes and around 10,000 richly annotated objects
- OmniGibson, a simulation environment capable of modeling complex interactions with rigid bodies, deformable objects, and fluids
BEHAVIOR is the first challenge of its kind that requires a robot's capability in high-level reasoning, long-range locomotion, and dexterous bimanual manipulation in house-scale scenes. This year's challenge includes 50 tasks.
Challenge Components
Task Definitions
The benchmark includes 1,000 everyday household activities covering diverse behaviors across: rearrangement, cleaning/wiping, cooking/freezing, painting/spraying, hanging/installing, slicing/dicing, baking, and doing laundry.
Interactive Environments
- 50 fully interactive scenes with house-scale layouts
- 10,000+ richly annotated objects
OmniGibson Simulator
The simulation environment supports:
- Rigid body physics
- Deformable objects (cloth, fabric)
- Fluid interactions (water, oils)
- Object semantic states (e.g., open, filled, on-top, inside, etc.)
Data and Baselines
Dataset
The benchmark includes 10,000 human-demonstrated trajectories with diverse behaviors across all task categories. Each demonstration contains:
- Synchronized RGBD observations
- Object and part-level segmentation masks
- Ground-truth object states
- Robot proprioception
- Robot actions
- Skill and subtask annotations
Available Baseline Methods
Participants have access to training and evaluation pipelines for these baseline methods: ACT, Diffusion Policy, BC-RNN, WB-VIMA, OpenVLA, and π0.
Evaluation
Metrics
Agents are evaluated across three areas:
- Task completion rate (primary metric): Fraction of satisfied predicates in the goal condition of BDDL (BEHAVIOR Domain Definition Language) task definition
- Agent efficiency: Total distance traveled and energy expended during task execution
- Data efficiency: Total number of frames from demonstrations (IL) or simulator (RL) used during training
Reporting
- Results are reported with 95% confidence intervals
- Primary ranking based on task completion rate
- All metrics displayed on the leaderboard
- EvalAI platform used for team registration, submission and leaderboard management
Resources and Participation
Available Resources
All code, data, and documentation are open-source and available at behavior.stanford.edu, including:
- Tutorial on simulator installation
- 3D asset downloads
- Demonstration data download and visualization tools
- Starter code for baseline methods
- Challenge rules and protocols
How to Participate
- Register your team on the EvalAI platform
- Install the simulator and download the required data
- Develop your approach using the provided baselines and training pipelines
- Submit your results through EvalAI
- Track your progress on the leaderboard
The challenge provides comprehensive documentation, tutorials, and baseline implementations to help participants get started with developing household robotics solutions.