Welcome to the Stanford Vision and Learning Lab (SVL)
We at the Stanford Vision and Learning Lab (SVL) tackle fundamental open problems in computer vision research. We are intrigued by visual functionalities that give rise to semantically meaningful interpretations of the visual world.
Join us: If you are interested in research opportunities at SVL, please fill out this application survey. (Stanford students only)
BEHAVIOR is a human-centered simulation benchmark to evaluate embodied AI solutions. Embodied artificial intelligence (EAI) is advancing. But where are we now? We propose to test EAI agents with the physical challenges humans need to solve in their everyday life: household activities such as doing laundry, picking up toys, setting the table, or cleaning floors. BEHAVIOR is a benchmark in simulation where EAI agents need to plan and execute navigation and manipulation strategies based on sensor information to fulfill up to 1,000 household activities. BEHAVIOR tests the ability of agents to perceive the environment, plan, and execute complex long-horizon activities that involve multiple objects, rooms, and state changes, all with the reproducibility, safety, and observability offered by a realistic physics simulation.Link
ObjectFolder models the multisensory behaviors of real objects with 1) ObjectFolder 2.0, a dataset of 1,000 neural objects in the form of implicit neural representations with simulated multisensory data, and 2) ObjectFolder Real, a dataset that contains the multisensory measurements for 100 real-world household objects, building upon a newly designed pipeline for collecting the 3D meshes, videos, impact sounds, and tactile readings of real-world objects. It also contains a standard benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch. We open source both datasets and the benchmark suite to catalyze and enable new research in multisensory object-centric learning in computer vision, robotics, and beyond.Link
Multi-Object Multi-Actor (MOMA)
Multi-Object Multi-Actor (MOMA) is a compositional and hierarchical activity recognition framework for complex activities that involve multiple humans utilizing a variety of objects to accomplish certain tasks. We introduce activity graphs as the overarching and human interpretable representation of human activities in videos and activity parsing as the task of generating activity graphs.Link
People, AI & Robots Group (PAIR)
The People, AI & Robots Group (PAIR) is a research group under the Stanford Vision & Learning Lab that focuses on developing methods and mechanisms for generalizable robot perception and control. We work on challenging open problems at the intersection of computer vision, machine learning, and robotics. We develop algorithms and systems that unify in reinforcement learning, control theoretic modeling, and 2D/3D visual scene understanding to teach robots to perceive and to interact with the physical world.Link
Partnership in AI-Assisted Care
The Partnership in AI-Assisted Care (PAC) is an interdisciplinary collaboration between the School of Medicine and the Computer Science department focusing on cutting edge computer vision and machine learning technologies to solve some of healthcare's most important problems.Link Media
Our research addresses the theoretical foundations and practical applications of computational vision. We are focused on discovering and proposing the fundamental principles, algorithms and implementations for solving high-level visual perception and cognition problems involving computational geometry, automated image and video analysis, and visual reasoning. At the same time, our curiosity leads us to study the underlying neural mechanisms that enable the human visual system to perform high level visual tasks with amazing speed and efficiency.