An open, modular framework for zero-shot, language conditioned pick-and-drop tasks in arbitrary homes.


OK-Robot in action

In 10 home environments of New York City, OK-Robot attempted 171 pick-and-drop tasks. Here are sample trials from 5 homes, each showing 5 tasks.


Understanding the performance of OK-Robot

A sankey diagram showing the analysis of success and failure modes of OK-Robot.

While our method can show zero-shot generalization in completely new environments, we probe OK-Robot to better understand when and how it succeeds and fails. While we find a 58.5% success rate at completely novel homes, at a closer look, we also notice a long tail of failure causes, which is presented in the figure above. We see that the leading three cause of failures are failing to retrieve the right object to navigate to from the semantic memory (9.3%), getting a difficult pose from the manipulation module (8.0%), and hardware difficulties (7.5%).

In the "Understanding the performance of OK-Robot" section of the paper, we go over the analysis of the failure modes presented in the figure above and discuss the most frequent cases.


OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics

  title={OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics},
  author={Liu, Peiqi and Orru, Yaswanth and Paxton, Chris and Shafiullah, Nur Muhammad Mahi and Pinto, Lerrel},
  journal={arXiv preprint arXiv:2401.12202},