Lerrel Pinto
Assistant Professor of CS @nyuniversity.
I like robots!
- We just released RUKA, a $1300 humanoid hand that is 3D-printable, strong, precise, and fully open sourced! The key technical breakthrough here is that we can control joints and fingertips of the robot **without joint encoders**. All we need here is self-supervised data collection and learning.
- This project, which combines hardware design with learning-based controllers was a monumental effort led by @anyazorin.bsky.social and Irmak Guzey. More links and information about RUKA are below: Website: ruka-hand.github.io Assembly Instructions: ruka.gitbook.io/instructions
- When life gives you lemons, you pick them up. (trained with robotutilitymodels.com)
- [Not loaded yet]
- This would be funny! 😂
- Is there a word for the feeling when you want to cheer for the other team?
- The robot behaviors shown below are trained without any teleop, sim2real, genai, or motion planning. Simply show the robot a few examples of doing the task yourself, and our new method, called Point Policy, spits out a robot-compatible policy!
-
View full threadThe overall algorithm is simple: 1. Extract key points from human videos. 2. Train a transformer policy to predict future robot key points. 3. Convert predicted key points to robot actions.
- This project was an almost solo effort from @haldarsiddhant.bsky.social. And as always, this project is fully opensourced. Project page: point-policy.github.io Paper: arxiv.org/abs/2502.20391
- Point Policy uses sparse key points to represent both human demonstrators and robots, bridging the morphology gap. The scene is hence encoded through semantically meaningful key points from minimal human annotations.
- We just released AnySense, an iPhone app for effortless data acquisition and streaming for robotics. We leverage Apple’s development frameworks to record and stream: 1. RGBD + Pose data 2. Audio from the mic or custom contact microphones 3. Seamless Bluetooth integration for external sensors
- [Not loaded yet]
- It should be accessible in EU now!
- With this 'wild' robot data, data collected by AnySense can then be used to train multimodal policies! In the video above, we use the Robot Utility Models framework to train Visuo-Tactile policies for a whiteboard erasing task. You can use it for so much more though!
- AnySense is built to empower researchers with better tools for robotics. Try it out below. Download on App store: apps.apple.com/us/app/anyse... Open-source code on GitHub: github.com/NYU-robot-le... Website: anysense.app AnySense is led by @raunaqb.bsky.social with several from NYU.
- Just found a new winner for the most hype-baiting, unscientific plot I have seen. (From the recent Figure AI release)
- Thank you to @sloanfoundation.bsky.social for this generous award to our lab. Hopefully this will bring us closer to building truly general-purpose robots!
- 🎉Congrats to the 126 early-career scientists who have been awarded a Sloan Research Fellowship this year! These exceptional scholars are drawn from 51 institutions across the US and Canada, and represent the next generation of groundbreaking researchers. sloan.org/fellowships/...
- [Not loaded yet]
- Thanks Tucker! The timing of this is great given the uncertainty with other funding mechanisms.
- A fun, clever idea from @upiter.bsky.social : treat code generation as a sequential editing problem -- this gives you loads of training data from synthetically editing existing code And it works! Higher performance on HumanEval, MBPP, and CodeContests across small LMs like Gemma-2, Phi-3, Llama 3.1
- [Not loaded yet]
- Yes, this is one of our inspirations!
- [Not loaded yet]
- [Not loaded yet]
- Thanks Eugene! Sounds exciting!
- Hi Eugene, this sounds cool! Could you comment a bit on how well simulated driving agents translate to real world driving?
- We have been working a bunch on offline world models. Pre-trained features from DINOv2 seem really powerful for modeling. I hope this opens up a whole set of applications for decision making and robotics! Check out the thread from @gaoyuezhou.bsky.social for more details.
- At NYU Abu Dhabi today and in love how cat friendly the campus is!
- [Not loaded yet]
- nah they are friendly cat food by folks around NYU AD.
- [Not loaded yet]
- [Not loaded yet]
- Your robot looks cool!
- HOT 🔥 fastest, most precise, and most capable hand control setup ever... Less than $450 and fully open-source 🤯 by @huggingface, @therobotstudio, @NepYope This tendon-driven technology will disrupt robotics! Retweet to accelerate its democratization 🚀 A thread 🧵
- [Not loaded yet]
- Great stuff!!
- New paper! We show that by using keypoint-based image representation, robot policies become robust to different object types and background changes. We call this method Prescriptive Point Priors for robot Policies or P3-PO in short. Full project is here: point-priors.github.io
- P3-PO uses a one time “point prescription” by a human to identify key points. After this it uses semantic correspondence to find the same points on different instances of the same object.
- This work was led by @maralevy.bsky.social and a wonderful collaboration with @haldarsiddhant.bsky.social and @abhinav-sh.bsky.social !
- Modern policy architectures are unnecessarily complex. In our #NeurIPS2024 project called BAKU, we focus on what really matters for good policy learning. BAKU is modular, language-conditioned, compatible with multiple sensor streams & action multi-modality, and importantly fully open-source!
- BAKU consists of three modules: 1. Sensor encoders for vision, language, and state 2. Observation trunk to fuse multimodal inputs 3. Action head for predicting actions. This allows BAKU to combine different action models like VQ-BeT and Diffusion Policy under one framework.
- More details are here: baku-robot.github.io BAKU was led by @haldarsiddhant.bsky.social who is will be presenting at NeurIPS this Thursday from 11 a.m. PST — 2 p.m. PST. So catch him if you around!
- [Not loaded yet]
- Maybe you have to force the baby to use only two fingers 😆
- Since we are nearing the end of the year, I'll revisit some of our work I'm most excited about from the last year and maybe a sneak peek of what we are up to next. To start of, Robot Utility Models, which enables zero-shot deployment. In the video below, the robot hasnt seen these doors before.
- [Not loaded yet]
- Yeah I agree we need to be more thoughtful. Even if one believe that data scaling is the answer, there is so much algorithmic work left to be done to make the process more efficient.
- [Not loaded yet]
- The policies here use only video as we want the trained policies to also work on webcam cameras. But in a previous project (dobb-e.com) we use the depth from it as well.
- [Not loaded yet]
- Thank you! I think there is a nuanced conversation here. I think data is a crucial piece for robots, but pure data scaling for all possible things you would want your robot to do will need an ungodly amount of data.
- [Not loaded yet]
- Yes, that is a great catch. iPhones are an amazing vision sensor, much better than Realsense or other cameras we have tried.
- [Not loaded yet]
- Lol we thought quite a bit about the name 😆
- Our awesome undergrad lead on this project @haritheja.bsky.social took RUMs to Munich for CoRL 2024 and showed it work zero-shot in opening doors and drawers bought from German IKEA.
- RUMs is the brainchild of @notmahi.bsky.social with several insightful experiments. The most important one being that data diversity >> data quantity. Another insight is that regardless of the algorithm there is a similar-ish scaling law across tasks. Check out the paper: arxiv.org/abs/2409.05865
- There are three main components to build RUMs: diverse expert data + multi-modal behavior cloning + mLLM feedback. hardware, code & pretrained policies are fully opensourced: robotutilitymodels.com
- [Not loaded yet]
- [Not loaded yet]
- Also as @hasanpoonawala.bsky.social has mentioned, getting rid of touch calibration is pretty important as well. What I really hope is folks keep making better and cheaper skin / touch sensors, eventually being able to cover entire interaction surfaces with them.
- [Not loaded yet]
- Good question. I think if you had a sensitive enough F/T sensor on the wrist it could be used for a similar application. But typically these F/T sensors are quite expensive and hence usually applied to one point (the wrist). AnySkin is cheap, flexible, and can be applied on many points.
- Does everyone in your community agree on some folk knowledge that isn’t published anywhere? Put it in a paper! It’s a pretty valuable contribution
- [Not loaded yet]
- Thank you for your candor.
- I think we need an AMA series for Robotics / Embodied AI with an optional anonymous setting. Will be both fun and informative to new community members to absorb folk knowledge.
- [Not loaded yet]
- I think we need someone to organize it!
- [Not loaded yet]
- I would recommend taking a look at the Keynote lectures of recent robotics conferences (like CoRL, RSS, ICRA etc.). Maybe panel discussions of workshops? I agree that this is a very noisy way to get the current state of the art, and that is why we need a better mechanism.
- [Not loaded yet]
- I'm curious. Why?
- Are there examples of this done in other fields? Something we can take inspiration from?
- Nice work Remi and team. We need more of this!