Lerrel Pinto
Assistant Professor of CS @nyuniversity.
I like robots!
- Reposted by Lerrel PintoHow is AI helping robots to generalise their skills to unfamiliar environments? 🤖 🏠 In the latest episode, I chatted to Prof. Lerrel Pinto (@lerrelpinto.com) from New York University about #robot learning and decision making. Available wherever you get your podcasts: linktr.ee/robottalkpod
- We just released RUKA, a $1300 humanoid hand that is 3D-printable, strong, precise, and fully open sourced! The key technical breakthrough here is that we can control joints and fingertips of the robot **without joint encoders**. All we need here is self-supervised data collection and learning.
- This project, which combines hardware design with learning-based controllers was a monumental effort led by @anyazorin.bsky.social and Irmak Guzey. More links and information about RUKA are below: Website: ruka-hand.github.io Assembly Instructions: ruka.gitbook.io/instructions
- When life gives you lemons, you pick them up. (trained with robotutilitymodels.com)
- Reposted by Lerrel Pinto[Not loaded yet]
- Is there a word for the feeling when you want to cheer for the other team?
- The robot behaviors shown below are trained without any teleop, sim2real, genai, or motion planning. Simply show the robot a few examples of doing the task yourself, and our new method, called Point Policy, spits out a robot-compatible policy!
- Point Policy uses sparse key points to represent both human demonstrators and robots, bridging the morphology gap. The scene is hence encoded through semantically meaningful key points from minimal human annotations.
- The overall algorithm is simple: 1. Extract key points from human videos. 2. Train a transformer policy to predict future robot key points. 3. Convert predicted key points to robot actions.
- This project was an almost solo effort from @haldarsiddhant.bsky.social. And as always, this project is fully opensourced. Project page: point-policy.github.io Paper: arxiv.org/abs/2502.20391
- Reposted by Lerrel PintoThis is important because the humble iPhone is one of the best accessories for embodied AI out there, if not actually the best. It's got a depth sensor, good camera, built-in internet, decent compute, and -- uniquely -- it has really good slam already built in.
- We just released AnySense, an iPhone app for effortless data acquisition and streaming for robotics. We leverage Apple’s development frameworks to record and stream: 1. RGBD + Pose data 2. Audio from the mic or custom contact microphones 3. Seamless Bluetooth integration for external sensors
- With this 'wild' robot data, data collected by AnySense can then be used to train multimodal policies! In the video above, we use the Robot Utility Models framework to train Visuo-Tactile policies for a whiteboard erasing task. You can use it for so much more though!
- AnySense is built to empower researchers with better tools for robotics. Try it out below. Download on App store: apps.apple.com/us/app/anyse... Open-source code on GitHub: github.com/NYU-robot-le... Website: anysense.app AnySense is led by @raunaqb.bsky.social with several from NYU.
- Reposted by Lerrel Pinto[Not loaded yet]
- Just found a new winner for the most hype-baiting, unscientific plot I have seen. (From the recent Figure AI release)
- Reposted by Lerrel PintoOne reason to be intolerant of misleading hype in tech and science is that tolerating the small lies and deception is how you get tolerance of big lies
- Thank you to @sloanfoundation.bsky.social for this generous award to our lab. Hopefully this will bring us closer to building truly general-purpose robots!
- 🎉Congrats to the 126 early-career scientists who have been awarded a Sloan Research Fellowship this year! These exceptional scholars are drawn from 51 institutions across the US and Canada, and represent the next generation of groundbreaking researchers. sloan.org/fellowships/...
- A fun, clever idea from @upiter.bsky.social : treat code generation as a sequential editing problem -- this gives you loads of training data from synthetically editing existing code And it works! Higher performance on HumanEval, MBPP, and CodeContests across small LMs like Gemma-2, Phi-3, Llama 3.1
- We have been working a bunch on offline world models. Pre-trained features from DINOv2 seem really powerful for modeling. I hope this opens up a whole set of applications for decision making and robotics! Check out the thread from @gaoyuezhou.bsky.social for more details.
- Reposted by Lerrel Pinto[Not loaded yet]
- Reposted by Lerrel Pintoomg a student somehow accidentally wrote an email addressed to a faculty-wide NYU listserv and my inbox is now a master class on who understands the difference between a listserv and an email chain
- Reposted by Lerrel PintoHumans vs Ants: Problem-solving Skills
- At NYU Abu Dhabi today and in love how cat friendly the campus is!
- Reposted by Lerrel PintoThis holiday season, take a moment to visit your local bookstore. It’s about more than finding a great book—it’s about supporting the small businesses that keep our communities thriving.
- Reposted by Lerrel PintoHOT 🔥 fastest, most precise, and most capable hand control setup ever... Less than $450 and fully open-source 🤯 by @huggingface, @therobotstudio, @NepYope This tendon-driven technology will disrupt robotics! Retweet to accelerate its democratization 🚀 A thread 🧵
- Reposted by Lerrel PintoOutstanding presentation, finally! DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control @jeffacce.bsky.social @lerrelpinto.com
- Reposted by Lerrel PintoLove this approach. Reminds me of a more detailed version of an idea I had. Will definitely look deeper into this ironj.github.io/eleuther/
- New paper! We show that by using keypoint-based image representation, robot policies become robust to different object types and background changes. We call this method Prescriptive Point Priors for robot Policies or P3-PO in short. Full project is here: point-priors.github.io
- P3-PO uses a one time “point prescription” by a human to identify key points. After this it uses semantic correspondence to find the same points on different instances of the same object.
- This work was led by @maralevy.bsky.social and a wonderful collaboration with @haldarsiddhant.bsky.social and @abhinav-sh.bsky.social !
- Modern policy architectures are unnecessarily complex. In our #NeurIPS2024 project called BAKU, we focus on what really matters for good policy learning. BAKU is modular, language-conditioned, compatible with multiple sensor streams & action multi-modality, and importantly fully open-source!
- BAKU consists of three modules: 1. Sensor encoders for vision, language, and state 2. Observation trunk to fuse multimodal inputs 3. Action head for predicting actions. This allows BAKU to combine different action models like VQ-BeT and Diffusion Policy under one framework.
- More details are here: baku-robot.github.io BAKU was led by @haldarsiddhant.bsky.social who is will be presenting at NeurIPS this Thursday from 11 a.m. PST — 2 p.m. PST. So catch him if you around!
- Reposted by Lerrel PintoRobot utility models are not just among the first learned models that work zero-shot on a mobile manipulator, but also provide a nuanced discussion on what works and what doesn't in data-driven robot learning.
- Since we are nearing the end of the year, I'll revisit some of our work I'm most excited about from the last year and maybe a sneak peek of what we are up to next. To start of, Robot Utility Models, which enables zero-shot deployment. In the video below, the robot hasnt seen these doors before.
- There are three main components to build RUMs: diverse expert data + multi-modal behavior cloning + mLLM feedback. hardware, code & pretrained policies are fully opensourced: robotutilitymodels.com
- Our awesome undergrad lead on this project @haritheja.bsky.social took RUMs to Munich for CoRL 2024 and showed it work zero-shot in opening doors and drawers bought from German IKEA.
- RUMs is the brainchild of @notmahi.bsky.social with several insightful experiments. The most important one being that data diversity >> data quantity. Another insight is that regardless of the algorithm there is a similar-ish scaling law across tasks. Check out the paper: arxiv.org/abs/2409.05865
- Reposted by Lerrel PintoI'd like to introduce what I've been working at @hellorobot.bsky.social: Stretch AI, a set of open-source tools for language-guided autonomy, exploration, navigation, and learning from demonstration. Check it out: github.com/hello-robot/... Thread ->
- Reposted by Lerrel PintoHow to drive your research forward? “I tested the idea we discussed last time. Here are some results. It does not work. (… awkward silence)” Such conversations happen so many times when meetings with students. How do we move forward? You need …
- Reposted by Lerrel Pinto[Not loaded yet]
- I think we need an AMA series for Robotics / Embodied AI with an optional anonymous setting. Will be both fun and informative to new community members to absorb folk knowledge.
- Reposted by Lerrel PintoI collected some folk knowledge for RL and stuck them in my lecture slides a couple weeks back: web.mit.edu/6.7920/www/l... See Appendix B... sorry, I know, appendix of a lecture slide deck is not the best for discovery. Suggestions very welcome.
- Reposted by Lerrel PintoInteresting article but the author drank the Kool-Aid and never sought out other viewpoints: “Foundation models like GPT-4 have largely subsumed [previous] models that help robots with planning and vision, and locomotion and dexterity will probably soon be subsumed, too.”
- Nice work Remi and team. We need more of this!