Researchers from NVIDIA, CMU, UC Berkeley, UT Austin, and UC San Diego presented HOVER, a “versatile neural whole-body controller for humanoid robots.” This multi-mode policy distillation framework allows the robots to move all their limbs with just one model.
“HOVER enables seamless transitions between control modes while preserving the distinct advantages of each, offering a robust and scalable solution for humanoid control across a wide range of modes.”
NVIDIA
NVIDIA’s Senior Research Manager and Lead of Embodied AI Jim Fan revealed that the team trained a neural network with only 1.5 million parameters, which might sound like a lot, but some models have billions of those, so this is pretty impressive.
“We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual ‘dojo’, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning,” he said.
NVIDIA
HOVER can perform several high-level motion tasks. Fan named a few of those “control modes”:
- Head and hand poses: can be captured by XR devices like Apple Vision Pro.
- Whole-body poses: via MoCap or RGB camera.
- Whole-body joint angles: exoskeleton.
- Root velocity command: joysticks.
NVIDIA
According to him, HOVER enables a unified interface to control a robot using “whichever input devices are convenient at hand,” an easier way to collect whole-body teleoperation data for training, and an upstream Vision-Language-Action model to provide motion instructions, which HOVER translates into low-level motor signals at high frequency.
If you’d like to see the technical side of the research, find the project here.
Also, join our 80 Level Talent platform and our Telegram channel, follow us on Instagram, Twitter, LinkedIn, TikTok, and Reddit, where we share breakdowns, the latest news, awesome artworks, and more.