Watch a human catch a thrown object and you’ll notice something almost unremarkable: the hand shapes itself before contact, fingers pre-curving into a configuration that anticipates trajectory, weight, and spin. It takes years of childhood development to wire this up. Robots, until very recently, simply couldn’t do it. Not the catching — the anticipating. The whole-body, feed-forward choreography that makes manipulation look effortless. That gap is closing fast, and the way it’s closing tells you something profound about where embodied AI is headed.
The core problem with robotic dexterity was never actuation. Servo motors and tendon-driven fingers have been mechanically capable for years. The bottleneck was the sensorimotor loop: perceiving contact forces in real time, predicting how an object will move, and updating a grasp policy fast enough to matter. Classical control pipelines couldn’t integrate these fast enough. Early deep learning approaches needed thousands of real-world trials to learn anything useful, which made them impractical for anything beyond highly scripted lab conditions.
What changed the trajectory was a combination of three things arriving close together: high-density tactile sensing, simulation-to-real transfer at meaningful fidelity, and transformer-based policies trained on large demonstration datasets. None of these is new individually. Together, they’re genuinely different.
Tactile sensing is the piece that doesn’t get enough attention. Systems like GelSight-family sensors and more recent capacitive skin arrays now give robot hands something resembling the mechanoreceptor density of a human fingertip — thousands of contact points updating at hundreds of hertz. That data was useless without a policy network fast enough to act on it, but modern inference on-chip has gotten cheap enough that tactile feedback can now close the loop in under ten milliseconds. That’s the window where it starts to matter physically.
The simulation side saw its own inflection. PhysX and MuJoCo have been around forever, but the domain randomization techniques pioneered by groups like OpenAI’s robotics team — and significantly extended since — made it possible to train policies in simulation that transfer to real hardware without collapsing. You train across thousands of randomized friction coefficients, object masses, sensor noise profiles. What emerges is a policy robust enough to handle the messy reality of an actual grasped object. Recent work from groups at Berkeley, CMU, and several industrial labs has pushed this to genuinely complex tasks: in-hand reorientation of arbitrary objects, tool use, bi-manual coordination.
The demonstration-data side is where the scaling argument enters. Diffusion-based policy architectures — ACT, Diffusion Policy, and their successors — treat robot manipulation as a generative modeling problem. Given a history of observations and a goal, sample a trajectory. Train on enough human teleoperation data and the policy generalizes across object categories it’s never seen. The numbers here are still modest compared to language: the largest manipulation datasets are in the hundreds of thousands of demonstrations, not billions. But the community has been building shared datasets aggressively, and the scaling curves look similar in shape to what language researchers saw early on.
What makes the current moment feel different is that these threads are being integrated into full systems rather than studied in isolation. Humanoid platforms from Figure, Physical Intelligence, and others are deploying policies that chain perception, dexterous manipulation, and whole-body coordination into single end-to-end models. Failures are still frequent and the settings are still controlled. But the failure modes are shrinking, and the tasks are getting harder on purpose.
The reasonable near-term expectation isn’t a robot that can do everything. It’s a robot that can reliably handle the dense, unpredictable physicality of real environments — kitchens, warehouses, surgical suites — because it has finally learned to feel what it’s touching. That’s not a small thing. That’s the foundation everything else gets built on.