Silicon Light: How Photonic Chips Are Bringing Trillion-Parameter Inference Down to a Whisper of Power

A single forward pass through a large language model consumes roughly the same energy as charging your phone. Multiply that by billions of queries a day, and the arithmetic gets uncomfortable fast. Now imagine routing that computation through light instead of electrons — cutting the energy per operation by an order of magnitude, running matrix multiplications at the speed photons travel through waveguides. That is not a distant dream. It is what photonic computing startups and research labs are actively building right now, and the hardware is starting to leave the lab.

The core physics is almost offensively elegant. Optical matrix-vector multiplication works by encoding numerical values as amplitudes of light signals, passing them through a mesh of Mach-Zehnder interferometers — tiny beam-splitters etched into silicon photonic chips — and reading the interference patterns as outputs. The operation happens at the speed of light, generates almost no heat compared to switching transistors, and consumes power mainly at the input/output conversion stages. For the dense linear algebra that dominates transformer inference, this is a nearly perfect fit.

Lightmatter, one of the furthest-along companies in this space, has moved past proof-of-concept. Their Passage interconnect fabric uses photonics not for the compute itself but for chip-to-chip communication, replacing copper traces with optical links that carry hundreds of terabits per second across a wafer-scale package with a fraction of the normal signaling energy. That matters enormously at the scale where modern inference clusters operate: interconnect bandwidth is frequently the bottleneck that prevents GPUs from being fully utilized, and photonic interconnects dissolve that bottleneck rather than incrementally pushing past it.

Meanwhile, Luminous Computing and MIT Lincoln Laboratory have been pursuing full optical matrix engines — chips where the multiply-accumulate operations themselves happen in the optical domain before ever touching an electronic register. The catch has always been precision. Photonic arithmetic naturally operates at relatively low numerical precision, which was a serious problem when models needed 16-bit or 32-bit floating point. But the field’s timing turns out to be fortunate: the aggressive quantization push across the AI research community, with models running well at INT8 and even INT4, is quietly making photonic compute far more viable. The precision requirements are coming down to meet the hardware’s natural range.

The thermal story is just as compelling as the energy one. Electronic compute at scale produces prodigious heat, which is why modern datacenters spend a meaningful fraction of their power budgets on cooling. Photonic multipliers generate almost no heat during the core computation — phonons don’t scatter the way electrons do in resistive switching. A dense photonic inference accelerator could sit in environments that would throttle conventional silicon, and datacenter designers are already thinking about what that changes for rack density.

There are real engineering challenges still in play. Fabricating photonic integrated circuits with high yield at wafer scale is genuinely hard — silicon photonics tolerances are tighter than standard CMOS, and defects in waveguide geometry translate directly into numerical errors. Hybrid architectures that pair a photonic matrix core with electronic control logic and memory are the current consensus approach, and getting the two domains to work in tight concert without the interface becoming the new bottleneck requires careful co-design.

But the trajectory is unmistakable. As transformer models scale further and inference costs become the dominant operational expense for AI products, the incentive to find fundamentally better physics keeps growing. Photonic chips offer something incrementally-better GPU generations cannot: a different substrate entirely, one where the most expensive operations in modern AI happen to be exactly what light does naturally and effortlessly. The question is no longer whether photonic inference accelerators arrive. It is how quickly the engineering catches up to what the physics already promises.