A matrix multiplication — the fundamental arithmetic of every neural network — takes a photonic processor roughly the time it takes light to cross a thumbnail. Not microseconds. Not nanoseconds. Picoseconds. That is not a rounding error on conventional silicon; it is a different regime of physics entirely.
For the better part of a decade, photonic computing sat in an intriguing but frustrating middle distance: theoretically compelling, practically immature. The core idea is straightforward enough. Instead of encoding numbers as voltages and churning through transistors, you encode them as amplitudes and phases of laser light, then perform matrix multiplications using interference — the way waves naturally add and cancel. The computation happens at the speed of light and consumes no switching energy, because you are not switching anything. You are just letting photons propagate through a carefully engineered mesh of waveguides and beam splitters.
What has changed recently is the engineering catching up to the physics. Companies like Lightmatter have moved from proof-of-concept chips to systems being actively integrated into datacenter inference pipelines. Their Passage interconnect fabric, which uses photonics for chip-to-chip communication, has been deployed at meaningful scale — reducing the energy and latency penalty of moving data between conventional accelerators. That is a more conservative entry point than full optical neural network processing, but it is a real production deployment, and it demonstrates that photonic hardware is no longer purely a research curiosity.
The more ambitious play is optical matrix units performing the multiply-accumulate operations themselves. Here the physics is genuinely exciting and the constraints are real and worth understanding. Photonic matrix multipliers are astonishingly fast and energy-efficient for linear operations, but neural networks are not purely linear — they require nonlinearities, and implementing those optically is hard. Current hybrid approaches handle the linear layers in photonics and route activations through electronic nonlinearity stages. That adds latency and complexity. But for transformer inference, where attention and feed-forward layers dominate compute, the linear fraction is enormous. Even a hybrid system that offloads 70% of operations into photonics can yield dramatic efficiency gains.
There is also a thermal argument that rarely gets enough attention. Conventional GPU clusters running large inference workloads generate heat on a scale that now drives significant datacenter infrastructure cost. Photonic processors, operating without resistive switching losses, run fundamentally cooler. As inference volume scales — and it is scaling ferociously — that thermal profile starts to matter as much as raw FLOPS per watt.
The precision question is the other honest challenge. Electronic accelerators can be designed for INT8, FP16, or BF16 with high fidelity. Analog photonic systems face noise floors, fabrication variation, and phase drift that make achieving clean low-bit arithmetic harder. Researchers at MIT, Stanford, and several well-funded startups have been attacking this with a mix of error-correction schemes and novel training methods that make networks robust to the noise characteristics of the optical substrate. The results have been improving steadily, and the gap with electronic precision is narrowing.
What makes this moment feel genuinely pivotal is the convergence of two pressures. Transformer models are not shrinking — inference demand is compounding, and the energy cost of running frontier models at scale is becoming a first-order infrastructure problem. At the same time, photonic fabrication is benefiting from the same silicon foundry ecosystem that matured for conventional chips, making manufacturing more reliable and cheaper with each process generation.
The era when photonic AI processors were a physics demonstration in a university lab is ending. What comes next is a hardware landscape where light itself does some of the thinking — and the implications for how much inference we can afford to run, and how fast, are only beginning to become clear.