Google unveils TurboQuant cutting inference memory sixfold, chip stocks tumble
Google just dropped TurboQuant, an algorithm that seriously cuts down how much memory AI models need during inference by at least six times.
The news spooked investors, with SK Hynix shares falling 6.4%, Samsung nearly 5%, and Kioxia slid sharply.
TurboQuant compresses KV cache, preserves quality
TurboQuant shrinks the memory a model needs during inference (the key-value, or KV, cache) by compressing that cache without reducing model quality; the effect is expected to press on NAND flash storage rather than high-bandwidth memory (HBM).
Analysts are split: some worry about chip demand, while others think this boost in efficiency could actually drive more AI growth.
Plus, since TurboQuant is free to use and doesn't need retraining, companies can easily upgrade their AI, saving money and boosting privacy by keeping things on their own hardware instead of big data centers.