Google unveils TurboQuant cutting inference memory sixfold, chip stocks tumble

Technology Mar 28, 2026

Google just dropped TurboQuant, an algorithm that seriously cuts down how much memory AI models need during inference by at least six times.
The news spooked investors, with SK Hynix shares falling 6.4%, Samsung nearly 5%, and Kioxia slid sharply.

TurboQuant compresses KV cache, preserves quality

TurboQuant shrinks the memory a model needs during inference (the key-value, or KV, cache) by compressing that cache without reducing model quality; the effect is expected to press on NAND flash storage rather than high-bandwidth memory (HBM).
Analysts are split: some worry about chip demand, while others think this boost in efficiency could actually drive more AI growth.
Plus, since TurboQuant is free to use and doesn't need retraining, companies can easily upgrade their AI, saving money and boosting privacy by keeping things on their own hardware instead of big data centers.