Google Research's TurboQuant compresses the LLM key-value cache — the main GPU memory bottleneck — by 6x with zero accuracy loss and up to 8x inference speed-up. If other AI providers adopt this, the economics of running large models could change dramatically.