More about HKUST
KV cache techniques for long context inference
The Hong Kong University of Science and Technology Department of Computer Science and Engineering Final Year Thesis Oral Defense Title: "KV cache techniques for long context inference" by LAM Hoi Kei Abstract: This project investigates Key-Value (KV) cache compression methods for efficient long-context inference in Large Language Models. We build upon ShadowKV, a state-of-the-art KV cache offloading framework, and propose two algorithmic variants: a low-frequency landmark method that discards the top 50% of head-dimension channels, and a hybrid scoring method that combines max-pooled high-frequency dot products with mean-pooled low-frequency landmarks. Through formal analysis of the Dirichlet-kernel low-pass filtering effect induced by RoPE mean-pooling, we show that low-frequency key components carry majority of semantic information for page selection, while high-frequency channels encode fine-grained positional detail critical for multi-key discrimination. Evaluated on Qwen2.5-7B-Instruct-1M using the RULER and SCBench benchmarks, the hybrid variant achieves a RULER average score of 86.76 and a SCBench average score of 33.45, outperforming the standard ShadowKV, with negligible runtime overhead due to an optimized CUTLASS-backed CUDA kernel. However, exact string-matching tasks remain fundamentally resistant to all sub-linear memory methods, revealing a limitation of pooling-based KV cache compression. Date : 28 April 2026 (Tuesday) Time : 16:00 - 16:40 Venue : Room 2126D (near Lift 19), HKUST Advisor : Prof. SONG Yangqiu 2nd Reader : Dr. FUNG May