Tl;dr: After a brief hiatus, I present you with two really cool mathy, but application-driven papers. First, Alibaba show that you can train a model to solve math problems, in the same way you can train a model to solve chess/Go. The idea is to use reinforcement learning (i.e. value function estimation) to learn the optimal reasoning route for a model to determine the correct answer to some math problem. Unrelatedly, SnapKV is an interesting new way to speed up text generation by condensing the KV cache of a transformer to only store the ‘important’ tokens in the cache. What’s deemed important is automatically learned by an internal voting mechanism - the downside is that this mechanism doesn’t improve a model’s ability to handle long text.
FYI: My ‘popularity emoji’ is based on aggregate statistics of how many people have engaged with a paper on Twitter/X (as well as my own subjective personal interest).
Very popular (you really should know about this): 🔥
Popular (a good amount of people are discussing this): 😄
Less popular (but still worth making a mental note) : 🙂
AlphaMath Almost Zero: process Supervision without process
Popularity: 😄
Recent advancements in large language models (LLMs) have substantially enhanced their mathematical reasoning abilities. However, these models still struggle with complex problems that require multiple reasoning steps, frequently leading to logical or numerical errors. While numerical mistakes can largely be addressed by integrating a code interpreter, identifying logical errors within intermediate steps is more challenging. Moreover, manually annotating these steps for training is not only expensive but also demands specialized expertise. In this study, we introduce an innovative approach that eliminates the need for manual annotation by leveraging the Monte Carlo Tree Search (MCTS) framework to generate both the process supervision and evaluation signals automatically. Essentially, when a LLM is well-pretrained, only the mathematical questions and their final answers are required to generate our training data, without requiring the solutions. We proceed to train a step-level value model designed to improve the LLM’s inference process in mathematical domains. Our experiments indicate that using automatically generated solutions by LLMs enhanced with MCTS significantly improves the model’s proficiency in dealing with intricate mathematical reasoning tasks. The code for our method will be made available at https://github.com/MARIO-Math-Reasoning/Super_MARIO.
SnapKV : LLM Knows What You are Looking for Before Generation
Popularity: 😄
Large Language Models (LLMs) have made remarkable progress in processing extensive contexts, with the Key-Value (KV) cache playing a vital role in enhancing their performance. However, the growth of the KV cache in response to increasing input length poses challenges to memory and time efficiency. To address this problem, this paper introduces SnapKV, an innovative and fine-tuning-free approach that efficiently minimizes KV cache size while still delivering comparable performance in real-world applications. We discover that each attention head in the model consistently focuses on specific prompt attention features during generation. Meanwhile, this robust pattern can be obtained from an ‘observation’ window located at the end of the prompts. Drawing on this insight, SnapKV automatically compresses KV caches by selecting clustered important KV positions for each attention head. Our approach significantly reduces the growing computational overhead and memory footprint when processing long input sequences. Specifically, SnapKV achieves a consistent decoding speed with a 3.6x increase in generation speed and an 8.2x enhancement in memory efficiency compared to baseline when processing inputs of 16K tokens. At the same time, it maintains comparable performance to baseline models across 16 long sequence datasets. Moreover, SnapKV can process up to 380K context tokens on a single A100-80GB GPU using HuggingFace implementation with minor changes, exhibiting only a negligible accuracy drop in the Needle-in-a-Haystack test. Further comprehensive studies suggest SnapKV’s potential for practical applications. Our code is available at https://github.com/FasterDecoding/SnapKV.