🎯 Enhancing LLM Efficiency with Confidence-Based Sampling
published on February 25

This paper discusses how to improve the efficiency of Large Language Models (LLMs) during testing by using model confidence to guide response sampling. Traditional methods like Best-of-N sampling and Self-Consistency require a fixed number of responses, which can lead to wasted resources or inadequate exploration of complex queries. The authors propose a technique called Self-Calibration, which helps LLMs provide more reliable confidence estimates by distilling information from previous responses. By implementing confidence-based strategies such as Early-Stopping, the paper shows that it is possible to enhance accuracy while reducing unnecessary computations, particularly in challenging tasks like MathQA.