MIT researchers have created a technique that helps large language models (LLMs) use their computing power more intelligently — spending more time on harder questions and less on simpler ones.
Current “inference-time scaling” methods give every question the same fixed computational budget. This wastes resources on easy tasks and limits performance on complex problems. The MIT team’s new instance-adaptive scaling instead adjusts the model’s effort dynamically, based on real-time estimates of difficulty and the likelihood that each partial solution will succeed.
The method relies on a calibrated process reward model (PRM) that evaluates how promising different reasoning paths are. By correcting PRMs’ tendency to be overconfident, the researchers ensure the model doesn’t cut computation prematurely.
In tests on challenging mathematical reasoning tasks, the approach used up to half the computation of existing techniques while maintaining similar accuracy. It also allowed smaller models to match or outperform larger ones on tough problems — potentially cutting energy use and improving reliability in high-stakes applications.
“By endowing models with the ability to know what they don’t know, we can enable them to spend more compute on the hardest problems and far fewer tokens on easy ones,” says Navid Azizan, senior author of the study.
The work, by researchers from MIT and the MIT-IBM Watson AI Lab, is being presented at the Conference on Neural Information Processing Systems.





















