Connect with us

Hi, what are you looking for?

Science, Research & Innovation

MIT Study Flags Hidden Weakness That Makes AI Models Less Reliable

Researchers report that instead of using domain knowledge to answer questions, LLMs can fall back on grammatical patterns learned during training

Image credit: Markus Winkler/Pexels

Large language models (LLMs) can sometimes learn the wrong lessons — a flaw that may make them less reliable in real-world applications, a new study from the Massachusetts Institute of Technology has found.

Researchers report that instead of using domain knowledge to answer questions, LLMs can fall back on grammatical patterns learned during training. These patterns, or “syntactic templates,” can become incorrectly linked to certain topics. As a result, a model may generate confident-sounding answers simply because it recognizes a familiar sentence structure, not because it actually understands the question.

The team found that even state-of-the-art LLMs exhibit this behavior, which can lead to unexpected failures across tasks such as handling customer queries, summarising clinical notes, or preparing financial reports. The flaw also carries safety implications: attackers could potentially exploit these syntactic associations to bypass built-in safeguards and trigger harmful responses.

Advertisement. Scroll to continue reading.

To study the problem, researchers created controlled experiments in which each domain used only one syntactic pattern during training. When tested with sentences that preserved the syntax but replaced meaningful words with synonyms, antonyms, or even random terms, the models still often returned the “correct” domain-specific answer — even when the question itself was gibberish.

“This is a byproduct of how we train models, but models are now used in practice in safety-critical domains far beyond the tasks that created these syntactic failure modes,” said Marzyeh Ghassemi, associate professor at MIT’s Department of Electrical Engineering and Computer Science and the senior author of the study. “If you’re not familiar with model training as an end-user, this is likely to be unexpected.”

The researchers have also developed a new benchmarking procedure to measure the degree to which a model relies on such faulty correlations. They say this could help developers detect and mitigate the issue before deploying models in high-stakes settings.

Advertisement. Scroll to continue reading.

The study was co-authored by Chantal Shaib of Northeastern University, Vinith Suriyakumar of MIT, Meta researcher Levent Sagun, and Northeastern professor Byron Wallace. The findings will be presented at the Conference on Neural Information Processing Systems (NeurIPS).

Author

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Policy & Society

New MIT research compares iodine-129 release from nuclear waste in the U.S., France, and alternative methods, highlighting lower emissions from deep underground disposal and...

Science, Research & Innovation

Bengaluru-based deep-tech startup Cablesmith has raised ₹50 lakh from IITB COMET Foundation to advance its AI- and photonics-driven Smart Optical Cable Management Platform for...

Campus & Community

The award-winning Goan student short film The Awakening, directed by filmmaker Rameez Shaikh, is set for its digital premiere on November 13, 2025, on...

Campus & Community

The Directorate General of Training (DGT) under the Ministry of Skill Development and Entrepreneurship (MSDE) has partnered with the Aditya Birla Capital Foundation (ABCF)...