New Study Suggests Large Language Models Are Not as Good at Reasoning as Claimed
A recent study by researchers at the Massachusetts Institute of Technology (MIT) and the University of Boston has cast doubt on the claim that large language models (LLMs) are particularly good at reasoning. The study found that while LLMs can perform well on standard tasks, they struggle with abductive reasoning tasks, which require the ability to make logical inferences and draw conclusions based on incomplete information.
The study tested 11 different abductive reasoning tasks, including tasks such as drawing a house, a penguin, a cake, and an unicorn based on given descriptions. While the LLM model GPT-4 was able to perform well on standard tasks, it struggled with the abductive reasoning tasks, often producing incomplete or inaccurate answers. The researchers suggest that this is because the models are not truly understanding the information they are processing, but rather memorizing and regurgitating it.
The study’s findings contradict the claims made by some providers of LLMs, who assert that their models are particularly good at reasoning. The researchers suggest that these claims may be overstated, and that LLMs may not be as effective at reasoning as they are purported to be.
The study also compared the performance of LLMs with that of humans on the abductive reasoning tasks. While humans took longer to answer the questions, they were more accurate in their responses. The researchers suggest that this may be because humans have a better understanding of the underlying concepts and are able to make more logical inferences.
The study’s findings are not the first to suggest that LLMs may not be as effective at reasoning as claimed. Other studies have also found that LLMs struggle with tasks that require logical inference and creativity. For example, one study found that an LLM was unable to solve a task that required it to imagine a situation where a piece of paper was folded in a certain way.
The study’s authors suggest that their findings have implications for the development of artificial general intelligence (AGI). They argue that if AGI is to be achieved, it will require more than just memorization and regurgitation of information. Instead, it will require a true understanding of the underlying concepts and the ability to make logical inferences.
In conclusion, the study suggests that large language models may not be as good at reasoning as claimed. While they can perform well on standard tasks, they struggle with abductive reasoning tasks that require the ability to make logical inferences and draw conclusions based on incomplete information. The findings have implications for the development of AGI and highlight the need for a more nuanced understanding of the capabilities and limitations of LLMs.