Study finds large language models struggle with real-world accuracy and adaptability
A new study from MIT, Harvard, and Cornell reveals that large language models (LLMs) like GPT-4 struggle to accurately understand and model the real world. While they can provide precise outputs, their underlying models often contain inaccuracies. Researchers tested LLMs using driving directions in New York City and found that when faced with unexpected changes, such as road closures, their accuracy dropped significantly. This raises concerns about their reliability in real-world applications, like autonomous vehicles. The study highlights the need for improved approaches to develop coherent world models in LLMs. Current models fail to adapt to dynamic environments, indicating that further research is necessary to enhance their performance and reliability.