New metric tracks AI model "hallucination" errors
Researchers have created a new metric to measure and understand "hallucinations" in multimodal reasoning models, which are prone to generating false information. This could help improve the accuracy of AI. The new metric, called RH-AUC, and a diagnostic benchmark, RH-Bench, assess how a model's accuracy changes with reasoning length. Longer reasoning chains often lead to increased hallucinations, as models rely more on language priors. These tools will help researchers evaluate and improve multimodal large language models, which are used to generate content. The study found larger models often achieve a better balance between reasoning and perception.