SEOUL, April 14 (AJP) - To reduce hallucinations, including time-based errors in large language models (LLMs), a joint team of researchers developed an automated diagnostic system that uses classic database theory to ensure AI provides up-to-date information, the Korea Advanced Institute of Science and Technology said Tuesday.
The research addresses a common frustration for users who find that chatbots often provide outdated facts. For example, when asked about a recently appointed government official, an AI might confidently name someone who left the position a year ago. This occurs because the models struggle to track how information changes over time, a problem known as temporal hallucination.
Korea Advanced Institute of Science and Technology (KAIST) Professor Hwang Eui-jong and his team worked with Microsoft Research (MSR) to apply temporal database design to AI evaluation. This theory, which has been refined over the last 40 years, allows the new system to automatically generate 13 different types of complex questions based on the chronological flow of data.
Previously, humans had to manually write and update test questions to check if an AI was staying current. The new automated framework eliminates this labor-intensive process. When real-world information changes, the system simply updates its internal database to refresh the evaluation criteria and correct the AI's logic.
According to the study, the system reduced the cost of building evaluation data by 51 percent. It also improved the detection of time-related errors by an average of 21.7 percent. Rather than just checking if the final answer is right, the system verifies whether the dates and timelines the AI uses to explain itself are actually accurate.
The researchers expect the technology to be particularly useful in high-stakes fields such as law and medicine. In these sectors, even a minor misunderstanding of a specific timeframe can lead to serious errors in advice or documentation.
"This research shows how classic database theories can solve modern reliability issues in artificial intelligence," Professor Hwang Eui-jong said. He added that turning professional data into evaluation resources will provide a practical foundation for verifying AI performance in specialized industries.
The study, which featured PhD student Kim So-yeon as the lead author, will be presented at the International Conference on Learning Representations (ICLR) 2026. MSR researchers Jindong Wang and Xing Xie also participated as co-authors.
(Reference Information)
Journal/Source: International Conference on Learning Representations (ICLR) 2026 Title: Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in Large Language Models
Link/DOI: https://arxiv.org/abs/2508.02045
Copyright ⓒ Aju Press All rights reserved.