KAIST researchers teach robots human judgment using short videos

by Park Sae-jin Posted : June 10, 2026, 10:54Updated : June 10, 2026, 10:54
From left Professor Yoo Chang-dong doctoral student and first author Luu Minh Tung in the center back and masters student and second author Kim Hwan-hee in the front right Courtesy of KAIST
From left: Professor Yoo Chang-dong, doctoral student and first author Luu Minh Tung in the center back, and master's student and second author Kim Hwan-hee in the front right. Courtesy of KAIST

SEOUL, June 10 (AJP) - Researchers in South Korea have built an artificial intelligence system that allows robots and self-driving cars to figure out human intentions just by watching a handful of videos. The new method heavily cuts down the amount of data needed to train these machines, the Korea Advanced Institute of Science and Technology said Wednesday.

A research team led by Professor Yoo Chang-dong at the Korea Advanced Institute of Science and Technology (KAIST)'s electrical engineering department designed the framework, called Video-based Optimal TransPort Preference (VOTP). The technology lets smart machines learn how to make the right choices without needing a person to manually grade thousands of different actions.

Instead of relying on massive datasets, the method only requires someone to provide about 10 short video clips showing both good and bad ways to complete a task. The system then uses a mathematical method called optimal transport to track tiny differences in movement. From there, it automatically guesses what humans would prefer in thousands of other situations, setting up virtual rewards to steer the machine.

This setup mimics how people learn to do new things by simply watching a few examples. Physical AI systems, like factory robots, self-driving cars, and medical equipment, need strict guidelines to choose the safest actions in complicated environments. Until now, creating those guidelines meant having workers spend countless hours evaluating robot behaviors one by one.

Some developers previously tried using text-based language models to speed up this grading process, but trying to describe highly specific mechanical movements with words often failed. The research team noted that their video-based approach works well for robotic arms, humanoid robots, drones, and software that controls computer screens.

The study was picked for an oral presentation at the International Conference on Machine Learning (ICML) 2026, which will take place in Seoul in July 2026. The paper, authored by Tung Minh Luu, Kim Hwan-hee, Lee Young-hwan, and Chang D. Yoo (Yoo Chang-dong), ranked in the top 0.7 percent of the 23,918 submissions the conference received.

"The core of physical AI is making machines understand human intent and select the correct actions," Yoo said. "VOTP can learn human judgment criteria from a small number of videos, and it is a core technology that will advance the era of robots making human-like judgments."