Researchers at the University of Toronto Institute for Aerospace Studies (UTIAS) have introduced a pair of high-tech tools that could improve the safety and reliability of autonomous vehicles by enhancing the reasoning ability of their robotic systems.
The innovations address multi-object tracking, a process used by robotic systems to track the position and motion of objects – including vehicles, pedestrians and cyclists – to plan the path of self-driving cars in densely populated areas.
Tracking information is collected from computer vision sensors (2D camera images and 3D LIDAR scans) and filtered at each time stamp, 10 times a second, to predict the future movement of moving objects.
“Once processed, it allows the robot to develop some reasoning about its environment. For example, there is a human crossing the street at the intersection, or a cyclist changing lanes up ahead,” says Sandro Papais, a PhD student in UTIAS in the Faculty of Applied Science & Engineering. “At each time stamp, the robot’s software tries to link the current detections with objects it saw in the past, but it can only go back so far in time.”
In a new paper presented at the 2024 International Conference on Robotics and Automation in Yokohama, Japan, Papais and co-authors Robert Ren, a third-year engineering science student, and Professor Steven Waslander, director of UTIAS’s Toronto Robotics and AI Laboratory, introduce Sliding Window Tracker (SWTrack) – a graph-based optimization method that uses additional temporal information to prevent missed objects.
The tool is designed to improve the performance of tracking methods, particularly when objects are occluded from the robot’s point of view.
“SWTrack widens how far into the past a robot considers when planning,” says Papais. “So instead of being limited by what it just saw one frame ago and what is happening now, it can look over the past five seconds and then try to reason through all the different things it has seen.”
The team tested, trained and validated their algorithm on field data obtained through nuScenes, a public, large-scale dataset for autonomous driving vehicles that have operated on roads in cities around the world. The data includes human annotations that the team used to benchmark the performance of SWTrack.
They found that each time they extended the temporal window, to a maximum of five seconds, the tracking performance got better. But past five seconds, the algorithm’s performance was slowed by computation time.
“Most tracking algorithms would have a tough time reasoning over some of these temporal gaps. But in our case, we were able to validate that we can track over these longer periods of time and maintain more consistent tracking for dynamic objects around us,” says Papais.
Papais says he’s looking forward to building on the idea of improving robot memory and extending it to other areas of robotics infrastructure. “This is just the beginning,” he says. “We’re working on the tracking problem, but also other robot problems, where we can incorporate more temporal information to enhance perception and robotic reasoning.”
Another paper, co-authored by master’s student Chang Won (John) Lee and Waslander, introduces UncertaintyTrack, a collection of extensions for 2D tracking-by-detection methods that leverages probabilistic object detection.
“Probabilistic object detection quantifies the uncertainty estimates of object detection,” explains Lee. “The key thing here is that for safety-critical tasks, you want to be able to know when the predicted detections are likely to cause errors in downstream tasks such as multi-object tracking. These errors can occur because of low-lighting conditions or heavy object occlusion.
“Uncertainty estimates give us an idea of when the model is in doubt, that is, when it is highly likely to give errors in predictions. But there’s this gap because probabilistic object detectors aren’t currently used in multi-tracking object tracking.”
Lee worked on the paper as part of his undergraduate thesis in engineering science. Now a master’s student in Waslander’s lab, he is researching visual anomaly detection for the Canadarm3, Canada’s contribution to the U.S.-led Gateway lunar outpost. “In my current research, we are aiming to come up with a deep-learning-based method that detects objects floating in space that pose a potential risk to the robotic arm,” Lee says.
Waslander says the advancements outlined in the two papers build on work that his lab has been focusing on for a number of years.
“[The Toronto Robotics and AI Laboratory] has been working on assessing perception uncertainty and expanding temporal reasoning for robotics for multiple years now, as they are the key roadblocks to deploying robots in the open world more broadly,” Waslander says.
“We desperately need AI methods that can understand the persistence of objects over time, and ones that are aware of their own limitations and will stop and reason when something new or unexpected appears in their path. This is what our research aims to do.”