Automated vehicles rely on different sensors to detect and track other vehicles and road users over time, to then be able to plan and execute safe trajectories. The characteristics of sensors are quite diverse and will probably be even more so in the future, so in this work we investigate how can a Multiple Object Tracking (MOT) method work with information coming from a large set of multimodal sensors, object detectors, and connectivity messages, and how can we ensure that it is robust to the noise that current object detectors face when deployed in real-world applications. We present a sensor-agnostic multimodal fusion framework for MOT that can seamlessly integrate information coming from different object detectors, sensors, and vehicle-to-everything (V2X) messages, either from other vehicles or from the infrastructure. All the information received is converted to a standardized set of detections, including position, classification, bounding box size, velocity, and covariance estimates. These detections are then combined using a Kalman Filter with a constant velocity model, and we propose methods to handle errors in classification and incorrect bounding box reconstruction. Two problems that are often ignored in the academic literature, although they are very relevant in practice. To evaluate our framework, we use 3 diverse and challenging scenarios. First, we present results for a perception system based on camera and radar that was integrated into a prototype traffic jam chauffeur function, we perform tests in proving grounds and report the performance of each sensor and our MOT solution compared to reference differential GNSS data. Then, we show qualitative results for a traffic monitoring application in highways, with multiple cameras, lidars and one radar. And finally. We show how our framework can integrate V2X messages to improve the safety of vulnerable road users such as pedestrians by either the VRU communicating its position or by receiving the position of the VRU from a road-side unit, our resulting perception system is tested with an autonomous emergency braking function in proving grounds. Our contributions are a sensor-agnostic framework for multiple object tracking, a method to handle errors in classification and incorrect bounding box reconstructions from object detectors, and a method to combine V2X messages with the onboard perception of the vehicle. To the best of our knowledge, it is the first time that a system combining V2X messages with onboard perception has been tested for safety-critical applications with vulnerable road users. Compared to our previous paper sent to an academic conference, we provide more details on the applications and include the integration of V2X messages and related experiments. In conclusion, in this work we present a flexible framework for multiple object tracking that can work with a large set of sensors and object detectors, as well as with connectivity information (V2X). We provide results in 3 practical and challenging use cases: A traffic jam chauffeur, traffic monitoring on highways, and V2X integration into an autonomous emergency brake function to increase the safety of vulnerable road users.
Mr. Marc Perez, R&D Engineer, Applus+ IDIADA, Institut de Robòtica i Informàtica Industrial CSIC-UPC