VideoML - SANJAY RANKA

NSF 1922782: SCC: Video-based Machine Learning for Smart Traffic Analysis and Management

(05/01/2019- 04/30/2024)

We have developed algorithms and software to fuse real-time feeds from video cameras, LiDARs and traffic sensor data (primarily loop detectors) and developed machine learning algorithms to generate real- time detection, classification, and trajectories of individual vehicles and pedestrians [1, 2, 3]. Space-time trajectories are used to derive conflict points of the vehicle-pedestrian trajectories, identifying potential collisions in terms of near-misses and severe events [1, 4, 5, 6]. The use of LiDAR allows for tracking when video is not effective (e.g. night-time). For an input video and LiDAR stream, the system first detects and tracks all road users, then maps to the same Google Maps-based rectangular coordinate system. The derived trajectories are clustered based on their movement patterns. Finally, the conflict and severe event detection and analysis module generates severe event statistics, heatmaps, as well as video playback clips [3]. The key components are described below

Video based Detection and Tracking. The video-based object detection and tracking module utilizes
YOLO (which is constantly updated to newer versions) to detect different kinds of road participants, includ-
ing vehicles, pedestrians, cyclists, and motorcyclists. A modified version of the DeepSORT algorithm is
used to associate detections across frames and assign a unique ID for each object. The modification is
necessary because of the large distortion in fisheye videos. Specifically, the trajectories in fisheye videos
have unusual shapes and speeds, which do not work well with the Kalman Filter used in DeepSORT. There-
fore, we first align them to Google Map rectangular coordinates instead of original fisheye coordinates and
then compute distances. This mapping approach utilizes thin-plate splines to unwarp locations obtained
from video to their counterparts in Google Maps[7]. A visual analytics framework was designed to offer
convenience for developers who are in the process of validating algorithms related to trajectory genera-
tion, object classification, preprocessing, and clustering trajectories [8, 9]. We have devised an innovative
video-based signature ReID (Re-Identification) method characterized by high precision. This method as-
sumes a pivotal role in multi-camera vehicle tracking and travel time estimation [10]. A book, resulting
from this project—Video Based Machine learning for Traffic Intersections is available from CRC Press [6].

Further, we introduced a novel unsupervised shape correspondence learning method that adapts the clas-
sical locally linear embedding algorithm to find dense correspondences between shapes, demonstrating
significant improvements over existing approaches on standard datasets for both human and nonhuman
shapes [11]. Our video processing method utilizes intersection videos to track vehicle trajectories across
multiple cameras and estimate corridor travel time which was validated through experiments on real in-
tersections in Florida [10]. For efficient training we developed EfficientMORL that introduces a framework
for unsupervised multi-object representation learning that efficiently balances symmetry and disentangle-
ment biases, achieving strong object decomposition and disentanglement on standard benchmarks with
nearly an order of magnitude faster training and inference times compared to previous state-of-the-art
methods [12]. We also proposed a method for commodity classification from surveillance videos leveraging
text information from truck logos, utilizing a new benchmark dataset and achieving promising performance,
thereby facilitating automated commodity classification for trucks on highways through video collection and
processing [13].

LiDAR based detection and tracking. We have developed a LiDAR analysis software module consist-
ing of state-of-the-art algorithms to detect and track objects in point clouds, specifically Centerpoint and
SimpleTrack. The former identifies objects as key points and regresses their other attributes, such as 3-D
location, size, and heading orientation and consists of a standard 3-D backbone network, a center heatmap
head, and regression heads. The latter is a high-performing multi-object tracking approach that unifies
3D MOT methods into a general framework. For object motion modeling, we use standard Kalman fil-
ters which is effective due to the availability of high-precision measurements of objects (in the absence of
ego-motion). We associate the predicted location with detections in the next frame using efficient weighted
bipartite matching. The detector was trained on high-quality annotations using an efficient annotation tool—
FLORIDA (Florida LiDAR-based Object Recognition and Intelligent Data Annotation)—[14] and can detect
traffic participants with high precision. LiDAR tracks are also mapped to Google Maps.

We developed a novel framework that established spatiotemporal correspondences between point clouds, enabling unsupervised segmentation in outdoor LIDAR sequences, particularly in traffic intersection scenarios, achieving competitive performance compared to supervised methods [15]. We introduced the novel problem of sequential scene flow estimation (SSFE) aiming to predict 3D scene flow for all pairs of point clouds in a sequence, proposing the SPCM-Net architecture which leverages multi-scale spatiotemporal correlations to improve SSFE significantly over previous approaches, while also demonstrating its effectiveness in sequential point cloud forecasting (SPF) and introducing a new benchmark dataset for both SSFE and SPF evaluation [16]. We introduced a novel self-supervised approach for scene flow estimation between con-secutive point clouds by representing them as continuous probability density functions and utilizing theCauchy-Schwarz divergence for motion recovery, resulting in more robust and accurate scene flow compared to existing methods, demonstrated through comprehensive experiments achieving state-of-the-art performance on various datasets [17]

Ground Sensor data processing. We have access to High frequency (10Hz) loop detector data using
ATSPM (Automated Signal Performance Metrics) [18] at 300+ signalized intersections over Florida. We
have developed ML algorithms to utilize advance bar and stop bar ground sensor data which provides lane
by lane traffic flow information at 10Hz to predict (given signal phase and other metadata), the inflow and
outflow of traffic from the intersection, thereby acting as a digital twin for an intersection [19]. These predic-
tions can drive signal policy recommendations and we are currently conducting before/after studies to see
if new policies can benefit traffic and pedestrians. Our publications range from basic reverse engineering
of signal geography to the development of attention-based machine learning algorithms for the modeling
of traffic dynamics on a corridor. Other relevant publications include the construction of data analytics
systems [18, 20], and traffic interruption detection [21, 22]. A book entitled “Data Analytics and Machine
Learning for Integrated Corridor Management” is under contract with CRC press and will be published in
Summer 2024 [23].

DSRC data processing. In addition to video, LiDAR, and ground sensor data, we examined braking behaviors of Connected Vehicle Road-side Unit (RSU) data, addressing practical limitations in publicly-collected
data from Connected Vehicle Technology, and propose a tool for analyzing trajectories with a focus on hard-
braking events, aiming to provide insights for public traffic authorities to enhance urban arterial performance
and safety and inform policy decisions [24].

Pedestrian and Vehicular Safety Module. We use our ATSPM module to derive temporal synchronization and signal phasing information. This information is fused with video and LiDAR data into a set of coherent actor-centric tracks
slotted into global spatio-temporal coordinates. We cluster trajectories based on both location and velocity using Dynamic Time Warping (DTW) as well as ingress and egress lanes. Then, we compute a representative trajectory—
an exemplar—for each cluster by averaging multiple trajectories of the same cluster. For each trajectory, we compute its speed and acceleration at each timestamp, along with the lane and cluster belonging information. For pairs of
trajectories passing the intersection within the same time period, we compute surrogate safety measures: TTC measures the time it takes for two road users to collide if they keep their current velocity. PET is the time difference between
one road user leaving the conflict zone and the other road user entering the zone. A low TTC or PET value indicates a high risk of collision. Since the seminal work in 1987, near-misses defined via TTC/PET metrics have been
fundamental to the traffic conflict technique. With the advancements in video recording and processing, these methods have also been enhanced as described below. The severe events are detected based on conflicting trajectories (which refer to the paths or movements of vehicles or pedestrians that intersect or overlap
in space and time, potentially leading to conflicts or collisions) and additional criterion based on the following. For vehicle-to-vehicle (V2V) conflicts, we check for two conditions: (1) TTC/PET is less than 3 seconds and both vehicles are moving relatively fast; and (2) the vehicles properly yield to each other (deceleration is detected).

For pedestrian-to-vehicle (P2V) conflicts, we also check (1) whether TTC/PET is less than 3 seconds; (2) whether the pedestrian is moving at a reasonable speed (to exclude the cases where cyclists or motorcycles are wrongly detected as pedestrians); and (3) whether the pedestrian has left the curb. These thresholds for distance and time are tunable. However, our work shows that they are within a small range across several intersections. Browsing through several hundred hours of video data for near-misses at multiple intersections, we found that our definition of severe events captures the near-misses that are potentially dangerous and are roughly 10-50 times smaller in number. The reduced number of near misses enables easy navigation through identified conflict situations and facilitates theplanning of countermeasures. Overall, our system provides high level of detection and tracking accuracy, provide coverage for both day and night due to presence of LiDAR and Video and can accurately calculate both near misses and severe events [3].

Safety Studies. The above described system was developed in conjunction with input from traffic engineers at City of Gainesville and District 5. Their input was valuable in defining the notion of severe events
as an extension of near-misses. However, they were also interested in understanding the impact of any
changes on the traffic volumes. Using our system, we implemented the following approach at several inter-
sections with our partners at City of Gainesville to perform before and after studies at three intersections
[4]. For each intersection, we collected data for multiple signal phasing configurations and analyzed them
as follows using three software modules:

Volume hotspot detection module that computes pedestrian and vehicle volumes at fixed time periods(an hour). Using this, peak periods in pedestrian or vehicle volume were derived that need attention.
Intersection-service evaluation module computes fine-grained aggregation of entering vehicle volume at 5-second levels. A vehicle counts for all movements between the 5- and 15-second is used to represent the performance of the intersection [25].
The conflict hotspot detection module computes the temporal hotspots of P2V and V2V conflicts bythe conflict types and movements of the involved trajectories as described above.

We briefly present results for an intersection near a high school [4]. Peak pedestrian periods were found
in morning and late afternoon hours (matching with school hours). Severe events were found during late
afternoon period an specific area and between left turning vehicles and pedestrians. The City activated an
experimental signal pattern where there was an exclusive pedestrian phase during these hours on Wednes-
day and Thursday. We found that the number of severe events dropped because of this change. However,
the Intersection performance also dropped on Thursday while remained the same on Wednesday [26]. In
a separate work we demonstrated the enhanced safety and operational efficiency of intersections through
the fusion of loop detector data and trajectory information from video cameras, offering data-driven insights
for traffic management and safety improvements [27].

Performance. The detection, tracking, and mapping modules are implemented in an streaming fashion,
i.e., frame-by-frame data processing and message passing. The detectin runs at 12 Hz for both Video
and LiDAR on an NVIDIA Titan RTX GPU, and the tracking runs at 10 Hz. Note that the data acquisition
rate is 10 Hz. The downstream modules, namely trajectory fusion, near-miss detection, and intention
prediction are also performed in an streaming mode at 10z using a multicore CPU. The Titan RTX GPU is
a high-performance graphics card with 4,608 CUDA cores, providing immense parallel processing power
for compute-intensive tasks, and 576 Tensor Cores to accelerate AI workflows. It comes equipped with 24
GB of GDDR6 memory, providing ample capacity for large datasets and complex simulations.

Code. The code for our video processing pipeline can be accessed at https://github.com/taniabanerjee/UFVideoLidarSystem
Kindly reach out via email to sranka@ufl.edu for access.

References
[1] X. Huang, T. Banerjee, K. Chen, A. Rangarajan, and S. Ranka, “Machine learning
based video processing for real-time near-miss detection,” in 6th International Conference
on Vehicle Technology and Intelligent Transport Systems (VEHITS), 2020. [Online]. Available:
https://doi.org/10.5220/00093454016901794

[2] T. Banerjee, X. Huang, K. Chen, A. Rangarajan, and S. Ranka, “Clustering object trajectories for
intersection traffic analysis,” in 6th International Conference on Vehicle Technology and Intelligent
Transport Systems (VEHITS), 2020. [Online]. Available: https://doi.org/10.5220/0009422500980105
[3] A. Wu, T. Banerjee, K. Chen, A. Rangarajan, and S. Ranka, “A multi-sensor video/LiDAR system
for analyzing intersection safety,” in 26th IEEE International Conference on Intelligent Transportation
Systems (ITSC), 2023. [Online]. Available: https://doi.org/10.1109/ITSC57777.2023.10422349
[4] A. Mishra, K. Chen, S. Poddar, E. Posadas, A. Rangarajan, and S. Ranka, “Using video analytics to
improve traffic intersection safety and performance,” Vehicles, vol. 4, no. 4, pp. 1288–1313, 2022.
[Online]. Available: https://www.mdpi.com/2624-8921/4/4/68
[5] R. Sengupta, T. Banerjee, K. Chen, Y. Karnati, S. Ranka, and A. Rangarajan, “Towards
effective traffic signal safety and optimization using fisheye video,” in 9th International Conference
on Vehicle Technology and Intelligent Transport Systems (VEHITS), 2023. [Online]. Available:
https://doi.org/10.5220/0011729000003479
[6] T. Banerjee, X. Huang, A. Wu, K. Chen, A. Rangarajan, and S. Ranka, Video Based
Machine Learning for Traffic Intersections. CRC Press, 2023. [Online]. Available: https:
//www.amazon.com.be/-/en/Tania-Banerjee/dp/1032542268
[7] X. Huang, T. Banerjee, K. Chen, N. V. S. Varanasi, A. Rangarajan, and S. Ranka, “Machine learning
based video processing for real-time near-miss detection,” in VEHITS, 2020, pp. 169–179. [Online].
Available: https://doi.org/10.5220/0009345401690179
[8] K. Chen, T. Banerjee-Mishra, X. Huang, A. Rangarajan, and S. Ranka, “A visual analytics system for
processed videos from traffic intersections,” in International Conference on Vehicle Technology and
Intelligent Transport Systems, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:
219324448
[9] K. Chen, T. Banerjee, X. Huang, Z. Ding, V. S. Varanasi, A. Rangarajan, and S. Ranka, “A web-based
visual analytics system for traffic intersection datasets,” International Journal of Big Data Intelligence,
vol. 8, no. 1, pp. 76–88, 2021.
[10] X. Huang, P. He, A. Rangarajan, and S. Ranka, “Machine-learning-based real-time multi-camera
vehicle tracking and travel-time estimation,” Journal of Imaging, vol. 8, no. 4, 2022. [Online]. Available:
https://www.mdpi.com/2313-433X/8/4/101
[11] P. He, P. Emami, S. Ranka, and A. Rangarajan, “Learning canonical embeddings for
unsupervised shape correspondence with locally linear transformations,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 45, no. 12, pp. 14 872–14 887, 2023. [Online]. Available: https:
//doi.org/10.1109/TPAMI.2023.3307592
[12] P. Emami, P. He, S. Ranka, and A. Rangarajan, “Efficient iterative amortized inference for learning
symmetric and disentangled multi-object representations,” in Proceedings of the 38th International
Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, ser. Proceedings of
Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR, 2021, pp. 2970–2981.
[Online]. Available: http://proceedings.mlr.press/v139/emami21a.html
[13] P. He, A. Wu, X. Huang, A. Rangarajan, and S. Ranka, “Video-based machine learning system for
commodity classification,” in Proceedings of the 6th International Conference on Vehicle Technology
and Intelligent Transport Systems, VEHITS 2020, Prague, Czech Republic, May 2-4, 2020,
K. Berns, M. Helfert, and O. Gusikhin, Eds. SCITEPRESS, 2020, pp. 229–236. [Online]. Available:
https://doi.org/10.5220/0009393702290236
[14] A. Wu, P. He, X. Li, K. Chen, S. Ranka, and A. Rangarajan, “An efficient semi-automated scheme for
infrastructure LiDAR annotation,” arXiv preprint arXiv:2301.10732, 2023.
[15] X. Li, P. He, A. Wu, S. Ranka, and A. Rangarajan, “A spatiotemporal correspondenceapproach to unsupervised lidar segmentation with traffic applications,” 2023 IEEE 26th International

Conference on Intelligent Transportation Systems (ITSC), pp. 1014–1021, 2023. [Online]. Available:
https://api.semanticscholar.org/CorpusID:261101031
[16] P. He, P. Emami, S. Ranka, and A. Rangarajan, “Learning scene dynamics from point cloud
sequences,” Int. J. Comput. Vis., vol. 130, no. 3, pp. 669–695, 2022. [Online]. Available:
https://doi.org/10.1007/s11263-021-01551-y
[17] ——, “Self-supervised robust scene flow estimation via the alignment of probability density functions,”
in Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on
Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational
Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 – March 1, 2022. AAAI
Press, 2022, pp. 861–869. [Online]. Available: https://doi.org/10.1609/aaai.v36i1.19968
[18] D. Mahajan, T. Banerjee, Y. Karnati, A. Rangarajan, and S. Ranka, “A data driven approach to derive
traffic intersection geography using high resolution controller logs,” in VEHITS, 01 2020, pp. 203–210.
[19] Y. Karnati, R. Sengupta, and S. Ranka, “Intertwin: Deep learning approaches for computing
measures of effectiveness for traffic intersections,” Applied Sciences, vol. 11, no. 24, p. 11637, Dec
2021. [Online]. Available: http://dx.doi.org/10.3390/app112411637
[20] D. Mahajan, Y. Karnati, T. Banerjee, V. R. Regalla, R. Reddy, A. Rangarajan, and S. Ranka, “A scal-
able data analytics and visualization system for city-wide traffic signal data-sets,” in 2020 IEEE 23rd
International Conference on Intelligent Transportation Systems (ITSC), 2020, pp. 1–6.
[21] Y. Karnati, D. Mahajan, T. Banarjee, A. Rangarajan, and S. Ranka, “Data mining algorithms for traf-
fic interruption detection,” 2020, in press, 6th International Conference on Vehicle Technology and
Intelligent Transport Systems, VEHITS.
[22] Y. Karnati, D. Mahajan, A. Rangarajan, and S. Ranka, “Machine learning algorithms for traffic interrup-
tion detection,” in 2020 Fifth International Conference on Fog and Mobile Edge Computing (FMEC),
2020, pp. 231–236.
[23] Y. Karnati, M. Mahajan, T. Banerjee, R. Sengupta, C. Packard, R. Casburn, N. Agarwal, J. Dil-
more, A. Rangarajan, and S. Ranka, Data Analytics and Machine Learning for Integrated Corridor
Management. CRC Press, 2024.
[24] R. Sengupta, T. Banerjee, Y. Karnati, S. Ranka, and A. Rangarajan, “Using dsrc road-side unit data
to derive braking behavior,” in Proceedings of the 9th International Conference on Vehicle Technology
and Intelligent Transport Systems – Volume 1: VEHITS,, INSTICC. SciTePress, 2023, pp. 420–427.
[25] USDOT, “Traffic analysis toolbox volume vi: Definition, interpretation, and calculation of traffic analysis
tools measures of effectiveness,” https://ops.fhwa.dot.gov/publications/fhwahop08054/sect6.htm.
[26] T. Banerjee, K. Chen, A. Almaraz, R. Sengupta, Y. Karnati, B. Grame, E. Posadas, S. Poddar,
R. Schenck, J. Dilmore et al., “A modern intersection data analytics system for pedestrian and vehic-
ular safety,” in 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC).
IEEE, 2022, pp. 3117–3124.
[27] R. Sengupta, T. Banerjee, K. Chen, Y. Karnati, S. Ranka, and A. Rangarajan, “Towards effective traffic
signal safety and optimization using fisheye video,” in Proceedings of the 9th International Conference
on Vehicle Technology and Intelligent Transport Systems – Volume 1: VEHITS,, INSTICC. SciTePress,
2023, pp. 54–63.