Abstract
Driver monitoring systems benefit from fixation-related eye-tracking features, yet dedicated eye-tracking hardware is costly and difficult to integrate at scale. This study presents a practical software pipeline that extracts fixation-related features from conventional RGB video. Facial and pupil landmarks obtained with MediaPipe are denoised using a Kalman filter, fixation centers are identified with the OPTICS algorithm within a sliding window, and an affine normalization compensates for head motion and camera geometry. Fixation segments are derived from smoothed velocity profiles based on a moving average. Experiments with laptop camera recordings show that the combined Kalman and OPTICS pipeline reduces landmark jitter and yields more stable fixation centroids, while the affine normalization further improves referential pupil stability. The pipeline operates with minimal computational overhead and can be implemented as a software update in existing driver monitoring or advanced driver assistance systems. This work is a proof of concept that demonstrates feasibility in a low-cost RGB setting with a limited evaluation scope. Remaining challenges include sensitivity to lighting conditions and head motion that future work may address through near-infrared sensing, adaptive calibration, and broader validation across subjects, environments, and cameras. The extracted features are relevant for future studies on cognitive load and attention, although cognitive state inference is not validated here.