Achieving Robotic Data Efficiency Through Machine-Centric FDCT Vision Processing
Abstract
1. Introduction
2. Related Work
3. Methodology
3.1. Impact of Scene Frequency on Designing a Perception System
3.2. Customizing the FDCT to Enhance Robotic Perception
- u and v are the indices for the values in the frequency space.
- F(u,v) are the frequency coefficients.
- x and y are the indices for the values in the sample space.
- f(x,y) are the values from the input (or spatial) domain.
- C(u) is 1/√2 iff u = 0.
- C(v) is 1/√2 iff v = 0.
3.3. mFDCT: Theoretical Foundation
- Starts near zero for low x (to de-emphasize low frequencies),
- Increases gradually at first, then accelerates for mid-to-high x (to prioritize edge-related details),
- Is smooth and differentiable to avoid artifacts in the transform,
- Remains computationally efficient for real-time robotic applications.
- Base Form Selection: Begin with a shifted and scaled quartic: −(x/4 − 2)4. The negative sign inverts the parabola-like shape to create a valley that delays the rise, emphasizing higher x. The shift by 2 centers the minimum around mid-range spatial indices, and the scaling by 1/4 normalizes the input for an 8 × 8 block (common in DCT applications), ensuring the function spans a reasonable range [0, ~32] to align with π-scaled cosine arguments.
- Offset for Frequency Expansion: Add +16 to shift the minimum upward, preventing the over-compression of mid-frequencies and ensuring that the function increases to approximately 32 at x = 7 (for n = 8), which expands high-frequency sampling by ~60% relative to the linear case based on numerical integration of the basis function density.
- Zero Prevention: Finally, add +1 to yield −(x/4 − 2)4 + 17, avoiding zero values that could cause division issues or singularities in downstream processing (e.g., quantization).
3.4. Optimizing Quantization Tables for Robotic Vision
3.5. Optimizing Quantization: Theoretical Foundation
- Base Function Selection: Start with cosh(z), where z = x/k + c. Hyperbolic cosine is chosen for its relation to exponential functions cosh(z) = (ez + e−z)/2), which naturally model amplification in signal processing (e.g., in filter design for edge enhancement). This allows q(x) to grow sublinearly at low x (de-emphasizing low frequencies) and exponentially at high x (prioritizing edges).
- Scaling and Normalization: The argument includes division by 2π to normalize for typical DCT block sizes (e.g., 8 × 8, where x ranges 0–7). The factor 2π approximates a full cycle in frequency space (drawing from Fourier analysis), ensuring the growth rate scales appropriately: for x = 0, x/2π = 0; for x = 7, 7/2π ≈ 1.11, providing a moderate ramp-up without overflow in floating-point computations.
- Shift for Asymmetry: Add +1 to the argument to shift the minimum away from x = 0, positioning it negatively (effectively starting at cosh(1) ≈ 1.54 and accelerating growth for positive x. This ensures minimal adjustment for low frequencies (allowing greater quantization loss there) while exponentially reducing quantization steps for high frequencies, boosting retention by up to 2–3x based on numerical evaluation.
- Overall Scaling: Multiply by 2 to align the range with standard quantization matrix magnitudes (typically 1–255 in H.264). This constant was analytically derived to match the average quantization scale in baseline matrices (e.g., JPEG luminance table averages ~16–32 post-scaling) and refined empirically to optimize PSNR for edge-heavy images, yielding 8–15% improvement in outline detection accuracy.
3.6. The mFDCT Pipeline: Integrating Robotic Perceptual Accuracy with Data Efficiency
3.7. Integration of mFDCT into Standard H.264 Encoder
- Input Frame Processing: The input video frame is divided into macroblocks (16 × 16 pixels), and motion estimation/compensation is applied if inter-prediction is used.
- Transform: The residual data are transformed using the modified mFDCT (Equation (2)) to produce frequency coefficients.
- Quantization: Here, the modified quantization tables are applied. The default intra and inter quantization matrices (for luminance and chrominance) defined in the H.264 profile (e.g., baseline or main) are replaced with our custom matrices. This replacement involves:
- Performing the diagonal flip (rotation from top-left to bottom-right) on the standard 8 × 8 matrix.
- Applying the amplification function from Equation (3) to each element of the flipped matrix, e.g., , where and is the flipped matrix entry (Note: Division is used to reduce quantization steps for higher frequencies, preserving details; adjust as a multiplier if finer control is needed.)
- Scaling the matrices according to the Quantization Parameter (QP) as per the standard formula: , ensuring the modifications scale appropriately with compression level.
- Entropy Coding and Stream Assembly: The quantized coefficients proceed to entropy coding, and the custom matrices are signaled in the bitstream via the Sequence Parameter Set (SPS) or Picture Parameter Set (PPS) using the scaling_list_present_flag and associated scaling lists, which allow custom matrices without breaking decoder compatibility.
- FFmpeg/libx264: Use the -x264-params flag to specify custom scaling lists, e.g., ffmpeg -i input.mp4 -c:v libx264 -x264-params “scaling_list = 1:scaling_list_data = …” output.mp4. The scaling_list_data is a comma-separated list of the 52 values (for 4 × 4 and 8 × 8 matrices).
- x264 Standalone: Employ the --qpfile option for per-frame QP control, combined with custom matrix definition in the configuration file.
| Algorithm 1: Standard-Compliant Integration of Modified Quantization Tables into H.264/AVC |
| import numpy as np def modify_quant_table(standard_table): # Step 1: Diagonal flip (rotate 180 degrees for top-left to bottom-right emphasis) flipped = np.rot90(standard_table, 2) # Equivalent to flipping for high-frequency bias # Step 2: Apply Equation (3) amplification rows, cols = flipped.shape modified = np.zeros_like(flipped) for u in range(rows): for v in range(cols): x = np.sqrt(u**2 + v**2) # Radial distance for frequency magnitude q_x = 2 * np.cosh(x/(2 * np.pi) + 1) modified[u, v] = flipped[u, v]/q_x # Reduce step for higher frequencies return modified.astype(int) # Ensure integer for H.264 compliance # In encoder loop: # Assume ‘encoder’ is an H.264 encoder instance (e.g., from PyAV) standard_luma_table = np.array([…]) # Default 8x8 H.264 luma matrix custom_luma = modify_quant_table(standard_luma_table) # Similarly for chroma # Signal in SPS/PPS (pseudo): encoder.set_scaling_list(‘luma_intra’, custom_luma.flatten()) encoder.encode(frame) |
4. Results
4.1. The Experimental Robotic Platform
4.2. Compression Performance of the Proposed Algorithm
- (1)
- A small room densely populated with assorted belongings, introducing significant object clutter and overlapping occlusions;
- (2)
- A lecture hall featuring a curtain wall with pronounced shade variations and a reflective floor that dynamically alters perceived color due to light interaction;
- (3)
- A stone wall with plants and flowers growing within its crevices, presenting irregular natural textures and mixed material surfaces;
- (4)
- A wall densely covered with air conditioner compressor units, characterized by repetitive geometric patterns and uniform metallic surfaces.


4.3. Dataset Composition and Selection Criteria
4.4. Negligible Impact on Accuracy and Computational Efficiency
4.5. Ablation Study: Incremental Effects of Proposed Modifications
5. Discussion
6. Conclusions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mahadik, S.C.; Deulgaonkar, V.R.; Bhosle, S.M.; Dhamdhere, O.A.; Taware, N.S.; Darade, N.D.; Gadhave, G.J. Envisioning the Future of Robotics Sensors: Innovations and Prospects. In Proceedings of the International Conference on Futuristic Advancements in Materials, Manufacturing and Thermal Sciences, Singapore, 16–18 January 2024; pp. 133–147. [Google Scholar]
- Basha, M.; Siva Kumar, M.; Chinnaiah, M.C.; Lam, S.K.; Srikanthan, T.; Narambhatla, J.; Dubey, S. Hardware schemes for smarter indoor robotics to prevent the backing crash framework using field programmable gate array-based multi-robots. Sensors 2024, 24, 1724. [Google Scholar] [CrossRef] [PubMed]
- Raj, R.; Kos, A. An extensive study of convolutional neural networks: Applications in computer vision for improved robotics perceptions. Sensors 2025, 25, 1033. [Google Scholar] [CrossRef] [PubMed]
- Soori, M.; Dastres, R.; Arezoo, B.; Jough, F.K.G. Intelligent robotic systems in Industry 4.0: A review. J. Adv. Manuf. Sci. Technol. 2024, 4, 2024007. [Google Scholar] [CrossRef]
- Zheng, L.; Xu, K.; Jiang, J.; Wei, M.; Zhou, B.; Cheng, H. Real-Time Efficient Environment Compression and Sharing for Multi-Robot Cooperative Systems. IEEE Trans. Intell. Veh. 2024, 9, 7797–7811. [Google Scholar] [CrossRef]
- Wiseman, Y. Autonomous Vehicles. In Encyclopedia of Information Science and Technology, 5th ed.; IGI Global: New York, NY, USA, 2021; pp. 1–11. [Google Scholar]
- Che, C.; Li, C.; Huang, Z. The integration of generative artificial intelligence and computer vision in industrial robotic arms. Int. J. Comput. Sci. Inf. Technol. 2024, 2, 1–9. [Google Scholar] [CrossRef]
- Nguyen, Q.C.; Hua, H.Q.B.; Pham, P.T. Development of a vision system integrated with industrial robots for online weld seam tracking. J. Manuf. Process. 2024, 119, 414–424. [Google Scholar] [CrossRef]
- Min, Z.; Lai, J.; Ren, H. Innovating robot-assisted surgery through large vision models. Nat. Rev. Electr. Eng. 2025, 2, 350–363. [Google Scholar] [CrossRef]
- Qiu, S.; Li, Z.; He, W.; Zhang, L.; Yang, C.; Su, C.Y. Brain–machine interface and visual compressive sensing-based teleoperation control of an exoskeleton robot. IEEE Trans. Fuzzy Syst. 2016, 25, 58–69. [Google Scholar] [CrossRef]
- McCool, C.; Perez, T.; Upcroft, B. Mixtures of lightweight deep convolutional neural networks: Applied to agricultural robotics. IEEE Robot. Autom. Lett. 2017, 2, 1344–1351. [Google Scholar] [CrossRef]
- Pan, S.; Shi, L.; Guo, S. A kinect-based real-time compressive tracking prototype system for amphibious spherical robots. Sensors 2015, 15, 8232–8252. [Google Scholar] [CrossRef]
- Gallagher, J.E.; Oughton, E.J. Surveying You Only Look Once (YOLO) multispectral object detection advancements, applications and challenges. IEEE Access 2025, 13, 7366–7395. [Google Scholar] [CrossRef]
- Li, S.; Li, Y.; Li, Y.; Li, M.; Xu, X. Yolo-firi: Improved yolov5 for infrared image object detection. IEEE Access 2021, 9, 141861–141875. [Google Scholar] [CrossRef]
- Sapkota, R.; Flores-Calero, M.; Qureshi, R.; Badgujar, C.; Nepal, U.; Poulose, A.; Zeno, P.; Vaddevolu, U.B.P.; Khan, S.; Shoman, M.; et al. YOLO advances to its genesis: A decadal and comprehensive review of the You Only Look Once (YOLO) series. Artif. Intell. Rev. 2025, 58, 274. [Google Scholar] [CrossRef]
- Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Chen, L.; Li, G.; Xie, W.; Tan, J.; Li, Y.; Pu, J.; Shi, W. A survey of computer vision detection, visual SLAM algorithms, and their applications in Energy-Efficient autonomous systems. Energies 2024, 17, 5177. [Google Scholar] [CrossRef]
- Guo, H.; Zhou, Y.; Guo, H.; Jiang, Z.; He, T.; Wu, Y. A Survey on Recent Advances in Video Coding Technologies and Future Research Directions. IEEE Trans. Broadcast. 2025, 71, 666–671. [Google Scholar] [CrossRef]
- Hao, Z.; Liu, J.; He, C.; Zheng, Q.; Chen, S.; Xu, J.; Ma, X. A High-Precision and Low-Cost Approximate Transform Accelerator for Video Coding. In Proceedings of the 2025 62nd ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 22–25 June 2025; pp. 1–7. [Google Scholar]
- Ouyang, M.; Chen, Z. JPEG quantized coefficient recovery via DCT domain spatial-frequential transformer. IEEE Trans. Image Process. 2024, 33, 3385–3398. [Google Scholar] [CrossRef]
- Wiseman, Y. Adapting the H. 264 Standard to the Internet of Vehicles. Technologies 2023, 11, 103. [Google Scholar] [CrossRef]
- Thukivakam, J.R. A Technical Analysis of H. 264 Video Coding and Network Transport Architecture. J. Comput. Sci. Technol. Stud. 2025, 7, 381–388. [Google Scholar] [CrossRef]
- Khatua, P.; Ray, K.C. VLSI architecture of DCT-based harmonic wavelet transform for time–frequency analysis. IEEE Trans. Instrum. Meas. 2023, 72, 1–8. [Google Scholar] [CrossRef]
- Chaudhary, P.K. FBSE-based JPEG image compression. IEEE Sens. Lett. 2024, 8, 1–4. [Google Scholar] [CrossRef]
- Liu, P.; Xu, T.; Chen, H.; Zhou, S.; Qin, H.; Li, J. Spectrum-driven mixed-frequency network for hyperspectral salient object detection. IEEE Trans. Multimed. 2023, 26, 5296–5310. [Google Scholar] [CrossRef]
- Kakad, S.; Sarode, P.; Bakal, J.W. A survey on query response time optimization approaches for reliable data communication in wireless sensor network. Int. J. Wirel. Commun. Netw. Technol. 2012, 1, 31–36. [Google Scholar]
- Ye, Y.; Ma, X.; Zhou, X.; Bao, G.; Wan, W.; Cai, S. Dynamic and real-time object detection based on deep learning for home service robots. Sensors 2023, 23, 9482. [Google Scholar] [CrossRef]






| Criteria/Features | Panoramic Point Cloud Compression for Multi-Robot Systems [5] | Vision Systems for Autonomous Vehicles [6] | Vision Systems for Industrial Robots [7,8] | Vision Systems for Medical Robotics [9] | Compressive Sensing in Brain-Machine Interface Teleoperation (Exoskeleton) [10] | Model Compression of Deep CNNs [11] | Real-Time Compressive Tracking (Kinect-Based) [12] | YOLO (You Only Look Once) Family [13,14,15] | Proposed Method in the Paper |
|---|---|---|---|---|---|---|---|---|---|
| Real-Time Capability | High (lightweight, real-time) | High (handles dynamic environments) | High (precise in controlled settings) | High (real-time visual feedback) | Medium (streamlines data) | High (improves efficiency) | High (real-time tracking) | High (excellent speed) | High (enhances detection) |
| Data Compression/Dimensionality Reduction | Present (improves compression quality) | Absent | Absent | Absent | Present (compressive sensing) | Present (reduces parameters) | Present (lossless compression using compressive sensing) | Absent | Present (significant reduction) |
| High Accuracy/mAP | N/A | High (optimized detections) | High (precise recognition) | High (integration with models) | High (precise control) | Medium (acceptable, some loss) | High (accurate feature extraction) | Medium (60–80% mAP) | Medium (acceptable mAP, does not surpass latest YOLO) |
| Computational Demand | Low (lightweight) | High | Medium (controlled environments) | High (safety requirements) | Medium (complex setup) | Low (efficient on embedded hardware) | Medium (older hardware) | Low (single-stage processing) | Medium (generalized approach) |
| Suitability for Dynamic Environments | Good (multi-robot sharing) | Good (unpredictable traffic) | Limited (structured only) | Limited (limited workspace) | Good (teleoperation) | Good (agricultural variations) | Good (amphibious movement) | Good (general real-time) | Good (broad vision systems) |
| Suitability for Structured Environments | Limited (focus on visibility merging) | Limited (dynamic focus) | Good (high accuracy in controlled) | Good (surgical precision) | Limited (BMI setup) | Limited (field applications) | Limited (spherical robots) | Good (general) | Good (generalized) |
| Specific Hardware/Framework Dependency | Absent (general point cloud) | Absent | Absent | Absent (integration flexible) | Present (BMI exoskeleton) | Present (embedded hardware) | Present (Kinect) | Absent (customizable) | Absent (does not rely on specific frameworks) |
| Integration with Other Systems (e.g., Models, BMI) | Present (multi-robot) | Absent | Present (manufacturing tools) | Present (large vision models) | Present (BMI) | Present (CNNs) | Absent | Absent (standalone detector) | Present (enhances existing detection) |
| Application Focus | Multi-robot cooperative systems, environment sharing | Autonomous driving | Manufacturing, assembly, welding | Robot-assisted surgery | Teleoperated exoskeleton robots | Agricultural/weed management robots | Amphibious spherical robots | General real-time object detection in robotics (navigation, manipulation, HRI) | Broad robotic vision systems |
| A | |||||||
| 5 | 3 | 3 | 5 | 7 | 12 | 15 | 18 |
| 4 | 4 | 4 | 6 | 8 | 17 | 18 | 16 |
| 4 | 4 | 5 | 7 | 12 | 17 | 21 | 17 |
| 4 | 5 | 7 | 9 | 15 | 26 | 24 | 19 |
| 5 | 7 | 11 | 17 | 20 | 33 | 31 | 23 |
| 7 | 11 | 17 | 19 | 24 | 31 | 34 | 28 |
| 15 | 19 | 23 | 26 | 31 | 36 | 36 | 30 |
| 22 | 28 | 29 | 29 | 34 | 30 | 31 | 30 |
| B | |||||||
| 30 | 31 | 30 | 34 | 29 | 29 | 28 | 22 |
| 30 | 36 | 36 | 31 | 26 | 23 | 19 | 15 |
| 28 | 34 | 31 | 24 | 19 | 17 | 11 | 7 |
| 23 | 31 | 33 | 20 | 17 | 11 | 7 | 5 |
| 19 | 24 | 26 | 15 | 9 | 7 | 5 | 4 |
| 17 | 21 | 17 | 12 | 7 | 5 | 4 | 4 |
| 16 | 18 | 17 | 8 | 6 | 4 | 4 | 4 |
| 18 | 15 | 12 | 7 | 5 | 3 | 3 | 5 |
| 119 | 139 | 119 | 224 | 102 | 102 | 87 | 34 |
| 119 | 308 | 308 | 139 | 63 | 39 | 21 | 11 |
| 87 | 224 | 139 | 46 | 21 | 16 | 6 | 4 |
| 39 | 139 | 191 | 25 | 16 | 6 | 4 | 3 |
| 21 | 46 | 63 | 11 | 5 | 4 | 3 | 3 |
| 16 | 29 | 16 | 7 | 4 | 3 | 3 | 3 |
| 13 | 18 | 16 | 4 | 3 | 3 | 3 | 3 |
| 18 | 11 | 7 | 4 | 3 | 3 | 3 | 3 |
| 5 | 3 | 3 | 5 | 7 | 12 | 15 | 18 |
| 5 | 5 | 7 | 14 | 30 | 30 | 30 | 30 |
| 5 | 6 | 8 | 20 | 30 | 30 | 30 | 30 |
| 7 | 8 | 17 | 30 | 30 | 30 | 30 | 30 |
| 14 | 20 | 30 | 30 | 30 | 30 | 30 | 30 |
| 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 |
| 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 |
| 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 |
| 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 |
| 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 |
| 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 |
| 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 |
| 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 |
| 30 | 30 | 30 | 30 | 30 | 30 | 20 | 14 |
| 30 | 30 | 30 | 30 | 30 | 17 | 8 | 7 |
| 30 | 30 | 30 | 30 | 20 | 8 | 6 | 5 |
| 30 | 30 | 30 | 30 | 14 | 7 | 5 | 5 |
| 119 | 119 | 119 | 119 | 119 | 119 | 119 | 119 |
| 119 | 119 | 119 | 119 | 119 | 119 | 119 | 119 |
| 119 | 119 | 119 | 119 | 119 | 119 | 119 | 119 |
| 119 | 119 | 119 | 119 | 119 | 119 | 119 | 119 |
| 119 | 119 | 119 | 119 | 119 | 119 | 25 | 10 |
| 119 | 119 | 119 | 119 | 119 | 16 | 4 | 4 |
| 119 | 119 | 119 | 119 | 25 | 4 | 3 | 3 |
| 119 | 119 | 119 | 119 | 10 | 4 | 3 | 3 |
| Detector | Scene/Condition | File Size (% of Uncompressed) | Standard H.264 mAP | Proposed Method mAP | Difference |
|---|---|---|---|---|---|
| YOLOv8n | Overall (avg. all scenes) | 20% | 41.2 | 41.3 | +0.1 |
| 50% | 41.5 | 41.6 | +0.1 | ||
| 80% | 41.7 | 41.8 | +0.1 | ||
| Stone wall (fine details) | 50% | 41.4 | 41.5 | +0.1 | |
| Reflective floor | 50% | 41.3 | 41.4 | +0.1 | |
| Air-conditioner grid | 50% | 41.2 | 41.4 | +0.2 | |
| YOLOv11n | Overall (avg. all scenes) | 20% | 43.9 | 44.0 | +0.1 |
| 50% | 44.2 | 44.3 | +0.1 | ||
| 80% | 44.4 | 44.5 | +0.1 | ||
| Stone wall (fine details) | 50% | 44.1 | 44.2 | +0.1 | |
| Reflective floor | 50% | 44.0 | 44.1 | +0.1 | |
| Air-conditioner grid | 50% | 43.9 | 44.1 | +0.2 | |
| EfficientDet-Lite2 | Overall (avg. all scenes) | 20% | 39.1 | 39.2 | +0.1 |
| 50% | 39.4 | 39.5 | +0.1 | ||
| 80% | 39.6 | 39.7 | +0.1 | ||
| Stone wall (fine details) | 50% | 39.3 | 39.4 | +0.1 | |
| Reflective floor | 50% | 39.2 | 39.3 | +0.1 | |
| Air-conditioner grid | 50% | 39.1 | 39.3 | +0.2 |
| Configuration | Compression Ratio (%) 1 | Avg. Encoding Time per Frame (ms) | Transmission Latency (ms) | mAP@50:95 (avg. Across Detectors) |
|---|---|---|---|---|
| Standard H.264 (baseline) | 0 | 12.04 | 138 | 41.7 |
| mFDCT only (FDCT modification) | 25 | 12.05 | 110 | 41.7 |
| mFDCT + Diagonal Flipping | 50 | 12.07 | 85 | 41.7 |
| mFDCT + Diagonal Flipping + Equation (3) | 66 | 12.11 | 72 | 41.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wiseman, Y. Achieving Robotic Data Efficiency Through Machine-Centric FDCT Vision Processing. Sensors 2026, 26, 518. https://doi.org/10.3390/s26020518
Wiseman Y. Achieving Robotic Data Efficiency Through Machine-Centric FDCT Vision Processing. Sensors. 2026; 26(2):518. https://doi.org/10.3390/s26020518
Chicago/Turabian StyleWiseman, Yair. 2026. "Achieving Robotic Data Efficiency Through Machine-Centric FDCT Vision Processing" Sensors 26, no. 2: 518. https://doi.org/10.3390/s26020518
APA StyleWiseman, Y. (2026). Achieving Robotic Data Efficiency Through Machine-Centric FDCT Vision Processing. Sensors, 26(2), 518. https://doi.org/10.3390/s26020518

