Eyes of the Future: Decoding the World Through Machine Vision

Khonina, Svetlana N.; Kazanskiy, Nikolay L.; Oseledets, Ivan V.; Khabibullin, Roman M.; Nikonorov, Artem V.

doi:10.3390/technologies13110507

Open AccessReview

Eyes of the Future: Decoding the World Through Machine Vision

by

Svetlana N. Khonina

¹

,

Nikolay L. Kazanskiy

¹

,

Ivan V. Oseledets

^2,3,

Roman M. Khabibullin

^1,*

and

Artem V. Nikonorov

¹

Samara National Research University, 443086 Samara, Russia

²

Artificial Intelligence Research Institute (AIRI), 105064 Moscow, Russia

³

Skolkovo Institute of Science and Technology, 121205 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Technologies 2025, 13(11), 507; https://doi.org/10.3390/technologies13110507

Submission received: 28 August 2025 / Revised: 25 September 2025 / Accepted: 29 October 2025 / Published: 7 November 2025

(This article belongs to the Section Information and Communication Technologies)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Machine vision (MV) is reshaping numerous industries by giving machines the ability to understand what they “see” and respond without human intervention. This review brings together the latest developments in deep learning (DL), image processing, and computer vision (CV). It focuses on how these technologies are being applied in real operational environments. We examine core methodologies such as feature extraction, object detection, image segmentation, and pattern recognition. These techniques are accelerating innovation in key sectors, including healthcare, manufacturing, autonomous systems, and security. A major emphasis is placed on the deepening integration of artificial intelligence (AI) and machine learning (ML) into MV. We particularly consider the impact of convolutional neural networks (CNNs), generative adversarial networks (GANs), and transformer architectures on the evolution of visual recognition capabilities. Beyond surveying advances, this review also takes a hard look at the field’s persistent roadblocks, above all the scarcity of high-quality labeled data, the heavy computational load of modern models, and the unforgiving time limits imposed by real-time vision applications. In response to these challenges, we examine a range of emerging fixes: leaner algorithms, purpose-built hardware (like vision processing units and neuromorphic chips), and smarter ways to label or synthesize data that sidestep the need for massive manual operations. What distinguishes this paper, however, is its emphasis on where MV is headed next. We spotlight nascent directions, including edge-based processing that moves intelligence closer to the sensor, early explorations of quantum methods for visual tasks, and hybrid AI systems that fuse symbolic reasoning with DL, not as speculative futures but as tangible pathways already taking shape. Ultimately, the goal is to connect cutting-edge research with actual deployment scenarios, offering a grounded, actionable guide for those working at the front lines of MV today.

Keywords:

machine vision; artificial intelligence; deep learning; Industry 4.0

1. Introduction

The roots of machine vision (MV) lie in the late 1940s and early 1950s, when early research in artificial intelligence (AI) and image analysis first began. Much of this foundational work was driven by U.S. military objectives [1]. Practical applications started to emerge in the 1960s. A significant breakthrough occurred in the 1970s, when researchers at the Massachusetts Institute of Technology (MIT) built an image-processing system that could control a robotic arm. By the 1980s, gains in algorithmic efficiency allowed MV to be deployed at industrial scale for the first time [2]. This era also saw the introduction of key techniques such as photostereoscopy, shape from shading, and shape from focus. These methods improved image interpretation by exploiting lighting variations and depth information [1]. The 1990s marked another major advance with the advent of integrated circuit technology. This innovation enabled the creation of smart cameras capable of performing image processing independently [3,4,5].

Concurrently, David Marr’s computational theory of vision laid the groundwork for understanding the transformation of visual inputs into meaningful representations through computation, algorithms, and hardware [6]. In the 21st century, advances in edge detection, stereoscopic vision, and 3D modeling have significantly sharpened the capabilities of MV systems, cementing their role in modern manufacturing, particularly for high-precision tasks such as quality assurance, flaw identification, and automated visual inspection [1].

Today, MV stands as a catalyst for transformation across numerous domains, from industrial automation and healthcare to security infrastructure and autonomous mobility [7,8,9]. Its strength lies in turning unprocessed pixels into meaningful, actionable insights, thereby boosting operational efficiency, measurement consistency, and the reliability of decisions in complex, real-world environments. This growing reliance stems from MV’s clear edge over human vision: it doesn’t tire, it doesn’t “drift” (become blurry or lose sharpness), and it operates at speeds and levels of precision that are simply unattainable manually. These qualities have made it indispensable in today’s tech-driven landscape [10]. In manufacturing and quality control, MV is indispensable: automated inspection systems integrating high-resolution cameras with intelligent algorithms reliably identify defects, verify geometric specifications, and maintain product uniformity at levels unattainable by manual methods [11]. Human inspection, by contrast, is inherently susceptible to fatigue and variability, while MV systems operate continuously with negligible drift. Robotic assembly lines further exploit MV for real-time spatial feedback, enhancing throughput and lowering operational costs [12].

The healthcare sector has undergone profound change through MV’s integration into medical imaging, diagnostic screening, and robot-assisted surgery [13,14,15]. DL-powered vision models now routinely interpret X-rays, MRIs, and CT scans to enable early diagnosis of conditions such as cancer and diabetic retinopathy [16]. By automatic monitoring of histopathological workflows, MV reduces diagnostic uncertainty and supports improved clinical outcomes. Surgical robots, in turn, rely on real-time vision systems to guide minimally invasive interventions with exceptional accuracy [17].

In autonomous driving, MV serves as a foundational perception modality, allowing vehicles to robustly interpret complex environments [18]. Augmented by sensors like LiDAR, infrared cameras, and radar, MV systems execute core tasks, including object detection, lane tracking, pedestrian identification, and collision avoidance [19]. When fused with DL algorithms, these systems support instantaneous driving decisions, contributing to safer and more efficient mobility. Simultaneously, heightened security demands have driven rapid adoption of MV in surveillance [20]. Modern vision-based security platforms deploy facial recognition, anomaly detection, and crowd behavior analysis to strengthen public safety and support law enforcement [21]. In security applications, AI-powered vision systems can spot potential threats as they unfold, offering real-time alerts that aid both proactive intervention and forensic analysis after an incident [22]. Yet for all this progress, significant hurdles endure. Chief among them is the heavy dependence on vast, manually labeled datasets to train DL models. This requirement hampers scalability and limits how well systems generalize beyond their training conditions. Compounding this, the real-time analysis of high-resolution imagery demands considerable computing power, which often rules out deployment on lightweight edge platforms. Equally pressing is the need for greater resilience: current models remain vulnerable to adversarial perturbations and can falter under everyday variations in lighting, weather, or scene composition [23].

In the “MV for Industry and Automation 2021” report, Yole Development estimated that CMOS image sensors (CIS) constitute over 86% of the industrial camera market [24]. The integration of advanced imaging modalities, such as three-dimensional (3D) technology and multispectral imaging, has expanded MV applications across various industrial sectors (Figure 1). This technological diversification enhances manufacturing precision in consumer electronics and automotive industries, thereby escalating the demand for MV solutions. Moreover, global industrial advancements, including Industry 4.0 initiatives, are accelerating automation processes and further propelling the evolution of MV technologies [25].

The development of MV systems is no longer limited to advancements in computer vision (CV) algorithms or software engineering. Instead, some of the most exciting progress is happening at the intersection of MV with fields like materials science, photonics, and robotics. For example, researchers are now using metasurfaces (MSs) and diffractive optical elements (DOEs) to create compact, highly specialized sensors that can respond to specific wavelengths of light. At the same time, integrating MV with soft robotics and microelectromechanical systems (MEMS) is creating new opportunities to develop vision systems that are both adaptive and flexible. Such systems can function reliably even in unpredictable or unstructured environments. This cross-disciplinary effort may lead to smart, reconfigurable platforms capable of adjusting their optical and computational parameters on the fly. This capability simply did not exist in earlier generations of vision technology. As industrial demands grow for MV solutions that are more portable, precise, and versatile, this research direction is expected to attract increasing attention and investment.

One of the most promising avenues for MV advancement lies in its convergence with neuromorphic computing, a field modeled on the architecture and function of the human brain [26]. In contrast to traditional digital processors, neuromorphic chips, such as Intel’s Loihi and IBM’s TrueNorth, rely on spiking neural networks (SNNs) that mimic the behavior of biological neurons. This approach supports visual processing that is not only faster and more adaptive but also significantly more energy efficient. Such capabilities are especially valuable for MV applications in resource-constrained settings, including edge devices, autonomous robotics, and intelligent surveillance systems. Neuromorphic vision sensors, such as event-based cameras, enhance this approach by recording only dynamic changes in a scene, which both lowers computational demands and increases responsiveness. The fusion of neuromorphic computing with MV not only enhances low-power, high-speed image analysis but also paves the way for self-learning vision systems capable of real-time adaptation to unpredictable conditions. This represents an essential milestone for next-generation AI-driven automation [27].

Unlike classical computers, which process data in binary states (0 s and 1 s), quantum computers leverage quantum superposition and entanglement to perform complex computations exponentially faster [28]. This paradigm shift could enable MV to process high-dimensional image data in real-time, improving object recognition, pattern detection, and anomaly identification beyond current capabilities. Quantum algorithms, such as quantum-enhanced Fourier transforms and Grover’s search, hold the potential to revolutionize edge detection and noise filtering in MV applications, making them more robust against distortions and occlusions. Additionally, quantum sensors could enhance imaging in challenging environments, such as medical diagnostics and autonomous navigation in low-visibility conditions [29]. While quantum computing for MV remains in its early stages, its integration with AI and DL models is set to transform the future of visual perception, enabling faster, more efficient, and previously unattainable MV capabilities.

The trajectory of MV points toward substantial advances, driven by both rapid technological evolution and its expanding footprint across industrial and scientific domains. One clear trend is the tighter coupling of AI and DL with vision pipelines, which enables systems to tackle sophisticated analytical problems while requiring far less labeled data than before. This, in turn, boosts their flexibility and efficiency. At the same time, 3D vision is gaining momentum, partly because stereovision hardware has become significantly more affordable; these opportunities allow machines to perceive depth and spatial layout with far greater fidelity than traditional 2D approaches [30]. Another key development is the move toward edge-based processing. Rather than sending every frame to a remote server, more systems now handle analysis right where images are captured, such as on cameras, robots, or embedded devices. This not only slashes latency but also eases pressure on network bandwidth and improves data security. Practical impacts are already visible: in the automotive industry, for instance, MV-powered diagnostic tools can now detect multiple mechanical or electrical faults in seconds, turning what used to be a time-consuming manual check into a near-instantaneous automated assessment. In clinical settings, innovations like digital surgical loupes offer surgeons enhanced visual clarity alongside real-time video capture, improving both intraoperative precision and post-procedure training. Together, these advances highlight MV’s evolving role as a core enabling technology. It continually pushes the limits of what machines can perceive, interpret, and act on across a broad range of applications.

The integration of AI into MV has greatly improved the accuracy and reliability of visual data analysis [31]. To achieve this, MV systems rely on a diverse set of algorithms, each designed to extract particular kinds of information from images. For instance, edge detection methods like Canny and Sobel are essential for identifying object boundaries. Pattern recognition and template matching are commonly used to detect and classify specific shapes or objects. Feature extraction techniques, such as Scale-Invariant Feature Transform (SIFT) [32] and Speeded-Up Robust Features (SURF) [33], allow systems to locate distinctive keypoints even under varying lighting conditions or changes in viewpoint. Meanwhile, optical flow algorithms analyze motion between consecutive frames, supporting critical functions like object tracking and navigation in autonomous systems. Deep learning (DL) algorithms, particularly convolutional neural networks (CNNs) [34], have revolutionized MV by enabling advanced image classification, segmentation, and anomaly detection with high accuracy. These diverse algorithms collectively power modern MV applications across industries. This fusion of AI and MV is revolutionizing automation and pushing the boundaries of what’s possible in visual perception [35]. Figure 2 summarizes the characteristics, applications, advantages, and disadvantages of MV, and the role of AI in MV.

To ensure the review reflects the state of the art in MV, we followed a structured literature search and selection process. Relevant publications were identified through major databases, including Web of Science, Scopus, IEEE Xplore, PubMed, and Google Scholar, covering the period from 2010 to 2025, with earlier key works included where historically important. Search queries combined terms such as “machine vision” OR “computer vision” with application areas (manufacturing, healthcare, autonomous vehicles, agriculture, surveillance) and enabling technologies (explainable AI, neuromorphic, spiking, event-based, edge, embedded, quantum, benchmark, dataset).

We included papers that were published in peer-reviewed journals or established conference proceedings, presented methods or benchmarks relevant to machine vision, or reported applications in key domains such as healthcare, industry, and autonomous systems. For each selected work, we noted its domain, vision task, dataset or benchmark used, methodology (e.g., CNN, Transformer, SNN, quantum-hybrid), and hardware assumptions (edge, cloud, or accelerators), along with reported performance. This approach ensured that the comparative findings and trade-offs highlighted in later sections are grounded in a systematic and transparent review process.

2. Fundamentals of MV

CV constitutes a broad field within AI that enables machines to interpret and analyze visual data in diverse and often unstructured environments. It underpins a wide range of applications, from facial recognition and augmented reality to medical imaging [36,37,38,39]. MV, in contrast, is a specialized branch of CV, tailored for controlled, task-specific settings, particularly in industrial automation. In these settings, integrated systems combine cameras, sensors, illumination, and purpose-built algorithms to achieve consistent, high-precision outcomes [25]. In essence, CV supplies the theoretical and algorithmic underpinnings, whereas MV translates these capabilities into reliable, real-world implementations. An MV system, often referred to as an automated inspection system, comprises several interdependent components that collectively fulfill its operational objectives (Figure 3). A solid grasp of these foundational elements is crucial for the effective deployment and optimization of MV technologies.

2.1. Basic Components of MV

A typical MV system comprises four core elements: cameras, sensors, lighting, and software, each playing a distinct role in determining overall system performance. Cameras acquire high-resolution imagery. Sensors supply Supplementary data that is essential for contextual interpretation. Lighting is engineered to maximize contrast and feature visibility. Finally, software executes the analysis and interpretation of visual input to support accurate, automated decision-making. These components and their interplay are detailed in the following discussion.

2.1.1. Cameras

Cameras serve as the system’s eyes, capturing high-resolution images of objects or environments [40]. Unlike consumer cameras, MV cameras are built for precision, speed, and durability. They come in various types, such as monochrome, color, infrared, and 3D cameras, selected based on application needs like defect detection, barcode reading, or object recognition [41,42]. Traditional digital cameras are constrained by their reliance on image and video formats inherited from film technology, limiting their ability to capture rapid changes in light.

To overcome this, Huang et al. introduced vform, a bit-sequence array in which each bit indicates whether photon accumulation has reached a threshold [43]. This method enabled precise recording and reconstruction of scene radiance at any moment. Utilizing standard complementary metal–oxide semiconductor (CMOS) sensors and integrated circuits, they developed a spike camera that operates nearly 1000× faster than conventional frame-based cameras [43]. Compared to traditional CMOS/CCD devices, which are typically limited to several thousand frames per second, the spike camera achieves microsecond-level temporal resolution, capturing changes that would otherwise be lost at millisecond-scale frame rates.

By interpreting vform as spike trains, similar to those found in biological vision, researchers created a spiking neural network (SNN)-based MV system. This integration combined computational speed with biologically inspired mechanisms, enabling ultra-fast object detection and tracking at rates exceeding human vision [43]. Huang et al. referred to this integrated framework as a “super vision system,” highlighting its ability to merge high-speed imaging with neural-inspired processing.

The potential of this technology was demonstrated in two experimental setups: an auxiliary referee system and a high-speed target tracking system (Figure 4a–g) [43]. In sports such as tennis and badminton, precise ball tracking is crucial. Traditional Hawk-Eye systems estimate impact points based on trajectory, which can lead to disputes and are costly. Using a table tennis ball machine, the spike camera’s continuous event-based imaging captured the exact moment of impact, providing accurate and reliable rulings (Figure 4a,b).

A second experiment tested high-speed target tracking with a 2400 rpm rotating fan, on which the characters “P,” “K,” and “U” were attached to its blades. The task required detecting and tracking moving objects, recognizing and locating the target, and predicting its motion to fire a laser at the correct moment. The super vision system executed this task in real time, as shown by the before/after comparison of the fan (Figure 4d), the spike train outputs (Figure 4e), and the smooth multi-object tracking results (Figure 4f). Performance evaluation demonstrated successful tracking of objects moving 30 m/s within 0.75 m, an aircraft at Mach 1 within 10 m, and even a hypersonic object at Mach 100 within 1 km (Figure 4g) [43].

These findings have profound implications. Unlike conventional frame-based vision systems, which reconstruct motion through interpolation or trajectory estimation, the spike camera records visual events directly, with temporal precision on the order of microseconds. This approach redefines the very foundations of imaging and video capture, opening the door to a new class of SNN-driven MV systems capable of transformative impact in domains ranging from high-speed cinematography to professional photography and immersive visual media [43].

Yang et al. presented a 3D reconstruction system that combines binocular and depth cameras to enhance the precision of object distance measurements and 3D reconstruction (Figure 5) [44]. The system consisted of two identical color cameras, a time-of-flight (TOF) depth camera, an image processing unit, a mobile robot control unit, and a mobile robot. The TOF depth camera, although useful for measuring distances, has low resolution, making it inadequate for accurate trajectory planning. On the other hand, while binocular stereo cameras offered high resolution, they face challenges in stereo matching, especially in low-texture environments, which affects their overall accuracy. To address these issues, the system integrated both depth camera data and stereo matching techniques to improve 3D reconstruction accuracy. Additionally, a dual-thread processing approach was used to boost system efficiency. Experimental results indicated that the system enhances the accuracy of 3D reconstruction, reliably measures distances, and effectively supports trajectory planning [44].

2.1.2. Sensors

Sensors complement cameras by detecting object presence, position, and movement [45]. They ensure accurate image acquisition, using technologies like proximity sensors, laser sensors, and time-of-flight sensors to enhance data accuracy [11]. Advancements in printed circuit board (PCB) production have led to an increased density of surface-mounted components. Consequently, the electronics industry has intensified efforts to refine inspection protocols, increasingly turning to automation on production lines. In this context, MV has become a cornerstone of quality assurance, directly supporting pass/fail decisions for components that fall short of required specifications. Silva et al. proposed a hybrid smart-vision inspection system that fuses MV methodology with dedicated vision sensor technology to simultaneously evaluate 24 discrete components and eight screw threads [46]. Developed specifically to bolster inspection reliability in automotive assembly, the setup paired a standard camera with a CMOS color vision sensor to acquire real-time images of assembly fixtures. The approach delivered high accuracy even under the demanding conditions typical of industrial shop floors, confirming its real-world applicability. It was particularly adept at revealing hidden failure modes, with optimal results obtained using Vision Builder for automated inspection. Furthermore, embedding the system into the quality workflow led to measurable improvements in the FMEA process, notably a clear reduction in action priority scores [46].

Today, MV solutions are deployed across a broad spectrum of industries, primarily to monitor and validate the consistency of manufacturing processes. While these systems can automatically record assembly states and pull out key performance indicators, getting them up and running is rarely straightforward. Implementation typically involves lengthy setup phases, including careful calibration, iterative tuning, and manual configuration. These steps often require deep domain knowledge. As a result, deployments frequently stretch over weeks or even months and remain heavily dependent on specialist involvement, which poses a real obstacle for smaller firms or those with limited technical resources. Compounding the issue, most MV deployments are tightly scoped to specific tasks, meaning any change in product design, operating environment, or process parameters typically triggers a full reconfiguration or redevelopment cycle. To mitigate these bottlenecks, Gierecker et al. put forward a simulation-driven process chain aimed at simplifying both the setup and commissioning of MV systems [47]. The proposed method combined established sensor planning algorithms with innovative techniques for generating training data and performing detailed analyses tailored to assembly processes [47].

MV systems rely on various types of sensors to capture and process visual data accurately. These sensors play a critical role in enhancing image quality, detecting objects, measuring distances, and analyzing materials. The choice of sensor depends on the specific application, required accuracy, and environmental conditions. Table 1 outlines the key characteristics of different types of sensors used in MV.

2.1.3. Lighting

Illumination in MV is essential for enhancing image contrast, accuracy, and detail detection [59]. Effective lighting selection depends on the material being examined, the light source’s characteristics, and the geometry of the system [60]. For consistent performance, LED lighting is preferred due to its energy efficiency and long lifespan. The proper light direction, intensity, and wavelength are crucial to optimize image clarity and minimize errors in the vision system’s analysis. Choosing the right illumination prevents defects and improves the overall reliability of the vision system. Kumar et al. examined the impact of different single-color LED lights on MV for estimating the surface roughness of 3D printed parts [61]. An Artificial Neural Network (ANN) was employed to predict roughness values based on GLCM surface texture features. The predicted roughness values showed a strong correlation with conventional R_a values, especially when blue illumination was used. This suggested that the intensity of different colors of light influences gray level distribution, affecting texture analysis. The experiment, carried out under static conditions using a single LED source, suggested that combining multiple LED colors could produce more consistent contrast across the surface. Expanding this approach by deploying several light sources with varied wavelengths may offer richer insight into how the spectral properties of illumination affect surface roughness measurements. The study clearly emphasized that the choice of illumination color is not incidental but central to achieving dependable texture quantification in MV applications [61].

When designing MV systems, lighting isn’t an afterthought. It’s a core determinant of image quality. Different illumination strategies serve distinct functional purposes: some enhance edge sharpness, others maximize contrast, and still others ensure even lighting across complex geometries. Table 2 summarizes the main types of lighting commonly used in MV, along with their operational characteristics and typical applications. In addition to selecting the appropriate lighting type, engineers must also consider light direction, intensity, and spectral composition in relation to the target’s color. These factors have a direct impact on image quality and determine how effectively the vision system can capture the features of interest.

2.1.4. Software

Software processes captured images using image processing algorithms and AI techniques. It extracts relevant features, applies pattern recognition, and makes decisions. MV software includes tools for image enhancement, segmentation, edge detection, and DL-based classification [62]. Advancements in computer image recognition have transformed industries such as healthcare, security, and autonomous systems. Huang et al. focused on enhancing recognition accuracy and efficiency by refining image processing algorithms, particularly through regression methods [63]. Various regression techniques and their applications in image recognition were analyzed, supported by data-driven examples. Additionally, the research addressed challenges in processing visual data from outdoor, unstructured environments. By standardizing heterogeneous patterns and extracting relevant features from fused data, recognition performance was significantly improved. Simulation results confirmed enhanced perception and identification capabilities in complex outdoor settings. Moreover, automated vision inspection is essential in computer-integrated manufacturing systems. Huang et al. compared two approaches for developing an MV inspection system: conventional image processing algorithms and neural networks. A case study was conducted to assess their performance [64]. Conventional methods had the advantage of faster setup, but neural networks demanded considerable effort in data preparation and training. Despite this overhead, they consistently outperformed traditional algorithms in terms of accuracy, particularly in inspection scenarios where subtle visual distinctions matter. This makes neural networks particularly well suited for high-stakes, high-precision applications. Table 3 summarizes the main contrasts between classical image processing and contemporary AI-based approaches, highlighting their respective strengths and listing widely used algorithms for each.

2.2. Comparison with Human Vision

MV differs from human vision in several fundamental ways, particularly regarding speed, precision, spectral sensitivity, and cognitive processing (Figure 6) [75]. MV systems can process visual information far more rapidly than humans, making them especially effective for real-time quality control [76]. They do not experience fatigue, which ensures stable, repeatable performance over long operating cycles. Additionally, they can detect extremely fine details that are often imperceptible to the human eye. When combined with high-resolution cameras and sophisticated image processing, MV systems routinely achieve micrometer-level accuracy in applications such as defect detection and dimensional metrology [77].

Another key distinction lies in the range of detectable wavelengths. Human vision is confined to the visible spectrum, whereas MV systems can utilize infrared, ultraviolet, and even X-ray imaging to uncover features that remain hidden under normal light. This expanded spectral capability is widely exploited in medical diagnostics, security screening, and non-destructive evaluation of materials. That said, human vision possesses a significant advantage in contextual understanding. It draws on years of experience, semantic knowledge, and intuitive reasoning. These are capabilities that current MV systems do not yet replicate. In contrast, MV relies on predefined algorithms and AI models, which, while powerful, lack intuitive understanding. Machines excel in structured environments but struggle with unpredictable scenarios where human intuition is needed.

2.3. Role of AI and DL in MV

AI and DL have significantly enhanced MV’s capabilities, enabling complex and adaptive visual analysis. Unlike traditional rule-based approaches, AI-driven models learn and improve over time [55,78].

2.3.1. Feature Extraction and Classification

Feature extraction and classification are foundational to many computer vision (CV) tasks. Deep learning (DL) models, especially convolutional neural networks (CNNs), have transformed these processes [79]. In the past, feature extraction relied on manual engineering: domain experts had to select relevant visual attributes, such as edges, textures, or shapes, based on their understanding of the problem. This approach was labor-intensive, required specialized knowledge, and frequently overlooked subtle but meaningful patterns in the data.

CNNs overcome these limitations by automatically learning hierarchical representations directly from raw input [34,80]. As an image moves through the network, early layers detect basic structures like edges and gradients. Deeper layers then combine these into more complex and abstract features, such as object parts, full shapes, or scene context, enabling robust recognition [81]. This hierarchical learning directly supports applications such as plant stress detection, where subtle discolorations or edge deformations captured by early layers evolve into stress-related phenotypes in deeper layers, and fruit grading, where surface texture, color, and shape features are combined to classify produce quality. By linking low-level to high-level features, CNNs enable robust performance across diverse MV applications.

This automatic feature extraction is particularly beneficial in tasks that involve large amounts of visual data, such as defect detection in manufacturing, facial recognition systems, and object classification in various domains like autonomous vehicles and healthcare. In defect detection, for example, CNNs can learn to recognize subtle flaws in products that may be difficult for human inspectors to spot [82]. In facial recognition, these models can extract distinctive facial features and match them across different images with high accuracy, even under challenging conditions like variations in lighting or angle [83]. Object classification, whether it’s categorizing animals in photos or identifying products in a retail setting, also greatly benefits from CNNs, which can rapidly and efficiently identify the relevant features for classification [84]. By automating feature extraction and classification, DL models significantly enhance both the speed and reliability of visual analysis, rendering them essential across a wide range of industrial applications.

2.3.2. Pattern Recognition and Anomaly Detection

AI-powered vision systems employ advanced ML techniques, including CNNs and other DL architectures, to perform real-time pattern recognition and anomaly detection [85,86,87]. These systems process massive volumes of visual data, learning from typical examples to flag deviations that may signal defects, security breaches, or operational failures. In manufacturing, for example, AI-based quality inspection can evaluate thousands of items per minute, detecting sub-millimeter flaws with exceptional precision [88,89]. This reduces dependence on manual inspection, thereby increasing throughput, cutting costs, and minimizing human error. The utility of anomaly detection extends well beyond production lines: in finance and cybersecurity, AI models scrutinize transactional or network behavior to uncover fraudulent or malicious activity, as detailed in Aggarwal’s Outlier Analysis [90]. In healthcare, AI-assisted diagnostic tools enable radiologists to spot early indicators of pathologies such as cancer in medical images, leading to earlier intervention and better clinical outcomes [91,92]. Collectively, these use cases demonstrate AI’s capacity to elevate accuracy, operational efficiency, and decision quality across multiple domains.

2.3.3. Autonomous Decision-Making

The fusion of MV and AI empowers machines to make autonomous decisions in robotics and industrial automation. Such systems can perceive their environment, interpret visual input, and execute context-appropriate actions without human oversight [93]. In manufacturing settings, industrial robots equipped with AI-driven MV navigate dynamic workspaces, recognize parts with high fidelity, and adjust their behavior in real time to accommodate process variations [94]. Automotive leaders, including Tesla, Mercedes, and BMW, deploy autonomous robotic arms that use MV to inspect assemblies, identify defects, and perform on-the-fly corrections during production [95,96].

In logistics, firms like Amazon and DHL utilize autonomous mobile robots (AMRs) that rely on AI-powered MV for warehouse navigation, parcel sorting, and adaptive route planning [97,98,99,100]. Similarly, in agriculture, autonomous drones combine MV and AI to assess crop conditions, detect plant diseases, and apply agrochemicals selectively, minimizing environmental impact while boosting yield efficiency [101,102]. These implementations underscore how AI-integrated MV is not only redefining current industrial practices but also laying the groundwork for the next generation of autonomous systems.

2.3.4. Adaptive Learning

AI-based MV systems continuously improve their accuracy by learning from new data [35,40]. This is especially beneficial in dynamic applications like traffic monitoring, medical diagnostics, and autonomous vehicles [36,103]. MV systems require adaptive mechanisms to process images under varying brightness conditions, yet conventional visual adaptive devices are hindered by slow adaptation speeds. To address this, Li et al. proposed a bionic two-dimensional (2D) transistor utilizing avalanche tuning as a feedforward inhibition mechanism, enabling rapid and high-frequency visual adaptation [104]. This approach achieved microsecond-level perception, surpassing the adaptation speed of the human retina and existing bionic sensors by more than 10,000 times. The bionic transistor dynamically transitions between avalanche and photoconductive effects in response to changes in light intensity, adjusting responsivity both in magnitude and polarity (from 7.6 × 10⁴ to 1 × 10³ A/W). This mechanism facilitated ultra-fast adaptation, with scotopic and photopic response times of 108 μs and 268 μs, respectively. By integrating this avalanche-tuned bionic transistor with CNNs, an adaptive MV system was developed capable of microsecond-level rapid adjustment. This system demonstrated exceptional performance, achieving over 98% accuracy in image recognition across both dim and bright lighting conditions [104].

Natural intelligence functions across multiple dimensions, with environmental learning and behavioral adaptation being fundamental aspects. Vision is particularly crucial in primates, where biological neural networks, composed of specialized neurons and synapses, process visual input while continuously adapting and learning with exceptional energy efficiency. Forgetting also plays a vital role in this process, enabling efficient information management.

Emulating these adaptive mechanisms in vision, learning, and forgetting could accelerate AI development and reduce the significant energy gap between artificial and biological intelligence. Dodda et al. introduced a bioinspired MV system based on a 2D phototransistor array constructed from large-area monolayer molybdenum disulfide (MoS₂), paired with an analog, nonvolatile, and programmable memory gate-stack [105].

This architecture enabled dynamic learning and relearning from visual stimuli while maintaining adaptability in varying lighting conditions with minimal energy consumption. The resulting “all-in-one” vision platform integrated sensing, computation, and memory within a single device, effectively bypassing the von Neumann bottleneck that plagues conventional CMOS architectures and removing the need for external peripheral circuits or auxiliary sensors [105].

2.3.5. Edge Computing and Real-Time Processing

Edge computing is fundamentally changing how MV systems operate by moving computation away from remote data centers and into local hardware close to where images are captured. This approach directly tackles two long-standing bottlenecks in cloud-dependent architectures: communication delays and limited bandwidth [106,107]. Traditional MV workflows often shuttle raw image data offsite for analysis, which not only slows down decision-making but also raises concerns about the exposure of sensitive visual content. Edge computing sidesteps both problems by processing data right at the source, whether on smart cameras, IoT-enabled sensors, or embedded AI chips. This makes it feasible to act instantly in applications where every millisecond counts, such as self-driving cars, real-time medical diagnostics, industrial robotics, and intelligent surveillance [108,109]. Enabling this transition are AI models (particularly CNNs) that have been heavily optimized to run efficiently on dedicated accelerators like GPUs, TPUs, and FPGAs [110,111]. Further gains in efficiency come from lightweight inference engines like TensorFlow Lite and OpenVINO, which allow complex models to run on low-power edge devices without sacrificing responsiveness [112,113]. By reducing reliance on centralized infrastructure, edge-based MV not only improves system resilience and data confidentiality but also maintains functionality in bandwidth-limited or disconnected environments. As industries increasingly demand instant visual insight, the synergy between edge computing and MV will continue to fuel innovation in safety-critical automation and operational efficiency.

Monitoring biomass in fermenting mushroom liquid cultures requires uninterrupted, real-time analysis with minimal operator input, making intelligent, automated vision essential. To meet this need, Wu et al. introduced Edge CV, a compact MV system built on edge computing principles for in situ, non-invasive biomass estimation [114]. Built on the Jetson Nano platform (featuring 4 GB RAM, 64 GB ROM, and a 128-core Maxwell GPU), the system supports real-time execution of vision algorithms. Integrated cameras stream image data continuously, enabling fully automated monitoring without operator intervention. To achieve accurate biomass evaluation, a cascaded MV model was developed, consisting of three key steps: object detection to locate the observation window, segmentation to extract liquid strain data, and morphological image processing to compute mycelium biomass indices. By integrating edge computing with MV, Edge CV enhanced automation, reducing manual workload while improving efficiency and accuracy. This study demonstrated the practical potential of edge-based MV for real-time biomass monitoring during fermentation [114].

3. Applications of MV

MV is transforming how industries operate by giving machines the ability to “see” and act on what they see. By combining sophisticated imaging techniques with AI, MV systems now support smarter automation, more rigorous analysis, and better-informed decisions. The payoff is clear: sharper efficiency, more reliable measurements, and greater system stability. These benefits are being realized across fields as diverse as manufacturing, healthcare, transportation, security, and environmental monitoring.

3.1. Industrial Automation and Quality Control

In today’s factories, MV has become a backbone technology for fast, precise inspection and quality control [115,116,117,118]. Industries like automotive, electronics, and pharmaceuticals rely on it to catch surface defects or internal flaws, confirm that parts meet exact dimensional specs, and ensure consistent output from one production batch to the next [9]. By fusing cameras, sensors, and AI-driven algorithms, these systems perform real-time product evaluation, curbing human-induced variability and boosting throughput [119]. Vision-guided robotic arms further extend automation capabilities, executing intricate operations like component assembly, part sorting, and object manipulation directly on the production floor. The cumulative effect is higher product consistency, lower operational costs, and minimized unplanned downtime [25,120].

Retail logistics, however, still leans heavily on manual labor for shelf monitoring and restocking. This process is prone to inefficiency, staffing strain, and inventory inaccuracies. To counter this, Gao et al. developed an autonomous replenishment robot built around MV capabilities [121]. Equipped with the OpenMV vision module, the robot independently recognized stockouts, mapped product and obstacle positions, and collected critical inventory metrics. A custom Python-based path-planning algorithm allowed it to maneuver through store aisles and execute restocking tasks without human oversight. By automating stock monitoring and replenishment, this system enhanced inventory accuracy, reduced labor costs, optimizes product placement, and improved overall operational efficiency and customer satisfaction [121].

Yang et al. focused on integrating MV technology into industrial automation assembly lines, emphasizing a visual inspection system based on an edge detection algorithm [122]. By applying edge detection in image processing, the system accurately determined workpiece position, geometry, and dimensions, thereby improving both automation and operational efficiency. The study proposed a complete MV framework encompassing image acquisition, preprocessing, feature extraction, and detection algorithms. Extensive simulations and experimental validation confirmed the system’s high accuracy and robustness in real industrial conditions, achieving a detection precision of 0.01 mm with overall system error held below 0.5%. The findings contributed to the advancement of intelligent industrial automation, offering a robust technical foundation for future developments [122].

Ali et al. proposed an intelligent framework for quality control and fault detection in manufacturing systems, encompassing both pre-production and post-production stages within Industry 4.0 [123]. Figure 7a depicts an MV system, while Figure 7b illustrates a comprehensive quality control framework integrating both phases. During the pre-production stage, vibration sensors mounted on the induction motor’s surface collect data from the gearbox via the motor shaft. This information is then transmitted through internet gateways to AI engines, where DL models assess the condition of the motor’s gear system. The real-time condition of the system was displayed on a connected screen, while a database server continuously records historical data and fault occurrences [123].

By providing early fault detection, this system enabled engineers and operators to take proactive measures, reducing the risk of unexpected machine failures. For the post-production phase, depicted in the top left of the figure, an MV system was utilized to inspect the final product on the production line’s conveyor belt. Equipped with cameras and a specialized lighting setup, the system captured high-quality images of the products. These images were then analyzed by AI engines trained to distinguish between defective and non-defective items. Before making final classifications and fault predictions, the AI engines preprocess both vibration and image data to enhance accuracy [123].

3.2. Medical Imaging and Diagnostics

MV is transforming healthcare, particularly in how medical images are interpreted and used to guide clinical decisions. It plays a critical role in analyzing X-rays, MRIs, CT scans, and ultrasounds to detect diseases such as cancer, fractures, and neurological disorders with greater accuracy [14]. AI-powered MV systems assist radiologists by identifying patterns in medical images, leading to earlier and more reliable diagnoses [91,124]. In addition, CV/MV is used in robotic-assisted surgeries, where real-time imaging guidance allows for precise, minimally invasive procedures. Laboratory automation also benefits from MV, as it speeds up the analysis of blood samples, tissue slides, and genetic data, improving diagnostic efficiency [14,125].

In recent years, DL-driven CV solutions have been developed for minimally invasive surgery by both academic researchers and industry professionals. These applications of CV span various tasks, from analyzing workflows to evaluating performance automatically. While similar digital solutions have already been scaled and clinically implemented for diagnostic use in fields like gastrointestinal endoscopy [126] and radiology [127], the use of CV in surgery remains underdeveloped.

In minimally invasive abdominal surgery, intraoperative bleeding is a major complication, often resulting from accidental damage to arteries or veins. A surgeon’s skill plays a crucial role in minimizing this risk. To enhance safety, Penza et al. developed the Enhanced Vision System for Robotic Surgery (EnViSoRS) (Figure 8), which incorporates a user-defined Safety Volume (SV) tracking system to reduce the likelihood of vessel injury [128]. EnViSoRS enhances a surgeon’s capabilities by providing Augmented Reality (AR) support during robotic-assisted procedures. Its framework consists of three key components: (i) the LT-SAT tracker, a hybrid algorithm that ensures long-term monitoring of the user-defined Safety Area (SA); (ii) a 3D reconstruction algorithm for dense soft tissue, essential for calculating the SV; and (iii) AR features that visualize the protected SV and display a graphical gauge indicating the distance between surgical instruments and the reconstructed surface.

The system was integrated with the dVRK robotic surgical platform for testing and validation. Simulated liver surgery on a phantom was used to evaluate accuracy, robustness, performance, and usability. Results confirmed that EnViSoRS achieved the required surgical accuracy (<5 mm) and reliably computed and identified the SV with high precision and recall. An optimization strategy improved computational efficiency, enabling AR feature updates at up to 4 frames per second without disrupting real-time stereo endoscopic video visualization. Usability tests further demonstrated seamless integration with commercial robotic surgical systems, highlighting its potential for real-world applications [128].

3.3. Autonomous Vehicles and Robotics

The advancement of autonomous vehicles and robotic systems heavily depends on MV technology. Self-driving cars utilize cameras, LiDAR, and AI-driven vision algorithms to perceive and interpret their environment [129]. These systems enable the detection of road signs, lane markings, pedestrians, and other vehicles, ensuring safe navigation and obstacle avoidance. Similarly, MV empowers industrial and service robots by providing object recognition, motion tracking, and navigation capabilities. In logistics, warehouse robots use vision-based guidance for sorting and transporting goods efficiently, while drones leverage vision systems for mapping, surveillance, and search-and-rescue operations [130,131]. Do et al. developed an omnidirectional vision system for a home service robot, focusing on cost-effectiveness by using readily available components [132]. The system, installed on a mobile robot controlled wirelessly via a PC, was designed for two primary functions: intruder detection and fire detection. For intruder detection, an adaptive background subtraction method was applied to analyze image sequences. In addition, a unique fire detection algorithm was introduced, which processes images through three distinct stages: pixel-level, block-level, and global-level analysis [132].

Grigorescu et al. outlined the development of the ROVIS MV architecture for service robotics, with a particular emphasis on the Model Driven Development (MDD) approach used in designing and implementing the vision system [133]. The development follows a structured approach, starting with the identification of ROVIS’s core requirements, followed by three key design phases: requirements analysis, system functional analysis, and architectural design. A shared-control framework was used to model the flow of information between the user and the vision system. The proposed architecture played a critical role in enabling the visual perceptual capabilities of the rehabilitation robot, FRIEND.

Wang et al. introduced a real-time active collision avoidance approach within an augmented environment, combining virtual 3D robot models with live camera feeds of operators for collision detection and monitoring [134]. A prototype system was developed and integrated with robot controllers, allowing adaptive control without the need for user programming. Upon detecting a potential collision, the system could alert the operator, halt the robot, or modify its trajectory to prevent impact. A case study confirmed the system’s practical efficacy in real-world scenarios, particularly in human–robot collaborative assembly, where it significantly improved operator safety [134].

The cold chain logistics sector has seen substantial growth in recent years, yet automation in this domain remains limited. Cold storage operations, in particular, require a careful trade-off between safety and operational efficiency. This balance is one that existing detection algorithms often struggle to maintain. To address this gap, Wei et al. proposed a carton recognition and grasping system for cold storage warehouses built on YOLOv5 [135]. The system incorporated a human–machine interface supporting both remote operation and fully autonomous grasping in refrigerated environments. Several improvements were made to the underlying algorithm: the integration of the CA attention mechanism enhanced accuracy, the Ghost lightweight module replaced the CBS structure to increase runtime efficiency, and the Alpha-DIoU loss function was employed to refine detection precision. These adjustments resulted in a 0.711% increase in mean Average Precision (mAP) and a 0.7% boost in frames per second (FPS), all while preserving detection accuracy.

An experimental platform was established to evaluate the system’s performance. The host machine was equipped with an AMD Ryzen 7 5800H CPU, an NVIDIA GeForce RTX 3060 GPU, 16 GB of RAM, and 6 GB of video memory. A ZED 2i camera (Stereolabs Inc., Paris, France) with a polarizer and a 4 mm focal length lens was used for data capture. The system ran on Windows 10 with CUDA 11.6.134, and development was conducted in Python 3.9 using the PyTorch framework [135]. The experimental setup, illustrated in Figure 9a, included a PC, control system, camera, motor, slider, suction cup, telescopic rod, and target cartons for picking. The camera was centrally mounted above the parallel middle rail to maximize the field of view, as shown in Figure 9b. During detection, false positives referred to non-target objects mistakenly identified as targets, while false negatives indicated missed detections of cardboard boxes. A total of 200 images containing 1824 instances were analyzed, with false positive and false negative rates recorded. Experimental findings demonstrated that the CA attention mechanism improved fidelity by 2.32%, the Ghost module reduced response time by 13.89%, and the Alpha-DIoU loss function enhanced positioning accuracy by 7.14%. These optimizations collectively led to a 2.16% decrease in response time, a 4.67% increase in positioning accuracy, and an overall enhancement in system performance [135].

3.4. Security and Surveillance

MV is widely used in security and surveillance to enhance safety, threat detection, and crime prevention [22,136,137]. Facial recognition systems powered by MV enable identity verification in high-security areas such as airports, government buildings, and financial institutions. AI-driven video analytics can monitor live feeds, detecting suspicious activities, unauthorized intrusions, or unattended objects in real time. In traffic management, MV systems analyze vehicle movement, identify traffic violations, and optimize road safety. Additionally, night vision and thermal imaging technologies extend surveillance capabilities to low-light or adverse weather conditions, improving security effectiveness in various environments [138].

Nigam et al. developed and implemented the MV Surveillance System AI (MaViSS-AI) to enforce COVID-19 guidelines using the Jetson Nano platform [139]. Designed for cost-effectiveness, accuracy, efficiency, and security, the system monitored compliance through two key functions: tracking and counting individuals to assess social distancing and detecting face masks using object detection techniques. YOLO (You Only Look Once) was employed for person detection and counting, ensuring real-time monitoring and enforcement. To ensure social distancing, the system calculated the distance between the centroids of individuals, flagging any violations when the threshold was exceeded. Mask detection was accomplished using a YOLO V4 DL model. Additionally, the system was capable of raising alerts for suspicious events, allowing security personnel to respond promptly [139].

Traffic conditions are shaped not only by infrastructure such as signals and road layouts but also by the behavior of drivers, which is often overlooked. Conventional traffic control systems have difficulty with challenges like adjusting green light timing or identifying vehicles making illegal turns. To address this, Khan et al. [140] developed a self-adaptive, real-time system that blends image processing with ML to improve traffic flow at intersections. Their approach applied the YOLOv3 model for vehicle detection and used neural networks for monitoring traffic activity. The system tracked vehicle centroids (centers of mass) to reconstruct individual paths and flag those straying outside allowed lanes or turning illegally. During evaluation, it reached 88.43% accuracy in detecting vehicles and 90.45% accuracy in identifying prohibited maneuvers and reckless driving. Adding a CNN further sharpened its performance in dense, multi-lane intersections, helping to ease congestion and improve traffic safety.

Rapid urban expansion, driven by economic growth and technological change, has led to a sharp rise in construction worldwide. But frequent on-site accidents point to deeper problems: inadequate hazard recognition, inconsistent supervision, and poor adherence to safety protocols. To tackle this, Zhang et al. [141] developed an AI-powered safety system that uses MV for round-the-clock, real-time monitoring of construction sites. By combining AI-based object detection with spatial interaction analysis, the system interprets dynamic on-site conditions and identifies recurring accident patterns. A dedicated monitoring and early-warning platform was developed to automatically detect hazardous scenarios and trigger preventive alerts before incidents occur. Tests showed that this approach significantly enhanced construction site safety management, with reported gains of 97.4% in management efficiency, raising both compliance and worker protection standards.

Figure 10 shows the framework of the proposed system, which operates in three stages. First, monitoring equipment is installed at the construction site to capture worker activity. Sensors record movement data, such as speed and angle, which are then used to build a large dataset of worker behaviors. From this data, models for behavior perception and activity tracking are generated and stored on smart devices, enabling continuous and adaptive safety monitoring throughout the project lifecycle. The second stage focuses on application and implementation, where the system’s built-in recognition unit processes worker activity signals and classifies them as either “safe” or “unsafe.” Based on this assessment, alerts or notifications are sent to both employees and the management center when unsafe behavior is detected. Employees receive corrective guidance, while managers conduct on-site evaluations of behavior-based safety (BBS) risks and implement targeted safety management strategies. The final stage involves continuous model refinement, where new data from the implementation phase is used to improve accuracy. Misclassified or borderline cases identified during operation are logged as standard errors and incorporated into the training database, enabling the system to iteratively refine its classification models. This feedback loop supports ongoing improvement in detection reliability, ultimately strengthening both safety outcomes and operational efficiency on-site [141].

3.5. Agriculture and Environmental Monitoring

MV is playing an ever-greater role in advancing precision agriculture and safeguarding the environment [142]. In crop farming, drones and autonomous ground platforms outfitted with multispectral and hyperspectral sensors are now routinely deployed to assess soil properties, track plant vitality, and detect early signs of pest infestations or disease. These capabilities allow for more judicious water use, precise nutrient delivery, and minimized chemical inputs. They represent key pillars of sustainable agricultural practice. In livestock management, MV supports individual animal identification, continuous health monitoring, and the analysis of behavioral patterns. Applications also encompass environmental monitoring, where satellite imagery supports deforestation tracking, air and water quality assessment, and the observation of climate change indicators. Together, these capabilities reinforce sustainable resource management and contribute to ecosystem preservation [143].

MV has emerged as a highly effective means of identifying plant stress, such as water deficit, nutrient disorders, and pest or disease outbreaks [144,145]. The approach relies on cameras and sensors to acquire visual data, which is subsequently analyzed using dedicated hardware and software to derive actionable insights. Its utility spans a range of agricultural functions, including presence detection, object positioning, species or variety identification, defect characterization, and dimensional measurement [146]. In plant factories, elevated humidity and air temperature foster conditions conducive to pest proliferation and disease spread, posing significant economic risks if not detected and mitigated early. Traditional manual monitoring in greenhouses is time-consuming, labor-intensive, and prone to subjective interpretation of infection levels. Early symptoms of pests and diseases are often undetectable by the human eye, resulting in widespread crop damage before detection. Recent studies have introduced MV for plant health monitoring, as shown in Figure 11 [147].

Several studies have explored different approaches to plant stress detection using MV. Foucher et al. developed a technique that employed a perceptron with a single hidden layer and imaging analysis [148]. By converting plant images into binary format, where plants appeared in black against a white background, they analyzed shape parameters to determine stress levels. Their method assessed plant stress based on moment invariants, fractal dimensions, and the average length of terminal branches [149]. In another study, Chung et al. investigated the potential of commercial smartphones for monitoring plant health. They found that smartphones could serve as a cost-effective alternative to traditional near-infrared (NIR) spectrophotometers and NIR cameras, making plant stress detection more accessible [150]. Meanwhile, Ghosal et al. demonstrated the effectiveness of DL in MV for identifying and classifying different types of stress in soybean plants [151]. Their model, trained on large datasets, achieved an impressive accuracy of 94.13%, as validated by a confusion matrix. These findings highlighted the potential of real-time plant stress detection through mobile applications, offering a practical solution for modern precision agriculture.

A fully automated plant stress detection framework was designed to provide farmers with an easy-to-use solution for monitoring crop health [152]. The system captured leaf images directly from the field using a camera and applies ML techniques to classify them as either healthy or unhealthy. A support vector machine (SVM) algorithm was trained using extracted leaf features, ensuring accurate classification. Instead of transmitting entire images, only the extracted features were sent to the cloud for efficient processing. On the receiving end, agricultural consultants analyze these features to identify plant stress using classification techniques. The framework employed gray-level co-occurrence matrix (GLCM) textures to distinguish between healthy and stressed leaves. System performance was assessed based on classification accuracy and the effectiveness of stress detection, ensuring reliable results for precision agriculture [152]. MV has become widely utilized in fruit grading, driving automation in the food processing industry. Studies have employed SVM and Artificial Neural Networks (ANN) to evaluate fruit maturity and quality. The effectiveness of these methods depends on the availability of large, reliable datasets for training. As shown in Figure 12, this technology plays a crucial role in these processes [147].

4. Future Trends and Research Directions

The ongoing evolution of MV is being shaped by transformative technologies, including Explainable AI (XAI) [153], quantum computing [154], and neuromorphic computing [155], which hold the potential to redefine the field. These innovations target core challenges in current MV systems by improving interpretability, computational efficiency, and adaptive capability, thereby overcoming existing bottlenecks and enabling novel real-world deployments. The trajectory of MV advancement will be tightly coupled with progress in XAI [153], quantum computing [154], and neuromorphic architectures [155], as these paradigms collectively enhance model transparency, accelerate processing, and support more dynamic visual reasoning. To clarify this outlook, emerging developments can be organized into short-term, mid-term, and long-term horizons, each anchored in concrete methodological breakthroughs and standardized evaluation frameworks.

4.1. Short Term (1–3 Years): Explainability and Robustness

Current MV systems based on DL often operate as “black boxes,” offering little insight into how decisions are derived. This lack of transparency is particularly problematic in high-stakes applications such as autonomous driving, medical diagnostics [86], and industrial automation, where verifiable reasoning and system accountability are non-negotiable. XAI has emerged to address this gap, with the goal of producing predictions that human users can understand, interrogate, and trust [156].

Ongoing and near-future research in XAI for MV is likely to prioritize the design of inherently interpretable model architectures, such as attention-based modules and feature attribution techniques, that explicitly identify image regions most influential to a given output [157]. Complementary strategies may involve hybrid frameworks that integrate DL with rule-based or symbolic AI, thereby enhancing transparency while preserving predictive performance. By enhancing trust and accountability in MV systems, XAI will facilitate wider adoption in industries that require rigorous validation and compliance with regulatory standards [158].

In the short term, it will also be essential to develop standardized benchmarks for evaluating XAI methods in MV. These benchmarks will provide a common ground for assessing interpretability, robustness, and reliability across different domains. Moreover, addressing robustness challenges such as adversarial attacks, environmental variability, and uncertainty quantification will remain a priority, ensuring that MV systems can function reliably in diverse real-world contexts.

4.2. Mid Term (3–7 Years): Neuromorphic Computing and Efficient Learning

Neuromorphic computing, inspired by the architecture of the human brain, presents another promising direction for advancing MV [159]. Unlike traditional von Neumann computing architectures, neuromorphic systems use spiking neural networks (SNNs) that mimic biological neurons and synapses, enabling more energy-efficient and real-time processing of visual data [160]. One of the key advantages of neuromorphic computing in MV is its ability to process streaming data with low latency and minimal power consumption. This makes it ideal for edge computing applications, such as autonomous drones, smart cameras, and wearable vision systems [106,161].

Research in this area is expected to focus on improving the scalability and adaptability of neuromorphic processors, enabling them to handle complex vision tasks such as scene understanding, gesture recognition, and predictive analytics. Additionally, integrating neuromorphic computing with DL techniques could lead to hybrid architectures that combine the efficiency of event-driven processing with the robustness of deep networks [162].

Neuromorphic engineering also involves the development of artificial systems that replicate the information processing mechanisms of biological nervous systems, particularly through electronic analog circuits. While computers excel in speed and accuracy, they struggle with recognition tasks compared to the human brain. Nevertheless, progress in neuromorphic computing, particularly in CV and image processing, is anticipated to significantly improve the way machines perceive and reason about visual input. Subramaniam examined core visual functions such as image segmentation, visual attention, and object recognition [163]. The work revisited anisotropic diffusion and introduced an innovative memristor-based method for segmentation. It also assessed the role of neuromorphic vision sensors in artificial systems, with particular attention to protocols governing asynchronous event-based communication. Two established algorithms for object recognition and attention modeling were critically evaluated. A central theme was the incorporation of non-volatile memory elements, especially memristors, into vision hardware. The study concluded by highlighting the pivotal role of dedicated hardware accelerators, arguing that advances in non-volatile memory technologies could serve as a catalyst for next-generation CV systems [163].

A critical comparison between conventional CNNs and neuromorphic approaches highlights their complementary strengths and weaknesses. While CNNs continue to dominate in terms of benchmark accuracy, they are limited by relatively high latency and power consumption. In contrast, neuromorphic models offer ultra-low latency and exceptional energy efficiency, which makes them particularly well suited for edge applications like drones and wearable vision systems. However, their performance in terms of accuracy is still limited, and dedicated hardware remains scarce. For these reasons, the most promising near-term advances are likely to come from hybrid architectures that combine convolutional neural networks (CNNs) with spiking neural networks (SNNs). Such models aim to retain the robustness of deep learning while benefiting from the computational efficiency of event-driven processing [164,165,166,167].

In the mid term, it will also be necessary to establish benchmark suites for neuromorphic MV tasks. These benchmarks will guide the development of hardware–software co-design strategies, ensuring fair evaluation of neuromorphic systems and supporting their integration into practical applications.

4.3. Long Term (7+ Years): Quantum and Hybrid Paradigms

Quantum computing has the potential to drastically improve MV capabilities by accelerating complex computations that are currently infeasible with classical computing. Quantum algorithms, such as quantum-enhanced ML, could significantly reduce the training time of DL models and enable faster image processing for large-scale datasets [154,168].

One promising research direction is the application of quantum neural networks (QNNs) for image recognition and classification. QNNs have shown promise in classification tasks but encounter difficulties in multi-class image classification. Bai et al. introduced the Superposition-Enhanced Quantum Neural Network (SEQNN) to improve quantum classification [168]. SEQNN integrates image superposition with Quantum Binary Classifiers (QBCs) to address two key challenges. First, it overcomes the linearity of quantum evolution by using a one-vs.-all strategy with QBCs, enabling better handling of nonlinearity in classification. Second, to mitigate data imbalance in the one-vs.-all subtasks, SEQNN applies image superposition, inspired by the mixup technique. Two methods were introduced: Quantum State Superposition (QSS) and Angle Superposition (AS). Experiments on the MNIST and Fashion-MNIST datasets showed that AS performed better than QSS in multi-class classification. With AS, SEQNN outperformed existing models, achieving 87.56% accuracy on the MNIST dataset [168].

Quantum optimization methods also hold promise for boosting real-time object detection and tracking, particularly in scenarios where split-second decisions are critical, such as autonomous robots or high-stakes surveillance [169,170,171]. That said, actual deployment remains out of reach for now, largely because today’s quantum hardware is still too rudimentary. This reality makes hybrid quantum–classical approaches not just useful but necessary as a pragmatic path forward for MV.

Looking further ahead, the field will need standardized benchmarks to properly evaluate how quantum-enhanced vision systems scale, perform, and hold up under real-world stress. Even more speculative, yet potentially transformative, are architectures that merge quantum computing with neuromorphic principles. By combining quantum computing’s theoretical speedups with the brain-inspired efficiency of event-driven neuromorphic processing, such hybrids could one day enable vision systems that far exceed the limits of current technology.

4.4. Synthesis

Collectively, these trajectories indicate that MV systems will transition from experimental prototypes toward specialized, production-ready platforms. Explainable models are likely to become foundational in safety-critical domains, such as autonomous driving and medical diagnostics, where interpretability directly influences regulatory acceptance and user confidence. Quantum-accelerated approaches may initially gain traction in high-throughput industrial inspection and defense-related real-time analytics, whereas neuromorphic processors are ideally matched to power-limited settings, including drones, wearables, and remote sensing. Beyond isolated technical breakthroughs, the synergistic integration of these paradigms will yield MV systems that are not only more efficient and adaptive but also inherently interpretable. This will accelerate their adoption across healthcare, agriculture, security, and industrial automation in the coming decade.

A critical comparison between conventional CNNs and neuromorphic approaches further emphasizes this trajectory (Table 4). While CNNs continue to set state-of-the-art benchmarks, they suffer from high latency and power consumption. Neuromorphic models excel in real-time efficiency but lag in accuracy and hardware availability, making hybrid CNN–SNN solutions an attractive interim step toward broader adoption.

While the preceding sections present detailed single-study findings, a higher-level synthesis helps to highlight broader patterns across application domains. To this end, we compiled a meta-level summary (Supplementary Table S2) that aggregates results by domain: healthcare, manufacturing, autonomous systems, and surveillance. The table outlines the most common tasks, typical methodological approaches, representative performance ranges, and emerging trends. This consolidated perspective offers readers a concise, cross-domain overview of the current state of the art, extending beyond individual case studies.

5. Challenges and Limitations

MV has made significant strides, largely driven by breakthroughs in AI, DL, and high-resolution imaging [172]. Despite these advances, widespread adoption and consistent operational performance remain hampered by a range of persistent challenges [147,173], including computational constraints, insufficient training data, ethical concerns, and evolving regulatory requirements. Addressing these issues effectively will require ongoing collaboration across engineering, policy, and social science disciplines.

This evolution has also reshaped the nature of the field’s core limitations. Classic MV methods, like rule-based edge detection, intensity thresholding, and template matching, had clear advantages: they were simple to set up, ran on modest hardware, and their decisions could be easily traced and understood. But they broke down quickly outside controlled lab settings, struggling with everyday complications like changing light, sensor noise, or objects partially hidden from view. Today’s DL-driven systems, by contrast, handle messy, real-world scenes with far greater accuracy, but this comes at a price. They need massive labeled datasets, demand significant computing power, and operate as “black boxes,” offering little insight into how they reach conclusions. In effect, the field’s central challenge has shifted: where early systems failed because they couldn’t adapt, modern ones struggle because they’re hard to train, deploy, and trust. To bridge this gap, recent innovations emphasize synthetic data generation, transfer and few-shot learning to mitigate data scarcity; hardware accelerators and edge computing for real-time processing; and Explainable AI frameworks or hybrid rule-based/DL approaches to restore transparency and trust.

Among the most pressing concerns is the computational burden and hardware dependency of modern MV systems. DL models, particularly CNNs, require considerable processing resources. In time-sensitive applications like autonomous driving or industrial automation, where decisions must be made within milliseconds, this computational load often results in unacceptable latency. Furthermore, many MV deployments occur in resource-constrained settings, such as drones, mobile robots, or embedded sensors, where energy efficiency is paramount. The high power consumption of standard DL models restricts their viability in such contexts, driving interest in specialized hardware (e.g., neuromorphic chips) and algorithmic optimizations that reduce inference costs.

Closely related is the strategic decision of where to perform computation: on-device (edge) or in the cloud. Edge processing minimizes latency and enhances data privacy but faces limitations in scalability and compute capacity. Cloud-based approaches offer greater flexibility and processing power but introduce bandwidth demands, security risks, and response delays. As outlined in Table 5, many current deployments now rely on hybrid edge–cloud architectures, which seek to reconcile the conflicting demands of speed, privacy, scalability, and processing power.

Data dependency remains a major bottleneck. Cutting-edge MV models typically rely on vast, meticulously labeled datasets, yet such data is scarce in niche domains like medical imaging or the identification of rare production flaws. Creating these labels isn’t just time-consuming and costly; it often demands expert knowledge, particularly when subtle visual distinctions determine a correct decision. Researchers have turned to workarounds like synthetic data, semi-supervised learning, and domain adaptation to reduce labeling burdens and broaden dataset coverage. While promising, these methods still fall short of guaranteeing consistent, reliable performance when models face the unpredictable variations of real-world deployment. Differences in lighting, viewpoint, background clutter, or imaging conditions can easily undermine their effectiveness. Models trained in controlled environments often struggle when deployed in different settings, as they fail to generalize effectively due to variations in lighting, object positioning, and background interference.

Researchers are actively exploring XAI to improve transparency in black-box AI models, enhancing user trust and understanding. A key challenge is balancing faithfulness with the model with plausibility for users. Liu et al. examined whether integrating human attention knowledge into saliency-based XAI methods for CV could improve both aspects [158]. They introduced FullGrad-CAM and FullGrad-CAM++, two gradient-based techniques adapted from image classification to object detection, generating object-specific explanations. Evaluations using human attention as a plausibility measure showed improved explanation plausibility. However, existing XAI methods for object detection often produce saliency maps that are less faithful to the model than human attention maps for the same task. To address this, Human Attention-Guided XAI (HAG-XAI) was developed. This approach refines model explanations by learning from human attention, incorporating trainable activation functions and smoothing kernels. Experiments on BDD-100K, MS-COCO, and ImageNet datasets demonstrated that HAG-XAI outperformed existing XAI methods for object detection, enhancing plausibility, faithfulness, and user trust. For image classification models, it improved plausibility and trust, though with some trade-offs in faithfulness [158].

Beyond technical challenges, MV systems also face robustness and security concerns. Unlike human vision, which adapts dynamically to environmental changes, MV models can be highly sensitive to variations in input data. Small changes in lighting, angle, or occlusion can lead to significant reductions in accuracy. Furthermore, adversarial attacks pose a serious risk, where small, imperceptible modifications to images can deceive a model into making incorrect classifications. This vulnerability is particularly concerning in safety-critical applications such as autonomous driving, medical diagnostics, and security surveillance, where a single misclassification could have severe consequences. Developing more resilient models capable of handling real-world variations and resisting adversarial manipulations remains an ongoing research challenge.

Ethical and privacy issues further complicate MV deployment [174]. The proliferation of facial recognition and intelligent video analytics has intensified public and regulatory scrutiny regarding mass surveillance and personal data handling. Legal frameworks such as the GDPR impose stringent controls on data acquisition, retention, and usage, adding compliance complexity for developers. Compounding this, algorithmic bias, particularly in facial recognition systems, has been shown to disproportionately affect underrepresented demographic groups due to skewed training data, raising serious fairness and equity concerns. Mitigation strategies now include adversarial debiasing, fairness-aware optimization, and targeted data augmentation, though achieving truly equitable performance remains an open challenge. Adversarial debiasing techniques introduce auxiliary networks that remove sensitive attribute information from learned representations, thereby reducing biased decision-making [175]. Fairness metrics such as demographic parity and equalized odds are increasingly employed to evaluate and enforce equitable outcomes across demographic groups [176]. Other approaches include fairness-constrained optimization, synthetic data generation for underrepresented populations, and domain adaptation methods that improve generalization in diverse environments [177]. Addressing algorithmic bias thus requires not only diverse and representative datasets but also fairness-aware training methodologies and transparency in model decision-making [178].

Cost and integration complexities further limit the widespread adoption of MV technologies [179]. Developing high-performance vision systems requires expensive hardware, including high-resolution cameras, GPUs, and AI-optimized processors [180]. Small and medium-sized enterprises (SMEs) often find the cost prohibitive, making it difficult for them to invest in MV solutions. In manufacturing, maintaining high throughput alongside stringent quality standards is essential. MV systems offer a practical response to this need through automated process monitoring and quality inspection. Wurschinger et al. demonstrated a real-world deployment in serial production that combined Transfer Learning with affordable hardware to build an effective vision solution [180]. Their workflow covered the full pipeline, from hardware integration and data collection to preprocessing, model optimization, and operational deployment. The resulting system met all specified performance criteria and delivered accuracy on par with commercial-grade MV platforms [180].

Beyond model and data challenges, deploying AI-powered vision systems into existing industrial or clinical environments often proves difficult and costly [181]. Legacy infrastructure was rarely designed with AI integration in mind, meaning significant retrofitting is usually required to support modern MV capabilities. Compounding this, evolving regulatory requirements across sectors and regions add another layer of complexity to adoption. Different industries and regions have varying legal frameworks governing AI and MV applications [103]. In the medical field, for instance, AI-driven diagnostic tools must undergo rigorous validation before they can be deployed in clinical settings [182]. Similarly, autonomous vehicle vision systems must comply with strict safety regulations before they are approved for public use. As AI regulations continue to evolve, companies must stay informed and ensure compliance, which can be both time-consuming and resource-intensive [183].

Practical deployment also poses significant challenges for MV adoption in real-world settings. Even beyond technical and regulatory barriers, practical deployment poses significant hurdles. Translating lab-scale prototypes into robust industrial or field-ready systems is often hampered by environmental variability. Fluctuations in lighting, temperature, vibration, dust, or weather can degrade performance and demand frequent recalibration. Sustained operation further requires ongoing maintenance: cleaning lenses, updating models, managing software dependencies, and ensuring hardware reliability. Equally important is workforce readiness; operators and technicians frequently lack the training needed to interpret system outputs, diagnose failures, or integrate MV tools into existing workflows. Without deliberate human–machine co-design, even state-of-the-art systems may underperform when scaled beyond controlled environments [184].

Tackling these multifaceted challenges calls for a coordinated strategy spanning technology, policy, and ethics. Hardware innovations, such as low-power AI accelerators and neuromorphic processors, can ease computational and energy constraints [160,185]. Data-centric advances like active learning and physics-informed synthetic data can improve model robustness. Meanwhile, ethical AI governance, bias-mitigation protocols, and cross-sector collaboration among researchers, regulators, and industry stakeholders will be essential to ensure responsible, inclusive, and sustainable MV adoption.

Despite these limitations, MV remains a profoundly transformative technology with expanding impact across healthcare, manufacturing, security, and autonomous systems. Continued progress in addressing its current constraints will be pivotal in realizing the next generation of intelligent, efficient, and ethically grounded vision systems. For reference, all quantitative results from cited studies, including their experimental settings (laboratory, simulation, or real-world), are compiled in Supplementary Table S1.

6. Conclusions

MV is in the midst of a deep and accelerating shift, propelled by the convergence of AI, edge computing, neuromorphic chips, and early quantum-inspired approaches. Tomorrow’s MV systems won’t be evaluated solely on how fast or accurate they are; just as important will be whether they’re transparent in their reasoning, adaptable to new situations, and robust enough to function reliably outside the lab. As industries push toward fully automated operations, the need for vision systems that consume little power, understand their surroundings, and learn from experience will only intensify. Realizing this vision means moving beyond isolated technical advances. It requires genuine collaboration across disciplines that have long operated in parallel, including optics, robotics, computer science, and even ethics. Only through such integration can MV evolve from a passive observer into an intelligent partner in real-world decision-making. But significant hurdles stand in the way. Delivering consistent performance means solving enduring technical problems: calibrating sensors precisely, processing high-resolution video with minimal delay, and maintaining accuracy despite changing light, motion artifacts, or environmental interference. On top of these are real-world constraints. These include expensive hardware, complex integration into existing workflows, and a shortage of engineers and technicians who can actually install, tune, and troubleshoot these systems. If these practical and economic barriers aren’t addressed, MV risks remaining out of reach for smaller manufacturers and resource-limited settings.

Equally urgent are the ethical questions raised by deploying autonomous vision systems in public spaces, workplaces, and critical infrastructure. Models trained on unrepresentative data, insufficiently validated algorithms, or “black-box” decision pathways can produce errors with real-world consequences. This risk is especially pronounced in sensitive domains like healthcare, law enforcement, or transportation. Building public trust therefore hinges on more than technical performance; it requires enforceable standards for fairness, mechanisms for accountability, and inclusive governance involving engineers, regulators, and civil society. Only through such deliberate, cross-sectoral cooperation can MV realize its full promise: not merely as a productivity enhancer, but as a responsible technology that serves both industry and society.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/technologies13110507/s1, Table S1: Reported performance values from cited studies and their experimental context, Table S2: Domain-level synthesis of CV and MV applications. The table summarizes representative tasks, typical methodological approaches, performance ranges (as reported in the reviewed studies), and emerging trends.

Author Contributions

Conceptualization, N.L.K., I.V.O. and A.V.N.; formal analysis, S.N.K. and I.V.O.; investigation, S.N.K. and A.V.N.; writing—original draft preparation, S.N.K.; writing—review and editing, R.M.K.; supervision, N.L.K.; project administration, A.V.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Economic Development of the Russian Federation (agreement identifier 000000C313925P3U0002, grant 139-15-2025-003 dated 16 April 2025).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
CNNs	Convolutional neural networks
CV	Computer vision
DL	Deep learning
ML	Machine learning
MV	Machine vision

References

History of Machine Vision. Available online: https://mv-center.com/en/history-of-machine-vision/ (accessed on 15 February 2025).
Zhao, R.; Yang, L. Research and Development of Machine Vision Algorithm Performance Evaluation System in Complex Scenes. J. Phys. Conf. Ser. 2023, 2562, 012022. [Google Scholar] [CrossRef]
Heyrman, B.; Paindavoine, M.; Schmit, R.; Letellier, L.; Collette, T. Smart camera design for intensive embedded computing. Real-Time Imaging 2005, 11, 282–289. [Google Scholar] [CrossRef]
Shi, Y.; Raniga, P.; Mohamed, I. A Smart Camera for Multimodal Human Computer Interaction. In Proceedings of the IEEE International Symposium on Consumer Electronics, St Petersburg, Russia, 28 June–1 July 2006. [Google Scholar]
Lee, K.F.; Tang, B. Image Processing for In-vehicle Smart Cameras. In Proceedings of the IEEE International Symposium on Consumer Electronics, St Petersburg, Russia, 28 June–1 July 2006. [Google Scholar]
Kitcher, P. Marr’s Computational Theory of Vision. Philos. Sci. 1988, 55, 1–24. [Google Scholar] [CrossRef]
Machine Vision: 9 Important Aspects to See Beyond Human Limitations. Available online: https://julienflorkin.com/technology/computer-vision/machine-vision/ (accessed on 15 February 2025).
Javaid, M.; Haleem, A.; Singh, R.P.; Ahmed, M. Computer vision to enhance healthcare domain: An overview of features, implementation, and opportunities. Intell. Pharm. 2024, 2, 792–803. [Google Scholar] [CrossRef]
Palanikumar, K.; Natarajan, E.; Ponshanmugakumar, A. Chapter 6—Application of machine vision technology in manufacturing industries—A study. In Machine Intelligence in Mechanical Engineering, 1st ed.; Palanikumar, K., Natarajan, E., Ramesh, S., Paulo Davim, J., Eds.; Woodhead Publishing: Cambridge, UK, 2024; Volume 1, pp. 91–122. [Google Scholar]
Is Machine Vision Surpassing the Human Eye for Accuracy? Available online: https://belmonteyecenter.com/is-machine-vision-surpassing-the-human-eye-for-accuracy/ (accessed on 2 April 2025).
Kurada, S.; Bradley, C. A review of machine vision sensors for tool condition monitoring. Comput. Ind. 1997, 34, 55–72. [Google Scholar] [CrossRef]
Charan, A.; Karthik Chowdary, C.; Komal, P. The Future of Machine Vision in Industries-A systematic review. In Proceedings of the IOP, Conf Ser: Mater Sci Eng, London, UK, 14 July 2022. [Google Scholar]
Mascagni, P.; Alapatt, D.; Sestini, L.; Altieri, M.S.; Madani, A.; Watanabe, Y.; Alseidi, A.; Redan, J.A.; Alfieri, S.; Costamagna, G.; et al. Computer vision in surgery: From potential to clinical value. Npj Digit. Med. 2022, 5, 163. [Google Scholar] [CrossRef]
Varoquaux, G.; Cheplygina, V. Machine learning for medical imaging: Methodological failures and recommendations for the future. Npj Digit. Med. 2022, 5, 48. [Google Scholar] [CrossRef]
Esteva, A.; Chou, K.; Yeung, S.; Naik, N.; Madani, A.; Mottaghi, A.; Liu, Y.; Topol, E.; Dean, J.; Socher, R. Deep learning-enabled medical computer vision. Npj Digit. Med. 2021, 4, 5. [Google Scholar] [CrossRef]
Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.M.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef]
Toolan, C.; Palmer, K.; Al-Rawi, O.; Ridgway, T.; Modi, P. Robotic mitral valve surgery: A review and tips for safely negotiating the learning curve. J. Thorac. Dis. 2021, 13, 1971–1981. [Google Scholar] [CrossRef]
Gajjar, H.; Sanyal, S.; Shah, M. A comprehensive study on lane detecting autonomous car using computer vision. Expert. Syst. Appl. 2023, 233, 120929. [Google Scholar] [CrossRef]
Janai, J.; Güney, F.; Behl, A.; Geiger, A. Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art. Found. Trends ®Comput. Graph. Vis. 2020, 12, 1–308. [Google Scholar] [CrossRef]
The Combined Power of Machine Vision Technology and Video Management Systems. Available online: https://www.computar.com/blog/the-combined-power-of-machine-vision-technology-and-video-management-systems (accessed on 2 April 2025).
Karthikeyan, R.; Karthik, S.; Saurav Menon, M. Vision based Intelligent Smart Security System. In Proceedings of the International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India, 8–9 October 2021. [Google Scholar]
Sivarai, D.; Rathika, P.D.; Vaishnavee, K.R.; Easwar, K.G.; Saranyazowri, P.; Hariprakash, R. Machine Vision based Intelligent Surveillance System. In Proceedings of the International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS), Coimbatore, India, 9–11 February 2023. [Google Scholar]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. In Proceedings of the ICLR, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Machine Vision a Growing Market Driven by Industrial and Automation Applications. Available online: https://www.yolegroup.com/press-release/machine-vision-a-growing-market-driven-by-industrial-and-automation-applications/ (accessed on 15 February 2025).
Javaid, M.; Haleem, A.; Singh, R.P.; Rab, S.; Suman, R. Exploring impact and features of machine vision for progressive industry 4.0 culture. Sens. Int. 2022, 3, 100132. [Google Scholar] [CrossRef]
Wu, W.-Q.; Wang, C.-F.; Han, S.-T.; Pan, C.-F. Recent advances in imaging devices: Image sensors and neuromorphic vision sensors. Rare Met. 2024, 43, 5487–5515. [Google Scholar] [CrossRef]
Sharma, I.; Vanshika. Evolution of Neuromorphic Computing with Machine Learning and Artificial Intelligence. In Proceedings of the IEEE 3rd Global Conference for Advancement in Technology (GCAT), Bangalore, India, 7–9 October 2022. [Google Scholar]
Gill, S.S.; Buyya, R. Transforming Research with Quantum Computing. J. Econ. Technol. 2026, 4, 1–8. [Google Scholar] [CrossRef]
Villalba-Diez, J.; Ordieres-Meré, J.; González-Marcos, A.; Larzabal, A.S. Quantum Deep Learning for Steel Industry Computer Vision Quality Control. IFAC-Pap. 2022, 55, 337–342. [Google Scholar] [CrossRef]
Viéville, T.; Clergue, E.; Enciso, R.; Mathieu, H. Experimenting with 3D vision on a robotic head. Robot. Auton. Syst. 1995, 14, 1–27. [Google Scholar] [CrossRef]
AI at the Edge: Transforming Machine Vision into Reality. Available online: https://www.intellectyx.com/ai-at-the-edge-transforming-machine-vision-into-reality/ (accessed on 14 February 2025).
Zhong, S.; Liu, Y.; Chen, Q. Visual orientation inhomogeneity based scale-invariant feature transform. Expert. Syst. Appl. 2015, 42, 5658–5667. [Google Scholar] [CrossRef]
Vardhan, A.H.; Verma, N.K.; Sevakula, R.K.; Salour, A. Unsupervised approach for object matching using Speeded Up Robust Features. In Proceedings of the Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 10–12 October 2015. [Google Scholar]
Gao, Q.; Lim, S.; Jia, X. Hyperspectral Image Classification Using Convolutional Neural Networks and Multiple Feature Learning. Remote Sens. 2018, 10, 299. [Google Scholar] [CrossRef]
Tempelaere, A.; De Ketelaere, B.; He, J.; Kalfas, I.; Pieters, M.; Saeys, W.; Van Belleghem, R.; Van Doorselaer, L.; Verboven, P.; Nicolaï, B.M. An introduction to artificial intelligence in machine vision for postharvest detection of disorders in horticultural products. Postharvest Biol. Technol. 2023, 206, 112576. [Google Scholar] [CrossRef]
Matsuzaka, Y.; Yashiro, R. AI-Based Computer Vision Techniques and Expert Systems. AI 2023, 4, 289–302. [Google Scholar] [CrossRef]
Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapakis, E. Deep Learning for Computer Vision: A Brief Review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef]
Khan, A.I.; Al-Habsi, S. Machine Learning in Computer Vision. Procedia Comput. Sci. 2020, 167, 1444–1451. [Google Scholar] [CrossRef]
Deng, F.; Huang, J.; Yuan, X.; Cheng, C.; Zhang, L. Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data. Lab. Investig. 2021, 101, 430–441. [Google Scholar] [CrossRef] [PubMed]
Dhanush, G.; Khatri, N.; Kumar, S.; Shukla, P.K. A comprehensive review of machine vision systems and artificial intelligence algorithms for the detection and harvesting of agricultural produce. Sci. Afr. 2023, 21, e01798. [Google Scholar] [CrossRef]
Kääriäinen, T.; Seppä, J. 3D camera based on laser light absorption by atmospheric oxygen at 761 nm. Opt. Express 2024, 32, 6342–6349. [Google Scholar] [CrossRef] [PubMed]
Shults, R.; Levin, E.; Aukazhiyeva, Z.; Pavelka, K.; Kulichenko, N.; Kalabaev, N.; Sagyndyk, M.; Akhmetova, N. A Study of the Accuracy of a 3D Indoor Camera for Industrial Archaeology Applications. Heritage 2023, 6, 6240–6267. [Google Scholar] [CrossRef]
Huang, T.; Zheng, Y.; Yu, Z.; Chen, R.; Li, Y.; Xiong, R.; Ma, L.; Zhao, J.; Dong, S.; Zhu, L.; et al. 1000× Faster Camera and Machine Vision with Ordinary Devices. Engineering 2023, 25, 110–119. [Google Scholar] [CrossRef]
Yang, Y.; Meng, X.; Gao, M. Vision System of Mobile Robot Combining Binocular and Depth Cameras. J. Sens. 2017, 2017, 4562934. [Google Scholar] [CrossRef]
Sergiyenko, O.; Tyrsa, V.; Flores-Fuentes, W.; Rodriguez-Quiñonez, J.; Mercorelli, P. Machine Vision Sensors. J. Sens. 2018, 2018, 3202761. [Google Scholar] [CrossRef]
Silva, C.A.d.S.; Paladini, E.P. Smart Machine Vision System to Improve Decision-Making on the Assembly Line. Machines 2025, 13, 98. [Google Scholar] [CrossRef]
Gierecker, J.; Schoepflin, D.; Schmedemann, O.; Schüppstuhl, T. Configuration and Enablement of Vision Sensor Solutions Through a Combined Simulation Based Process Chain. In Proceedings of the Annals of Scientific Society for Assembly, Handling and Industrial Robotics, Garbsen, Germany, 20 December 2021. [Google Scholar]
Lim, S.-J.; Leem, D.-S.; Park, K.-B.; Kim, K.-S.; Sul, S.; Na, K.; Lee, G.H.; Heo, C.-J.; Lee, K.-H.; Bulliard, X.; et al. Organic-on-silicon complementary metal–oxide–semiconductor colour image sensors. Sci. Rep. 2015, 5, 7708. [Google Scholar] [CrossRef] [PubMed]
Imanbekova, M.; Saridag, A.M.; Kahraman, M.; Liu, J.; Caglayan, H.; Wachsmann-Hogiu, S. Complementary Metal-Oxide-Semiconductor-Based Sensing Platform for Trapping, Imaging, and Chemical Characterization of Biological Samples. ACS Appl. Opt. Mater. 2023, 1, 329–339. [Google Scholar] [CrossRef]
Lesser, M. 3-Charge coupled device (CCD) image sensors. In High Performance Silicon Imaging, 1st ed.; Durini, D., Ed.; Woodhead Publishing: Cambridge, UK, 2014; Volume 1, pp. 78–97. [Google Scholar]
Chen, H.; Cui, W. A comparative analysis between active structured light and multi-view stereo vision technique for 3D reconstruction of face model surface. Optik 2020, 206, 164190. [Google Scholar] [CrossRef]
Karim, A.; Andersson, J.Y. Infrared detectors: Advances, challenges and new technologies. In Proceedings of the IOP Conference Series: Materials Science Engineering, Bandung, Indonesia, 8–10 March 2013. [Google Scholar]
Askar, C.; Sternberg, H. Use of Smartphone Lidar Technology for Low-Cost 3D Building Documentation with iPhone 13 Pro: A Comparative Analysis of Mobile Scanning Applications. Geomatics 2023, 3, 563–579. [Google Scholar] [CrossRef]
Cremons, D.R. The future of lidar in planetary science. Front. Remote Sens. 2022, 3, 1042460. [Google Scholar] [CrossRef]
Khonina, S.N.; Kazanskiy, N.L.; Oseledets, I.V.; Nikonorov, A.V.; Butt, M.A. Synergy between Artificial Intelligence and Hyperspectral Imagining—A Review. Technologies 2024, 12, 163. [Google Scholar] [CrossRef]
Hou, B.; Chen, Q.; Yi, L.; Sellin, P.; Sun, H.-T.; Wong, L.J.; Lui, X. Materials innovation and electrical engineering in X-ray detection. Nat. Rev. Electr. Eng. 2024, 1, 639–655. [Google Scholar] [CrossRef]
Bhargava, A.; Sachdeva, A.; Sharma, K.; Alsharif, M.H.; Uthansakul, P.; Uthansakul, M. Hyperspectral imaging and its applications: A review. Heliyon 2024, 10, e33208. [Google Scholar] [CrossRef]
Khan, M.A.; Sun, J.; Li, B.; Przybysz, A.; Kosel, J. Magnetic sensors-A review and recent technologies. Eng. Res. Express 2021, 3, 022005. [Google Scholar] [CrossRef]
Goodman, D.S. Illumination in machine vision. Opt. Soc. Am. Annu. Meet. 1991, 1, WB2. [Google Scholar]
Yan, M.T.; Surgenor, B.W. A Quantitative Study of Illumination Techniques for Machine Vision Based Inspection. In Proceedings of the International Manufacturing Science and Engineering Conference (MSEC), Corvallis, OR, USA, 13–17 June 2011. [Google Scholar]
Kumar, V.; Sudheesh Kumar, C.P. Investigation of the influence of coloured illumination on surface texture features: A Machine vision approach. Measurement 2020, 152, 107297. [Google Scholar] [CrossRef]
Chen, J.; Wang, M.; Hsia, C.-H. Artificial Intelligence and Machine Learning in Sensing and Image Processing. Sensors 2025, 25, 1870. [Google Scholar] [CrossRef] [PubMed]
Huang, L.; Yao, C.; Zhang, L.; Luo, S.; Ying, F.; Ying, W. Enhancing computer image recognition with improved image algorithms. Sci. Rep. 2024, 14, 13709. [Google Scholar] [CrossRef] [PubMed]
Huang, C.; Lim, C.-C.; Ming, C. Comparison of image processing algorithms and neural networks in machine vision inspection. Comput. Ind. Eng. 1992, 23, 105–108. [Google Scholar] [CrossRef]
Lu, Y.; Duanmu, L.; Zhai, Z.; Wang, Z. Application and improvement of Canny edge-detection algorithm for exterior wall hollowing detection using infrared thermal images. Energy Build. 2022, 274, 112421. [Google Scholar] [CrossRef]
Lynn, N.D.; Sourav, A.I.; Santoso, A.J. Implementation of Real-Time Edge Detection Using Canny and Sobel Algorithms. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Bristol, UK, 13 November 2021. [Google Scholar]
Feng, Y.; Zhao, H.; Li, X.; Zhang, X.; Li, H. A multi-scale 3D Otsu thresholding algorithm for medical image segmentation. Digit. Signal Process. 2017, 60, 186–199. [Google Scholar] [CrossRef]
Zhu, N.; Wang, G.; Yang, G.; Dai, W. A Fast 2D Otsu Thresholding Algorithm Based on Improved Histogram. In Proceedings of the 2009 Chinese Conference on Pattern Recognition, Nanjing, China, 4–6 November 2009. [Google Scholar]
Bansal, M.; Kumar, M.; Kumar, M. 2D object recognition: A comparative analysis of SIFT, SURF and ORB feature descriptors. Multimed. Tools Appl. 2021, 80, 18839–18857. [Google Scholar] [CrossRef]
Tareen, S.A.K.; Saleem, Z. A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK. In Proceedings of the 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, 3–4 March 2018. [Google Scholar]
Van Droogenbroeck, M.; Talbot, H. Fast computation of morphological operations with arbitrary structuring elements. Pattern Recognit. Lett. 1996, 17, 1451–1460. [Google Scholar] [CrossRef]
Lee, Y.H. Algorithms for Mathematical Morphological Operations with Flat Top Structuring Elements. In Proceedings of the Applications of Digital Image Processing VIII, SPIE, San Diego, CA, USA, 20–22 August 1985. [Google Scholar]
Gupta, A.; Sintorn, I.-M. Efficient high-resolution template matching with vector quantized nearest neighbour fields. Pattern Recognit. 2024, 151, 110386. [Google Scholar] [CrossRef]
Bergamini, L.; Sposato, M.; Peruzzini, M.; Vezzani, R.; Pellicciari, M. Deep Learning-Based Method for Vision-Guided Robotic Grasping of Unknown Objects. In Proceedings of the 25th ISPE Inc. International Conference on Transdisciplinary Engineering, Modena, Italy, 6–9 July 2018; Volume 1, pp. 281–290. [Google Scholar]
Walia, S. Light-operated On-chip Autonomous Vision Using Low-dimensional Material Systems. Adv. Mater. Technol. 2022, 7, 2101494. [Google Scholar] [CrossRef]
Hunter, D.B. Machine Vision Techniques for High Speed Videography. In Proceedings of the High-Speed Photography, Videography, and Photonics II, SPIE, Bellingham, WA, USA, 1–2 November 1984. [Google Scholar]
Are High-Resolution Event Cameras Really Needed? Available online: https://arxiv.org/abs/2203.14672 (accessed on 29 March 2022).
Kazanskiy, N.L.; Khonina, S.N.; Butt, M.A. Transforming high-resolution imaging: A comprehensive review of advances in metasurfaces and metalenses. Mater. Today Phys. 2025, 50, 101628. [Google Scholar] [CrossRef]
Cherian, A.K.; Poovammal, E. Classification of remote sensing images using CNN. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Bristol, UK, 13 November 2021. [Google Scholar]
Firsov, N.; Myasnikov, E.; Lobanov, V.; Khabibullin, R.; Kazanskiy, N.; Khonina, S.; Butt, M.A.; Nikonorov, A. HyperKAN: Kolmogorov–Arnold Networks Make Hyperspectral Image Classifiers Smarter. Sensors 2024, 24, 7683. [Google Scholar] [CrossRef]
Krichen, M. Convolutional Neural Networks: A Survey. Computers 2023, 12, 151. [Google Scholar] [CrossRef]
Tulbure, A.-A.; Tulbure, A.-A.; Dulf, E.-H. A review on modern defect detection models using DCNNs–Deep convolutional neural networks. J. Adv. Res. 2022, 35, 33–48. [Google Scholar] [CrossRef] [PubMed]
Lawrence, S.; Giles, C.L.; Tsoi, A.C.; Back, A.D. Face recognition: A convolutional neural-network approach. IEEE Trans. Neural Netw. 1997, 8, 98–113. [Google Scholar] [CrossRef] [PubMed]
Sharma, N.; Jain, V.; Mishra, A. An Analysis Of Convolutional Neural Networks For Image Classification. Procedia Comput. Sci. 2018, 132, 377–384. [Google Scholar] [CrossRef]
Benefits and Applications of AI-Powered Machine Vision. Available online: https://www.micropsi-industries.com/blog/benefits-and-applications-of-ai-powered-machine-vision (accessed on 29 March 2025).
Van der Velden, B.H.M.; Kuijf, H.J.; Gilhuijs, K.G.A.; Viergever, M.A. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med. Image Anal. 2022, 79, 102470. [Google Scholar] [CrossRef]
Li, J.Q.; Dukes, P.V.; Lee, W.; Sarkis, M.; Vo-Dinh, T. Machine learning using convolutional neural networks for SERS analysis of biomarkers in medical diagnostics. J. Raman Spectrosc. 2022, 53, 2044–2057. [Google Scholar] [CrossRef]
Lodhi, S.K.; Gill, A.Y.; Hussain, I. AI-Powered Innovations in Contemporary Manufacturing Procedures: An Extensive Analysis. Int. J. Multidiscip. Sci. Arts 2024, 3, 15–25. [Google Scholar] [CrossRef]
Rashid, A.B.; Kausik, M.A.K. AI revolutionizing industries worldwide: A comprehensive overview of its diverse applications. Hybrid. Adv. 2024, 7, 100277. [Google Scholar] [CrossRef]
Aggarwal, C.C. Outlier Analysis, 2nd ed.; Springer International Publishing: Cham, Switzerland, 2017; p. 481. [Google Scholar]
Khalifa, M.; Albadawy, M. AI in diagnostic imaging: Revolutionising accuracy and efficiency. Comput. Methods Programs Biomed. Update 2024, 5, 100146. [Google Scholar] [CrossRef]
Al-Antari, M.A. Artificial Intelligence for Medical Diagnostics—Existing and Future AI Technology! Diagnostics 2023, 13, 688. [Google Scholar] [CrossRef] [PubMed]
AI in Logistics: Uncovering More Major Benefits and Use Cases. Available online: https://litslink.com/blog/ai-in-logistics-uncovering-more-major-benefits-and-use-cases (accessed on 31 March 2025).
Ghonasgi, K.; Kaveny, K.J.; Langlois, D.; Sigurðarson, L.D.; Swift, T.A.; Wheeler, J.; Young, A.J. The case against machine vision for the control of wearable robotics: Challenges for commercial adoption. Sci. Robot. 2025, 10, eadp5005. [Google Scholar] [CrossRef] [PubMed]
Mercedes-Benz Accelerates AI and Robotics at Berlin-Marienfelde, Transforming Digital Production with Humanoid Robots and Next-Generation Automation Technologies. Available online: https://www.automotivemanufacturingsolutions.com/robotics/mercedes-benz-advances-ai-and-robotics-in-production/46909.article (accessed on 31 March 2025).
BMW Taps Humanoid Startup Figure to Take on Tesla’s Robot. Available online: https://www.reuters.com/business/autos-transportation/bmw-taps-humanoid-startup-figure-take-teslas-robot-2024-01-18/ (accessed on 31 March 2025).
How Autonomous Robots are Transforming Logistics. Available online: https://www.reisopack.com/en/how-autonomous-robots-are-transforming-logistics/ (accessed on 31 March 2025).
Tan, H. Line inspection logistics robot delivery system based on machine vision and wireless communication. In Proceedings of the 2020 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), Chongqing, China, 29–30 October 2020. [Google Scholar]
Valero, S.; Martinez, J.C.; Montes, A.M.; Marin, C.; Bolanos, R.; Alvarez, D. Machine vision-assisted design of end effector pose in robotic mixed depalletizing of heterogeneous cargo. Sensors 2025, 25, 1137. [Google Scholar] [CrossRef] [PubMed]
Fang, J.; Lu, X.; Feng, X.; Zhang, Y. Research into an intelligent logistics handling robot based on front-end machine vision. In Proceedings of the 2024 5th International conference on computer engineering and application (ICCEA), Hangzhou, China, 12–14 April 2024. [Google Scholar]
Sharma, K.; Shivandu, S.K. Integrating artificial intelligence and Internet of Things (IoT) for enhanced crop monitoring and management in precision agriculture. Sens. Int. 2024, 5, 100292. [Google Scholar] [CrossRef]
Guebsi, R.; Mami, S.; Chokmani, K. Drones in Precision Agriculture: A Comprehensive Review of Applications, Technologies, and Challenges. Drones 2024, 8, 686. [Google Scholar] [CrossRef]
Profili, A.; Magherini, R.; Servi, M.; Spezia, F.; Gemmiti, D.; Volpe, Y. Machine vision system for automatic defect detection of ultrasound probes. Int. J. Adv. Manuf. Technol. 2024, 135, 3421–3435. [Google Scholar] [CrossRef]
Li, L.; Li, S.; Wang, W.; Zhang, J.; Sun, Y.; Deng, Q.; Zheng, T.; Lu, J.; Gao, W.; Yang, M.; et al. Adaptative machine vision with microsecond-level accurate perception beyond human retina. Nat. Commun. 2024, 15, 6261. [Google Scholar] [CrossRef]
Dodda, A.; Jayachandran, D.; Subbulakshmi Radhakrishnan, S.; Pannone, A.; Zhang, Y.; Trainor, N.; Redwing, J.M.; Das, S. Bioinspired and Low-Power 2D Machine Vision with Adaptive Machine Learning and Forgetting. ACS Nano 2022, 16, 20010–20020. [Google Scholar] [CrossRef]
Ibn-Khedher, H.; Laroui, M.; Mabrouk, M.B.; Moungla, H.; Afifi, H.; Oleari, A.N. Edge Computing Assisted Autonomous Driving Using Artificial Intelligence. In Proceedings of the 2021 International Wireless Communications and Mobile Computing (IWCMC), Beijing, China, 28 June–2 July 2021. [Google Scholar]
Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge Computing: Vision and Challenges. IEEE Internet Things J. 2016, 3, 637–646. [Google Scholar] [CrossRef]
Verde Romero, D.A.; Villalvazo Laureano, E.; Jiménez Betancourt, R.O.; Navarro Álvarez, E. An open source IoT edge-computing system for monitoring energy consumption in buildings. Results Eng. 2024, 21, 101875. [Google Scholar] [CrossRef]
Manogaran, N.; Nandagopal, M.; Abi, N.E.; Seerangan, K.; Balusamy, B.; Selvarajan, S. Integrating meta-heuristic with named data networking for secure edge computing in IoT enabled healthcare monitoring system. Sci. Rep. 2024, 14, 21532. [Google Scholar] [CrossRef] [PubMed]
Zacchigna, F.G. Methodology for CNN Implementation in FPGA-Based Embedded Systems. IEEE Embed. Syst. Lett. 2023, 15, 85–88. [Google Scholar] [CrossRef]
Vasile, C.-E.; Ulmămei, A.-A.; Bîră, C. Image Processing Hardware Acceleration—A Review of Operations Involved and Current Hardware Approaches. J. Imaging 2024, 10, 298. [Google Scholar] [CrossRef]
Sailesh, M.; Selvakumar, K.; Narayanan, P. A novel framework for deployment of CNN models using post-training quantization on microcontroller. Microprocess. Microsyst. 2022, 94, 104634. [Google Scholar]
Canpolat Şahin, M.; Kolukısa Tarhan, A. Evaluation and Selection of Hardware and AI Models for Edge Applications: A Method and A Case Study on UAVs. Appl. Sci. 2025, 15, 1026. [Google Scholar] [CrossRef]
Wu, L.; Xiao, G.; Huang, D.; Zhang, X.; Ye, D.; Weng, H. Edge Computing-Based Machine Vision for Non-Invasive and Rapid Soft Sensing of Mushroom Liquid Strain Biomass. Agronomy 2025, 15, 242. [Google Scholar] [CrossRef]
Akundi, A.; Reyna, M. A Machine Vision Based Automated Quality Control System for Product Dimensional Analysis. Procedia Comput. Sci. 2021, 185, 127–134. [Google Scholar] [CrossRef]
Sioma, A. Vision System in Product Quality Control Systems. Appl. Sci. 2023, 13, 751. [Google Scholar] [CrossRef]
Ivaschenko, A.; Avsievich, V.; Reznikov, Y.; Belikov, A.; Turkova, V.; Sitnikov, P.; Surnin, O. Intelligent Machine Vision Implementation for Production Quality Control. In Proceedings of the 2023 34th Conference of Open Innovations Association (FRUCT), Riga, Latvia, 15–17 November 2023. [Google Scholar]
Xiao, Z.; Wang, J.; Han, L.; Guo, S.; Cui, Q. Application of Machine Vision System in Food Detection. Front. Nutr. 2022, 9, 888245. [Google Scholar] [CrossRef] [PubMed]
Zhao, Z.; Wang, R.; Liu, M.; Bai, L.; Sun, Y. Application of machine vision in food computing: A review. Food Chem. 2025, 463, 141238. [Google Scholar] [CrossRef] [PubMed]
Tzampazaki, M.; Zografos, C.; Vrochidou, E.; Papakostas, G.A. Machine Vision—Moving from Industry 4.0 to Industry 5.0. Appl. Sci. 2024, 14, 1471. [Google Scholar] [CrossRef]
Gao, X. Artificial intelligence applied to supermarket intelligent replenishment robot based on machine vision. In Proceedings of the 2023 Asia-Europe Conference on Electronics, Data Processing and Informatics (ACEDPI), Prague, Czech Republic, 17–19 April 2023. [Google Scholar]
Yang, R.; Jiang, Q. Research on the application of machine vision technology in industrial automation assembly line. In Proceedings of the International Conference on Mechatronics and Intelligent Control (ICMIC 2024), Wuhan, China, 20–22 September 2024. [Google Scholar]
Ali, Y.; Shah, S.W.; Arif, A.; Tlija, M.; Siddiqi, M.R. Intelligent Framework Design for Quality Control in Industry 4.0. Appl. Sci. 2024, 14, 7726. [Google Scholar] [CrossRef]
Rana, M.; Bhushan, M. Machine learning and deep learning approach for medical image analysis: Diagnosis to detection. Multimed. Tools Appl. 2023, 82, 26731–26769. [Google Scholar] [CrossRef]
Pinto-Coelho, L. How Artificial Intelligence Is Shaping Medical Imaging Technology: A Survey of Innovations and Applications. Bioengineering 2023, 10, 1435. [Google Scholar] [CrossRef]
Hassan, C.; Spadaccini, M.; Iannone, A.; Maselli, R.; Jovani, M.; Chandrasekar, V.T.; Antonelli, G.; Yu, H.; Areia, M.; Dinis-Ribeiro, M.; et al. Performance of artificial intelligence in colonoscopy for adenoma and polyp detection: A systematic review and meta-analysis. Gastrointest. Endosc. 2021, 93, 77–85. [Google Scholar] [CrossRef]
Van Leeuwen, K.G.; Schalekamp, S.; Rutten, M.J.C.M.; van Ginneken, B.; de Rooij, M. Artificial intelligence in radiology: 100 commercially available products and their scientific evidence. Eur. Radiol. 2021, 31, 3797–3804. [Google Scholar] [CrossRef]
Penza, V.; De Momi, E.; Enayati, N.; Chupin, T.; Ortiz, J.; Mattos, L.S. EnViSoRS: Enhanced Vision System for Robotic Surgery. A User-Defined Safety Volume Tracking to Minimize the Risk of Intraoperative Bleeding. Front. Robot. AI 2017, 4, 00015. [Google Scholar] [CrossRef]
Rosen, C.A. Machine Vision and Robotics: Industrial Requirements. In Computer Vision and Sensor-Based Robots, 1st ed.; Doll, G.G., Rossol, L., Eds.; Springer: New York, NY, USA, 1979; Volume 1, pp. 3–22. [Google Scholar]
The Use of Machine Vision for Robot Guidance Offers New Possibilities. Available online: https://www.qualitymag.com/articles/97057-the-use-of-machine-vision-for-robot-guidance-offers-new-possibilities (accessed on 1 April 2025).
Das, S.; Das, I.; Shaw, R.N.; Ghosh, A. Chapter Seven-Advance machine learning and artificial intelligence applications in service robot. Artif. Intell. Future Gener. Robot. 2021, 1, 83–91. [Google Scholar]
Do, Y.; Kim, G.; Kim, J. Omnidirectional vision system developed for a home service robot. In Proceedings of the 2007 14th International Conference on Mechatronics and Machine Vision in Practice, Xiamen, China, 4–6 December 2007. [Google Scholar]
Grigorescu, S.M.; Prenzel, O.; Gräser, A. Model driven developed machine vision system for service robotics. In Proceedings of the 2010 12th International Conference on Optimization of Electrical and Electronic Equipment, Brasov, Romania, 20–22 May 2010. [Google Scholar]
Wang, L.; Schmidt, B.; Nee, A.Y.C. Vision-guided active collision avoidance for human-robot collaborations. Manuf. Lett. 2013, 1, 5–8. [Google Scholar] [CrossRef]
Wei, Z.; Tian, F.; Qiu, Z.; Yang, Z.; Zhan, R.; Zhan, J. Research on Machine Vision-Based Control System for Cold Storage Warehouse Robots. Actuators 2023, 12, 334. [Google Scholar] [CrossRef]
Abba, S.; Bizi, A.M.; Lee, J.-A.; Bakouri, S.; Crespo, M.L. Real-time object detection, tracking, and monitoring framework for security surveillance systems. Heliyon 2024, 10, e34922. [Google Scholar] [CrossRef] [PubMed]
Attard, L.; Farrugia, R.A. Vision based surveillance system. In Proceedings of the 2011 IEEE EUROCON-International Conference on Computer as a Tool, Lisbon, Portugal, 27–29 April 2011. [Google Scholar]
Nurnoby, M.F.; Helmy, T. A Real-Time Deep Learning-based Smart Surveillance Using Fog Computing: A Complete Architecture. Procedia Comput. Sci. 2023, 218, 1102–1111. [Google Scholar] [CrossRef]
Nigam, R.; Kundu, A.; Yu, X.; Saniie, J. Machine Vision Surveillance System-Artificial Intelligence For COVID-19 Norms. In Proceedings of the 2022 IEEE International Conference on Electro Information Technology (eIT), Mankato, MN, USA, 19–21 May 2022. [Google Scholar]
Khan, H.; Thakur, J.S. Smart traffic control: Machine learning for dynamic road traffic management in urban environments. Multimed. Tools Appl. 2024, 84, 10321–10345. [Google Scholar] [CrossRef]
Zhang, Y. Safety Management of Civil Engineering Construction Based on Artificial Intelligence and Machine Vision Technology. Adv. Civ. Eng. 2021, 2021, 1–14. [Google Scholar] [CrossRef]
Ghazal, S.; Munir, A.; Qureshi, W.S. Computer vision in smart agriculture and precision farming: Techniques and applications. Artif. Intell. Agric. 2024, 13, 64–83. [Google Scholar] [CrossRef]
Shin, J.; Mahmud, M.S.; Rehman, T.U.; Ravichandran, P.; Heung, B.; Chang, Y.K. Trends and Prospect of Machine Vision Technology for Stresses and Diseases Detection in Precision Agriculture. AgriEngineering 2023, 5, 20–39. [Google Scholar] [CrossRef]
Kim, Y.; Glenn, D.M.; Park, J.; Ngugi, H.K.; Lehman, B.L. Hyperspectral image analysis for water stress detection of apple trees. Comput. Electron. Agric. 2011, 77, 155–160. [Google Scholar] [CrossRef]
Paes de Melo, B.; Carpinetti, P.d.A.; Fraga, O.T.; Rodrigues-Silva, P.L.; Fioresi, V.S.; de Camargos, L.F.; Flores da Silva Ferreira, M. Abiotic Stresses in Plants and Their Markers: A Practice View of Plant Stress Responses and Programmed Cell Death Mechanisms. Plants 2022, 11, 1100. [Google Scholar] [CrossRef]
Satheeshkumar, S.K.; Paolini, C.; Sarkar, M. Subsurface Heat stress detection in plants using machine learning regression models. In Proceedings of the 2023 International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS), Valencia, Spain, 19–22 June 2023. [Google Scholar]
Tian, Z.; Ma, W.; Yang, Q.; Duan, F. Application status and challenges of machine vision in plant factory—A review. Inf. Process. Agric. 2022, 9, 195–211. [Google Scholar] [CrossRef]
Walsh, J.J.; Mangina, E.; Negrão, S. Advancements in Imaging Sensors and AI for Plant Stress Detection: A Systematic Literature Review. Plant Phenomics 2024, 6, 0153. [Google Scholar] [CrossRef]
Foucher, P.; Revollon, P.; Vigouroux, B.; Chassériaux, G. Morphological Image Analysis for the Detection of Water Stress in Potted Forsythia. Biosyst. Eng. 2004, 89, 131–138. [Google Scholar] [CrossRef]
Chung, S.; Breshears, L.E.; Yoon, J.-Y. Smartphone near infrared monitoring of plant stress. Comput. Electron. Agric. 2018, 154, 93–98. [Google Scholar] [CrossRef]
Ghosal, S.; Blystone, D.; Singh, A.K.; Ganapathysubramanian, B.; Singh, A.; Sarkar, S. An explainable deep machine vision framework for plant stress phenotyping. Proc. Natl. Acad. Sci. USA 2018, 115, 4613–4618. [Google Scholar] [CrossRef]
Karthickmanoj, R.; Sasilatha, T.; Padmapriya, J. Automated machine learning based plant stress detection system. Mater. Today Proc. 2021, 47, 1887–1891. [Google Scholar] [CrossRef]
De Lucia, G.; Lapegna, M.; Romano, D. Towards explainable AI for hyperspectral image classification in Edge Computing environments. Comput. Electr. Eng. 2022, 103, 108381. [Google Scholar] [CrossRef]
Pfenning, A.; Yan, X.; Gitt, S.; Fabian, J.; Lin, B.; Witt, D.; Afifi, A.; Azem, A.; Darcie, A.; Wu, J.; et al. A perspective on silicon photonic quantum computing with spin qubits. In Proceedings of the Silicon Photonics XVII, San Francisco, CA, USA, 22–24 February 2022. [Google Scholar]
El Srouji, L.; Krishnan, A.; Ravichandran, R.; Lee, Y.; On, M.; Xiao, X.; Ben Yoo, S.J. Photonic and optoelectronic neuromorphic computing. APL Photonics 2022, 7, 051101. [Google Scholar] [CrossRef]
Yang, W.; Wei, Y.; Wei, H.; Chen, Y.; Huang, G.; Li, X.; Lo, R.; Yao, N.; Waang, X.; Gu, X.; et al. Survey on Explainable AI: From Approaches, Limitations and Applications Aspects. Hum.-Cent. Intell. Syst. 2023, 3, 161–188. [Google Scholar] [CrossRef]
Przybył, K. Explainable AI: Machine Learning Interpretation in Blackcurrant Powders. Sensors 2024, 24, 3198. [Google Scholar] [CrossRef]
Liu, G.; Zhang, J.; Chan, A.B.; Hsiao, J.H. Human attention guided explainable artificial intelligence for computer vision models. Neural Netw. 2024, 177, 106392. [Google Scholar] [CrossRef]
Shchanikov, S.; Bordanov, I.; Kucherik, A.; Gryaznov, E.; Mikhaylov, A. Neuromorphic Analog Machine Vision Enabled by Nanoelectronic Memristive Devices. Appl. Sci. 2023, 13, 13309. [Google Scholar] [CrossRef]
Wang, H.; Sun, B.; GE, S.S.; Su, J.; Jin, M.L. On non-von Neumann flexible neuromorphic vision sensors. Npj Flex. Electron. 2024, 8, 28. [Google Scholar] [CrossRef]
Imran, A.; He, X.; Tabassum, H.; Zhu, Q.; Dastgeer, G.; Liu, J. Neuromorphic Vision Sensor driven by Ferroelectric HfAlO. Mater. Today Nano 2024, 26, 100473. [Google Scholar] [CrossRef]
Schuman, C.D.; Kulkarni, S.R.; Parsa, M.; Mitchell, J.P.; Date, P.; Kay, B. Opportunities for neuromorphic computing algorithms and applications. Nat. Comput. Sci. 2022, 2, 10–19. [Google Scholar] [CrossRef] [PubMed]
Subramaniam, A. A neuromorphic approach to image processing and machine vision. In Proceedings of the 2017 Fourth International Conference on Image Information Processing (ICIIP), Shimla, India, 21–23 December 2017. [Google Scholar]
Kösters, D.J.; Kortman, B.A.; Boybat, I.; Ferro, E.; Dolas, S.; de Austri, R.R.; Kwisthout, J.; Hilgenkamp, H.; Rasing, T.; Riel, H.; et al. Benchmarking energy consumption and latency for neuromorphic computing in condensed matter and particle physics. APL Mach. Learn. 2023, 1, 016101. [Google Scholar] [CrossRef]
Wang, Y.; Wen, W.; Song, L.; Li, H.H. Classification accuracy improvement for neuromorphic computing systems with one-level precision synapses. In Proceedings of the 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), Chiba, Japan, 16–19 January 2017. [Google Scholar]
Ji, Y.; Zhang, Y.; Li, S.; Chi, P.; Jiang, C.; Qu, P. NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Taipei, Taiwan, 15–19 October 2016. [Google Scholar]
Ji, Y.; Wang, L.; Long, Y.; Wang, J.; Zheng, H.; Yu, Z.G.; Zhang, Y.-W.; Ang, K.-W. Ultralow energy adaptive neuromorphic computing using reconfigurable zinc phosphorus trisulfide memristors. Nat. Commun. 2025, 16, 6899. [Google Scholar] [CrossRef]
Bai, Q.; Hu, X. Superposition-enhanced quantum neural network for multi-class image classification. Chin. J. Phys. 2024, 89, 378–389. [Google Scholar] [CrossRef]
Ruiz, F.J.R.; Laakkonen, T.; Bausch, J.; Balog, M.; Barekatain, M.; Heras, F.J.H.; Novikov, A.; Fitzpatrick, N.; Romera-Paredes, B.; van de Wetering, J.; et al. Quantum circuit optimization with AlphaTensor. Nat. Mach. Intell. 2025, 7, 374–385. [Google Scholar] [CrossRef]
Sciorilli, M.; Borges, L.; Patti, T.L.; García-Martín, D.; Camilo, G.; Anandkumar, A.; Aolita, L. Towards large-scale quantum optimization solvers with few qubits. Nat. Commun. 2025, 16, 476. [Google Scholar] [CrossRef]
Blekos, K.; Brand, D.; Ceschini, A.; Chou, C.-H.; Li, R.-H.; Pandya, K.; Summer, A. A review on Quantum Approximate Optimization Algorithm and its variants. Phys. Rep. 2024, 1068, 1–66. [Google Scholar] [CrossRef]
Fernandes, A.O.; Moreira, L.F.E.; Mata, J.M. Machine vision applications and development aspects. In Proceedings of the 2011 9th IEEE International Conference on Control and Automation (ICCA), Santiago, Chile, 19–21 December 2011. [Google Scholar]
Mohaideen Abdul Kadhar, K.; Anand, G. Challenges in Machine Vision System. In Industrial Vision Systems with Raspberry Pi, 1st ed.; Asadi, F., Ed.; Apress: New York, NY, USA, 2024; Volume 4, pp. 73–86. [Google Scholar]
Waelen, R.A. The ethics of computer vision: An overview in terms of power. AI Ethics 2024, 4, 353–362. [Google Scholar] [CrossRef]
Zhang, B.H.; Lemoine, B.; Mitchell, M. Mitigating unwanted biases with adversarial learning. In Proceedings of the AIES’ 18: 2018 AAAI/ACM Conference on AI, Ethics, and Society, New Orleans, LA, USA, 2–3 February 2018. [Google Scholar]
Hardt, M.; Price, E.; Srebro, N. Equality of opportunity in supervised learning. In Proceedings of the 30th International conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3323–3331. [Google Scholar]
Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. 2021, 54, 1–35. [Google Scholar] [CrossRef]
Hanna, M.G.; Pantanowitz, L.; Jackson, B.; Palmer, O.; Visweswaran, S.; Pantanowitz, J.; Deebajah, M.; Rashidi, H.H. Ethical and Bias Considerations in Artificial Intelligence/Machine Learning. Mod. Pathol. 2025, 38, 100686. [Google Scholar] [CrossRef]
Kashyapa, R. How Expensive Are Machine Vision Solutions? Available online: https://qualitastech.com/image-acquisition/how-expensive-is-machine-vision-solution/ (accessed on 28 March 2025).
Würschinger, H.; Mühlbauer, M.; Winter, M.; Engelbrecht, M.; Hanenkamp, N. Implementation and potentials of a machine vision system in a series production using deep learning and low-cost hardware. Procedia CIRP 2020, 90, 611–616. [Google Scholar] [CrossRef]
Using Artificial Intelligence in Machine Vision. Available online: https://www.cognex.com/what-is/edge-learning/using-ai-in-machine-vision (accessed on 28 March 2025).
Malik, P.; Pathania, M.; Rathaur, V.K. Overview of artificial intelligence in medicine. J. Fam. Med. Prim. Care 2019, 8, 2328. [Google Scholar] [CrossRef]
Kitaguchi, D.; Takeshita, N.; Hasegawa, H.; Ito, M. Artificial intelligence-based computer vision in surgery: Recent advances and future perspectives. Ann. Gastroenterol. Surg. 2022, 6, 29–36. [Google Scholar] [CrossRef]
Sinha, S.; Lee, Y.M. Challenges with developing and deploying AI models and applications in industrial systems. Discov. Artif. Intell. 2024, 4, 55. [Google Scholar] [CrossRef]
Christensen, D.V.; Dittmann, R.; Linares-Barranco, B.; Sebastian, A.; Gallo, M.L.; Redaelli, A.; Slesazeck, S.; Mikolajick, T.; Spiga, S.; Menzel, S.; et al. 2022 roadmap on neuromorphic computing and engineering. Neuromorph. Comput. Eng. 2022, 2, 022501. [Google Scholar] [CrossRef]

Figure 1. Industrial and automation camera market forecast by application [24].

Figure 2. Characteristics, applications, advantages, and disadvantages of MV, and the role of AI in MV.

Figure 3. Components of the MV system.

Figure 4. Vidar camera and supervision system applications. (a) Referee assistance: the spike camera determines ball placement. (b) Ball landing sequence at 100 km/h, showing 6 of 170 frames with highlighted ball and boundary. (c) Target tracking setup: a laser hits a specific character on a rotating fan. (d) Fan before and after 64 laser pulses hitting character “K.” (e) SNN recognition test: the correct category neuron spikes the most. (f) Multi-object tracking: y-axis shows the polar angle of objects relative to the fan’s center, with real-time SNN masks and bounding boxes. (g) Performance evaluation: tracking high-speed motion. 1 Mach = 340.3 m/s [43].

Figure 5. The suggested 3D reconstruction system [44].

Figure 6. Difference between human vision vs. MV.

Figure 7. (a) MV system, (b) proposed intelligent network. Inspired by [123].

Figure 8. EnViSoRS: Enhanced Vision System for Enhancing Safety in Robotic Surgery, Integrated with the dVRK System (WPI and Johns Hopkins University). Using the master console, the surgeon can (i) designate the SA with a stylus and graphics tablet, (ii) observe the SV superimposed on the images, and (iii) utilize a graphical gauge to track the distance between the instruments and the 3D surface of the SV [128].

Figure 9. (a) Experimental setup, (b) false positives and false negatives [135].

Figure 10. Smart Recognition System for Civil Engineering Safety Management [141].

Figure 11. Concept of pest and disease identification using MV [147].

Figure 12. Process of applying MV in fruit grading [147].

Table 1. Key characteristics of sensors used in MV.

Sensor Type	Function	Key Characteristics	Applications
CMOS (Complementary Metal-Oxide-Semiconductor) Sensor [48,49]	Converts light into electrical signals	Low power consumption, high-speed processing, cost-effective	General MV, industrial inspection, robotics
CCD (Charge-Coupled Device) Sensor [50]	Captures high-quality images with low noise	High image quality, low noise, higher power consumption	High-precision measurement, scientific imaging
3D Sensors (Time-of-Flight, Structured Light, Stereo Vision) [51]	Captures depth information for 3D imaging	Measures object distance, accurate depth perception	Object recognition, bin picking, gesture recognition
Infrared (IR) Sensors [52]	Detects heat signatures and thermal variations	Works in low light, captures non-visible wavelengths	Night vision, defect detection, surveillance
LIDAR (Light Detection and Ranging) Sensors [53,54,55]	Measures distance using laser reflection	High accuracy, long-range detection	Autonomous vehicles, terrain mapping, 3D scanning
X-ray Sensors [56]	Penetrates objects to capture internal structure	Non-destructive testing, detects internal defects	Medical imaging, baggage scanning, industrial inspection
Hyperspectral Sensors [57]	Captures data across multiple wavelengths	Identifies materials, chemical composition analysis	Agriculture, pharmaceutical inspection, food quality control
Magnetic Sensors [58]	Detects metallic components in objects	Measures magnetic fields, high sensitivity	Industrial automation, position sensing, defect detection

Table 2. Types of MV lighting.

Lighting Type	Description
Fibre Optic	Delivers light from halogen, tungsten-halogen, or xenon sources and provides bright, shapeable, and focused illumination.
LED Lights	Widely adopted for their fast response, ability to pulse or strobe (which freezes motion), flexible mounting options, long operational life, and consistent light output.
Dome Lights	Deliver omnidirectional illumination that minimizes glare and reflections, making them well suited for inspecting parts with complex or curved geometries.
Telecentric Lighting	Best suited for high-precision tasks like edge detection and flaw identification on shiny or reflective surfaces, where measurement accuracy is critical.
Diffused Light	Employs scattering filters to soften illumination, producing uniform lighting and reducing hotspots or uneven brightness on reflective materials.
Direct Light	Delivers light along the same optical path as the camera, ensuring direct illumination.

Table 3. Image processing algorithms and AI-driven techniques.

Characteristic	Image Processing Algorithms	AI Techniques (Neural Networks, DL)
Approach	Rule-based, deterministic processing	Data-driven, adaptive learning
Common Methods	Edge Detection using Canny and Sobel algorithms [65,66] Thresholding, including Otsu’s method and adaptive thresholding [67,68] Feature extraction using SIFT, SURF, and ORB [69,70] Morphological operations such as erosion and dilation [71,72] Template Matching [73,74]	CNNs for image classification and segmentation [34] RNNs for sequential/temporal analysis FCNs for segmentation GANs for data augmentation and synthetic image generation
Development Time	Shorter, requires manual feature design	Longer, involves extensive training
Accuracy	Moderate, depends on predefined rules	Higher, learns complex patterns
Flexibility	Limited, needs manual adjustments	High, adapts to varying conditions
Computational Complexity	Lower, efficient for simple tasks	Higher, requires more processing power
Training Requirement	None, operates on fixed rules	Requires large annotated datasets
Performance in Complex Environments	Struggles with variations in lighting, noise, and occlusion	Robust against variations and distortions
Interpretability	High, decisions are explainable	Lower, functions as a “black box”
Adaptability	Low, requires reprogramming for new tasks	High, generalizes across tasks
Real-Time Processing	Faster, suitable for immediate analysis	Slower, depends on hardware optimization

Table 4. Neuromorphic vs. Classical CNNs for MV.

Aspect	Classical CNNs	Neuromorphic (Spiking Neural Networks)	Trade-Off/Remarks
Latency	High (batch processing; ms–s scale)	Ultra-low (event-driven; µs–ms scale) [164]	Neuromorphic excels in real-time scenarios (e.g., drones, robotics).
Power consumption	Moderate–High (GPU/TPU intensive)	Very low (event-driven, sparse coding) [164]	Neuromorphic favored for IoT/edge devices with energy constraints.
Accuracy	Mature, state-of-the-art benchmarks	Emerging; often lags behind CNNs on complex datasets [165]	CNNs still outperform on accuracy, but gap is narrowing.
Hardware support	Widely available (GPU/TPU, CPUs)	Limited (Loihi, TrueNorth, SpiNNaker) [166]	Neuromorphic hardware is still niche and less accessible.
Adaptability	Strong with large training data	Good for temporal/event data, but less robust to large-scale supervised tasks [167]	Hybrid architectures may combine strengths.

Table 5. Edge vs. Cloud MV Processing.

Criterion	Edge Processing	Cloud Processing	Trade-Off/Remarks
Latency	Very low, real time on device	Higher, due to network delays	Edge essential for safety-critical tasks (e.g., autonomous driving).
Computational power	Limited (mobile CPUs, TPUs, FPGAs)	Virtually unlimited in datacenters	Cloud supports deep and complex models.
Energy use	Device power drain unless optimized	Energy offloaded to datacenters	Edge energy-efficient only with tailored hardware.
Privacy & security	Data remains local, higher privacy	Requires data transfer, higher risks	Edge aligns with GDPR/healthcare compliance.
Scalability	Limited by device capacity	Scales easily across many users/devices	Cloud better for global-scale analytics.
Cost	Higher upfront device hardware costs	Lower device cost, ongoing service fees	Best choice depends on deployment scale.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khonina, S.N.; Kazanskiy, N.L.; Oseledets, I.V.; Khabibullin, R.M.; Nikonorov, A.V. Eyes of the Future: Decoding the World Through Machine Vision. Technologies 2025, 13, 507. https://doi.org/10.3390/technologies13110507

AMA Style

Khonina SN, Kazanskiy NL, Oseledets IV, Khabibullin RM, Nikonorov AV. Eyes of the Future: Decoding the World Through Machine Vision. Technologies. 2025; 13(11):507. https://doi.org/10.3390/technologies13110507

Chicago/Turabian Style

Khonina, Svetlana N., Nikolay L. Kazanskiy, Ivan V. Oseledets, Roman M. Khabibullin, and Artem V. Nikonorov. 2025. "Eyes of the Future: Decoding the World Through Machine Vision" Technologies 13, no. 11: 507. https://doi.org/10.3390/technologies13110507

APA Style

Khonina, S. N., Kazanskiy, N. L., Oseledets, I. V., Khabibullin, R. M., & Nikonorov, A. V. (2025). Eyes of the Future: Decoding the World Through Machine Vision. Technologies, 13(11), 507. https://doi.org/10.3390/technologies13110507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Eyes of the Future: Decoding the World Through Machine Vision

Abstract

1. Introduction

2. Fundamentals of MV

2.1. Basic Components of MV

2.1.1. Cameras

2.1.2. Sensors

2.1.3. Lighting

2.1.4. Software

2.2. Comparison with Human Vision

2.3. Role of AI and DL in MV

2.3.1. Feature Extraction and Classification

2.3.2. Pattern Recognition and Anomaly Detection

2.3.3. Autonomous Decision-Making

2.3.4. Adaptive Learning

2.3.5. Edge Computing and Real-Time Processing

3. Applications of MV

3.1. Industrial Automation and Quality Control

3.2. Medical Imaging and Diagnostics

3.3. Autonomous Vehicles and Robotics

3.4. Security and Surveillance

3.5. Agriculture and Environmental Monitoring

4. Future Trends and Research Directions

4.1. Short Term (1–3 Years): Explainability and Robustness

4.2. Mid Term (3–7 Years): Neuromorphic Computing and Efficient Learning

4.3. Long Term (7+ Years): Quantum and Hybrid Paradigms

4.4. Synthesis

5. Challenges and Limitations

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI