AI-Based Computer Vision Techniques and Expert Systems

: Computer vision is a branch of computer science that studies how computers can ‘see’. It is a ﬁeld that provides signiﬁcant value for advancements in academia and artiﬁcial intelligence by processing images captured with a camera. In other words, the purpose of computer vision is to impart computers with the functions of human eyes and realise ‘vision’ among computers. Deep learning is a method of realising computer vision using image recognition and object detection technologies. Since its emergence, computer vision has evolved rapidly with the development of deep learning and has signiﬁcantly improved image recognition accuracy. Moreover, an expert system can imitate and reproduce the ﬂow of reasoning and decision making executed in human experts’ brains to derive optimal solutions. Machine learning, including deep learning, has made it possible to ‘acquire the tacit knowledge of experts’, which was not previously achievable with conventional expert systems. Machine learning ‘systematises tacit knowledge’ based on big data and measures phenomena from multiple angles and in large quantities. In this review, we discuss some knowledge-based computer vision techniques that employ deep learning.


Introduction
Among the many research fields related to image processing, 'computer vision' has been attracting considerable attention [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]. It is an academic field that studies the 'realisation of vision using computers' and is also an artificial intelligence (AI) field that enables computers and systems to derive meaningful information from digital images, videos, and other visual data and act and make recommendations (allowing humans to do so as well) based on that information. If AI enables computers to think, computer vision enables them to see, observe, and understand. This works similar to human vision and is expected to give humans a head start. Thus, the purpose of computer vision is to impart computers with functions equivalent to the human eye, that is, to realise 'computer vision'. Specifically, the goal is to design computer software that uses still image or video data to function similar to or better than human vision. Thus far, computer graphics (CG) has been the primary image-processing technology used in computers. The difference between CG and computer vision is that CG is used for projecting and displaying 3D objects on a 2D display, whereas computer vision is used for deriving 3D information from 2D image data. Although they are contrasting technologies, they complement each other and contribute to new technologies, such as virtual reality and augmented reality (AR). AR displays 3DCG that are overlaid on real-world scenery, for which additional information is added by a computer. It is an example of the fusion of computer vision, which perceives the real world, and CG, which depict a fictional world [16][17][18][19][20]. Human vision has the advantage of accumulating contextual knowledge to distinguish between objects, determine the distance between objects, identify whether an object is moving, and ascertain whether there is something wrong with an image [21,22]. Computer vision involves training machines to perform these functions considerably faster than humans using cameras, data, and algorithms rather than the retina, optic nerve, and visual cortex. Systems trained to inspect products and monitor production assets can quickly surpass human capabilities because they can instantaneously analyse thousands of products and processes and identify imperceptible defects and problems. Therefore, computer vision requires vast quantities of data, which are iteratively analysed until the computer can identify features and, finally, an image. Two key technologies are used to achieve this: a form of machine learning called deep learning and convolutional neural networks (CNNs) [23][24][25][26][27]. They employ algorithmic models that allow computers to learn the context of visual data, that is, to 'visualise' machine or deep learning models by segmenting images into tagged or labelled pixels, and perform a convolution, which is a mathematical operation that takes two functions to produce a third, employing labels for predictions regarding what a computer 'sees'. If the model provides sufficient data to the computer, it learns to 'see' the data and distinguish one image from another. Instead of being manually programmed to recognise images, algorithms allow machines to learn independently. A neural network performs convolutions and checks the accuracy of predictions in a series of iterations until it can make a prediction. It then perceives or displays the images in a human-like manner. Similar to how humans perceive images at a distance, CNNs first identify hard edges and simple shapes, and then perform iterative predictions to embed information. Although CNNs are used to identify single images, recurrent neural networks are similarly used for video applications to help computers understand the relationships between images in a series of frames. Since this new field deals with all the processes in which computers acquire information in the real world, it is widely studied with respect to aspects ranging from hardware for image sensing to AI theory for information recognition. Therefore, we focused our attention on the relevant new fields, such as the fusion of computer graphics and computer vision with CNNs, which are configured by expert systems, computer vision systems, and application.

Problem Solving Via Expert Systems
Human beings are constantly making choices throughout their lives, and some decisions often require a comparison of two things. Computer-based problem solving is also similarly performed through comparisons. The comparison constitutes a conditional branch, and the answer to the input is derived by executing the conditional branch. Such operations can be considered AI because a machine performs actions that are recognised as intelligence by humans. Very early AI was composed of programs that performed such conditional branching tasks, and this has been carried over to current AI implementations. Condition setting, that is, the setting of rules, is necessary for conditional branching. A system that performs conditional branching using rules is called a rule-based system, and conditional branching is often described in the form of IF-THEN statements [28,29]. Considering that their program and algorithm configurations are represented in flowcharts, rule-based systems are compatible with flowcharts. When building a rule-based system, the conditional branching content is arranged in a flowchart. The rules must be predefined by humans; however, it is impossible to handle unknown problems that humans cannot correctly answer on a rule-oriented basis [30,31]. When setting these conditions, it is necessary to focus on their order and priority. Clarifying problems and their solutions at the rule design stage in this manner is called problem formulation. When a flowchart is composed using rules, a binary tree is built based on the available information. This tree structure is called a decision tree and is often used for statistical data processing and analysis [32,33]. Unknown rules are searched for, and they may be found through statistical data processing. When constructing a rule-based program, the conditional branch that becomes a rule can easily be hardcoded; this is called a fixed program, which cannot be modified later as it uses fixed information. Even if the settings are changed, such changes are not problematic if rewriting the program does not incur additional costs. If the conditions' settings are easily changeable, for example, as in changing settings according to taste, this can result in excessive costs if the program is frequently rewritten. Therefore, a method was devised to address this problem, wherein the main body of the program that processes and outputs the data was separated from the condition-setting data. A separate set of data is called a knowledge base [34]. When a program requires a conditional branch, it uses the ID that extracts the rule to read the setting value and make a decision. The knowledge base may be stored as text files on a file system or in a database management system. The aforementioned rule-based system was developed in the 1960s and is used in large systems. In particular, systems that can assist or replace experts with respect to performing certain tasks, such as classification, are called expert systems [35,36]. They are computer systems that have knowledge regarding specialised fields and can reason and judge phenomena related to specific problems in a manner similar to that of an expert. They were developed so that a layman or novice with no technical knowledge can use them to achieve the same level of problem-solving ability as an expert. It is said that by using an expert system, which comprises two parts, namely, an 'inference engine' and a 'knowledge base', even people without specialised knowledge can solve problems and make decisions on par with experts [37,38]. The inference engine is based on a knowledge base comprising various forms of information, such as rules, facts, and specialised knowledge, and it makes inferences and draws conclusions using this information. Rules handled by humans can be understood by expressing them in words, but for computers to interpret and process them, it is necessary to convert the expressions into a more suitable format. The discipline concerned with such representations is called symbolic logic [39,40]. The inference engine plays a key role as the brain of an expert system that engages in the problem-solving process. The most basic branch of mathematical logic it uses is called propositional logic, which comprises propositional variables and sentence operators (connectors) and indicates what is expressed by truth values. The aim is to express and comprehend the relationships between propositions by relating them with conditions such as 'and', 'or', and 'if' without pursuing the meaning of the proposition itself. Therefore, although the meanings of propositions cannot be analysed, they can be given meaning using other forms of logic that extend propositional logic, such as predicate logic [41,42]. Logic is quantified by extending propositional logic and is an established inference engine. These inference engines increase the number of available methods for answering 'questions'. In propositional and postoperative logic, sentences are expressed using symbols. In propositional logic, logical expressions are composed of propositional variables that are logical, atomic expressions and sentence operators [43]. For example, consider there are two propositions, P and Q, with predetermined truth values: true or false, or 0 or 1. As listed in Table 1, P ⇒ Q and P ⇔ Q, where P ⇒ Q is equal to (¬P) ∨ Q, and P ⇔ Q is equal to (P ⇒ Q) ∧ (Q ⇒ P). This table is called a truth table. Logical formulas that are always true are called tautologies. Conversely, logical formulas that are always false are called contradictions. Logical formulas have equivalence relations, which are true formulas [44,45], as shown in Table 2. An inference rule that appears to be a combination of these logical expressions can be converted into a clause form. Using the clause form, even complicated logical expressions can be grouped together to make them easier to handle. Conjunctive and Skolem normal forms comprise propositional and predicate formulas, respectively, that are converted to the clause form. In the conjunctive normal form, a clause is a formula comprising formulas joined by disjunctions. In contrast, the knowledge base is a database separated from the inference engine wherein 'IF-THEN' data are accumulated.

History of Expert Systems
The first expert system ever built was called 'Dendral' [46]. It was built at Stanford University in 1965, and Edward Feigenbaum, who was involved in its development, is referred to as the 'father of expert systems'. Dendral can estimate the chemical structure of a substance using its numerical value and the molecular weight of its peak position, for which the latter is obtained through mass spectrometry. For example, a water molecule (H2O) comprises hydrogen (H) and oxygen (O) atoms that have masses of 1.01 and 16.00, respectively; hence, the molecular weight of H2O is 18, and its mass spectrum has a peak at 18 units. For this example, Dendral can calculate chemical substances with molecular weights of 18 using a combination of atoms and determines the possible answers. As the molecular weight increases, the combinations of atoms became more diversified. Since calculating the answer requires time, it is necessary to devise a method that can avoid calculating combinations that do not require an evaluation of these mechanisms. Dendral's system comprises two parts: Heuristic Dendral, which performs heuristic (rule-of-thumb) analysis, and Meta-Dendral, which registers sets of molecular structures to be combined and their mass spectra as a knowledge base and feeds them back to Heuristic Dendral [47,48]. Therefore, Meta-Dendral can be referred to as a learning system. In addition, the fuzzy logic approach also plays an important role [49][50][51][52]. Following these developments, 'Mycin' was developed in 1972, which increased the popularity of expert systems [53]. It could diagnose patients and infectious blood diseases and recommend antibiotics to be administered along with dosages adjusted to the body weight of the patient [54]. It made judgements using a knowledge base comprising approximately 600 rules, and, based on the answers to questions comprising answer formats other than YES/NO, it could display several bacteria in descending order of likelihood and reliability as the cause of a disease along with the reasons. Although the performance of Mycin was not comparable to that of specialists, it had great potential for use in the medical field as a system capable of diagnosing bacterial infections, obtaining an accuracy rate of approximately 65%. However, it was never used in practice owing to indecision among doctors regarding who should be held accountable in the event of a 'misdiagnosis' by the system. Subsequently, a second AI boom occurred in the 1980s, wherein the use of various expert systems started. However, owing to problems such as the requirement to manually input a large number of data and formulate rules, as well as an inability to handle complex learning, the practical use of expert systems was restricted to a limited number of systems, and the boom passed. Since the 2000s, medical expert systems that demonstrate this level of reliability have been criticised as 'unusable', and even when similar systems were developed, their adaptation was often difficult to achieve. There were ethical and legal issues, such as unclear liability and resistance, concerning the use of computerised diagnostics that could provide incorrect results. The diagnostic accuracy rate often expected from such systems is 85-90% or more, and the numbers of false positives and negatives should be as low as possible (the positive predictive value should be high). Around 2010, expert systems again attracted attention owing to the advances in 'machine learning', which reduced the cost of data entry and rule creation [55,56]. The period from 2010 to the present can be considered as the third AI boom. During the second AI boom, expert systems faced various challenges, such as the requirement to manually input a large number of data after creating rules. The data input process itself had to be performed by humans after the knowledge of the specialised field was formatted for input into the system. This required a significant amount of money and time. As the knowledge required to build an expert system is often extremely complex, it is difficult to convert such information into data through simple algorithms. Additionally, as the amount of knowledge and rules increased, contradictions arose, and managing inconsistencies became difficult. Moreover, the hardware during the second AI boom did not have the processing power to allow the system to automatically extract the data entered by humans. However, in recent years, machine learning has enabled the automatic learning of large quantities of data, and advances in technology have gradually reduced the challenges faced by expert systems, thus rendering them more practical.

Past and Present of Computer Vision Techniques
The demand to develop methods that enable machines to see and understand visual data has existed for several decades. Experiments began in 1959, when neurophysiologists showed a series of images to cats and attempted to correlate such images with their brain responses. Around the same time, the first image scanner was developed, which allowed computers to acquire and digitise images. In 1963, computers became capable of transforming two-dimensional (2D) images into three-dimensional models, and during the 1960s, AI became a prominent field in academic research, and thus the quest to use AI for solving human vision problems began. In 1974, optical character recognition (OCR) technology was introduced, which could identify printed text in any font or typeface [57,58]. Similarly, intelligent character recognition (ICR) can use neural networks to decipher handwritten text [59,60]. Since their introductions, OCR and ICR have been used in several general applications. In 1982, neuroscientist David Marr discovered that vision works hierarchically and introduced a machine algorithm for detecting edges, corners, curves, and similar primitive shapes. Around the same time, computer scientist Kunihiko Fukushima developed a network comprising multiple types of cells, called 'neocognitron', that includes convolutional layers in a neural network and can identify patterns [61,62]. Until 2000, research was primarily focused on object recognition, and by 2001, the first realtime facial recognition application was introduced. Methods for tagging and annotating visual datasets were standardised in the 2000s. In 2010, the ImageNet dataset was released, which contains millions of tagged images across 1000 object classes and provided the foundation for the CNN and deep learning models in use today [63][64][65][66][67][68][69][70]. In 2012, a team from the University of Toronto participated in the 'Utilisation of CNN in Image Recognition' contest. They introduced a ground-breaking model called AlexNet, which significantly reduced the image recognition error rate for ImageNet to just 15.3%. As a result, by using CNN, even for ImageNet-scale image recognition datasets, it is possible to automatically perform effective feature extraction through a multiresolution hierarchical convolution kernel layer group [71,72]. Therefore, the accuracy of AlexNet was considerably higher than that of convolutional image recognition methods. The AlexNet research can be considered the first successful CNN-learning framework for 'successfully learning large-scale networks from large-scale datasets using GPUs' and collecting tips.' Thus, AlexNet is considered to be the original network structure of modern image recognition CNNs introduced in recent years. The key factor that allowed AlexNet to achieve high accuracy was the suppression of the overfitting of deep CNNs. High-speed training of large-scale CNNs using GPUs, which has now become commonplace, was achieved and modelled for the first time by using techniques such as data augmentation, dropout, ReLU, local response normalisation, and multi-GPU parallelisation. In other words, 'a collection of tips for successfully training a CNN model on large-scale image datasets without the least amount of overfitting' was introduced for the first time. Therefore, using AlexNet as the baseline, it was easier for other researchers to conduct CNN research. Thus, the introduction of AlexNet had a significant impact on the computer vision industry, resulting in a paradigm shift wherein 'the computer vision industry started rapidly shifting to CNN-based image recognition methods'. AlexNet sparked another AI boom, and many top researchers participated in research on image recognition CNNs. However, AlexNet was quickly replaced by other CNNs, such as VGGNet, InceptionNet, and ResNet, due to their high-performance. Hence, the architecture of AlexNet itself has not been used for a long time [73][74][75][76][77][78][79][80][81][82][83][84][85][86]. AlexNet's learning exhibited poor convergence and was not very practical because it was a hit-and-run, trial-and-test optimisation. However, AlexNet is a very important development because it was the first to present the underlying backbone of its successor image recognition CNNs. At the same time, because it exhibited significantly higher accuracy than conventional methods, researchers stopped combining conventional methods comprising handmade image feature values such as Fisher Vector and discriminative models such as support vector machines. Moreover, AlexNet triggered a shift in the entire pattern recognition industry to deep learning with CNN. Although leading researchers had started an AI boom by employing deep learning, the introduction of AlexNet was the biggest breakthrough, especially in computer vision pattern recognition tasks. As one of the advantages of the CNN models, feature visualization provides a unique perspective on how neural networks function, especially with respect to image recognition. Given the complexity and opacity of neural networks, feature visualization constitutes an important step in order to analyse and explain neural networks. Through feature visualization, it was revealed that the neural network first detects simple edges and textures, followed by detecting layers more abstractly, and, finally, performs object detection. Network analysis expands on these insights and makes the interpretability of network units measurable. In addition, concepts can be automatically linked to units, which is very convenient. Furthermore, feature visualization is a great technique for communicating how neural networks work in a non-technical way. Together with network analysis, it can detect concepts across classes in classification tasks.

Application of Knowledge-Based Computer Vision Techniques
Knowledge bases are special databases used for the management of facts, common sense, experience, etc., pertaining to the subject area incorporated into the expert system, which renders knowledge searchable and organised and aggregates it on computers. They are designed to solve problems pertaining to specific applications or domains through reasoning. These systematically ordered sets of facts, events, beliefs, and rules are roughly classified into two categories: machine and human-readable knowledge bases [87][88][89][90]. For example, input images from a camera can be processed by a computer to serve as the eyes of a robot or a self-driving car, wherein the installed cameras play the role of eyes. The images captured by the camera are analysed using computer vision technology and used for autonomous driving. For example, in the case of a 'Forward Collision Prevention System', when a camera installed in the front of a vehicle detects a nearby car or person, a warning sound is emitted to alert the driver. Additionally, when approaching a dangerous distance, brakes are automatically applied to prevent collisions. Moreover, the 'Lane Departure Warning System', which alerts the driver using an audible warning when a car unintentionally strays from the lane it is in, and the 'Parking Support System', which synthesises images captured by the front, back, left, and right cameras when parking and displays them as if they were captured from above, have been developed [91,92]. Computer vision technology is also used to automatically detect a specific person in surveillance camera images. The image analysis system "TB-eye AI Solution" achieves high-performance facial recognition and image analysis by combining deep-learning-and computer-vision-based object recognition technology. Additionally, a system called 'smart search' has been developed that uses AI to identify objects in images based on various conditional terms, such as 'white car' and 'person in black clothes', which can significantly reduce research time. Computer vision technology is also used for vein authentication, which is performed by holding a palm or finger over a terminal to identify a person. Research is also underway with respect to the use of image recognition technology for detecting affected areas within the human body using images captured via computed tomography or magnetic resonance imaging. Furthermore, a device that can produce 3D images of living cells within minutes has been developed, leading to the study of drug effects at the single-cell level [93,94].
Among the various applications of computer vision technology, 'matchmove' is a 3DCG synthesis technology that is often used in movies and television dramas [95,96]. The actor's performance is shot in a blue sheet studio, and the background CG that are synthesised later are calculated and created based on the viewpoint changes of the camera and 3D information, which are then matched to the actor's image. It is similar to AR in that it synthesises another video or 3DCG comprising a landscape image; however, AR is a realtime process, whereas matchmove is a video-editing technology that synthesises the video later. In addition, 'projection mapping' technology is used to project images onto 3D objects, such as buildings, and has recently been employed at various events [97]. In the case of ordinary projectors used in offices, movie theatres, etc., images are projected vertically onto a flat rectangular screen. In contrast, in projection mapping, the 3D information of the building onto which the image is projected is read, and the image is projected according to its shape. It is also possible to limit the projection within a specific outline and to change the image to be projected at each surface transition. It can be adjusted such that it looks natural, even when viewed from viewpoints other than the front. This mapping work is performed using video-editing software. By calibrating and adjusting the 3D positional relationship between the projector and camera, images can be mapped in a 3D space with high accuracy. Additionally, this technology has evolved even further, leading to the creation of interactive projection mapping, wherein images that are projected onto a room or hallway change in response to the movements of people in that space. In recent years, gesture recognition technology has attracted attention as a user interface, allowing users to operate televisions and cameras with only gestures and without touching [98,99]. Such systems are equipped with cameras and sensors and, therefore, can be operated with gesturally and vocally without touching. Subsequently, developer packages were offered at a low price, which resulted in their application in various fields. One example is the medical field, where a system that allows doctors to operate personal computers (PCs) during surgery through non-contact gesture recognition is being used. Using this system, X-ray images can be checked without touching a PC screen or by instructing other staff members to operate the PC during surgery.
Additionally, even in the design of image-understanding systems, the main part of the knowledge-based vision system comprises the process of matching the target model representation with the cues in the observed image and model [100]. In the same way that the selection of the knowledge representation format is important in knowledge systems, the representation of target models in vision systems is also an important issue that significantly affects the matching process that is realised via inference. Many conventional 3D vision systems represent models individually, and various methods have been proposed for their representation. It is important to establish a description system for complex objects and to employ it to identify objects. However, if the models remain separate, an identity judgement between the object and the model would occur. When considering a computer's structural understanding, it is desirable to describe the relationship between each model in the model representation. Based on this idea, a 3D vision system with a model that expresses both the conventional model structure's description and the relationship-based description between models using the object-oriented class concept was developed [101]. The geometry is represented by surface models, and each model corresponds to an object-oriented system class. In the comprehension process, with respect to relationship descriptions (is-a relationships), for which there is a has-a, aggregation, and composition aggregation relationship, in relations wherein different kinds of hierarchical properties between classes are used, matching is first performed via an abstract model, and then a concrete model with tighter constraints is obtained. The features of this vision system are as follows: 1.
A model system with independent 3D and 2D models; 2.
Each model expresses one shape concept that is expressed using inheritance through the ISA relationship between models; 3.
Model representation, reasoning mechanism, and image processing are described in an object-oriented framework; 4.
A function for understanding incomplete line drawings is implemented.
Additionally, the following points are considered important as model representations for understanding images in such systems:

1.
Only the essence of an object that is the subject of a single concept is used to suppress slight differences in individual objects as much as possible; 2.
Although shape representation is used to enable the representation of term (i), the process of matching with the actual image should not be ignored; 3.
The structure is expressed explicitly using the PART OF relation; 4.
It is structured to deepen the understanding sequentially by describing the relationships between concepts using is-a relationships.
Frames are often used in such a knowledge representation method. In knowledge representation by frames, one frame corresponds to one concept, which makes knowledge representation easily comprehensible. The functions of frames are sufficient for model representation alone; however, frames are often used as hierarchical knowledge bases and have a strong passive character. Since image comprehension requires active reasoning during the matching process, an object-oriented paradigm involving active procedures is more suitable [102][103][104][105][106]. However, frames and object orientation are not originally different concepts, and object orientation can be considered as a broader concept that includes procedures in the hierarchy of frames. From a programming point of view, an object is a collection of data structures and their exclusive procedures, whereas from an analysis/design point of view, it is a collection of information resources and their processing procedures. In other words, object orientation is a way of thinking that involves handling data and processes together, and uses objects that integrate both as basic elements, emphasises the interaction between objects and messages, and attempts to construct the entire software [107][108][109]. Additionally, in object orientation, there is a clear separation of concepts between classes, which are abstract objects, and instances, which are concrete objects. Consequently, an abstract model representation can be associated with a class, and an entity obtained from clues in an image and matched with a model can be associated with an instance. As described above, the ability to successfully apply model representations, inference procedures, and concrete entities from images is required in an object-oriented paradigm.

Discussion
Computer vision and AI technologies are the fastest growing areas in terms of market size and industry adoption. Spatial computer vision and edge AI, in particular, are not only used for complex processes but also to improve and automate repetitive tasks [110]. This new reality coupled with the increasing affordability of hardware and the sophistication of depth perception and machine learning has enabled the development of practical solutions in edge computer vision and AI systems. Spatial computer vision with edge AI can deploy depth-based applications and perform image processing inside a device. As hardware becomes more accessible, important improvements are also taking place in software and machine learning workflows. Although they are still highly specialized and many technical issues remain, AI and computer vision have become easier to use with the provision of tools that allow them to learn their own models. Meanwhile, large-scale edge computing and deployment remains a problem for standard machine learning pipelines and workflows [111][112][113]. In addition, one of the biggest challenges is reducing the cost and time currently required to create and improve machine learning models for real-world applications. The challenge is how to manage all these devices and how to create a smooth pipeline for continuous improvement. There are also implicit limitations in terms of computational processing, so the final model that will be deployed on a device requires additional considerations, such as the requirement of an application to be lightweight and performant.

Conclusions
Computer vision is a branch of computer science that focuses on enabling computers to identify and understand objects and people through images and videos. Similar to other types of AI, computer vision aims to perform and automate tasks that mimic human capabilities. In this case, it reproduces both the way humans see and understand what they see. In other words, computer vision applications use inputs from sensing devices, AI, machine learning, and deep learning to mimic the functions of the human visual system. Additionally, computer vision applications run on algorithms trained on vast quantities of visual data and images. These applications recognise patterns in the visual data and use these patterns to determine the content of other images. Computer vision applications are shifting away from the statistical methods conventionally used for image analysis and are increasingly relying on deep learning, which provides even more accurate image analysis via deep neural network algorithms. Furthermore, deep learning retains information about each image; therefore, the more it is used, the more accurate it becomes. Machine learning, in which humans prepare large quantities of sample and supervised data that are studied by computers to discover knowledge and rules on their own, requires a large number of sample data and a configured learning machine. In deep learning, a computer itself finds the optimal solution; although it is not accurate at first, the accuracy increases as sample data are continuously supplied. This has become possible due to the high performance of current computers and large quantities of sample data generated by big data. However, advanced rules are difficult to define in an expert system wherein a human defines knowledge and rules and teachers them to a computer. In other words, there were various limitations in the past due to the inability to transcend the human frame. However, because real-time processing and big data have spread owing to improvements in hardware specifications, expert systems are being used in many industries. In conclusion, computer vision is a powerful ability that can be combined with a wide variety of applications and sensing devices to support various practical applications.