A Systematic Literature Review of Cutting Tool Wear Monitoring in Turning by Using Artificial Intelligence Techniques

In turning operations, the wear of cutting tools is inevitable. As workpieces produced with worn tools may fail to meet specifications, the machining industries focus on replacement policies that mitigate the risk of losses due to scrap. Several strategies, from empiric laws to more advanced statistical models, have been proposed in the literature. More recently, many monitoring systems based on Artificial Intelligence (AI) techniques have been developed. Due to the scope of different artificial intelligence approaches, having a holistic view of the state of the art on this subject is complex, in part due to a lack of recent comprehensive reviews. This literature review therefore presents 20 years of literature on this subject obtained following a Systematic Literature Review (SLR) methodology. This SLR aims to answer the following research question: “How is the AI used in the framework of monitoring/predicting the condition of tools in stable turning condition?” To answer this research question, the “Scopus” database was consulted in order to gather relevant publications published between 1 January 2000 and 1 January 2021. The systematic approach yielded 8426 articles among which 102 correspond to the inclusion and exclusion criteria which limit the application of AI to stable turning operation and online prediction. A bibliometric analysis performed on these articles highlighted the growing interest of this subject in the recent years. A more in-depth analysis of the articles is also presented, mainly focusing on six AI techniques that are highly represented in the literature: Artificial Neural Network (ANN), fuzzy logic, Support Vector Machine (SVM), Self-Organizing Map (SOM), Hidden Markov Model (HMM), and Convolutional Neural Network (CNN). For each technique, the trends in the inputs, pre-processing techniques, and outputs of the AI are presented. The trends highlight the early and continuous importance of ANN, and the emerging interest of CNN for tool condition monitoring. The lack of common benchmark database for evaluating models performance does not allow clear comparisons of technique performance.


Introduction
Cutting tools endure mechanical, thermal, and chemical conditions that induce wear, most importantly located on their flank face (flank wear) under nominal machining conditions. The workpieces produced with worn tools may exhibit poor quality, consisting of dimensional discrepancies, poor surface roughness and residual stresses that fall out of specifications. In order to limit downtime and risks for the machine and for the production quality, the cutting tools must be replaced in a timely manner [1]. Due to several wear mechanisms occurring simultaneously [2], the evolution of tool degradation is extremely variable even at steady cutting parameters, inducing the need for detection or prediction of wear. The continuous nature of the machining process does not allow direct measurement of the tool wear degradation. Furthermore, the evolution of different condition monitoring parameters with wear differs highly depending on the measured variable, which makes tool wear estimate extremely complex [3]. In consequence, it is estimated that only 50% to 80% of numerical control machining tool life is rationally used [4], hence important waste and losses. The industrial practice occasionally contributes to this trend, by using lower cutting parameter values in order to slow the wear process and replace the tool in time [5]. Therefore, the tool life estimate is a recurring research in machining, and several approaches have been attempted, from early empiric laws [6] to stochastic modeling [7], to more advanced statistical models such as the Proportional Hazards model [8], and in recent years, Artificial Intelligence (AI) methods.
With the rise of industry 4.0, the amount of monitoring data available for analysis increases, along with the capability of automation for industrial processes, including machining [9]. An increasing number of sensors are included in machine tools, accounting for the rise in data available, but also to the variety of variables being monitored. In this context, AI can be used in order to predict the state of tool wear based on sensors data in order to replace the cutting insert before adverse effects on production are met. Contrary to previous methods, AI also provides the capability of taking into account large data sets and continuous measurements of several sensors allows updated condition monitoring.
In a 2002 review of Artificial Neural Networks (ANN) used in tool wear monitoring in turning, Bernhard Sick covered the research of the previous decade [10]. Its long conclusions summarize the state of the art at the time, but remain specific to ANN, few applications of other techniques to tool wear monitoring existing at that time. Indeed, the AI techniques evolve rapidly and the last twenty years provided a large array of models with various input and output variables, which raised numerous methodological questions. In 2010, Abellan-Nebot and Romero Subirón [11] highlighted the lack of common methodologies for the development of AI in this framework and proposed guidelines for future experiments, regardless of the machining process, using Taguchi's orthogonal arrays [11]. In a major review of 2013, Siddhpura and Paurobally [12] comprehensively reviewed the available variables for condition monitoring and the AI techniques for decision-making in turning [12]. More recently, taking into account the recent developments in deep learning techniques, Serin et al. [13] reviewed specifically the applications of deep learning to tool wear monitoring in machining [13]. In parallel, and specifically for milling, Mohanraj et al. [14] produced a short review on the tool condition monitoring techniques in milling, highlighting the main condition monitoring variables and summarizing the AI techniques used in decision-making relative to this process, highlighting the variety of techniques available and importance of feature extraction [14]. However, despite these reviews, which are either outdated or specific to a restricted portion of AI techniques, there is a lack of hindsight on the current variety of AI techniques applied to tool life estimate in turning, their inputs, pre-processing, and outputs.
In this paper, a brief overview of the background of tool wear in turning and AI techniques is presented in Section 2. Then, in Section 3, the standardized methodology of the systematic literature review is presented and applied to the current review [15]. For this analysis, research questions are identified, then the search process is described. In Section 4, a first bibliometric analysis is performed, then an in-depth analysis of the identified literature entries is proposed in Section 5. The results are discussed in Section 6 and formal answers to the research questions are formulated in Section 7. Finally, conclusions on each aspect of the review and perspectives on future research objectives are identified in Section 8.

Background in Turning Process and Artificial Intelligence
This section defines some important concepts that are used through this systematic literature review. First, the concept of tool wear in turning is presented, followed by the introduction of machine learning. This section does not constitute an exhaustive description of the subjects that it describes, but rather constitutes a comprehensive introduction to the specificities of tool wear and the AI techniques discussed in the following sections. A short introduction to some of the most used AI techniques used in this systematic literature review is presented in the appendices of this article (Appendices B.1-B.6).

Tool Wear in Single Point Turning
Any machining operation induces tool wear. The causes of degradation of the tool can be numerous and of very varied origins. Among them, the main causes of degradation are adhesion, abrasion, tribochemical reactions and surface disruption [16]. The word "wear" groups together the surface interactions leading to all of these degradation mechanisms. The predominance of one mechanism over another depends on the cutting conditions and materials. In nominal cutting conditions, the main mechanism of wear is abrasive wear resulting in the degradation of the flank face. This type of wear is favoured by the tool manufacturers, as it is considered as the steadiest and most predictable [17,18]. Prominence of other kinds of wear generally comes from poor cutting parameters choice [8]. In turning, the main cutting parameters are: cutting speed, feed rate, and depth of cut. The workpiece material and the choice of cutting parameters each have an effect on the tool life.
In the case of flank wear, the evolution of wear depends on the cutting condition and is constituted of three phrases ( Figure 1): 1.
An initial wear zone. In this phase, the new insert starts to wear quickly. This period is generally short in regard of the cutting life; 2.
A steady-state region. In this region, the wear slowly gradually increases. The tool spends most of its life in this phase; 3.
Accelerated wear. This is the end-of-life of the cutting tool, the wear rate starts to increase significantly until the tool is worn. When the tool end-of-life criterion is reached, the tool is replaced by a new one.
Defining an end-of-life criterion for a tool can be complex. Wear is generally accompanied by a degradation in the quality of the machined surface and compliance with required tolerances. The end-of-life criterion is thus variable, depending on the objective of the machining process. The ISO 3685 standard defines the value of VB B as the end-of-life criterion for flank wear in the framework of single point turning tool life testing [19]. VB B is defined as presented in Figure 2, and a tool is considered worn if the value of VB B reaches 0.3 mm or VB Bmax is above 0.6 mm. In industrial applications, this criterion may vary depending on the machining purpose. The value of VB B is obtained by directly measuring the wear land width on the flank face ( Figure 2).  Tool wear standardized measurement [19]. Image from Jozić et al. [20] under Creative Commons Attribution License [20].

Artificial Intelligence
In the framework of this paper, the words "artificial intelligence" are often be used to be as general as possible. It regroups "machine learning" and "deep learning" approaches.
The AI paradigm finds its origin during the 1950s [21] and can be defined as: "an area of study concerned with making computers copy intelligent human behaviour" [22]. The word "intelligence" should be taken in the broad sense. Indeed, while the task of recognizing a cat from a dog is not considered as a marker of high intelligence in humans, achieving this task is considered as AI for machines. A definition for intelligence in this framework can be proposed as: "the computational part of the ability to achieve goals in the world" [23].
Machine learning (ML) algorithms can achieve different tasks without the intervention of human beings in the process; they therefore belong to a part of the AI paradigm. They are soft computing techniques in the way that they can adapt their architecture in order to achieve the desired task without having explicitly been programmed to do so. The process of adapting the architecture is called "learning". A model learns through experience, in such a way that the model is influenced by the input data and the desired output. The interest of these techniques is that they are not only able to learn from the data but also to generalize the results for previously unseen data [24]. ML can therefore be defined as: "The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyse and draw inferences from patterns in data" [22].
In the context of this Systematic Literature Review (SLR), AI methods can be applied to: detecting the failure of the tool, diagnosing the state of the tool or making a prognosis on the future state of the tool, this can be achieved in different ways. There are four AI families depending on whether it performs: regression, classification, clustering or pattern recognition.
A wide variety of AI techniques exists, varying by their architectures, the way they learn, their objectives, . . . In the following, six AI approaches are presented: Artificial Neural Network (ANN), Support Vector Machine (SVM), Hidden Markov Model (HMM), Convolutional Neural Network (CNN), Self-Organizing Map (SOM) and Fuzzy Inference System (FIS). The choice to present these techniques comes from their interest in the literature of this SLR. These techniques therefore do not represent all aspects of AI but regroup all the approaches that are presented in this SLR.
As part of this SLR and for comparison purposes, it was chosen to structure the use of a AI method into six steps: 1.
Data collection. The first step of using an AI approach is to collect information from the different sensors. In turning, numerous types of sensors are used, and the fundamental variables to which they relate have been extensively reviewed in the past [12]: accelerometers, strain gauges, microphones, etc. 2.
Features extraction. As the data come from different types of sensors, extracting the useful information of the signal is not always feasible without applying a preprocessing technique. These techniques make it possible to extract the useful information (or features) from the raw signal. 3.
Features selection. As all the features extracted from the previous step are not always useful, a selection of the best features is made. This is generally accomplished by a correlation analysis such as ANalysis Of VAriance (ANOVA) [25], Pearson's analysis [26], etc. or through expert judgment.

4.
Input of the selected features into the AI. The number of inputs may vary depending on the approach.

5.
Inference. This is the output of the selected AI approach. Based on the model architecture and the input data, the AI produces an inference of the output. 6.
Accuracy. To discuss whether the outputs of the AI technique have a good performance, some indicators are computed.
These steps allow comparison of the different techniques, and they are discussed in Section 5.

Review Protocol
This SLR was conducted following the review protocol suggested by Kitchenham et al. [15,27]. The definition of a review protocol is a critical step because it ensures the repeatability of this review and reduces the researcher bias in the presented results. This protocol is composed of six stages with the objective of defining the framework of this SLR:

1.
Research questions. This step consists of defining the research questions that are addressed in this SLR.

2.
Search process. In this step, the methodology to obtain relevant literature entries is described.

3.
Inclusion and Exclusion criteria. The literature obtained from the previous step needs to be sorted in order to obtain only the literature entries that answer the research question. Therefore, in this step, paper selection criteria are defined in order to best answer the research question based on the title and abstract of the articles. 4.
Quality assessment. In this step, the content of the selected papers is analysed, and quality criteria are applied to reject papers that do not answer the research question. 5.
Data collection. In this step, the researchers extract the useful information from the selected studies. 6.
Data analysis. Finally, this step allows the presentation of the results in a relevant and comparable way.
All six stages of this protocol are exposed in the remainder of Section 3. Figure 3 presents a summary of the research protocol with the corresponding results.

Research Questions
As presented in the introduction, the interest of AI techniques applied in manufacturing is a constantly growing subject. This subject is in fact too broad to be treated in, a single review. In consequence, the main research question of this review is intentionally limited to the use of AI techniques to monitor the tool wear in stable turning operations. The research question is expressed as:

RQ1
How is the AI used in the framework of monitoring/predicting the condition of tools in a stable turning condition?
The formulation of this research question aims to precisely define the framework of this paper. The concepts of AI and ML have been described in Section 2.2. The words "monitoring/predicting" are used to encompass all monitoring cases, i.e., in real time or with a prediction of the future state of the tool. As many indicators can be used to monitor the tool, the word "condition" is used to encompass them all. The formulation "stable turning condition" aims to reject all papers that discuss chatter or tool breakage detection. As this research question mainly focuses on AI methods applied to the health of the tool, quality-oriented approaches which focus on workpieces are not considered in this review.  The research question is voluntarily broad, such that some sub-questions can be addressed: RQ2 What are the most used AI techniques used in this context? RQ3 What is the forecasting horizon of each of the identified AI techniques in this context? RQ4 What are the most common inputs used with the different identified AI techniques in this context? RQ5 What are the most used feature extraction techniques used in this context? These research questions are answered throughout the SLR analysis process, and they are explicitly answered in Section 7.

Database
To answer the research question, adequate databases should be consulted. The database must be as large as possible and related to the subject to have a representative panel of articles constituting the current state of the art. In this review, it was chosen to use the Scopus database as it regroups a large number of relevant journals from various publishers [28]. Due to the evolution of AI techniques and computing power, it was considered that articles dating from before 2000 were not representative of the current state of the art in the frame of this SLR. Thus, it was chosen to limit the search process over a 20-year period from January 2000 to December 2020. Furthermore, other studies also stated that before 2000 the vast majority of AI techniques used was ANN [29]. A literature review made in 2002 [10] already addresses this part of the literature [10].

Keywords
The search terms must be chosen to be the most general in the framework of the research question. It ensures that the research addresses the topic without introducing bias. The keywords used to perform the search in the Scopus database are listed in Table 1. A semantic approach was chosen such that each search query is constructed with one element of each semantic part. It was chosen to compose the search query with four parts: 1.
the tool part; 2. the aims part; 3. the method part; 4. the indicator part.
For the "tool part" in Table 1, the word cutting tool uses the wildcard character (*), which is used to include these different forms of the "tool" word: "tool", "tools", and "tooling". A total of 160 research queries is constructed with this method, creating all possible combinations with the desired composition. The keywords used to build the search query are voluntarily broad and include all machining processes. The main advantage of this approach is that it allows for going through all the possible combinations, but it creates a significant amount of "duplicate" results which must be removed. This last step can easily be realized automatically.
As mentioned previously, the research was done in the Scopus database, each combination of keywords is entered individually, and the results are obtained with the "search within all fields" option. For the results of each piece of research, Scopus provided the following informations: Title, authors, year, DOI, abstract, document type, publisher, number of citation, . . . From the 160 search queries, 62,832 articles were obtained. The duplicates were removed based on their DOI. This lead to 8426 unique articles. On average, every search made with this approach yielded 393 results. The most successful search query yielded 3180 results and was • "machining" AND "prediction" AND "artificial intelligence" AND "data" With only two results, the least successful search was • "cutting tool*" AND "prognosis" AND "intelligent data analysis" AND "data" From the 160 combinations of keywords, 19 combinations gave more than 1000 results. From these 19 combinations, it is possible to generalize a search query that would yield most results into this form: • "machining" AND ("prediction" OR "monitoring") AND ("artificial intelligence" OR "machine learning") AND ("condition" OR "data" OR "wear") Table 1. List of keywords. Wildcard character (*) includes these different forms of the "tool" word: "tool", "tools", and "tooling"

Inclusion and Exclusion Criteria
From the previous step in the procedure, 8426 articles are identified. Not all these articles are relevant to answer the research question. To assess the eligibility of each article, it has to be compared with some inclusion and exclusion criteria. These criteria are chosen in accordance with the perspectives of this review and try to be inclusive enough to find all the articles of interest while clearly delineating the research question. These criteria are listed below. Inclusion Criteria:

IC1
The paper describes the application of an AI method applied to cutting tools. IC2 The paper is about turning operation. This information must be clearly stated in the title or the abstract. IC3 The paper is written in English. IC4 The AI must provide information on the tool condition. IC5 The paper has been published between January 2000 and December 2020.
Exclusion Criteria: EC1 The paper must not include instabilities/chatter in the machining process. EC2 The paper must not be a review.
Based on titles and abstract of the identified articles, the inclusion and exclusion criteria were applied to obtain 475 results. At this stage, the inclusion criteria IC2 was not applied to obtain an overview of the application of AI techniques in different machining processes. From the 475 results, 149 articles are in turning, 261 are in milling, and 65 in other machining processes ( Figure 3). These relevant articles are spread across all search queries; in other words, they are distributed uniformly over the 8426 articles identified. It confirms that the semantic approach used to build the search terms was successful. In the following steps, the criteria IC2 is applied such that only articles about turning are analysed.
Due to the large number of articles and the risk to introduce some bias in the sorting previously described, this stage was performed in parallel simultaneously by two researchers. After the independent rating phase, in case of disagreement, a discussion took place to make a final decision for these articles as proposed by Kitchenham [15]. As it is a manual sorting step, it can introduce some bias into the choice of the selected article depending on the appreciation of the researchers. To measure the inter-rater reliability, the Cohen's kappa statistic (κ) was used [30]. This indicator measures the agreement among raters and is computed as: with P o (observed value: 0.93) the observed agreement and P e (observed value: 0.64) the probability of random agreement. After sorting, the obtained κ value was 0.8, which indicates that the raters were in "almost perfect agreement" [30]. This means that the defined criteria were applied in the same way by both researchers. This ensures that the vast majority of relevant articles dealing with the subject of this SLR have been identified and provides quality assessment of the sorting process.

Article Quality Assessment
The previous steps identify 149 articles about turning ( Figure 3). These articles were selected based on their title and abstract, but their contents may still not correspond to the research question of this SLR. The quality of the content of the 149 articles is compared with quality criteria that helps to answer the research questions. The quality assessment is often used to assess differences in the execution of studies [15]. In this SLR, this list of quality indicator is proposed:

QA1
The paper correspond to the inclusion criteria. QA2 The paper presents a clear data input description. QA3 The origin of the data are clearly defined. QA4 The paper clearly describes its methodology. QA5 The pre-processing on the data are explicitly mentioned (if any). QA6 The paper presents an indicator for the quality of their results.
If two or more of these criteria are not met, the paper is rejected. Based on the quality indicators, the rejected articles are articles that do not contribute to answering the research question or do not provide enough information to be fairly compared with others. The analysis of content and the application of these criteria results in 102 articles retained and 47 articles rejected ( Figure 3).

Data Collection and Data Analysis
By following the procedure previously described, 102 articles are selected to be presented in the SLR (Figure 3). In what follows, two types of analyses are performed:

1.
A descriptive analysis (Section 4). This part discusses the trends in the literature and shows the state of the art on the research question. Bibliometric information such as year of publication, authors, number of citations are discussed. A text mapping is also presented.

2.
An in-depth analysis of the AI techniques used (Section 5). A detailed analysis of the content of the articles is carried out and presented in the form of tables. Each category of AI is discussed individually and relevant information in each category are presented. Criteria such as input data, pre-processing, and output are discussed.

Bibliometrics Analysis
From the 102 articles identified with the procedure described above (Section 3), a bibliometric analysis is performed. The aim of this analysis is to present the trend in the literature and give a meaningful picture of the state of the art. As described above, this analysis focuses on the 102 previously identified articles from the last 20 years in the context of AI applied to cutting tools in turning operation.

Distribution in Time
The distribution of the number of publications per year allows for finding if a subject is of interest among researchers. This makes it possible to show trends over the years and reflects the research interest in a subject. Knowledge of these trends allows researchers to align their efforts in particular topics to solve current technological challenges.
The distribution of publications between 2000 and 2020 is presented in Figure 4. It can be observed that there is a steady growth in the number of articles which reflects the growing interest in the use of AI in the industrial sector. This may be related with the evolution of AI techniques and also the amount of data generated by machine tools in the framework of industry 4.0. From Figure 4, it is possible to divide the timeline into three identical periods of seven years: from the period of 2000 to 2006, there is an average of 2.85 articles per year, from the period of 2007 to 2013, there is an average of 4.14 articles per year and from the period of 2013 to 2020, there is an average of 7.57 articles per year. It is observed that the year 2020 was particularly rich in the number of publications which may indicate the rise of a hot topic in the literature.

Repartition of AI Type
From the content analysis of the selected papers, seven "types" of AI stand out. The categorization is the reflection of the occurrence of these techniques in the literature: Other. This category regroups techniques that individually are not sufficiently represented to have their own category. It includes AI techniques such as: decisions trees, K-nearest neighbour classifier, etc.
These different techniques are the subject of an in-depth analysis in Section 5, including tables summarizing the review results. To be exhaustive and take into account articles that in addition to the AI approach addressed a non-AI approach, the category "other (not AI)" has been added.
The repartition of the type of AI used to monitor the tool wear is presented in Figure 5. From this figure, it is observed that the majority of articles focuses on the following AI techniques: ANN, Fuzzy and SVM. These three techniques represent about 60% of all  The repartition of the type of AI by year is presented in Figure 6. It should be noted that some articles discuss several independent AI techniques, which explains why the numbers are not consistent with the repartition by year in Figure 4. This figure allows observing some trends in the literature: • The interest for the ANN approach occupies an almost constant proportion of the literature through the years. • The SVM approach has seen a growing interest in the last five years. • The fuzzy logic approach aroused great interest up to 2011, but it seems less represented in recent years. • Techniques such as HMM and SOM are not sufficiently represented to observe any tendencies. • The CNN approach was first observed in 2020. Therefore, no conclusion can be drawn as to its use. It will be necessary to monitor its use in future years to observe if this is a hot topic of ongoing research. • The other AI techniques have a growing interest too. This may indicate an interest to test new approaches.
The choice to carry out an analysis of the literature over the last 20 years therefore makes it possible to observe the evolution in interest of each of the AI techniques. This allows for identifying all the approaches that have been tried and which approaches are raising the most interest for future trends.

Citations Analysis
Analysing the number of citations helps to identify the most influential papers in a subject. In this SLR, seven types of AI have been identified, and the top 3 of the most cited articles in each technique is presented in Table 2. The number of citations comes from the Scopus database and reports this value as of 1 January 2021. In Table 2, as the HMM and CNN approaches include only three articles, all articles are listed, resulting in a citation number of 0 for 3 articles.
Hereafter, the top 3 of the most cited articles in all categories of AI are presented (in boldfacetype in Table 2): 1.
The most cited article is [31] (Salgado and Alonso, 2007) and concerns the uses of SVM with easily accessible signals such as sound, motor current and cutting parameters to make a real-time estimation of VB B . 2.
The second paper that receives the most attention is [32] (Wang et al., 2002). It uses an HMM approach using a pre-processed vibration signal to classify the state of the tool ("new tool" or "worn tool"). On average, all identified articles are cited 18 times. Even if the HMM approach is not strongly represented in the literature (Section 4.1), it appears that that publication [32] has been cited a significant number of times. However, this citation analysis may present some bias. Indeed, for a paper to accumulate citations, it has to be published for a sufficient time to be read and cited. It is therefore normal to observe a majority of papers published before 2015 in this analysis. Note that the number of citations was not always available for all 102 articles.  [37] "Online tool wear prediction system in the turning process using an adaptive neuro-fuzzy inference system" Rizal et al., 2013 Table 2. Cont. The other articles come from either journals or conference proceedings that generally have only a single article taken into account in the framework of this SLR.

Authors
The identified articles were written by 280 different authors. Among those, only 13 authors have published more than three articles on this subject.
A co-authorship analysis is performed using VOSViewer (a tool for creating, visualizing and exploring meaningful map of items of interest [50]). A mapping of the different authors is presented in Figure 7. The size of each item (authors name and associated circles) indicates the occurrence of the authors in the literature. A larger circle means that the corresponding author appeared more in this SLR. Each cluster represents an author group, i.e., usual co-authors. The larger a cluster, the more authors there are in it. It is observed that there are some clusters which are larger and bigger than the others. This is generally a specific lab that works on the subject of this SLR. It is observed that most of the clusters are isolated, and there is no link between them. This does not mean that the authors do not cite each other, but it does indicate that there is very little inter-lab collaboration in the redaction of articles beyond usual co-authors.

In Depth Analysis
Below, an analysis of the content of the articles is carried out from the different categories of AI, namely ANN, Fuzzy, SVM, SOM, HMM, CNN, and other. This analysis is broken down into two stages: • The content of the articles is presented in table form (Tables A1-A8). Note that some articles present several approaches, such that they can appear multiple time in the tables of related techniques. • A general analysis of the content of the tables for each technique is carried out (Sections 5.1-5.7). For each AI approach, a presentation of the most used: inputs, pre-processing and output is presented in the form of figures that relate the number of article using the corresponding approach. If there is less than four articles for a technique, a short summary of each article is presented. The articles that compare their results with others approach is also listed. Finally, a small conclusion is made highlighting the use of each AI technique.
The presentation of the articles follows the logical steps to use the AI technique (Section 2.2): 1.
Data collection. In this SLR, the sensors used to obtain the data are not explicitly presented, as previous reviews covered the subject extensively [12].

2.
Features extraction. This is the pre-processing of the data, for the sake of this SLR, these techniques have been grouped into different categories: (a) Statistical approach. This approach extracts descriptive statistics such as: mean, kurtosis, skewness, . . . , from the temporal signal.
(b) Frequency approach. This approach extracts information from the frequency domain. In this SLR, this is commonly achieved by a Fast Fourier Transform (FFT). (c) Time-frequency approach. This approach extracts information from the timefrequency domain. In this SLR, this is generally done with a Wavelet Package decomposition. (d) Normalization. It consists of scaling the variables between 0 to 1. This allows the AI to learn in a more stable and faster way.
Others. This regroups specific approaches. In this SLR, the "others" category generally refers to features extracted from images.
None. An article which does not explicitly describe any pre-processing techniques is found in this category. Note that some sort of pre-processing is always needed.
Features selection. To avoid any confusion, this part is not systematically presented in this SLR. Indeed, the articles rarely present the feature selection extensively and, in many cases, the selection is made by the authors. Since little and incomplete information is available in the articles, it is therefore not relevant to compare the articles on this point.

4.
Input In this review, a categorization of the input data is realized, and the input is categorized as: cutting speed, feed rate, depth of cut, cutting time, spindle information, cutting forces, Acoustic Emission (AE), vibration, surface roughness, images, temperature, and others. The first three inputs listed above are data characterizing the cutting conditions of the turning operation. These data are not correlated with the health of the tool but are usually used to give context to the other measured data.
In this study, these data are presented in the "input" column in the different tables of results and figures related to the approach.

5.
Inference In this SLR, the output is categorized in: classification, estimation of VB B , prediction of VB B , remaining useful life (RUL) and others. The difference between "estimation" and "prediction" aims at disambiguating the word "prediction" used by authors indistinctly for "estimation" (i.e., "monitoring of the current state of the tool") and "prediction" (i.e., "prediction of the future state of the tool"). This nuance is discussed in Section 6. 6.
Accuracy Comparing the accuracy of different AI techniques is complex, as different authors use different indicators (Root Mean Squared Error, Mean Absolute Error, accuracy term, . . . ) to evaluate their approach. Categorizing the results would be too subjective as it would be necessary to define precision intervals. As all the approaches are very different and use various datasets, it was chosen to refer to each article authors' assessment to judge the accuracy of the approach. All articles cited in this SLR have obtained a "good" performance. Objectively, the average accuracy of all classification methods is around 90%. As it is discussed in the following sections, the majority of authors that performed a comparison of an AI approach against a standard statistical approach have shown that the AI approach performed better than the statistical approach.
This approach allows for identifying the different trends in the literature for each AI selected techniques. It also allows for highlighting the difficulties and opportunities offered by each approach.

ANN
As presented in the bibliometric analysis (Section 4), the artificial neural network is the most popular decision-making method used to monitor the tool life. An explanation of this approach is presented in Appendix B.1, and Table A1 presents the 44 articles about ANN.

General Architecture of the Neural Network
The architecture of a neural network is often defined by trial and error and can have a major influence on the quality of the results. From the 44 papers about ANN, 17 propose an architecture with one hidden layer [33,41,46,[52][53][54][55][56][57][58][59][60][61][62][63][64][65], two propose an architecture with two hidden layers [66,67] and the rest does not give any information about the network architecture. From these papers, the most common architecture of the neural network to monitor the tool wear is presented in Figure 8. This architecture is composed of three layers: one input layer, one hidden layer and one output layer. The number of neurons in the input layer depends on the data used as inputs. In the hidden layer, the number of neurons is around 3 to 5. Finally, the output layer is generally composed of one neuron that gives information about VB B (Figure 8). This architecture therefore appears to be the most efficient to resolve the subject of this SLR. Apart from the network architecture, other considerations exist. An exhaustive comparison of different types of NN is presented in [68]. The influence of the optimizer is studying in [69,70]. A comparison between different architectures and activation functions is made in [55]. It is also worth noting that [71] proposes an approach based on a recurrent neural network.

Input
The different features used as input of the neural network, based on the 44 articles on the subject, are presented in Figure 9. The six most used features are: cutting speed, cutting forces, feed, depth of cut, vibration and acoustic emission (AE). In these data, only three are dependent on the state of the tool: cutting forces, vibration and AE. The three others are cutting parameters that are necessary to help the neural network to identify the cutting conditions.

Pre-Processing
To extract useful information about these signals, pre-processing techniques are employed. Figure 10 shows the pre-processing techniques used in this context. As the two most used pre-processing techniques are: None and Statistic, it appears that, for ANN, complex techniques are not always required. When a pre-processing is applied, it is a very simple one that consists of computing the skewness, kurtosis, etc. Note that the category "None" refers to articles that do not present the pre-processing technique used. As some sort of pre-processing is always needed, it indicates that these articles only compute simple statistic value on the "raw" signal from the sensors such that it does not require extensive presentation of the pre-processing technique. This observation confirmed what is presented in Appendix B.1: ANN is robust to noisy data such that advanced pre-processing techniques are not needed for all the inputs.

Output
The output of the neural network is presented in Figure 11. The vast majority of papers use the ANN as a regressor to estimate the value of VB B in real time ("estimation of VB B " in Table A1 and Figure 11). Some papers also use the neural network as a classifier, and the classification is on average performed for three states. The RUL is used in [72,73] with the latter giving an upper and lower limit for its prediction. Some articles use a different approach: [57] monitors the percentage of tool life, [74] monitors the tool life and tool-shim interface temperature, [75] monitors the surface roughness to latter predict the flank wear and [59] monitors surface roughness and flank wear.

Comparison
Some articles present the results of the ANN by comparing its performance with those of classical regression methods [59,61,67,73,[75][76][77]. Some of them also completely describe a regression method in their approach [59,61,67,75,76]. In all these papers, the observation is that ANN performs better than the classical regression methods. The advantage of ANN over the regression model is its ability to continuously learn from the data, making it more efficient at monitoring the tool wear.

Conclusions
The great interest in this technique can be explained by its history in the framework of tool monitoring. Indeed, before the year 2000, the majority of publications concerning the use of AI applied to cutting tools concerned the use of neural networks [10]. This can be explained by the ease of implementing neural networks and the great interest that this technique received among researchers. Today, many accessible libraries make it possible to code a neural network easily, e.g., in Python, TensorFlow [78] and Scikit-Learn [79] are two commonly used libraries. It is also well known that ANN are efficient to model highly nonlinear models which is our case of interest [80]. One of the major drawbacks of ANN is its "black box" aspect. In today industries, the explicability of AI is a subject of great interest [81,82]

Fuzzy Inference System and ANFIS
This section presents the results for the ANFIS and FIS approaches. It was chosen to group these two approaches because the observations made in the following are identical for both. The 26 articles are presented in Table A2 for the FIS and Table A3 for the neuro-fuzzy approach.

Input
The inputs used for the fuzzy models are presented in Figure 12. It is observed that the most used input is the cutting forces followed by the AE and feed rate signals. With 17 out of 26 articles using it, the cutting force is the most used input signal. A particular approach using temperature as well as the power and voltage of the machine as input signal is presented in [93].

Pre-Processing
The pre-processing technique use is presented in Figure 13. The most used preprocessing techniques are: None and Statistic. It indicates that the fuzzy logic approaches do not require elaborated pre-processing technique to obtain a good accuracy.

Output
The output of the fuzzy models is presented in Figure 14. The vast majority of articles use the fuzzy logic to monitor the flank wear VB B .

Comparison
For the FIS, the papers [86,87] perform a comparison of their approach that uses a Tagaki-Sugeno-Kang Fuzzy Approach Based on Subtractive Clustering Method with a neural network, neuro-fuzzy and Mamdani fuzzy logic. The approaches proposed in these papers have a lower root mean squared error on the prediction of VB B .

Conclusions
The fuzzy approaches occupied a large amount of the literature around the 2010s; today, they are much less represented in the literature. There is nothing that explain this decline in interest. It seems that the interest by the researcher has evolved such that these approaches have been replaced. The interest of fuzzy approaches lies mainly in their explicable aspect. Unlike neural networks, a fuzzy approach is not a black box. This is a great feature, especially in shop floor applications where the fuzzy approach can be of interest by providing more information than a simple prediction. In [41], the low processor time is also cited as a key feature of the fuzzy approach compared with neural networks.

SVM
As mentioned in Section 4.2, SVM received great interest over the past five years and is one of the most used techniques to monitor the tool wear with 18 articles. A presentation of these articles is provided in Table A4.

Input
The input features used for the SVM approach are presented in Figure 15. It is observed that the most used input is the cutting force. It also appears that the use of images to generate features is more commonly used in this AI technique than with other approaches (except CNN).

Pre-Processing
Due to the sensibility of SVM to noise and outliers, it appears that this approach requires more pre-processing on the data than ANN. Indeed as Figure 16 shows, the vast majority of the approaches show the use of a pre-processing technique. This pre-processing is performed in the time-frequency domain and some statistical features are computed.

Output
The output of the SVM is presented in Figure 17. As described, SVM was initially only used in classification purpose. Here, the number of articles that use SVM for regression and classification are almost the same.

Comparison and Particular Approaches
In a particular article [102], the authors use the SVM approach to make a short-term prediction of VB B at time t and also at time t + 1 ("prediction of VB B "). This approach uses SVM coupled with a genetic algorithm to perform this task. A comparison is realized with an AutoRegressive Integrated Moving Average (ARIMA) model and the SVM approach developed in that paper. The authors demonstrate that the SVM approach performs better than the ARIMA model.
In another instance, it is proposed to use the cutting forces to classify the state of the tool into three states ("initial", "normal" and "severe" wear) [103]. The cutting forces are pre-processed with several techniques: normalization, statistical features and timefrequency domain. This is followed by a correlation analysis to find the best features. These

Conclusions
Applications of SVM appeared later in the literature and remain a relatively modern approach. As this method is more dependent on the quality of the input data, some feature extraction is required to improve the quality of the inference. Fitting the correct parameter in the SVM approach is a complex task, but it appears that the quality of the inference made with this approach is at least at the same level as with an ANN approach. In practice, this method may be more difficult to implement in the industrial sector due to its sensibility to noisy data, and it needs a pre-processing approach. However, SVM has the advantage of a good generalization performance which could make this approach particularly interesting in the future.

SOM
Only 7 articles described this approach and are presented in Table A5.

Input
The inputs of the SOM are presented in Figure 18. Due to the nature of this approach, only inputs that are highly correlated with the tool wear are used. Indeed, the cutting parameters such as cutting speed, depth of cut, feed rate, etc. are not used as input of the SOM contrary to other approaches.

Pre-Processing
The pre-processing techniques used are presented in Figure 19. It is observed that the statistical features and frequency domain features are the most used pre-processing approaches. As presented in Appendix B.4, this method is generally robust against noisy data but computing the kurtosis, skewness, average value, etc. is an easily accessible way of improving the results obtained with this approach and reducing the learning time.

Output
This approach is only able to perform classification as observed in Figure 20.

Comparison
Some articles present the SOM with another method. In [2], a comparison between SOM, SVM and k-nearest neighbour approach is realized. In [41], the use of SOM, neurofuzzy and ANN is discussed and software is presented for shop-floor application. In general, the SOM approach does not outperform the others but proposes a simple and accessible classification solution. However, the lack of generalization capabilities of this approach requires training under all cutting conditions, which is not always necessary with other methods.

Conclusions
The SOM approach is quite different from the others by its unsupervised learning aspect. Some authors use this aspect to try to select the best features independently of classical correlation approaches [107]. Other uses this aspect to work with imbalanced data and uni-sensor approach [2]. In an industrial application, imbalanced data often constitute the only available data, and this technique can therefore be used in this context. The advantage of low computation time is also highlighted [41].

HMM
The HMM approach is not strongly represented in the literature with only 3 articles: [32,46,47]. These articles are presented in Table A6. They are all used to perform a classification on the tool state.

Presentation of the Articles
In Ref. [32], a discrete hidden model uses the wavelet transformation on the vibration signal in the feed direction to classify the tool state within two states ("sharp" or "worn" tool). This approach achieves a hit rate up to 97%. The authors try this approach under different training conditions (e.g., length of training data and variation of observation sequence length), and the HMM performs well in all conditions. In Ref. [47], a Discrete Hidden Markov Model (DHMM) is used in combination with an SOM. They use the cutting force and the acceleration to monitor the state of the tool and these features are pre-processed with a FFT and coded with a SOM. Their approach performed well, and a processing time of around 0.2 s is reported, which makes this approach usable in online applications.
In Ref. [46], the use of ANN is compared to an HMM to monitor the tool wear. The cutting force signal is used as input, pre-processed with multiple techniques: statistical (mean, skewness, etc.), frequency (FFT, PSD) and time-frequency (wavelet and spectrogram). The best features are chosen by the researcher following a correlation analysis. It is concluded that both approaches perform well with the advantage of HMM to be extremely easy to train but limited to a certain amount of output values (classification).

Conclusions
Even if this approach presents only 3 papers in the literature, the results reported in these publications show that this technique is capable of realizing a good quality classification. In theory, there are Markov models capable of performing a regression, but these have not been tested yet in the literature. As the classification of the tool state is not the most common approach, the low interest in this technique can be explained by this consideration. Moreover, as stated in one of these articles, in comparison, a neural network is able to have the same precision as an HMM. However, the neural network needs a little more trial and error to determine the best architecture for learning [46]. HMM therefore does not seem to present any major advantages, which leads researchers to not favour this approach over another.

CNN
The use of CNN has recently emerged. Only 3 articles discuss the use of CNN to monitor the tool wear: [9,48,49]. They are presented in Table A7. All of these articles have been published in 2020 showing a new interest in this techniques (see Section 4.2). These articles almost use the same approach.

Presentation of the Articles
In [49], the authors used the image of the insert to compute the value of VB B . This is achieved by identifying the wear area on the tool. The identification of this zone makes it possible to count the number of pixels which allows finding the dimensions of the wear zone. The authors perform a data augmentation on the initial dataset to improve the robustness of the CNN. In [9], a similar approach is followed, but the images of the insert are used directly to classify the state of the tool. In this study, the classification is made within two classes: "GO/NO GO". Both articles report accuracy greater than 90%.
Based on the dependency of chip colour on the cutting tool temperature, and the influence of tool wear on tool temperature, it has been proposed to use the chip colour to monitor the tool wear [48]. Images of the chip are pre-processed to remove the background and a kernel density estimation is performed. They show that this density estimation is more efficient on the hue channel from the Hue Saturation Value (HSV) channels of the images. This is used as an input of the CNN which performs a classification around three classes depending on the class of the wear: "New", "Medium" and "High". They achieve accuracy of above 90%. A comparison is made with a functional data analysis (FDA) classifier, and it fails to beat the CNN.

Conclusions
The late interest in this technique can be explained by the complexity to obtain a significant number of images for the datasets. Indeed, this technique uses images of the tool which can be impractical to obtain in industrial applications. These approaches work greatly in laboratory conditions but still require human intervention to achieve them. Comparing this approach with the others presented in this paper, automating this approach seems more difficult. In addition, in real production applications, turning is often performed with cutting fluids which can further complicate image capture. Indeed, the conditions for taking images must be consistent and the light must be controlled so as not to influence the neural networks [48]. Today, to our knowledge, there is no automated machine allowing the introduction of a capture system of this type which may explain the low interest raised by this technique. This lack of interest can also be explained by the false belief that CNN is more difficult to implement than ANN [108]. Furthermore, this approach requires stopping the machining process to allow the picture to be taken which removes all the interest of monitoring in real time. However, the advantage of this approach is that it makes it possible to directly measure the wear on the tool rather than going through correlated indicators such as the cutting forces, for example. Automating this approach would be equivalent to automating the measurement of VB B , which would provide important data for other forecasting techniques. By using the chip colour instead, it could then be possible to carry out the measurement without interrupting the process [48].

Other
This section covers all approaches that are not related to the other categories. The articles are presented in Table A8. It can be noted that, among these methods, there is a majority of decision trees and classifiers. These approaches are generally compared in pairs with others, hence they do not have a dedicated section in this SLR. Table 3 summarizes the top 3 inputs, pre-processing, and outputs for each AI techniques. From this table, the different methods are compared.

General Conclusions
For ANN, 2 of the 3 most used inputs are cutting parameters: cutting speed and feed rate. This extensive use of cutting parameters as input signal appears to be specific to ANN as this characteristic is not observed for other methods. All AI techniques extensively use the cutting force as input signal as it is present in the top 3 of each AI technique (except CNN). Vibration and AE signal are also commonly used. It is worth noting that, except for CNN, SVM is the second method that uses features extracted from images as input.
ANN and ANFIS/FIS techniques do not appear to require elaborate pre-processing as most of the articles do not specify the pre-processing techniques employed ("None" in Table 3). SVM, SOM, HMM and CNN all require special attention to the feature extraction techniques, especially SVM and HMM, which benefit from more advanced pre-processing techniques in the time-frequency domain. For the majority of methods, the computation of simple statistic characteristic on the signal is commonly used. By its nature, CNN does not require the same kind of pre-processing technique as it mainly uses images as input.
Two outputs are predominant in the use of AI techniques: estimation of VB B and classification. In the analysis of AI technique by year (Section 4.1), it was shown that ANN, fuzzy and SVM techniques are predominant in the literature. It is observed that these three techniques are the only ones with "Estimation of VB B " as the most represented output. Indeed, approaches such as SOM, HMM and CNN are mainly focused on classification. It is therefore observed that techniques monitoring the value of VB B are favoured over a simple classification purpose.  (17) Cutting Force (7) Vibration (6) Cutting Force (2) Image(3) Feed rate (24) AE (9) Image (5) Cutting force (5) Vibration (2) / Cutting force (22) Feed rate (9) AE (4) Noise (2) / /

Pre-processing
None (16) None (12) Time-frequency (8) Statistic (6) Time-frequency (2) Image processing (3) Statistic (12) Statistic (6) Statistic (8) Frequency domain (5) Frequency (2) / Frequency domain (7) Other (3) Other (5) Time-frequency (2) Normalization (1) (7) Classification (3) Classification (2) Classification (7) Classification (5) Classification (9) / Estimation of VB B (1) Estimation of VB B (1) Other (6) / Prediction of VB B (1) / / / As it is discussed in Section 6, comparing AI methods on the results they are able to achieve is not possible. Since each approach is carried out on a different dataset, with different indicators, with different inputs, etc. comparing these results would not provide a clear and unbiased analysis of the literature. Despite this fact, it is described in the previous sections that articles that compare the AI techniques with classical methods such as statistical model, show that the AI techniques outperform the classical models.

Discussion
As indicated by the various authors of the identified publications, all approaches presented in this SLR were able to obtain good results in monitoring the cutting tool health, some authors were even able to obtain more than 90% accuracy. The majority of these approaches have the common point to monitor the value of VB B to assess the tool state. Unfortunately, there are few other approaches that try to evaluate the tool health differently. Some papers try to monitor the tool wear with other indicators such as its percentage of elapsed tool life, and others try to predict the RUL. This lack of other approaches may come from the difficulty to define a clear end-of-life criterion to cutting tools. In the ISO 3685 standard [19], the maximum value for VB B is set to 0.3 mm as the end-of-life criterion, but, in real application, this value may not represent the cutting tool end-of-life, depending on the industrial applications and its tolerances. As this end-of-life can be very variable, having approaches based on the RUL or other time indicator can be very difficult to adopt for industrial applications. It is therefore understandable why researchers are more interested in the value of VB B rather than trying to define a precise end-of-life criterion. It is up to the AI user to define the criteria corresponding to its application. By proposing an approach that monitors VB B , the researchers ensure that their method can be applicable in real applications.
Knowing the current condition of a tool is obviously of great interest in condition monitoring. However, as the industry evolves, predictive monitoring tends to be the preferred approach. The current state of the art shows that there is too little research done to meet the future needs of industries in predictive maintenance, i.e., to predict the coming evolution of the tool condition. In recent years, advanced statistical models tend to answer this problem but still require a lot of tuning to obtain results. The use of AI could perhaps meet these needs of industries while guaranteeing models robust enough to be implemented on shop-floor applications. At this stage, it is not possible to assess whether this lack of interest in predictive maintenance is due to a lack of interest from the researchers or if it comes from a publication bias. Indeed, in this SLR are only listed articles that present relatively good results, and this may either be due to the fact that all techniques were always able to perform well, or that the publications which do not get good results are not published or not submitted for publication. It is therefore possible that, currently, research in predictive maintenance is not published due to a possible lack of results.
Due to the different nature of the AI techniques, it is not possible to make a comparison between each of them. Many articles compare their approach with other methods. In this SLR, these articles are listed and their results presented, although they are often contradictory. Indeed, an approach presented in any of the selected articles is often presented to be the most effective compared with others. Given these contradictions, it is impossible to identify the best approach, if one exists. This may come from two reasons: (i) the publication bias previously discussed (the papers are not submitted or published if their approach is not better than another one), (ii) there is no best approach. Moreover, out of 102 articles, 98 performed a test campaign to obtain their dataset. As these data have been collected with different cutting conditions, materials, tools, . . . it is not relevant to compare the results of these approaches. In turning, there is no benchmark dataset allowing for fairly comparing all the approaches, whereas benchmarks exist for other categories of machining. In milling, for example, some datasets are available on the NASA website [109], on the PHM website [110], . . . This lack of unified benchmarking in turning does not allow a clear view of the performance and is a clear limitation of the comparison of performances in all these articles, which can therefore only be analysed objectively on the bases of their methodology instead of their results. The results obtained are therefore highly dependent on the quality of the data recovered during the tests and some data may be more fitted to certain approaches than others. Consequently, the pooling of various test data would allow the establishment of benchmarks. These benchmarks should aim to be the most representative of industrial reality. This lack of unified dataset can also be in relation with the co-authorship analysis performed in Section 4.5: as there is little inter-laboratory collaboration in the topic of this SLR, no data are shared between researchers.
It is also observed that the title of the articles could be confusing. Indeed, many articles incorrectly used the word "prediction" to talk about "monitoring". As a matter of fact, out of the 102 identified articles, 34 wrongly use the word "prediction" in their title and 62 in their title or abstract. The word prediction, in the English language, normally evokes the future state as defined in the Oxford dictionary: "Prediction is a statement that says what you think will happen" [22]. As shown in this SLR, the majority of articles actually use the AI technique as a monitoring system (i.e., to produce an estimate in real time). There is no forecasting of the future state. This abuse of language likely leads to confusion given the previous remark on the emergence of predictive maintenance.

Answers to the Research Questions
This section presents an answer to the research questions identified in Section 3.1. The answers to these questions are distributed through this SLR, but the main results are summarized here.

RQ1 How is AI used in the framework of monitoring/predicting the condition of tools in stable turning condition?
This question is answered throughout this SLR and RQ2 to RQ5. Several approaches have been mentioned and there is not one that stands out from the others (mainly due to the lack of unified benchmark). Each approach can find its place depending on the means and equipment made available to the monitoring system. However, as discussed in Section 6, it appears that, in the current state of the art, too few papers concern the evolution of VB B , as they rather monitor the current state of the tool, which could be a problem in the framework of predictive maintenance in industry 4.0.
RQ2 What are the most used AI techniques used in this context?
In Section 5 of this SLR, all the AI methods were presented. Among them, there does not appear to be a more efficient approach than another. Each approach has its advantages and disadvantages. However, it appears in Section 4.2 that approaches from neural networks, fuzzy logic and support vector machines are particularly present in the literature. They all have the ability to be integrated into workshops. Neural network approaches have the advantage of having been widely studied but have a "black box" aspect which could cause reluctance in practice. Fuzzy logic seems to no longer be considered even though they had the advantage of presenting the results in an easily understandable way. In addition, finally, SVM approaches seem to perform as good as ANN and fuzzy approaches but require special attention to data pre-processing and the choice of parameters.
RQ3 What is the forecasting horizon of each of the identified AI techniques in this context? Figure 21 presents the outputs of all the AI techniques presented in this article. It appears that the majority of articles use AI to monitor the current state of tool wear (estimation of VB B ). A classification purpose is also realised to detect the actual state of the tool. Some papers also present other kinds of output such as the percentage of tool life. Only five articles propose an approach that consist of a prediction of the future state. This is achieved by computing the remaining useful life (RUL) or a prediction of VB B at a future time (generally called "t + 1"). As discussed in Section 6, there is a lack of application in predictive maintenance and the actual state of the art is limited to the actual monitoring of the tool.

RQ4
What are the most common inputs used with the different identified AI techniques in this context?
As AI techniques are data-driven, the most common features have a high correlation with the cutting tool state. The repartition of all input data for all techniques is presented in Figure 22. The cutting force is the main input as it appears in 54 of the 102 articles. Features that correspond to the cutting conditions such as: feed rate, cutting speed, and depth of cut are also used a lot. These features are generally constant through the cutting process but are important to give context to the correlated data such as the cutting force, vibration signal, . . . These cutting parameters thus have a significant impact in the ability of the AI technique to perform the monitoring of the tool.

RQ5
What are the most used features extraction techniques in this context?
The features extraction technique for each AI approach is presented in Section 5. Figure 23 presents the repartition of all pre-processing techniques in this SLR. It is important to note that, depending on the approach, the pre-processing technique can be different. For example, it is observed that using ANN does not need much pre-processing while, for SVM, the pre-processing technique is necessary.

Conclusions
This paper presents a systematic literature review covering the use of AI techniques in the monitoring of cutting tool health in stable turning condition. The review protocol allowed for identifying 8426 articles in which 102 articles are corresponding to criteria of interest (Section 3). The review protocol followed the procedure recommended by Kitchenham [15]. During the definition of the research procedure, a new approach that aims to improve the quality of it was proposed. Some conclusions can be made about them:

•
In the research procedure for this SLR, the choice to use a semantic approach to build the keywords was successful. It was chosen to construct each search query with a combination of four keywords so that each correspond to one part of the semantic approach (Section 3.2.2). The results show that this approach allows for browsing all of the literature on the subject of interest. However, it needs some additional sorting steps such as removing the "duplicated" items. Due to the number of articles identified, this last step cannot be performed manually. This approach is particularly interesting in the context of a systematic literature review.

•
In the research procedure, to evaluate the inter-rater reliability, the Cohen's kappa [30] was used to evaluate the agreement among raters (Section 3.3). This gives an unbiased judgement of the quality of the sorting which ensures that the research was carried out in an unbiased manner.
The bibliometric analysis of the 102 articles was performed in Section 4 and shows the trends in the literature over the last 20 years. Several aspects were discussed including the distribution of the number of articles per year (Section 4.1), the number of article citations (Section 4.3), the identification of reference journals (Section 4.4), and a discussion on the authors (Section 4.5). Some general conclusions can be drawn:

•
There is a growing interest in the use of AI techniques to monitor the tool wear in turning operation. This is observed by the increasing number of publications by year in this domain (Section 4.1). • Approaches using neural networks are predominant in the subject of this SLR with 44 articles (Section 4.2). This may be linked to the ease of implementation of this type of technique and to its robustness to handle noisy data. Fuzzy logic and SVM approaches are also often used by the researchers with 26 and 18 articles, respectively.
An in-depth analysis of the six AI approaches identified in this SLR was carried out and presented in the form of Tables A1-A8. A discussion for each approach was performed in Section 5. Some general comments can be provided: • Some articles compare the AI techniques and the classical statistical approach. It appears that the AI techniques outperform the classical approach. The main advantage of the AI technique over the classical approaches is that the AI are able to easily adapt to multiple cutting conditions without any further researcher expertise. This makes AI an interesting approach in the industrial context. • Almost all the articles presented in this systematic literature review perform a dedicated test campaign to obtain their own data. This makes it difficult to determine the best approach. An establishment of a benchmark dataset would allow for comparing the different approaches on the same data allowing the researchers to have a clear view of the best approach. • There is a vast variety of approaches to monitor the tool wear, but the accuracy is usually above 90%. It indicates that, no matter the approach, AI techniques can be used to monitor the tool health. Due to the wide variety of approaches and the differences in the AI techniques employed, producing an effective horizontal comparison between the methods is not feasible. • The forecasting horizon is actually limited to the real-time monitoring with only a few applications corresponding to predictive maintenance.
To conclude this SLR, some future scopes that need to be addressed in the use of AI to monitor the tool wear are highlighted: • The implementation of a common benchmark dataset is needed to allow an objective comparison of the performance of proposed approaches. • Since all the approaches mainly focus on real-time monitoring of the tool state, the predictive maintenance aspects need to be considered a priority. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare that they have no conflict of interest.
Appendix A

Appendix B
This section presents an introduction to the most used AI techniques presented in this SLR. The descriptions of the method are not exhaustive but allows a comprehensive introduction to some of the concepts that are used in this article.

Appendix B.1. Artificial Neural Network-ANN
The Artificial Neural Network (ANN) is one of the most popular AI techniques. Inspired by the human brain, the architecture of an ANN is composed of at least three layers: an input layer, some hidden layers and one output layer ( Figure A1) [141]. Each layer is composed of neurons which are interconnected with the neurons in the following layer. Each neuron has an associated activation function and the weights of the interconnections are updated in the training phase to obtain the desired relationship between the input and the output. The advantages of ANN are their ability to reproduce highly nonlinear input-output characteristics, their ease of software implementation and their robustness to noisy data. It is suggested that ANN can achieve similar performance to any other methods given enough resources [80]. The main downside of ANN is their need for a large amount of data and processing power to learn from the dataset, the trial and error process to choose their parameters and the fact that they are to be considered as black boxes [29]. ANN can be used to achieve regression or classification.

Appendix B.2. Fuzzy and Adaptative Neuro-Fuzzy Inference System-FIS & ANFIS
In the framework of this paper, two fuzzy approaches are considered: Fuzzy Inference System (FIS) and Adaptative Neuro-Fuzzy Inference System (ANFIS). Both are based on the fuzzy logic originally proposed by Zadeh in [142].
An FIS is generally composed of five elements: (i) a set of "if-then" rules; (ii) a database for the membership function; (iii) an inference operation system; (iv) a fuzzification interface; and (v) a defuzzification interface [143] (Figure A2). Different types of fuzzy reasoning exist (types 1, 2 and 3). In this SLR, the vast majority of FIS are based on approaches that try to generate the fuzzy rule based on the input-output dataset rather than with the researcher knowledge [144].
ANFIS is the integration of a network that optimizes a FIS. This optimization is achieved by associating and optimizing a series of fuzzy "if-then" rules with the appropriate membership function [145]. This approach was first proposed in [143] and combines the capability of adaptation of neural networks while being comprehensible with its set of fuzzy rules [143,145].
Both approaches have the advantages of fuzzy logic: robustness and being easily understandable, and hence a natural implementation. However, the learning capabilities can be limited. Appendix B.3. Support Vector Machine-SVM Support Vector Machine (SVM) is based on statistical learning. This supervised technique was originally developed for classification (Support Vector Classifier, SVC) purposes, but it was later adapted as a regressor (Support Vector Regressor, SVR). The aim of SVC is to find an optimal hyperplane that best separates the data classes by minimizing a loss function. In nonlinear problems, the input can be mapped into a higher-dimensional Hilbert spaces through a kernel function such that the nonlinear approach is transformed into a linear approach in the higher-dimensional space [146] (Figure A3). A similar approach is used for SVR. In this approach, the continuous output is the hyperplane that minimizes the loss function [147]. SVM has the advantages of being efficient on small datasets and providing a great generalization, but the training time can be high on bigger datasets, they are sensitive to datasets with noise and outliers, and fitting the correct parameters can be difficult (Kernel function).

Kernel function
Input space Output space Figure A3. SVM principle.

Appendix B.4. Self-Organizing Map-SOM
This unsupervised learning method is a kind of neural network sometimes referred to as a Kohonen map [141]. The neural structure is made to represent the high dimension input into a typically two-dimensional map. In other terms, this is a nonlinear transformation from a higher dimensions space to a one-or two-dimensional map ( Figure A4). An interesting feature of the SOM is that they preserve their neighbourhood such that, if the data are in the same region in the input space, they are also in a similar region in the map [148]. These maps are generally used to perform classification. The main advantage of SOM is its ability to process incomplete data, its ease of interpretation, and its learning speed. The main disadvantage is that it requires a large number of parameters and may require optimization to perform well.

Appendix B.5. Hidden Markov Model-HMM
Hidden Markov Model (HMM) is based on Markov chains. Markov chains describe a system by a finite number of states. In a Markov chain, the history of the system is irrelevant and only the current state influences the future state. Two categories of models exist: the continuous and discrete models. In continuous model, the changes of the system can occur at any time, while, in discrete model, the change can only occur at given time steps [149]. The particularity of HMM is that the Markov model underlying the data are unknown such that the data are the observation of this model [150]. In other terms: This model is a doubly embedded stochastic process with an underlying stochastic process that is not observable, but can only be observed through another set of stochastic processes that produce the sequence of observations [150]. HMM are a well-studied probabilistic model but cannot model dependencies between hidden states. Appendix B.6. Convolutional Neural Network-CNN Convolutional Neural Networks (CNN) have seen a growing interest in the past several years for tool monitoring applications. The architecture of a CNN is an extension of an ANN. The main difference comes from the presence of a convolutional and a pooling layer which allows for extracting relevant features before the fully connected layers which perform as classical ANN. The convolutional layer aims to apply filters to the input while the pooling layers gradually reduce the dimensionality of the representation [108] (Figure A5). These two extra layers make CNN a deep learning approach (i.e., the features of interest are extracted autonomously by the model). The dimensionality of the input is generally higher than a classical ANN such that these types of networks are much more complex and need more computing power than classical ANN [151]. In general, CNN also primarily focuses on images and pattern recognition.

Input
Convolutional Layer

Convolutional Layer
Pooling Layer Pooling Layer Fullyconnected Layer Output Figure A5. CNN principles.