Intelligent Identification of MoS2 Nanostructures with Hyperspectral Imaging by 3D-CNN

Increasing attention has been paid to two-dimensional (2D) materials because of their superior performance and wafer-level synthesis methods. However, the large-area characterization, precision, intelligent automation, and high-efficiency detection of nanostructures for 2D materials have not yet reached an industrial level. Therefore, we use big data analysis and deep learning methods to develop a set of visible-light hyperspectral imaging technologies successfully for the automatic identification of few-layers MoS2. For the classification algorithm, we propose deep neural network, one-dimensional (1D) convolutional neural network, and three-dimensional (3D) convolutional neural network (3D-CNN) models to explore the correlation between the accuracy of model recognition and the optical characteristics of few-layers MoS2. The experimental results show that the 3D-CNN has better generalization capability than other classification models, and this model is applicable to the feature input of the spatial and spectral domains. Such a difference consists in previous versions of the present study without specific substrate, and images of different dynamic ranges on a section of the sample may be administered via the automatic shutter aperture. Therefore, adjusting the imaging quality under the same color contrast conditions is unnecessary, and the process of the conventional image is not used to achieve the maximum field of view recognition range of ~1.92 mm2. The image resolution can reach ~100 nm and the detection time is 3 min per one image.

In the prior art optical film measurement, the atomic force microscope (AFM) has various disadvantages, such as relatively limited scan range and time consuming; thus, it is unsuitable for large-area quick measurements [20,21]. Raman spectroscopy (Raman) is usually only capable of local characterization within the spot, which results in a limited measurement rate; hence, it is unsuitable for large-area analysis. Transmission electron microscopy (TEM) and scanning tunneling microscopy can be characterized at a high spatial resolution of up to atomic scale [22,23]. However, both techniques have the disadvantages of low throughput and complex sample preparation. The use of machine learning, as compared with the abovementioned techniques, in image or visual recognition is a mature application field. The integration of machine learning (SVM, KNN, BGMM-DP, and K-means) with optical microscopes has only begun in recent years. Thus, artificial intelligence has great potential in the recognition of microscopic images, especially nanostructures [24,25]. In 2019, Hong et al. demonstrated the machine learning algorithm to identify local atomic structures by reactive molecular dynamics [26]. In 2020, Masubuchi et al. showed the real-time detection of 2D materials by deep-learning-based image segmentation algorithm [27]. In the same year, Yang et al. presented the identification of 2D material flakes of different layers from the optical microscope images via machine learning-based model [28]. However, the following three shortcomings remain: (1) optical microscope image quality often depends on the user's experience and will pass through image processing. (2) Only the color space with few feature dimensions will have the possibility of underfitting. (3) Angular illumination asymmetry (ANILAS) of the field of view (FOV) is an important factor that is largely ignored, which results in a certain loss of pixel accuracy [29][30][31].
Here, we used big data analysis and deep learning methods, combined with the hyperspectral imagery common in the remote sensing field, to solve the difficulties that are encountered in the previous literature (e.g., uneven light intensity distribution, image dynamic range correction, and image noise filtering process). The first attempt was made by analyzing the eigenvalues in other dimensions that were not previously used (e.g., morphological features) to improve the prediction accuracy [29][30][31]. An intelligent detection can be achieved to identify the layer number of 2D materials.

Growth Mechanism and Surface Morphology of MoS 2
The majority of 2D material layer identification studies focus on film synthesis using mechanical stripping [32][33][34]. Although an improved quality of molybdenum disulfide can be obtained, this method cannot synthesize a large-area molybdenum disulfide film with a few layers. The chemical vapor deposition method [35][36][37][38] can produce high-quality and large-area molybdenum disulfide on a suitable substrate surface under a stable gas flux and temperature environment, and it is suitable for current device fabrication [38][39][40]. In 2014, Shanshan et al. explored the sensitivity of the MoS 2 region growth in a relatively uniform temperature range [41]. In 2017, Lei et al. found that temperature is one of the main factors controlling MoS 2 morphology [42]. Wei et al. maintained the state for a long time to observe the change of MoS 2 type when the precursor was heated to a constant temperature interval [43]. In 2018, Dong et al. discussed the nucleation and growth mechanism of MoS 2 [44]. On the basis of their results, two types of film growth dynamic paths were established: one is a central nanoparticle with a multi-layered MoS 2 structure and the other is a single-layered triangular dominated or double-layered structure. The conclusion of reference [45] explained the effect of adjusting the growth temperature and carrier gas flux. Understanding the growth pattern mechanism can help us to obtain the initial requirements and judgments of database collection and data tagging.

CVD Sample Preparation
The sample was grown on a sapphire substrate by using CVD to form a MoS 2 film. The precursor used had Sulfur (99.98%, Echo Chemical Co., Miaoli county, Taiwan) and MoO 3 (99.95%, Echo Chemical Co., Miaoli county, Taiwan), each placed in the appropriate position of the inner part of the quartz tube. The substrate was placed over the MoO 3 crucible and in the center of the furnace tube, in a windward position. During growth, different parameters, such as ventilation, heating rate, temperature holding time, and maximum temperature, were set. The MoS 2 sample was obtained at the end of the growth process. Figure S1 shows the schematic of the experimental structure and position of the precursor. Some new pending data indicate the growth of the periodic structure of MoS 2 , which is crucial for the large-scale controllable molybdenum disulfide synthesis and it will greatly benefit the future production of electronic components. In comparison with that of other studies [45][46][47][48][49], this sample was fabricated via laser processing to make periodic holes (hole diameter and depth of approximately 10 µm and 300 nm, respectively), followed by CVD to grow MoS 2 .

Optical Microscope Image Acquisition
MoS 2 on the sapphire substrate was observed through an optical microscope (MM40, Nikon, Lin Trading Co., Taipei, Taiwan) at 10×, 40×, and 100× magnification rates. For the experimental sample, we recorded the shooting area code (Figure S1c to store the images captured via optical microscopy (OM) systematically in the image database. The image had a size of 1600 pixels × 1200 pixels with a depth of 32 bits. The portable network graphics (PNG) file, including the gain image at different dynamic range intervals, was acquired, but color calibration and denoising action were not performed. We aimed to replace the cumbersome image processing with deep learning. On the basis of the amount of data that were collected in this experiment, approximately 90 pieces of 2 × 2 cm 2 growth samples were obtained, and~2000 images were used as sources of data exploration.

System Equipment and Procedures
This study aims to construct a system for the automated analysis of different layers of MoS 2 film. The system architecture is mainly divided into four parts, as shown in Figure S2. The program flow is as follows: (1) database: the prepared molybdenum disulfide sample is measured using a Raman microscope to determine the position of each layer distribution, and an image is taken using an optical microscope and CCD to capture the same position (Image Capture System). (2) Offline Training: the obtained CCD image is combined with hyperspectral imaging technology (VIS-HSI) to convert the spectral characteristics of each layer of molybdenum disulfide, and data preprocessing is performed.
(3) Model Design: the data are further trained in deep learning, thereby completing the establishment of our classification algorithm. (4) Online Service: when a new sample of molybdenum disulfide is to be analyzed, it is placed under an optical microscope to capture the surface image by using CCD. Subsequently, the spectral characteristic value of each pixel is obtained through the hyperspectral imaging technique, and the model is predicted by training. The number of layers of the molybdenum disulfide film at the pixel position is determined, and the molybdenum disulfide film with different layers in the image is visualized while using different colors. Figure 1 shows the processing step before the data enters the model. Layer number labeling of the manual area circle mask (Mask) is performed and the data are divided when the captured image is converted into a spectrum image by using a hyperspectral image technology; we will measure the result on the basis of the Raman spectrum ( Figure S3) [50][51][52][53][54]. The categories are substrate, monolayer, bilayer, trilayer, bulk, and residues, which are our ground truths. Two types of data are available for model training, namely "feature" and "label". Feature has two types: one is hyperspectral vector as input for deep neural network (DNN) and one-dimensional (1D) convolutional neural network (1D-CNN), and the other is spatial domain hyperspectral cube as input for three-dimensional (3D) convolutional neural network (3D-CNN). After data preprocessing, we divide the dataset into three parts. We randomly select 80% of the labeled samples as training data, 20% as the verification set, and the remaining unmarked parts for the test set. As the light intensity distribution is incompletely covered in the dataset in the training and validation sets, we need a test set to help us measure whether the model has this capability. Nanomaterials 2020, 10, x FOR PEER REVIEW 4 of 15 divide the dataset into three parts. We randomly select 80% of the labeled samples as training data, 20% as the verification set, and the remaining unmarked parts for the test set. As the light intensity distribution is incompletely covered in the dataset in the training and validation sets, we need a test set to help us measure whether the model has this capability.

Visible Hyperspectral Imaging Algorithm
The VIS-HSI used in this study is a combination of CCD (Sentech, STC-620PWT) and visible hyperspectral algorithm (VIS-HSA). The calculated wavelength range is 380-780 nm and the spectral resolution is 1 nm. The core concept of this technology is to input the image captured by the CCD in the OM to the spectrometer, such that each pixel of the captured image has spectrum information [55]. Figure S4 shows the flow of the technology, while using MATLAB. A custom algorithm is created.

Data Preprocessing and Partitioning
All of the MoS2 samples were obtained at different substrate locations and uneven illumination distribution to form a dataset. Two types of features were available: one is the 1 × 401 hyperspectral vector central pixel, which extracts the marker data as DNN and 1D-CNN inputs, and the other is the 1 × 5 × 5 × 401 hyperspectral cube, which regards the spatial region as the center of the cubes. The category labels of the pixels extracted the tag data as the 3D-CNN input. With single hyperspectral image data as an example ( Figure S5), the marked area was divided into 80% training set and 20% validation set, whereas the unmarked area was the test set. Finally, the new pending data were generated. The generalization capability of the model was evaluated. The training set part had oversampling and data enhancement for the hyperspectral cube, because it contained spatial features and it was processed via horizontal mirroring and left and right flipping.

Software and Hardware
The model usage environment was implemented in the Microsoft Windows 10 operating system while using TensorFlow Framework version 1.10.0 and Python version 3.6. Open-source data analysis platforms, namely, Jupyter notebook, SciPy, NumPy, Matplotlib, Pandas, Scikit-learn, Keras,

Visible Hyperspectral Imaging Algorithm
The VIS-HSI used in this study is a combination of CCD (Sentech, STC-620PWT) and visible hyperspectral algorithm (VIS-HSA). The calculated wavelength range is 380-780 nm and the spectral resolution is 1 nm. The core concept of this technology is to input the image captured by the CCD in the OM to the spectrometer, such that each pixel of the captured image has spectrum information [55]. Figure S4 shows the flow of the technology, while using MATLAB. A custom algorithm is created.

Data Preprocessing and Partitioning
All of the MoS 2 samples were obtained at different substrate locations and uneven illumination distribution to form a dataset. Two types of features were available: one is the 1 × 401 hyperspectral vector central pixel, which extracts the marker data as DNN and 1D-CNN inputs, and the other is the 1 × 5 × 5 × 401 hyperspectral cube, which regards the spatial region as the center of the cubes. The category labels of the pixels extracted the tag data as the 3D-CNN input. With single hyperspectral image data as an example ( Figure S5), the marked area was divided into 80% training set and 20% validation set, whereas the unmarked area was the test set. Finally, the new pending data were generated. The generalization capability of the model was evaluated. The training set part had oversampling and data enhancement for the hyperspectral cube, because it contained spatial features and it was processed via horizontal mirroring and left and right flipping.

Software and Hardware
The model usage environment was implemented in the Microsoft Windows 10 operating system while using TensorFlow Framework version 1.10.0 and Python version 3.6. Open-source data analysis Nanomaterials 2020, 10, 1161 5 of 14 platforms, namely, Jupyter notebook, SciPy, NumPy, Matplotlib, Pandas, Scikit-learn, Keras, and Spectral, were used to analyze the feature values. The training hardware used a consumer-grade desktop computer with a GeForce GTX1070 graphics card (NVDIA, Hong Kong, China) and Core i5-3470 CPU @ 3.2GHz (Intel, Taipei, Taiwan).

Results
The model must have a certain recognition capability, because ensuring that the quality of each captured image is the same is impossible. Therefore, we add several image quality features to the hyperspectral image data of different imaging qualities to make the model close to the practical classification performance. We will explore the models with different magnification rates in order to discuss the spatial resolution and mixed pixel issues [56,57].

Model Framework for Deep Learning
The classification prediction model uses three models, namely, DNN, 1D-CNN, and 3D-CNN, as shown in Figure 2. Among the model parameters, the learning rate is adjusted to 1 × 10 −6 to 5 × 10 −6 and the batch size (batch) is based on the difference between the model and data. The size and dropout are 24 and 0.25, respectively, and the selected optimizer is RMSprop. Figure 2a illustrates a schematic of the basic DNN model architecture. The input is a hyperspectral vector feature belonging to single-pixel spectral information. The model only contains three layers of fully connected layer. Additional outer neuron nodes are expected to extract relatively shallow features [58]. The six categories that we classify are the outputs. Figure 2b displays a schematic of the 1D-CNN model architecture. The input is consistent with the DNN model. The model consists of four convolutional layers, which include two pooling layers and two fully connected layers. CNN convolution kernel has rights-sharing characteristics. We believe that convolving of spectral features is equivalent to chopping different frequencies. Given the 1 nm resolution of the band, the correlation with adjacent feature points is high. The pooling layer can help to eliminate features that are too similar in the neighborhood and reduce the redundant dimensions of features [59]. Figure 2c shows a schematic of the 3D-CNN model architecture.
The input belongs to a hyperspectral cube type of the space-spectral domain. It extracts a feature cube consisting of pixels as d × d × N in a small spatial neighborhood (not the entire image) along the entire spectral band as input data and convolves with the 3D kernel to learn the spectral spatial features. The reason for using neighboring pixels is based on the observation that pixels in the small spatial neighborhood often reflect similar characteristics [60]. This observation is proven in Ref [55,61], which indicated that the small 3 × 3 core is the best option for spatial features; thus, only two convolution operations are performed and the sample space size is set to 5 × 5, which can be reduced to only two convolutional layers (1 × 1). The spatial domain of each layer is extracted. The first 3D convolutional layers, namely, C1 and C2, each contain a 3D kernel. The kernel size is K 1 . Two 3D feature cubes, namely, C1 and C2, as , are used as inputs. The second 3D convolutional layer, namely, C3, involves four 3D cores (with a size of K 2 1 × K 2 2 × K 2 3 ) and produces eight 3D data cubes, each as (d − K 1 [62]. Nanomaterials 2020, 10, x FOR PEER REVIEW 6 of 15  Figure 3 shows the calculation results of the sapphire substrate sample via three models, namely, DNN, 1D-CNN, and 3D-CNN, under 10× magnification. Figure 3a-c exhibit the convergence curve and training time of the loss and accuracy in the three algorithms, respectively. 3D-CNN has a longer training time and epoch than the first two models, and its input feature variable has more space domain parts; thus, it takes more time to start the convergence process. Figure 3d-f display the results of the confusion matrix of the verification set in the three algorithms, respectively. Table 1 presents the evaluation results of each category. Classifier precision determines how many of the positive categories of all the samples are true positive. The recall rate indicates how many of the true-positive category samples are judged as positive category samples by the classifier. F1-score is the harmonic mean of the accuracy and recall rate. Macro-average refers to the arithmetic mean of each statistical indicator value of all categories. The micro-average is used to establish a global confusion matrix for each model example in the dataset without category and then calculate the corresponding indicators. 3D-CNN accuracy is better from the indicators. Figures S6 and S7 exhibit the remaining training procedures at 40× and 100× magnification rates. Table 1 and Figures S6 and S7 show that the evaluation results under a small magnification are relatively poor. Therefore, the mixed pixels may cause the pixels to contain additional mixing or noise factors.  Figure 3 shows the calculation results of the sapphire substrate sample via three models, namely, DNN, 1D-CNN, and 3D-CNN, under 10× magnification. Figure 3a-c exhibit the convergence curve and training time of the loss and accuracy in the three algorithms, respectively. 3D-CNN has a longer training time and epoch than the first two models, and its input feature variable has more space domain parts; thus, it takes more time to start the convergence process. Figure 3d-f display the results of the confusion matrix of the verification set in the three algorithms, respectively. Table 1 presents the evaluation results of each category. Classifier precision determines how many of the positive categories of all the samples are true positive. The recall rate indicates how many of the true-positive category samples are judged as positive category samples by the classifier. F1-score is the harmonic mean of the accuracy and recall rate. Macro-average refers to the arithmetic mean of each statistical indicator value of all categories. The micro-average is used to establish a global confusion matrix for each model example in the dataset without category and then calculate the corresponding indicators. 3D-CNN accuracy is better from the indicators. Figures S6 and S7 exhibit the remaining training procedures at 40× and 100× magnification rates. Table 1 and Figures S6 and S7 show that the evaluation results under a small magnification are relatively poor. Therefore, the mixed pixels may cause the pixels to contain additional mixing or noise factors.   Figure 4 presents a randomly selected sample at 10× magnification, which is predicted by three models. Figure 4a displays the OM image under the corresponding range of the prediction data.   Figure 4 presents a randomly selected sample at 10× magnification, which is predicted by three models. Figure 4a displays the OM image under the corresponding range of the prediction data. the MoS 2 film encounters external force damage and the crystal structure is missing. 3D-CNN can be clearly judged under the destruction of the region, and Supplementary Figure S8 exhibits the remaining new pending data predictions. In other predictions for 40× and 100× magnification rates ( Figures  S9-S12), the color classification image (false-color composite) can be easily found for each model, but it will be reduced in the opposite FOV detection range.  Figure  4f-h present the prediction results for the new data (new pending data) under three models (color classification image), respectively. The results from the region of interest (ROI) in the training data indicate that the DNN and 1D-CNN models cannot accurately predict the damaged region when the MoS2 film encounters external force damage and the crystal structure is missing. 3D-CNN can be clearly judged under the destruction of the region, and Supplementary Figure S8 exhibits the remaining new pending data predictions. In other predictions for 40× and 100× magnification rates ( Figures S9-S12), the color classification image (false-color composite) can be easily found for each model, but it will be reduced in the opposite FOV detection range.

Differences in Models at Three Magnification Rates
In this experiment, the classification algorithm is based on the pixel unit data in the image; thus, determining how to obtain the quantitative value of the enhanced precision for the OM is crucial. The majority of existing microscopes achieve uniform spatial irradiance through Köhler illumination [63]. However, some shortcomings remain in the need for quantitative measurement and analysis, for example, FOV nonuniform illumination asymmetry (ANILAS) [64,65] is a largely important factor that is ignored ( Figure S13). This phenomenon leads to a certain loss in pixel accuracy. Thus, we attempt to understand the various feature data types through deep learning in the case of uneven illumination distribution.
The FOV sizes of the 10×, 40×, and 100× magnification rates are 1.6 mm × 1.2 mm, 0.4 mm × 0.3 mm, and 0.17 mm × 0.27 mm, respectively, which are the actual detection sizes at the time of prediction. Figure 5 presents the optimal loss values of the (a) training and (b) verification sets for three models at three magnification rates. The optimal train loss is higher than the validation loss, partly because of the use of data enhancements in the training set, which makes the model difficult to learn when the data are increased in diversity. Figure 5 shows that the three models have low loss values under 100× magnification. When considering the difference in spatial resolution, the pixel

Differences in Models at Three Magnification Rates
In this experiment, the classification algorithm is based on the pixel unit data in the image; thus, determining how to obtain the quantitative value of the enhanced precision for the OM is crucial. The majority of existing microscopes achieve uniform spatial irradiance through Köhler illumination [63]. However, some shortcomings remain in the need for quantitative measurement and analysis, for example, FOV nonuniform illumination asymmetry (ANILAS) [64,65] is a largely important factor that is ignored ( Figure S13). This phenomenon leads to a certain loss in pixel accuracy. Thus, we attempt to understand the various feature data types through deep learning in the case of uneven illumination distribution.
The FOV sizes of the 10×, 40×, and 100× magnification rates are 1.6 mm × 1.2 mm, 0.4 mm × 0.3 mm, and 0.17 mm × 0.27 mm, respectively, which are the actual detection sizes at the time of prediction. Figure 5 presents the optimal loss values of the (a) training and (b) verification sets for three models at three magnification rates. The optimal train loss is higher than the validation loss, partly because of the use of data enhancements in the training set, which makes the model difficult to learn when the data are increased in diversity. Figure 5 shows that the three models have low loss values under 100× magnification. When considering the difference in spatial resolution, the pixel resolutions at 10×, 40×, and 100× magnification rates are 0.5, 0.25, and 0.1 µm, respectively. Therefore, the problem of mixed pixels is further serious at a small magnification. 3D-CNN is the best among the three models at different magnification rates. This finding is different from previous research arguments [66]. Previous studies have considered that the poor image quality is due to several noise points and the surrounding blur at a large magnification or that fine impurities are caused by deposition, resulting in re-traditional classification algorithms. The effect is not good, but the small impurities will be ignored when the spatial resolution is low. However, in the present study, the cognition of the model in deep learning solves the bottleneck of the previous problem.  As 3D-CNN demonstrates the best generalization capability at different magnification rates, we will only discuss the results of this algorithm. Figure 6a shows an OM image of a large-area periodically grown single-layer MoS 2 on a sapphire substrate (also defined as ROI-3). The actual corresponding size is 1 × 1 mm, and the microscope magnification is 10×. We can observe that a single layer of MoS 2 is distributed in a star shape around the hole.

Instrument Measurement Verification
In the new pending data section, we observe the accuracy of the samples from Raman spectroscopy mapping, as shown in Figure S15. We show two oscillation modes of in-plane (E ) and out-of-plane ( A ) in Raman mapping analysis. The classification of the number of layers is determined by two peak differences. Figure S16 presents the SEM measurement results. The instrument can be judged by the gray level. However, the instrument measurement cannot determine the number of layers and it can only be observed from relative contrast. The PL spectroscopy results ( Figure S17) indicate that the mapping diagram is either 625 or 667 nm, and the periodic growth of the single-layer to multilayer distribution of MoS2 has good uniformity. In Figure S18, the material of the sample profile is divided via HRTEM [67,68]. We obtain the magnified images of the ROI-4 and ROI-5 regions from the color classification image under 10× magnification (Figure 6d) as Figure 6e,f, respectively. Corresponding to other magnification rates (Figure 6b,c) in the same region of the color classification image, we observe that a small magnification will be limited by the spatial resolution. Consequently, the fine type of features will be blurred or impurity points are further difficult to identify. Supplementary Figure S14 discusses the probability of class prediction confidence in images with various magnification rates. This finding is different from previous research arguments [66]. Previous studies have considered that the poor image quality is due to several noise points and the surrounding blur at a large magnification or that fine impurities are caused by deposition, resulting in re-traditional classification algorithms. The effect is not good, but the small impurities will be ignored when the spatial resolution is low. However, in the present study, the cognition of the model in deep learning solves the bottleneck of the previous problem.

Instrument Measurement Verification
In the new pending data section, we observe the accuracy of the samples from Raman spectroscopy mapping, as shown in Figure S15. We show two oscillation modes of in-plane (E 1 2g ) and out-of-plane (A 1g ) in Raman mapping analysis. The classification of the number of layers is determined by two peak differences. Figure S16 presents the SEM measurement results. The instrument can be judged by the gray level. However, the instrument measurement cannot determine the number of layers and it can only be observed from relative contrast. The PL spectroscopy results ( Figure S17) indicate that the mapping diagram is either 625 or 667 nm, and the periodic growth of the single-layer to multilayer distribution of MoS 2 has good uniformity. In Figure S18, the material of the sample profile is divided via HRTEM [67,68].

Conclusions
This study is aimed at the layer number discrimination of molybdenum disulfide film on sapphire.
In comparison with the current measurement instruments, such as Raman, SEM, AFM, and TEM, the proposed equipment has a large detection area, less time, and low cost. The equipment can detect the low number of layers of molybdenum disulfide film. Unlike in the past, we use deep learning, but not image processing, for analysis, and experimentally confirm that 3D-CNN has the best precision and generalization capability. The reason is that the 3D-CNN model initially adds the spatial domain of the morphological features of MoS 2 to learn to avoid the misjudgment caused by the difference in imaging quality due to noise or to make further accurate judgments on the fuzzy regions between the category regions. For the problem that the low magnification is limited by the spatial resolution, which results in fine contaminants and edge blur morphology, the GAN model can be used to achieve the super-resolution method in low magnification [69,70]. In future research, we hope to integrate all types of 2D materials and various substrates, such as in the case of heterogeneous stack, in order to easily distinguish different 2D materials and their different layers easily. The future ideas are to determine the inference of the growth pattern of MoS 2 by detecting the image in real time and to avoid machine termination for reducing time and related costs under impurity intrusion  Figure S2: Deep learning is applied to construct a flowchart for the number detection system of the optical MoS 2 layer. The gray, blue, orange, and green block colors correspond to (1) database, (2) offline training, (3) model design, and (4) online service, respectively; Figure S3: Data labeling is assisted by the Raman spectroscopy instrument (MRI-1532A). This instrument is mainly used to inject a 532 nm laser light into the sample. The photons in the laser will collide with the molecules in the sample material, namely, E 1 2g and A 1g . The peak difference of the vibration mode is the signal of the main judgment layer of MoS2, and the two vibration modes have a high dependence on the thickness of MoS2. We select two 30 µm×30 µm Raman mapping results in the database and wait for~45 min, especially in the ground truth mark, which is a considerable time; Figure S4: Flowchart of visible hyperspectral image algorithm; Figure S5: In the blue area, the data in the offline training section of Figure 2 are used. The ground truth and our label data are set as the training and validation sets, respectively. The rest of the VIS-HIS feature data are used as the test set. The green area is the predicted result of new pending data in the (4) online service architecture in Figure S2; Figure