Skin Characterizations by Using Contact Capacitive Imaging and High-Resolution Ultrasound Imaging with Machine Learning Algorithms

Authors are encouraged to provide a concise description of the speciﬁc application or a potential application of the work. This section is not mandatory. Abstract: We present our latest research on skin characterizations by using Contact Capacitive Imaging and High-Resolution Ultrasound Imaging with Machine Learning algorithms. Contact Capacitive Imaging is a novel imaging technology based on the dielectric constant measurement principle, with which we have studied the skin water content of different skin sites and performed image classiﬁcation by using pre-trained Deep Learning Neural Networks through Transfer Learning. The results show lips and nose have the lowest water content, whilst cheek, eye corner and under-eye have the highest water content. The classiﬁcation yields up to 83.8% accuracy. High-Resolution Ultrasound Imaging is a state-of-the-art ultrasound technology, and can produce high-resolution images of the skin and superﬁcial soft tissue to a vertical resolution of about 40 microns, with which we have studied the thickness of different skin layers, such as stratum corneum, epidermis and dermis, around different locations on the face and around different body parts. The results show the chin has the highest stratum corneum thickness, and the arm has the lowest stratum corneum thickness. We have also developed two feature-based image classiﬁcation methods which yield promising results. The outcomes of this study could provide valuable guidelines for cosmetic/medical research, and methods developed in this study can also be extended for studying damaged skin or skin diseases. The combination of Contact Capacitive Imaging and High-Resolution Ultrasound Imaging could be a powerful tool for skin studies.


Introduction
Skin analysis, particularly of facial skin, is very important in many cosmetic and medical applications. In this paper, we present our latest research on skin characterizations by two novel skin imaging technologies-i.e., Contact Capacitive Imaging and High-Resolution Ultrasound Imaging. The aim is to measure the skin water content and skin layer thickness of different skin sites, mainly for facial skin, and to perform skin image analysis by using Machine Learning Algorithms.
Contact Capacitive Imaging is a novel imaging technique based on the dielectric constant measurement principle. It was originally developed for biometric applications, and Figure 1 shows photos and schematic diagrams of the Epsilon permittivity imaging system (Biox Systems Ltd., London, UK) and the EPISCAN I-200 High-Resolution Ultrasound (HRUS) imaging system (Longport Inc., Chadds Ford, PA, USA). The Epsilon is based on a Fujitsu fingerprint sensor [16], which has a resolution of 256 × 300 pixels with 50 µm spatial resolution and 8-bit grey-scale capacitance resolution per pixel. The measurements can be conducted by pressing the probe against the skin surface. Each measurement typically takes 2 to 3 s, with controlled contact time and pressure. The EPIS-CAN is a High-Resolution Ultrasound (HRUS) imaging system that utilizes ultrasound at frequencies as high as 50 MHz to image the skin and underlying soft tissue. The system has been designed to provide users with images of very high resolution and clarity and offers a user-friendly interface, enabling the EPISCAN to be utilized in a broad range of clinical applications, as well as in research and development. The EPISCAN enables the examination of tissue at a microscopic level without the need to perform damaging biopsies. The measurements can be conducted by filling the probe with water, then sealing the probe with a designated thin rubber film. A small quantity of ultrasound gel is also needed on the skin site. A measurement typically takes a few seconds.
For the skin High-Resolution Ultrasound Imaging classifications, we have developed two new feature-based classification methods. One is based on the luminosity values (0 to 255) from the red, green or blue channels of the images. In this approach, Logistic Regression, K-Nearest Neighbor (KNN), Neural Networks (NNs) and Random Forest were used as classifiers. Logistic Regression is a parametric model where KNN is a non-parametric model, and KNN is comparatively slower than Logistic Regression. KNN supports nonlinear solutions where Logistic Regression supports only linear solutions. NNs need a large volume of training data compared to KNN to achieve sufficient accuracy. NNs need lot of hyperparameter tuning compared to KNN.
Another new classification method is based on skin ultrasound image texture. In this approach, ten types of textural features were extracted from each skin ultrasound image. Of the ten textural features, five were traditional textural features and five were from the output of one intermediate layer of pre-trained convolutional networks, such as Dense-Net, MobileNet, ResNet, VGG and Xception. We then applied principal component analysis (PCA) on each feature class and retained the first two principal components to generate scatter plots for exploratory data analysis. For the skin High-Resolution Ultrasound Imaging classifications, we have developed two new feature-based classification methods. One is based on the luminosity values (0 to 255) from the red, green or blue channels of the images. In this approach, Logistic Regression, K-Nearest Neighbor (KNN), Neural Networks (NNs) and Random Forest were used as classifiers. Logistic Regression is a parametric model where KNN is a nonparametric model, and KNN is comparatively slower than Logistic Regression. KNN supports non-linear solutions where Logistic Regression supports only linear solutions. NNs need a large volume of training data compared to KNN to achieve sufficient accuracy. NNs need lot of hyperparameter tuning compared to KNN.
Another new classification method is based on skin ultrasound image texture. In this approach, ten types of textural features were extracted from each skin ultrasound image. Of the ten textural features, five were traditional textural features and five were from the output of one intermediate layer of pre-trained convolutional networks, such as DenseNet, MobileNet, ResNet, VGG and Xception. We then applied principal component analysis (PCA) on each feature class and retained the first two principal components to generate scatter plots for exploratory data analysis.
Although the classification algorithms developed in this study are mainly for differentiating skin sites, they can also be used for other classification tasks, such as differentiating dry skin from normal skin, damaged skin from intact skin, young skin from aged skin, healthy skin from diseased skin, and even different types of skin diseases.

Measurement Methods
All the measurements were performed under normal ambient laboratory conditions of 20-21 • C and 40-50% relative humidity.
The measurements were performed on the different skin sites, such as the volar forearm, cheek, chin, eye corner, forehead, lips, neck and nose, of healthy volunteers (aged 20-70, both male and female, Caucasian and Asian). The test skin sites used were initially wiped clean with ETOH/H2O (95/5) solution. The volunteers were acclimatized in the laboratory for 20 min prior to the experiments.

Skin Contact Capacitive Images
Skin Contact Capacitive Imaging produces 2D skin surface images. Figure 2 shows the typical contact capacitive images of different skin sites, such as the volar forearm, cheek, chin, eye corner, forehead, lips, neck and nose. The signal intensity (image brightness) is proportional to the water content. As illustrated, the skin contact capacitive images can show not only the water content of different skin sites, but also the skin texture. The volar forearm has the most uniform skin texture, while the nose, cheek and eye corner are less uniform. The textural differences of the different skin sites are also clearly seen. entiating skin sites, they can also be used for other classification tasks, such as differentiating dry skin from normal skin, damaged skin from intact skin, young skin from aged skin, healthy skin from diseased skin, and even different types of skin diseases.

Measurement Methods
All the measurements were performed under normal ambient laboratory conditions of 20-21 °C and 40-50% relative humidity.
The measurements were performed on the different skin sites, such as the volar forearm, cheek, chin, eye corner, forehead, lips, neck and nose, of healthy volunteers (aged 20-70, both male and female, Caucasian and Asian). The test skin sites used were initially wiped clean with ETOH/H2O (95/5) solution. The volunteers were acclimatized in the laboratory for 20 min prior to the experiments.

Skin Contact Capacitive Images
Skin Contact Capacitive Imaging produces 2D skin surface images. Figure 2 shows the typical contact capacitive images of different skin sites, such as the volar forearm, cheek, chin, eye corner, forehead, lips, neck and nose. The signal intensity (image brightness) is proportional to the water content. As illustrated, the skin contact capacitive images can show not only the water content of different skin sites, but also the skin texture. The volar forearm has the most uniform skin texture, while the nose, cheek and eye corner are less uniform. The textural differences of the different skin sites are also clearly seen.  Table 1 shows the mean and the standard deviation of measured skin Epsilon values of different volunteers at the different skin sites. Epsilon values are in arbitrary units, but proportional to the skin water content. The results show that the lips and nose have the lowest water content, whilst the cheek, eye corner and under-eye have the highest water content. The lips and nose also have the lowest standard deviation, whilst the neck, eye corner and under-eye have the highest standard deviation.  Table 1 shows the mean and the standard deviation of measured skin Epsilon values of different volunteers at the different skin sites. Epsilon values are in arbitrary units, but proportional to the skin water content. The results show that the lips and nose have the lowest water content, whilst the cheek, eye corner and under-eye have the highest water content. The lips and nose also have the lowest standard deviation, whilst the neck, eye corner and under-eye have the highest standard deviation.  Table 2 shows the skin Contact Capacitive image classification results by using different Deep Learning Neural Networks, such as AlexNet, GoogLeNet, VGG16, ResNet-50, InceptionV3, MobileNetV2, DenseNet 201, SqueezeNet, InceptionResNetV2 and Xception, through Transfer Learning. The results show that DenseNet 201 gives the best accuracy (83.8%), but it also takes a long time to train (110 min). GoogLeNet gives the best performance if you consider both the accuracy (73.5%) and training time (21 min), VGG16 gives the worse performance with both accuracy (59.1%) and training time (114 min). SqueezeNet is the quickest to train (8 min) but the accuracy is very low (61.1%). The training was carried out by using MATLAB software on a standard desktop computer with Intel ® Core™ i7-3770 CPU @3.4 GHz, 8 cores, 16 GB RAM and Windows 8.1 operating system.

High-Resolution Skin Ultrasound Images
Differently from skin Contact Capacitive Imaging, which produces surface images of the skin, the EPISCAN I-200 High-Resolution Ultrasound (HRUS) Imaging system returns a cross-sectional view along a plane approximately orthogonal to the skin surface. Figure 3 shows the typical high-resolution ultrasound images at the different skin sites: cheeks, chin, forearm, forehead, lips and nose.

Skin Layer Thickness
By analyzing the skin ultrasound images, we can obtain the thickness information of the different skin layers. Figure 4 shows the skin layers at different cross-sections in the high-resolution ultrasound image [6].
In this experiment, a sample of 605 color images was obtained from eight different areas of the body: the arm, cheek, chin, eyelid, forehead, lips, neck and nose. Table 3 shows the skin layers' thickness measured at the different skin sites. The results show that the cheek contains the highest percentage of dermis among all facial sites, about 95.3%, while the neck's dermis percentage was the lowest at 92.0%. The value for the stratum corneum was found to be highest on the chin, at 0.038 mm, and lowest on the arm-merely 0.024 mm. Likewise, the epidermis thickness measurements recorded for the arm and chin were lowest (0.047 mm) and highest (0.083 mm), respectively. A deviation from the mean indicates poor skin condition. The risk of skin damage will be higher when the standard deviation is high.

Skin Layer Thickness
By analyzing the skin ultrasound images, we can obtain the thickness information of the different skin layers. Figure 4 shows the skin layers at different cross-sections in the high-resolution ultrasound image [6].   In this experiment, a sample of 605 color images was obtained from eight different areas of the body: the arm, cheek, chin, eyelid, forehead, lips, neck and nose. Table 3 shows the skin layers' thickness measured at the different skin sites. The results show that the cheek contains the highest percentage of dermis among all facial sites, about 95.3%, while the neck's dermis percentage was the lowest at 92.0%. The value for the stratum corneum was found to be highest on the chin, at 0.038 mm, and lowest on the arm-merely 0.024 mm. Likewise, the epidermis thickness measurements recorded for the arm and chin were lowest (0.047 mm) and highest (0.083 mm), respectively. A deviation from the mean indicates poor skin condition. The risk of skin damage will be higher when the standard deviation is high.  Figure 5 shows the average stratum corneum thickness at the different skin sites with +1 sigma and −1 sigma.

Luminosity Feature-Based Skin Image Classifications
A new classification method, based on the luminosity values (0 to 255) from the red, green or blue channels, rather than images themselves, has been developed. To achieve this, we need to analyze the images first, to extract the luminosity feature values.
Expert evaluation of the ultrasound images allows formulating the following observations/hypotheses: 1. Areas of different colors correspond to different elements of the skin: fat, etc. 2. Properties of skin, e.g., density of the elements, may only depend on the depth. As

Luminosity Feature-Based Skin Image Classifications
A new classification method, based on the luminosity values (0 to 255) from the red, green or blue channels, rather than images themselves, has been developed. To achieve this, we need to analyze the images first, to extract the luminosity feature values.
Expert evaluation of the ultrasound images allows formulating the following observations/hypotheses:

1.
Areas of different colors correspond to different elements of the skin: fat, etc.

2.
Properties of skin, e.g., density of the elements, may only depend on the depth. As such, we are not looking for any two-dimensional patterns and the problem is essentially one-dimensional.

3.
Coloration of the images is essential, so moving to monochrome representation will lose information.
There are three major issues with the ultrasound images, as shown in Figure 3.

1.
The large empty area on top, which is not perfectly black. This is due to the design of the probe.

2.
The natural curvature of skin.

3.
Possible presence of the gel layer, due to the imperfect contact between the probe's thin rubber film and the skin.
To rectify the issues, the following three transformations of the images were performed:

1.
Gel was manually removed if present.

3.
Nearly black area on top, if present, was removed algorithmically and only 50% of the image vertical area was kept.  Table 4 summarizes the training dataset with total 368 images, modified to rem outliers per site from the original set of images. Undereye 81 Factor Specification The pre-processed images were used as input for the training algorithms. Ima were first represented as three matrices of luminosity values from 0 to 255, with each m trix representing red, green or blue channels.
Due to the relatively small number of observations, we chose the following featu  Table 4 summarizes the training dataset with total 368 images, modified to remove outliers per site from the original set of images. Factor Specification The pre-processed images were used as input for the training algorithms. Images were first represented as three matrices of luminosity values from 0 to 255, with each matrix representing red, green or blue channels.
Due to the relatively small number of observations, we chose the following features for classification per channel:

2.
Standard deviation of luminosity value.
Consequently, there were six features per channel and 18 features on the whole image. Values of those 18 factors were then used as the explanatory variable for the classification. The output variable was a number from 1 to 7, coding the facial site.
Three standard classifiers were used: Logistic, K-Nearest neighbor and Neural Networks. Training was repeated 10 times and on each iteration the set was split randomly into 20% and 80% groups, representing testing and training sets. Score and accuracy across all runs were then aggregated. Table 5 presents a typical average output; the best ever average score we observed was 0.68 or 68%. Score represents the quality of training, while accuracy represents the quality of classification on the testing set. It can be concluded that Logistic classifier performs best. We expect its performance to be improved for larger sample sizes and more rigorous systematic measurements. Random Forest was also originally tried, but it appeared constantly overfit and underclassified.

Texture-Based Skin Image Classification
Another image classification method based on image texture was also investigated in this study. In this approach, the original EPISCAN images in DICOM (.dcm) format were cropped to patches of dimension 512 × 1024 pixels ( Figure 7) and converted, losslessly, to 8-bit gray-scale bitmaps (.bmp). No further pre-processing operation, such as filtering, de-noising, intensity clipping and/or resampling, was applied.
Ten types of textural features (five traditional and five based on convolutional networkssee below) were extracted from each image, this way obtaining as many M i data matrices, i ∈ {1, . . . , 10}, each of dimension N × F i , where F i indicates the number of features generated by the i-th descriptor. We applied principal component analysis (PCA) on each of the M i and retained the first two principal components to generate scatter plots for exploratory data analysis (Figures 8 and 9).  Figures 8 and 9 show the results of texture analysis on the EPISCAN data using Gabor filters as texture descriptors. As can be seen from Figure 8, the measurement points were not separable by age, gender of the subject or anatomical site where the scan was taken. However, Figure 8 indicates the presence of two clear clusters. Further investigation revealed that the two clusters belong to the results that were obtained by using two different types of ultrasound probes. Figure 9 shows the results corresponding to the ultrasound probes used. The other descriptors returned similar results and the corresponding plots are provided as supplementary material.
These findings suggest that texture analysis on ultrasound scans of the skin is unable to predict the age and gender of the subject or the anatomical part. By contrast, texture analysis revealed a surprising ability to discriminate between the ultrasound probes used.   Gabor filters. Marker color shows details about the body part where the scan was taken; marker size and shape show, respectively, the age and gender of the subject. AU = arbitrary units.

Traditional ('Hand-Designed') Descriptors
Discrete Cosine Filters (DCF) Mean and standard deviation of the transformed images processed through a bank of 25 two-dimensional separable filters. The filters were generated via pair-wise outer product of five one-dimensional DCF kernels defined on a sequence of 11 points [29] (25 × 2 = 50 features).
Gabor filters Mean and standard deviation of the magnitude of the transformed images processed through a bank of 25 Gabor filters with five frequencies and five orientations [29] (525 × 2 = 50 features).
Laws' masks Mean and standard deviation of the transformed images processed through a bank of 25 two-dimensional separable filters. These were generated via pair-wise outer product on the five one-dimensional Laws' kernels [30] (25 × 2 = 50 features).
Zernike polynomials Mean and standard deviation of the transformed images processed through a bank of filters based on even and odd Zernike polynomials of order r ∈ {0, . . . ., 6} [29,32] (42 features).

Texture Descriptors Based on Deep Learning
We considered five learned descriptors from the following pre-trained convolutional networks: DenseNet121, MobileNet, ResNet50, VGG16 and Xception. Following the approach described in previous works [33][34][35], we used the L 1 -normalised output of one intermediate layer as image features. Specifically, we retrieved the features from the 'dropout' layer of the MobileNet, the 'fc2 layer of the VGG16 and the 'avg_pool' layer of the DenseNet121, ResNet50 and Xception networks. The number of features, respectively, was 1024 for DenseNet121 and MobileNet, 2048 for ResNet50 and Xception and 4096 for VGG16.
All the networks had been previously trained on the ImageNet dataset and were used off-the-shelf without any further adjustment or fine-tuning. The EPISCAN images were resized to fit the receptive field of each net before processing. Figures 8 and 9 show the results of texture analysis on the EPISCAN data using Gabor filters as texture descriptors. As can be seen from Figure 8, the measurement points were not separable by age, gender of the subject or anatomical site where the scan was taken. However, Figure 8 indicates the presence of two clear clusters. Further investigation revealed that the two clusters belong to the results that were obtained by using two different types of ultrasound probes. Figure 9 shows the results corresponding to the ultrasound probes used. The other descriptors returned similar results and the corresponding plots are provided as supplementary material.
These findings suggest that texture analysis on ultrasound scans of the skin is unable to predict the age and gender of the subject or the anatomical part. By contrast, texture analysis revealed a surprising ability to discriminate between the ultrasound probes used.

Discussion
Skin Contact Capacitive Imaging is a promising, novel imaging technique based on the dielectric constant measurement principle, which has been used not only for skin water content measurements, but also for solvent penetration measurements [6], skin texture/microrelief and hair water content measurements [7,8]. With Skin Contact Capacitive Imaging, we have studied the water content of different facial skin sites. Although the water content results available in the literature for different face areas are not always consistent, our results agree well with Voegeli's study [36]. Apart from normal skin measurements, Skin Contact Capacitive Imaging can also be used for measuring diseased skin. Combined with the advances of Artificial Intelligence, which has already been used for skin classification [9] and decision supporting in radiotherapy [37], it is possible to develop a skin capacitive image classification system to identify different types of skin diseases, such as neoplastic lesions [38]. With the 50 µm spatial resolution, it is also possible to detect skin diseases at an early stage.
High-Resolution Ultrasound Imaging is a state-of-the-art ultrasound technology, which allows us to measure the thickness of the different skin layers, such as the stratum corneum, epidermis and dermis. Estimating the thickness of the stratum corneum, the outmost skin layer, is very useful for skin cosmetic studies, and is not possible with any other techniques. The skin histology information can reflect skin conditions, such as damaged skin or diseased skin. Our thickness results agree generally well with literature studies [39,40]. We have also developed two new ultrasound image classification techniques based on image features, rather than the whole images. The first is based on skin ultrasound image luminosity values from the red, green or blue channels as well as the histogram values. The second is based on skin image texture, where 10 skin image textural features were used. Finally, we evaluated the feasibility of training standard Machine Learning classifications to identify the different facial sites based on the pre-processed High-Resolution Ultrasound images. We consider our result a moderate success, with lots of room to improve accuracy by having a more systematic measurement setup.

Conclusions
We conducted a detailed skin characterization study by using two state-of-the-art imaging technologies, Contact Capacitive Imaging and High-Resolution Ultrasound Imaging. With Contact Capacitive Imaging, we measured skin water content, and obtained information about skin texture. We also performed capacitive image classifications by using pre-trained Deep Learning neural networks through Transfer Learning.
With High-Resolution Ultrasound Imaging, we studied the thickness of the different skin layers, such as the stratum corneum, epidermis and dermis. We also developed two new ultrasound image classification techniques based on image features with promising results.
Future research will focus on improving the classification algorithms and to apply the two technologies to studying other types of skin samples, such as dry skin, damaged skin, young skin, aged skin, diseased skin and even samples of different types of skin diseases. The classification algorithms we developed in this study can also be used for discriminating different types of skin samples. The combination of Contact Capacitive Imaging and High-Resolution Ultrasound Imaging, with the aid of new Machine Learning algorithms, could be a powerful research tool for skin studies.
Funding: This research received no external funding.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of London South Bank University (reference "UREC 1412", June 2014).

Informed Consent Statement:
Informed consent was obtained from all subjects.

Data Availability Statement:
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.