Characterizing Subjects Exposed to Humidifier Disinfectants Using Computed-Tomography-Based Latent Traits: A Deep Learning Approach

Around nine million people have been exposed to toxic humidifier disinfectants (HDs) in Korea. HD exposure may lead to HD-associated lung injuries (HDLI). However, many people who have claimed that they experienced HD exposure were not diagnosed with HDLI but still felt discomfort, possibly due to the unknown effects of HD. Therefore, this study examined HD-exposed subjects with normal-appearing lungs, as well as unexposed subjects, in clusters (subgroups) with distinct characteristics, classified by deep-learning-derived computed-tomography (CT)-based tissue pattern latent traits. Among the major clusters, cluster 0 (C0) and cluster 5 (C5) were dominated by HD-exposed and unexposed subjects, respectively. C0 was characterized by features attributable to lung inflammation or fibrosis in contrast with C5. The computational fluid and particle dynamics (CFPD) analysis suggested that the smaller airway sizes observed in the C0 subjects led to greater airway resistance and particle deposition in the airways. Accordingly, women appeared more vulnerable to HD-associated lung abnormalities than men.


Introduction
Humidifier disinfectants (HDs) are used for the prevention of microorganism growth in humidifiers. In Korea, several types of HD were found to have severe adverse effects on human health. Before they were banned, around nine million people had been exposed to toxic HDs [1]. Many epidemiological studies have demonstrated the relationship of HD with lung injuries and fibrosis [2][3][4][5][6][7][8]. In addition, Kim et al. have demonstrated the aerosolized particles of HD-induced lung inflammatory and fibrotic responses through in vivo and in vitro experiments [9].
Machine learning techniques have been widely used for discovering imaging subtypes of lung diseases with heterogeneous characteristics, such as chronic obstructive pulmonary disease (COPD). For example, Binder et al. used a generative model that captures emphysema subtypes correlated with distinct physiological indicators using an unsupervised approach [10]. In addition, Haghighi et al. employed CT-imaging-based structural and functional variables on both the local and global scales to identify clinically relevant clusters among current and former smokers, respectively [11,12]. However, subjective selections of imaging-based variables or features, which are based on known physiology and pathophysiology, may not be inclusive enough to describe all the features of a heterogeneous lung disease. With the advancement of deep learning techniques, deep learning models are able to detect the important imaging features for the task concerned without human intervention. For example, Li et al. proposed a three-dimensional (3D) convolutional autoencoder with a feature constructor (CAE-FC) that is able to identify imaging-based latent traits which can be used to differentiate phenotypes among COPD patients [13].
HD-associated lung injuries (HDLI) can develop due to multiple causes, and the long-term effects of HD exposure remain unclear. A large portion of people who have claimed that they experienced exposure to HD were not diagnosed with HDLI but still claimed discomfort, possibly due to the influences of HD exposure. Yoon et al. reported the increased risk of asthma symptoms in children with a history of acute bronchiolitis after exposure to HD [14]. In this case, not only exposure to HD, but also the pre-existing bronchiolitis, contributed to the asthma symptoms.
In this study, we aimed to identify subgroups with distinct tissue pattern factors or latent traits derived by computed tomography (CT). The identified subgroups were used to differentiate HD-affected subjects from unaffected subjects among the subjects with normal lung CTs and lung functions. To identify the latent traits, we employed an in-house deep learning model, along with exploratory factor analysis (EFA), to evaluate CT images of both HD-exposed and unexposed subjects [13]. We furthermore applied computational fluid and particle dynamics (CFPD) analysis to elucidate the airway resistance and HD aerosol particle deposition that may characterize HD-exposed subjects with normal-appearing lungs. To our knowledge, this is the first study that uses an unsupervised deep learning approach to discover the subtypes of HD-affected patients, potentially beyond those who can be identified through the manual reading of the CT images. The findings of this study may be useful for the better management of the disease in the future.

Human Subject Data and Image Processing
A total of 121 subjects were selected for investigation. We retrospectively collected inspiratory (IN) and expiratory (EX) CT image data acquired at total lung capacity (TLC) and residual volume (RV), as well as PFT results. Among these, 96 subjects claimed exposure to toxic HD but had normal CT and pulmonary function test (PFT) results. The most common HD chemical ingredient was polyhexamethylene guanidine (PHMG). Other major ingredients were oligo (2-)ethoxyethoxyethyl guanidine chloride (PGH), chloromethylisothiazolinone (CMIT), and methylisothiazolinone (MIT) [15]. The exposure type and duration were defined based upon the questionnaire. The other 25 subjects did not claim exposure to toxic HD and had normal-appearing CT and PFT results. The demographic data and PFT measurements for each stratum are shown in Table 1. The study protocol was approved by the institutional review boards (IRB) of Seoul National University Hospital (SNUH) and Jeonbuk National University Hospital (JBNUH) (project title: Quantitative CT features of potential respiratory abnormalities in children and adults with normal-appearing chest CT by humidifier disinfectant inhalation exposure). For the current research, informed consent was waived by IRB at SNUH, because this was a retrospective study based upon existing data and the research involved minimal risks. The research collected only non-identifiable data. All methods were carried out in accordance with the relevant institutional guidelines and regulations.
Patients underwent 128 multidetector CT scanning at full inspiration and full expiration using a Philips Ingenuity scanner (Philips Healthcare, Cleveland, OH, USA) in SNUH and Somatom Definition Flash scanner (Siemens Healthcare, Forchheim, Germany) in JBNUH. The CT parameters were as follows: tube voltage (120 kVp), tube current (SNUH: inspiration, DRI (dose right index) 14, expiration, DRI 14; JBNUH: inspiration, 110 quality Table 1. Demographic data and PFT measurements for HD-exposed and unexposed subjects. Entries are mean (SD) unless otherwise specified. The differences between the means of independent groups were analyzed by Welch's t-test and the significance level was set as 0.05. Chi-square test was used to examine the relationship between two categorical variables. The determinant of the Jacobian matrix (J) was computed by registering IN and EX images [16,17] for each voxel in the IN images so as to measure local volume change. In this study, J is defined as the EX to IN volume ratio; thus, 1/J indicates the local volume expansion. Two-channeled images were constructed by combining the J values with the IN images ( Figure 1a). In total, 98,250 3D regions of interest (ROIs) were randomly extracted from the two-channeled images. The size of the ROI was approximately the size of a secondary pulmonary lobule (20 × 20 × 20 mm 3 ), which is a fundamental unit of the lung structure [18]. We used Apollo software (VIDA Diagnostics, Coralville, Iowa) for the segmentation of the lungs and airways and Insight Toolkit (ITK, version 5.0, Kitware, Clifton Park, NY, USA) and in-house software for the further image processing. The detailed information about the image processing can be found in Li et al. [13].

3D Convolutional Autoencoder (CAE) and Feature Constructor (FC)
Each voxel in the extracted ROIs consisted of two input data channels: rescaled CT density and J. The ROIs were fed into a 3D CAE-FC model to obtain their one-dimensional (1D) representations and to group the ROIs with similar patterns into pattern clusters [13]. The optimal number of pattern clusters was determined by testing different numbers of pattern clusters and selecting the number which yielded the lowest loss during the training of the CAE-FC model ( Figure S1).
The CAE-FC model contained an encoder, which consisted of 3D convolutional layers, an embedding layer, and a decoder, which consisted of 3D de-convolutional layers and a FC (Figure 1). The FC suppressed the small activations in the embeddings and forced the decoder to reconstruct the ROIs using only the important activations. The ROIs were grouped based on the greatest activations among the highest-level feature maps (embeddings). Therefore, the CAE-FC model was able not only to extract the 1D representations of the ROIs but also to group the 1D representations with similar patterns. Please refer to [13] for more detailed information about the CAE-FC model.

3D Convolutional Autoencoder (CAE) and Feature Constructor (FC)
Each voxel in the extracted ROIs consisted of two input data channels: rescaled CT density and J. The ROIs were fed into a 3D CAE-FC model to obtain their one-dimensional (1D) representations and to group the ROIs with similar patterns into pattern clusters [13]. The optimal number of pattern clusters was determined by testing different numbers of pattern clusters and selecting the number which yielded the lowest loss during the training of the CAE-FC model ( Figure S1).
The CAE-FC model contained an encoder, which consisted of 3D convolutional layers, an embedding layer, and a decoder, which consisted of 3D de-convolutional layers After the CAE-FC training, a sliding window technique was applied to classify the lung tissue patterns and quantify the frequency of each pattern cluster within the whole lung. Then, a pattern cluster frequency histogram could be calculated for each subject. The procedure is illustrated in Figure 2. We further used EFA [19][20][21], a data reduction technique, to extract latent traits (factors) from these pattern clusters. That is, highly associated pattern clusters were grouped together into the same factor, and the pattern cluster frequencies of the subjects were then transformed into factor scores. Please refer to the Supplementary Materials for the steps of the EFA. lung tissue patterns and quantify the frequency of each pattern cluster within the whole lung. Then, a pattern cluster frequency histogram could be calculated for each subject. The procedure is illustrated in Figure 2. We further used EFA [19][20][21], a data reduction technique, to extract latent traits (factors) from these pattern clusters. That is, highly associated pattern clusters were grouped together into the same factor, and the pattern cluster frequencies of the subjects were then transformed into factor scores. Please refer to the Supplementary Materials for the steps of the EFA.

Figure 2.
The CAE-FC model is applied to the ROIs extracted by the sliding window technique so as to classify the lung tissue patterns and construct a pattern cluster histogram for each subject. EFA is then used to extract the latent traits (factors) from the pattern cluster histograms of the subjects. Using the factor scores, subject clusters are identified by the k-means clustering technique. Finally, CFPD analysis is conducted to evaluate the particle depositions of the representative subjects. The CAE-FC model is applied to the ROIs extracted by the sliding window technique so as to classify the lung tissue patterns and construct a pattern cluster histogram for each subject. EFA is then used to extract the latent traits (factors) from the pattern cluster histograms of the subjects. Using the factor scores, subject clusters are identified by the k-means clustering technique. Finally, CFPD analysis is conducted to evaluate the particle depositions of the representative subjects.

Factor Interpretation
The averaged intensities and the averaged J of the pattern clusters showing high associations with each factor were compared. In addition, to facilitate the interpretation of the identified factors, we evaluated the correlations of the factors with the clinical and CT imaging-based variables. The included clinical variables were age, gender, height, weight, body mass index (BMI), percent-predicted forced expiratory volume in 1 s (FEV 1 %), and forced vital capacity (FVC). The included imaging-based variables were the low attenuation area (LAA) percentage at RV (LAA RV %) in the total lung and in each lobe, tissue fraction at TLC (Tissue%) in the total lung and in each lobe, LAA percentage at TLC (LAA TLC %) in the total lung and in each lobe, functional small airway disease percentage (fSAD%) in the total lung and in each lobe, airway tree to lung volume ratio (AWV%) [22,23], and CT-measured RV to TLC ratio (RV/TLC) [24].

Identification of the Subject Clusters
K-means clustering was applied to identify the subject clusters within the factor scores. The number of subject clusters was determined by evaluating the inter-cluster and intracluster variability. Next, the correlation between the clusters and exposure was tested. Moreover, the inter-cluster differences in terms of the aforementioned clinical and imagingbased variables were examined. Lastly, a decision tree model, viz., a non-linear feature selector, was trained and developed using both the clinical and imaging-based variables, showing significant differences between clusters. Thus, this model may potentially be used in clinical practice to assist in the identification of HD-associated lung abnormalities in susceptible patients.

Simulations of Airflow in the Airways Using CFPD
To facilitate the interpretation of the subject clusters in terms of the inhaled HD aerosol deposition, a HD-exposed subject and an unexposed subject were selected for CFPD analysis, who were chosen from the clusters with the minimal and the largest proportions of HD-unexposed subjects, respectively. The selection was based on their proximity to the clusters' geometric centers in the feature space of the factor scores. Moreover, the mean hydraulic diameters of the lobar and sub-lobar subsets were calculated for the two clusters in order to facilitate the explanation of the CFPD results. The lobar airways were defined as the left upper lobe (LUL) bronchus, left lower lobe (LLL) bronchus, the right upper lobe (RUL) bronchus, right middle lobe (RML) bronchus, and right lower lobe (RLL) bronchus. In this study, the sub-lobar subset airways were defined as the child and grandchild branches of the lobar branches.
Subject-specific flow rates were applied at the supraglottal inlet. The inhalation flow pattern was approximated by a sinusoidal waveform. The inhaled air volume was set to a tidal volume, which was estimated using 7 mL of air per kilogram of body weight [25]. For all cases, the time to peak inspiration was set to 1.2 s with an inhalation period (T) of 4.8 s. An in-house three-dimensional (3D) CFPD lung model was used to simulate the velocity field and pressure drop in the central airways, which can be used to calculate the airway resistance [26,27]. Subsequently, the trajectories of 0.5 µm spherical particles, which were used to model the HD particles, were computed using a Lagrangian particle-tracking algorithm [27] in order to capture the local hotspots of high deposition density on the walls of the 3D airways. We further applied a 1D deposition model to predict the deposition fractions in the entire subject-specific airway tree [28][29][30]. Particle depositions in the entire conducting airway were estimated for the representative subjects during inspiration.

Statistical Analysis
The differences between the means of the independent groups were analyzed by Welch's ANOVA with the Games-Howell method for post hoc pairwise tests. Pearson's correlation and bi-serial correlation were used to examine the associations of the identified factors with the known continuous variables and binary variables, respectively. The chisquare test was used to examine the relationship between two categorical variables. The significance level was set as 0.05. The statistical analyses were conducted using SciPy 1.4.1 and Pingouin 0.3.4 in the Python 3 packages.

Results
The deep learning and the following statistical analyses were conducted for the 121 subjects. Among them, 96 subjects (79.3%) claimed exposure to HD and 25 subjects (20.7%) did not. A total of 80 pattern clusters were identified by the CAE-FC model, and the quantification of the pattern clusters within the lungs was performed to generate the pattern cluster histograms. Subsequently, six factors (F0-F5) were extracted from the pattern cluster histograms of all subjects. Finally, six clusters (C0-C5) were obtained by k-means clustering in the feature space of the factor scores. The demographic data, PFT measurements, and exposure information for each cluster are shown in Table 2. The exposure frequency for each cluster is shown in Table 3. A significant relationship (p < 0.001) between exposure to HD and the clusters showed that the frequencies of HD-exposed subjects in the clusters were different from one another. C0 had the smallest fraction (3.1%) of HD-unexposed subjects, while C5 had the largest fraction (48.5%) of HD-unexposed subjects. On the other hand, the proportions of HD-unexposed subjects in C1 (13.0%) and C2 (17.9%) were similar to that of the entire dataset, indicating that exposure to HD was not the main feature differentiating C1 and C2 from the entire dataset. C3 and C4 were excluded from the analysis, since the number of members was small. Thus, hereafter, we presented the results for the four main clusters of C0, C1, C2, and C5, with a focus on C0 and C5, to characterize the HD-exposed subjects, regarding C5 as cluster of unexposed subjects with no HD-induced abnormal features. We note that the majority (64.0%) of unexposed subjects were found in C5. Nevertheless, neither the exposure type nor the exposure duration were associated with the cluster membership ( Table 2). As an illustration, Figure 3 shows a comparison of the factors, CT densities, and J values of the representative subjects for the four clusters. Table 2. Demographic data and PFT measurements for each cluster. Entries are mean (SD) unless otherwise specified. The differences between the means of independent groups were analyzed by Welch's ANOVA with the Games-Howell method for post hoc pairwise tests. The chi-square test was used to examine the relationship between two categorical variables.  Table 3. Counts of HD-exposed and unexposed subjects in each cluster.  A significant relationship (p < 0.001) between gender and the clusters was found. C0 was female dominated (59.4%), while C5 was male dominated (84.8%). On the contrary, there was no significant difference between the means of the clusters in terms of FVC% and FEV1%. The pairwise comparisons showed that C0 was significantly smaller than C5 in terms of height (C0: 159.74 ± 7.52 cm; C5: 168.33 ± 7.61 cm) and LAATLC%_Total (C0: 1.33 ± 2.23%; C5: 3.46 ± 3.30%). On the other hand, C0 was significantly larger than C5 in terms of RV/TLC (C0: 0.76 ± 0.33; C5: 0.55 ± 0.06) and Tissue%_Total (C0: 16.98 ± 5.07%; C5: 10.51 ± 1.36%). However, there were no significant differences between C0 and C5 in A significant relationship (p < 0.001) between gender and the clusters was found. C0 was female dominated (59.4%), while C5 was male dominated (84.8%). On the contrary, there was no significant difference between the means of the clusters in terms of FVC% and FEV 1 %. The pairwise comparisons showed that C0 was significantly smaller than C5 in terms of height (C0: 159.74 ± 7.52 cm; C5: 168.33 ± 7.61 cm) and LAA TLC %_Total (C0: 1.33 ± 2.23%; C5: 3.46 ± 3.30%). On the other hand, C0 was significantly larger than C5 in terms of RV/TLC (C0: 0.76 ± 0.33; C5: 0.55 ± 0.06) and Tissue%_Total (C0: 16.98 ± 5.07%; C5: 10.51 ± 1.36%). However, there were no significant differences between C0 and C5 in terms of LAA RV %_Total (C0: 16.43 ± 15.22%; C5: 21.21 ± 10.24%) and fSAD%_Total (C0: 10.52 ± 13.66%; C5: 11.53 ± 7.69%). The mean values and standard deviations of these variables for each cluster are shown in Figure 4. Significant differences were found between C0 and C5 for F1, F2, F3, and F5 ( Figure 5). Among these factors, F2 and F3 were found to be bi-polar factors. A bi-polar factor is a latent trait that has two opposite directions. That is, their positive and negative factor scores were contributed by pattern clusters with positive loadings and negative loadings, respectively. For example, when describing the air content in a lung, both low attenuation and high attenuation areas of the CT image can be used. That is, low and high attenuation areas correspond to more air content and less air content in a lung, respectively. C0 and C5 for F1, F2, F3, and F5 ( Figure 5). Among these factors, F2 and F3 were found to be bi-polar factors. A bi-polar factor is a latent trait that has two opposite directions. That is, their positive and negative factor scores were contributed by pattern clusters with positive loadings and negative loadings, respectively. For example, when describing the air content in a lung, both low attenuation and high attenuation areas of the CT image can be used. That is, low and high attenuation areas correspond to more air content and less air content in a lung, respectively.  The mean CT density and J of the pattern clusters were calculated. The pattern clusters that were highly positively associated with each factor were denoted as f0−f5. The pattern clusters that were highly negatively associated with F2 and F3 were denoted as f2n and f3n, respectively (i.e., pattern clusters associated with the negative portion of the bi-polar factors). We found that f1 and f2 had a relatively low CT density and J (f1: CT density = 0.097, J = 0.504; f2: CT density = 0.125, J = 0.588), f3 had a high density but low J (CT density = 0.156, J = 0.442), and f5 had a low CT density but high J (CT density = 0.088, J = 0.769). On the other hand, f2n had both the highest CT density and J (CT density = 0.267, J = 1.470), and f3n had a relatively high CT density and J (CT density = 0.159, J = 0.690, Figure 6b). Examples of f1, f2(n), f3(n), and f5 are shown in Figure 6a. Mean factor scores which account for the differences between C0 and C5. Significant differences between C0 and C5 were found for F1, F2, F3, and F5. The differences between the means of independent groups were analyzed by Welch's ANOVA with the Games-Howell method for post hoc pairwise tests and the significance level was set as 0.05.

Exposure
The mean CT density and J of the pattern clusters were calculated. The pattern clusters that were highly positively associated with each factor were denoted as f0−f5. The pattern clusters that were highly negatively associated with F2 and F3 were denoted as f2n and f3n, respectively (i.e., pattern clusters associated with the negative portion of the bi-polar factors). We found that f1 and f2 had a relatively low CT density and J (f1: CT density = 0.097, J = 0.504; f2: CT density = 0.125, J = 0.588), f3 had a high density but low J (CT density = 0.156, J = 0.442), and f5 had a low CT density but high J (CT density = 0.088, J = 0.769). On the other hand, f2n had both the highest CT density and J (CT density = 0.267, J = 1.470), and f3n had a relatively high CT density and J (CT density = 0.159, J = 0.690, Figure 6b). Examples of f1, f2(n), f3(n), and f5 are shown in Figure 6a.

Figure 5.
Mean factor scores which account for the differences between C0 and C5. Significant differences between C0 and C5 were found for F1, F2, F3, and F5. The differences between the means of independent groups were analyzed by Welch's ANOVA with the Games-Howell method for post hoc pairwise tests and the significance level was set as 0.05. Correlations between the factors and common clinical and imaging-based variables were also calculated in order to better interpret the results (Figure 7). F1 and F2 were negatively correlated with Tissue%_Total and Tissue% for all the lobes. F3 was negatively correlated with LAARV%_Total and LAARV% for all the lobes, while F5 was positively correlated with LAARV%_Total and LAARV% for all the lobes. F1, F2, and F3 were negatively correlated with RV/TLC. On the contrary, F5 was positively correlated with RV/TLC. Moreover, F1 was positively correlated with LAATLC%_Total, LAATLC%_LUL, and LAATLC%_RML, while F5 was positively correlated with fSAD%_Total and fSAD% for all Correlations between the factors and common clinical and imaging-based variables were also calculated in order to better interpret the results (Figure 7). F1 and F2 were negatively correlated with Tissue%_Total and Tissue% for all the lobes. F3 was negatively correlated with LAA RV %_Total and LAA RV % for all the lobes, while F5 was positively correlated with LAA RV %_Total and LAA RV % for all the lobes. F1, F2, and F3 were negatively correlated with RV/TLC. On the contrary, F5 was positively correlated with RV/TLC. Moreover, F1 was positively correlated with LAA TLC %_Total, LAA TLC %_LUL, and LAA TLC %_RML, while F5 was positively correlated with fSAD%_Total and fSAD% for all the lobes. However, F1, F2, F3, and F5 were not strongly correlated with the clinical variables of BMI, weight, height, age, FEV 1 %, and FVC%. The lobar and segmental airways of C0 subjects were found to have smaller hydraulic diameters than those of C5 subjects, although no significant difference was found between the lobar airways in RML and RLL or the segmental airways in RUL and RML (Figure 8a). Similarly, the hydraulic diameters of both lobar and segmental airways were found to be smaller in the C0 representative subject (Figure 8b). The CFPD analysis showed that the C0 representative subject had a greater pressure drop in both the lobar and segmental airways of all lobes except for RML (Figure 9a). Hotspots were observed in the segmental The lobar and segmental airways of C0 subjects were found to have smaller hydraulic diameters than those of C5 subjects, although no significant difference was found between the lobar airways in RML and RLL or the segmental airways in RUL and RML (Figure 8a).
Similarly, the hydraulic diameters of both lobar and segmental airways were found to be smaller in the C0 representative subject (Figure 8b). The CFPD analysis showed that the C0 representative subject had a greater pressure drop in both the lobar and segmental airways of all lobes except for RML (Figure 9a). Hotspots were observed in the segmental airways of RUL and RLL in the C0 representative subject (Figure 9b). Moreover, a larger amount of particle depositions in the conducting airway were found in the C0 representative subject compared to the C5 representative subject (Figure 10). These results suggest that the C0 subjects experienced airway narrowing, which induced increased airway resistance and greater particle deposition.
Int. J. Environ. Res. Public Health 2022, 19, 11894 13 of 20 Figure 8. (a) Averaged hydraulic diameters of the lobar airways and segmental airways of C0 and C5. A "*" denotes a significant difference between C0 and C5. The differences between the means of independent groups were analyzed by Welch's ANOVA with the Games-Howell method for post hoc pairwise tests and the significance level was set as 0.05. (b) Hydraulic diameters of the lobar airways and sub-lobar subset airways of the C0 and C5 representative subjects. A lobar airway and a sub-lobar subset airway in a lobe are denoted as "l" followed by the abbreviation of the given lobe (e.g., lRUL) and "s" followed by the abbreviation of the given lobe (e.g., sRUL), respectively. A "*" denotes a significant difference between C0 and C5. The differences between the means of independent groups were analyzed by Welch's ANOVA with the Games-Howell method for post hoc pairwise tests and the significance level was set as 0.05. (b) Hydraulic diameters of the lobar airways and sub-lobar subset airways of the C0 and C5 representative subjects. A lobar airway and a sub-lobar subset airway in a lobe are denoted as "l" followed by the abbreviation of the given lobe (e.g., lRUL) and "s" followed by the abbreviation of the given lobe (e.g., sRUL), respectively.  A decision tree was trained using the selected inspiratory CT imaging metrics and clinical variables for the classification of the subjects into either C0, C5, or other clusters ( Figure 11). An accuracy of 0.79 was achieved based on a three-fold cross-validation analysis. The total lung deposition fraction in the conducting airways of the C0 and C5 representative subjects during inspiration. The C0 representative subject demonstrated a greater particle deposition in the conducting airways.
A decision tree was trained using the selected inspiratory CT imaging metrics and clinical variables for the classification of the subjects into either C0, C5, or other clusters ( Figure 11). An accuracy of 0.79 was achieved based on a three-fold cross-validation analysis. Figure 11. A decision tree model which takes the imaging-based variables and the clinical variables as inputs and predicts the cluster membership.  The total lung deposition fraction in the conducting airways of the C0 and C5 representative subjects during inspiration. The C0 representative subject demonstrated a greater particle deposition in the conducting airways.
A decision tree was trained using the selected inspiratory CT imaging metrics and clinical variables for the classification of the subjects into either C0, C5, or other clusters ( Figure 11). An accuracy of 0.79 was achieved based on a three-fold cross-validation analysis. Figure 11. A decision tree model which takes the imaging-based variables and the clinical variables as inputs and predicts the cluster membership.

Discussion
In this study, we extracted six factors from lung tissue patterns. Subsequently, the factors were used to identify four major clusters of C0, C1, C2, and C5 from a total of 121 either HD-exposed or unexposed subjects with normal-appearing lungs. Table 4 summarizes the characteristics of each cluster. The association between the clusters and HD exposure indicated that the clusters differentiate HD-exposed from unexposed subjects.
Among the clusters, C0 had the largest fraction (96.9%) of HD-exposed subjects, while C5 had the smallest fraction (51.5%) of HD-exposed subjects ( Table 3). The majority of unexposed subjects (16 of 25, 64.0%) were in C5, which suggests that C5 does not reflect HD-specific abnormalities. Thus, HD-exposed subjects in C5 may be regarded as HDunaffected subjects, although they claimed exposure to toxic HD. In contrast, subjects in C0, as well as C1 and C2, may reflect HD-affected abnormality and could be considered as candidates for a follow-up study on progression or recovery. F1, F2, F3, and F5 are the major features that differentiate C0 and C5. Table 4. Pair-wise contrasts for C0, C1, and C2, with C5 as a reference group.
C0 was dominated by HD-exposed female subjects. It was characterized by negative scores of F2 and F3, suggesting high CT attenuation, high J (low expansion), increased Tissue%, and increased RV/TLC. Moreover, f2n appears to be associated with regional hypoinflation and subpleural opacities (Figure 6), while f3n may be associated with air trapping or low attenuation on expiratory CT. The effects of f2n and f3n may lead to opposite results in regard to lung attenuation. In C0, LAA RV % and fSAD% were smaller than those of C5, which implies that the influence of f2n was stronger than that of f3n in C0. Increased Tissue% may reflect an increased high attenuation area in an inspiratory image, which is attributable to inflammation or fibrosis. These findings were obtained by the presence of mosaic attenuation and air trapping. In addition, an increase in RV/TLC was due to decreased TLC, that is, hypo-inflation, implying a fibrotic lung [31]. Mosaic attenuation and air-trapping observed from CT images, as well as decreased TLC, are features similar to those of fibrotic hypersensitivity pneumonitis [31,32]. It has been suggested that HD can damage epithelial cells and induce inflammation, leading to excessive tissue repair and lung fibrosis [3,9]. C5 was dominated by unexposed male subjects. Regarding the identified factors, this cluster was characterized by positive scores of F2 and F5, demonstrating relatively greater LAA RV %, LAA TLC %, and fSAD% than the HD-exposed-subject-dominated clusters. While C5 was dominated by subjects from the control group, it is noteworthy that C5 subjects were greater in height. For the HD-exposed subjects in C5, we could assume they were unaffected by HDs or in the resolving stage, by which point CT features have diminished into faint lesions or disappeared [33].
C1 and C2 were also predominated by HD-exposed subjects (Table 3). While the characteristics of C1 and C2 are still unclear and further investigation is needed, we provided a brief discussion. C1 and C2 were associated with greater F3 and F1, respectively. F3 and F1 both showed greater lung deformation (smaller J, Figure 6b). This might be contributed by hyper-deflation of the lung in RV ( Table 4). The TLC volume is not significantly different from C5. More information for C1 and C2 can be found in Table S1.
From the CFPD analysis, we found an increase in the pressure drop in both the lobar and segmental airways of all lobes except RML in the C0 subject. In addition, particle deposition hotspots were observed in the segmental airways in RUL and RLL of the C0 representative subject. Similarly, more particle depositions were found in the complete conducting airways of the C0 representative subject. Furthermore, the increase in the pressure drop might be attributed to a decrease in the hydraulic diameters of the lobar and segmental airways. Therefore, we speculated that females are more susceptible to HD-associated lung abnormality because of their smaller airway size, which may lead to greater airway resistance and more HD particle accumulation in small airways.
However, there may exist other environmental or occupational factors that can also contribute to the inclination of females to be more vulnerable to HD-associated lung abnormality. For example, housewives tend to spend longer periods of time at home, resulting in more exposure to HD. It has been reported that females and children were at greater risk with a longer duration of exposure [2,34]. This study has several limitations. First, this is a retrospective study. The duration and dose of exposure to HDs were not well controlled. The data from the questionnaire related to HD exposure were collected from the subject years after the HD exposure. Therefore, there is strong possibility of exposure misclassification due to erroneous statements based on recall bias and other technical reasons [33]. Moreover, there are many types of HD. Each type may have different effects on the patients. Thus, these factors may affect the interpretation of the findings. The decision tree model ( Figure 11) used the lung deformation (RV/TLC) and tissue content in the lung (Tissue%) to differentiate the clusters. This coincides with the previous result, showing that C0 subjects had less lung deformation and a greater tissue content than the other clusters, implying more fibrotic or inflammatory lungs in the C0 subjects. Nevertheless, the derivation of RV/TLC and Tissue% required the image processing of CT images. The decision tree model could serve as a diagnostic aid for clinical doctors.

Conclusions
Among those subjects with normal-appearing lung CT and normal PFT results, HDaffected (C0) and HD-unaffected (C5) subject clusters were identified by CT-based twochannel deep learning latent traits. C0 was characterized by increased Tissue% and hypoinflation, as compared to C5, suggesting that our model can distinguish HD-associated lung abnormalities from among a normal-appearing HD-exposed population, and that C0 subjects may be at risk of HD-induced inflammation or lung fibrosis. The CFPD analysis suggested that HD-affected subjects tend to have a higher airway resistance and higher HD aerosol particle deposition density, which may increase their susceptibility to HD-induced lung damage. Finally, we developed a decision tree model to assist in the identification of HD-associated lung abnormalities in susceptible subjects. The findings of this study may be useful for the future studies aiming to follow up HD-exposed normal-appearing subjects.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/ijerph191911894/s1, Figure S1. Training losses for CAE-FC models trained using different numbers of embeddings (pattern-clusters). Table S1. Post-hoc comparisons of the clinical and imaging-based variables between the clusters. The comparison pairs which were significantly different are shaded with gray. The differences between the means of independent groups were analyzed by Welch's ANOVA with the Games-Howell method for post-hoc pairwise tests. Reference [35] is cited in the Supplementary Materials.  Institutional Review Board Statement: The study protocol was approved by the institutional review boards (IRB) of Seoul National University Hospital (IRB: 1810-036-977) and Jeonbuk National University Hospital (project title: Quantitative CT features of potential respiratory abnormalities in children and adults with normal-appearing chest CT by humidifier disinfectant inhalation exposure, IRB: 2020-06-037).

Informed Consent Statement:
For the current research, informed consent was waived by IRB at SNUH, because this was a retrospective study based upon existing data and the research involved minimal risks. The research collected only non-identifiable data. All methods were carried out in accordance with the relevant institutional guidelines and regulations.