Serum Fatty Acid Composition Balance by Fuzzy C-Means Method in Individuals with or without Metabolic Dysfunction-Associated Fatty Liver Disease

Circulating fatty acid composition is assumed to play an important role in metabolic dysfunction-associated fatty liver disease (MAFLD) pathogenesis. This study aimed to investigate the association between the overall balance of serum fatty acid composition and MAFLD prevalence. This cross-sectional study involved 400 Japanese individuals recruited from a health-screening program. We measured fatty acids in serum lipids using gas chromatography–mass spectrometry. The serum fatty acid composition balance was evaluated using fuzzy c-means clustering, which assigns individual data points to multiple clusters and calculates the percentage of data points belonging to multiple clusters, and serum fatty acid mass%. The participants were classified into four characteristic subclasses (i.e., Clusters 1, 2, 3, and 4), and the specific serum fatty acid composition balance (i.e., Cluster 4) was associated with a higher MAFLD prevalence. We suggest that the fuzzy c-means method can be used to determine the circulating fatty acid composition balance and highlight the importance of focusing on this balance when examining the relationship between MAFLD and serum fatty acids.


Introduction
Metabolic dysfunction-associated fatty liver disease (MAFLD) is a newly proposed definition of fatty liver disease (FLD) [1,2] that is diagnosed in individuals with fatty liver who meet one of the three criteria: (1) overweight/obesity; (2) type 2 diabetes; and (3) at least two metabolic risk abnormalities among increased waist circumference (WC), elevated blood pressure, increased triglycerides, decreased high-density lipoprotein cholesterol (HDL-C), prediabetes, insulin resistance, and increased high-sensitivity C-reactive protein level [1,2]. MAFLD is a hepatic manifestation of multiple metabolic diseases influencing hepatic lipid accumulation, inflammation, and fibrosis, and its underlying causes, symptoms, course, and outcomes are heterogeneous [1,2]. Compared to the commonly used definition of nonalcoholic fatty liver disease (NAFLD), the new definition has several

Genotyping
Genomic DNA was extracted from whole blood samples using a DNA purification kit (FlexiGene DNA kit; QIAGEN, Hilden, Germany). The patatin-like phospholipase domaincontaining 3 gene (PNPLA3) rs738409 C > G (encoding c.444C > G, I148M) polymorphism has been recognized as a major genetic risk factor for the development and progression of NAFLD [21,22]. Therefore, to adjust for the effects of the PNPLA3 rs738409 polymorphism in multivariable analysis of the MAFLD prevalence, the polymorphism was genotyped by real-time TaqMan allelic discrimination assay (Applied Biosystems, Waltham, MA, USA) (Assay No. C_7241_10). For the genotyping, pooled DNA from healthy volunteers with known genotypes and a negative control (water) were included as internal controls to ensure genotyping quality.

Diagnosis of MAFLD
Hepatic ultrasonography scanning was used for diagnosing FLD based on four criteria: a diffuse hyperechoic echotexture, an increased echotexture compared to the kidneys, vascular blurring, and deep attenuation [23]. After a radiologist diagnosed FLD, a physician reviewed the images to assess the accuracy and reproducibility of the diagnosis. For the diagnosis of MAFLD, according to previous reports [1,2], one or more of the following conditions coexisted with FLD diagnosed by hepatic ultrasonography scanning: (1) body mass index (BMI) ≥ 23 kg/m 2 ; (2) presence of type 2 diabetes; (3) BMI < 23 kg/m 2 along with the presence of two of the following metabolic risk abnormalities: WC ≥ 90 cm in men and ≥ 80 cm in women; blood pressure ≥ 130/85 mmHg or use of antihypertensive medicines; triglycerides (TGs) ≥ 150 mg/dL or use of dyslipidemia medications; HDL-C ≥ 40 mg/dL in men and ≥ 50 mg/dL in women; fasting blood glucose (FBG) = 100-125 mmol/L or HbA1c = 5.7-6.4%; high-sensitivity C-reactive protein > 2.0 mg/L. The fibrosis (FIB)-4 index, an indicator of fibrosis in FLD subjects, was calculated from age, platelet count, aspartate aminotransferase (AST), and alanine aminotransferase (ALT) using the following formula: [age × AST (IU/L)] / [(platelet count (10 9 ) × √ ALT (IU/L)] [24].

Data Collection
The laboratory tests were performed using the standard methods of the Japan Society of Clinical Chemistry. Type 2 diabetes was diagnosed on the basis of the history of the patient and the criteria recommended by the American Diabetes Association Expert Committee. Information on dietary habits was collected by means of a questionnaire.

Statistical Analysis
The data are expressed as the mean ± standard deviation or median (range) for continuous variables and as proportion for categorical variables. Fisher's exact test or the Fisher-Freeman-Halton test was used for comparing categorical variables. Student's t-test or one-way ANOVA was used for comparing continuous parametric values, and the Mann-Whitney U test or Kruskal-Wallis test was used for comparing continuous nonparametric values. The Mantel-Haenszel test for trend or Jonckheere-Terpstra trend test was used for trend testing of categorical or continuous variables, respectively.
The participants were clustered on the basis of the standardized mass% of the 10 serum fatty acids to represent the serum fatty acid composition balance. For clustering, we used the fuzzy c-means method that assigns individual data points to multiple clusters and calculates the percentage of data points belonging to multiple clusters [16,17]. The number of clusters was determined to be four using the Elbow method [25]. The m value (i.e., the fuzzy weighting exponent) was set to 2.0 [16,17]. In the clustering method, data points of the participants were plotted on the basis of the standardized mass% of the 10 serum fatty acids, and the centroids of the four clusters were determined randomly at the beginning. Next, their centroids were recalculated from the Euclidean distances between the data points. Subsequently, all cluster percentages at each data point of the participants were calculated on the basis of the Euclidean distances from the centroids of the clusters. Furthermore, the above calculations (i.e., cluster centroids and cluster percentages belonging to the data points) were repeated and terminated when no changes were observed in the results. The study participants were classified into the clusters with the highest percentage of belonging among the four clusters. We used principal component analysis to reduce the dimensions of the serum fatty acid composition balance and constructed a two-dimensional graph using the first two principal components to visualize the proximity of the four clusters.
The association between MAFLD prevalence and the four clusters was analyzed by multivariable logistic regression analysis. This association was measured as odds ratios (ORs) and 95% confidence intervals (95% CIs) for the prevalence of MAFLD in Clusters 2, 3, and 4 compared to Cluster 1; ORs were adjusted for age, sex, BMI, total concentration of fatty acids, and PNPLA3 rs738409 polymorphism using the forced entry method. When developing the multivariable logistic regression model for MAFLD, the probability (Pr) of prevalence of MAFLD was expressed as the following inverse logit function: Logit (Pr) values describing the linear relationship between prevalence of MAFLD and the covariates were calculated using the following equation: where β n represents the standardized coefficients; x n respresents the covariates.
The ORs of covariates were calculated using the following equation: A p < 0.05 value was considered statistically significant. Multiple comparisons were corrected using Bonferroni's method, and p values < 0.05/n were considered statistically significant after correcting for the number of comparisons made. The fuzzy c-means method was performed using the scikit-fuzzy library (version 0.4.2, Python Software Foundation, Wilmington, NC, USA) of Python (version 3.9.12, Python Software Foundation, Wilmington, NC, USA). The SPSS software package (version 28.0; IBM Japan Inc., Tokyo, Japan) was used for all other statistical analyses. Table 1 shows the clinical characteristics of participants with and without MAFLD. The participants with and without MAFLD had different values for age, BMI, WC, HbA1c, FBG, HDL-C, low-density lipoprotein cholesterol (LDL-C), TGs, AST, ALT, gamma-glutamyl transferase (GGT), and smoking status ( Table 1). PNPLA3 C/C, C/G, and G/G genotype frequencies of the participants were 27.3%, 55.0%, and 17.8%, respectively. The PNPLA3 genotype frequencies were in the Hardy-Weinberg equilibrium (p > 0.05), and they differed between participants with and without MAFLD ( Table 1). Similar to previous studies [21,22], bi-variable logistic regression analysis revealed that the frequency of MAFLD was higher in participants with the PNPLA3 C/G and G/G genotypes than in those with the C/C genotype, with ORs (95% CI) of 2.04 (1.21-3.44) and 2.19 (1.14-4.20), respectively.

Clustering Based on a Standardized Concentration of 10 Serum Fatty Acids
The mass% of each serum fatty acid in the participants was calculated by dividing the amount of each serum fatty acid (µg/mL) by the total serum fatty acid content (µg/mL). The fuzzy c-means method was used for dividing the 400 participants into four characteristic subclasses based on the standardized mass% of the 10 serum fatty acids. Table 2 shows the differences in the mass% of the 10 serum fatty acids and the total concentration of serum fatty acids among the four clusters. Figure 1 shows the radar charts of the mass% of the 10 serum fatty acids at the centroids of the four clusters. At the centroid of Cluster 1, high mass% of linoleic acid (C18:2 omega-6) and low mass% of myristic acid (C14:0), palmitic acid (C16:0), palmitoleic acid (C16:1 omega-7), and oleic acid (C18:1 omega-9) were observed (Figure 1). At the centroid of Cluster 2, the mass% of all fatty acids was generally average (Figure 1). At the centroid of Cluster 3, high mass% of palmitic acid (C16:0), palmitoleic acid (C16:1 omega-7), eicosapentaenoic acid (C20:5 omega-3), and docosahexaenoic acid (C22:6 omega-3), and low mass% of linoleic acid (C18:2 omega-6) were observed (Figure 1). At the centroid of Cluster 4, high mass% of myristic acid (C14:0), palmitic acid (C16:0), palmitoleic acid (C16:1 omega-7), oleic acid (C18:1 omega-9), and low mass% of stearic acid (C18:0), linoleic acid (C18:2 omega-6), eicosapentaenoic acid (C20:5 omega-3), and docosahexaenoic acid (C22:6 omega-3) were observed (Figure 1). To investigate the proximity of clusters, the percentages of other clusters within each cluster were compared ( Figure 2). In Cluster 1, the percentages were higher in the order of Cluster 2 > Cluster 3 > Cluster 4 ( Figure 2). In Cluster 2, the percentages of Clusters 1 and 3 were higher than that of Cluster 4 ( Figure 2). In Cluster 3, the percentages of Clusters 2 and 4 were higher than that of Cluster 1 ( Figure 2). In Cluster 4, the percentages were higher in the order of Cluster 3 > Cluster 2 > Cluster 1 ( Figure 2). Figure 3 shows the fatty acid composition balance expressed by the principal component analysis in terms of the first two principal components. Table S1 shows the principal component scores of the fatty acids.

Clustering Based on a Standardized Concentration of 10 Serum Fatty Acids
The mass% of each serum fatty acid in the participants was calculated by dividing the amount of each serum fatty acid (µg/mL) by the total serum fatty acid content (µg/mL). The fuzzy c-means method was used for dividing the 400 participants into four characteristic subclasses based on the standardized mass% of the 10 serum fatty acids. Table 2 shows the differences in the mass% of the 10 serum fatty acids and the total concentration of serum fatty acids among the four clusters. Figure 1 shows the radar charts of the mass% of the 10 serum fatty acids at the centroids of the four clusters. At the centroid of Cluster 1, high mass% of linoleic acid (C18:2 omega-6) and low mass% of myristic acid (C14:0), palmitic acid (C16:0), palmitoleic acid (C16:1 omega-7), and oleic acid (C18:1 omega-9) were observed (Figure 1). At the centroid of Cluster 2, the mass% of all fatty acids was generally average (Figure 1). At the centroid of Cluster 3, high mass% of palmitic acid (C16:0), palmitoleic acid (C16:1 omega-7), eicosapentaenoic acid (C20:5 omega-3), and docosahexaenoic acid (C22:6 omega-3), and low mass% of linoleic acid (C18:2 omega-6) were observed (Figure 1). At the centroid of Cluster 4, high mass% of myristic acid (C14:0), palmitic acid (C16:0), palmitoleic acid (C16:1 omega-7), oleic acid (C18:1 omega-9), and low mass% of stearic acid (C18:0), linoleic acid (C18:2 omega-6), eicosapentaenoic acid (C20:5 omega-3), and docosahexaenoic acid (C22:6 omega-3) were observed (Figure 1). To investigate the proximity of clusters, the percentages of other clusters within each cluster were compared (Figure 2). In Cluster 1, the percentages were higher in the order of Cluster 2 > Cluster 3 > Cluster 4 ( Figure 2). In Cluster 2, the percentages of Clusters 1 and 3 were higher than that of Cluster 4 ( Figure 2). In Cluster 3, the percentages of Clusters 2 and 4 were higher than that of Cluster 1 ( Figure 2). In Cluster 4, the percentages were higher in the order of Cluster 3 > Cluster 2 > Cluster 1 (Figure 2). Figure 3 shows the fatty acid composition balance expressed by the principal component analysis in terms of the first two principal components. Table S1 shows the principal component scores of the fatty acids.        Table 3 shows the differences in the clinical characteristics among the four clusters. BMI, WC, HDL-C, TGs, AST, ALT, GGT, alcohol intake, and MAFLD frequency differed among the clusters (Table 3). Moreover, the trend test revealed that the clusters in decreasing order of MAFLD prevalence are as follows: Cluster 4 > Cluster 3 > Cluster 2 > Cluster 1 ( Table 3). Table S2 shows the results comparing the information on dietary habits among the clusters. There were differences in the frequency of eating fruits and sweets among the clusters (Table S2). Table S3 shows the differences in liver function test values and FIB-4 index among the four clusters in subjects with MAFLD. GGT and FIB-4 index were different between the four clusters (Table S3). Moreover, trend test results showed that the clusters in decreasing order of AST and GGT levels were as follows: Cluster 4 > Cluster 3 > Cluster 2 > Cluster 1 (Table S3).

Multivariable Analysis of the Association between the MAFLD Prevalence and Clusters
The prevalence of MAFLD differed among the four clusters (Table 3). Therefore, we used multivariable logistic regression analysis to examine the association of the MAFLD prevalence with each cluster ( Table 4). The prevalence of MAFLD was higher in Cluster 4 than in Cluster 1, independent of age, sex, BMI, total concentration of fatty acids, and PNPLA3 rs738409 polymorphism (Table 4). In contrast, no difference was observed in the prevalence of MAFLD between Clusters 2 and 3 and Cluster 1 (Table 4).

Discussion
This study represented the findings based on the serum fatty acid composition balance by clustering using the fuzzy c-means method. The participants were classified into four clusters on the basis of the serum fatty acid composition balance, and the proximity of these clusters was investigated. Moreover, we showed that the specific serum fatty acid composition balance (i.e., Cluster 4) was associated with the MAFLD prevalence, independent of several confounding factors (i.e., age, sex, BMI, total serum fatty acid concentration, and PNPLA3 rs738409 polymorphism). These findings suggest a useful method for explaining the circulating fatty acid composition balance and its importance in MAFLD.
The fuzzy c-means method was used in this study for classifying the serum fatty acid composition balance into four characteristic patterns. We examined the relationship between these patterns and MAFLD prevalence. The fuzzy c-means method incorporates fuzziness into clustering and calculates the percentages of belonging to multiple clusters at individual data points [16,17]. The fuzzy c-means method can be used to provide information on the proximity of individual data points to other clusters in addition to the cluster to which it belongs on the basis of the percentages of individual data points belonging to multiple clusters. Comparison of the percentages of belonging to the other clusters within each cluster suggests that the serum fatty acid composition balances were close between Clusters 1 and 2, Clusters 2 and 3, and Clusters 3 and 4 ( Figure 2). Additionally, the principal component analysis findings suggest the proximity of Clusters 1 and 2, Clusters 2 and 3, and Clusters 3 and 4 ( Figure 3). We speculate that the serum fatty acid composition balance frequently shifted between Clusters 1 and 2, Clusters 2 and 3, and Clusters 3 and 4; however, further longitudinal studies are needed to confirm this theory.
Information on the proximity to each cluster at individual data points obtained using the fuzzy c-means method may also be useful in characterizing the serum fatty acid composition balance of the participants individually. Further studies are needed to clarify whether lifestyle modification can change the serum fatty acid composition balance pattern and whether such changes are effective for MAFLD prevention, treatment, or both. However, the findings of fuzzy c-means clustering may be useful in the future for proposing individualized prevention strategies for MAFLD, focusing on the serum fatty acid composition balance.
Cluster 4 was strongly associated with MAFLD prevalence, regardless of age, sex, BMI, total serum fatty acid concentration, and PNPLA3 rs738409 polymorphism (Table 4). This finding emphasizes the significance of evaluating serum fatty acid composition in terms of individual serum fatty acid concentration and balance. The serum fatty acid composition balance of Cluster 4 was characterized by a high mass% of SFAs (myristic acid (C14:0) and palmitic acid (C16:0)) and MUFAs (palmitoleic acid (C16:1 omega-7) and oleic acid (C18:1 omega-9)) and a low mass% of omega-3 PUFAs (eicosapentaenoic acid (C20:5 omega-3) and docosahexaenoic acid (C22:6 omega-3)), steric acid (C18:0), and linoleic acid (C18:2 omega-6) ( Table 2 and Figure 1). Previous human studies have found that patients with NAFLD have higher levels of SFAs and MUFAs in serum or plasma than those who do not have NAFLD [8,14,15]. SFA-enriched diets have been reported to increase hepatic triglycerides [26,27], which has been linked to increased lipolysis in adipose tissue and fatty acid transfer to the liver [26]. In contrast, MUFA-enriched diets have been reported to reduce intrahepatic triglyceride levels and improve hepatic and total insulin sensitivity [28][29][30]. In addition to SFA-rich diets, lipolysis, de novo synthesis, or both influence SFA increase [7]. However, it has been reported to increase SCD1 activity to avoid SFA-induced hepatotoxicity and increase MUFAs [12]. Indeed, while the mass% of stearic acid (C18:0) was very low in Cluster 4, the mass% of oleic acid (C18:1 omega-9) was very high ( Table 2 and Figure 1), suggesting that most of the stearic acid (C18:0) may have been converted to oleic acid (C18:1 omega-9) due to increased SCD1 activity. Moreover, a low mass% of eicosapentaenoic acid (C20:5 omega-3), docosahexaenoic acid (C22:6 omega-3), and linoleic acid (C18:2 omega-6) were also observed in Cluster 4 ( Table 2 and Figure 1). Eicosapentaenoic acid (C20:5 omega-3) and docosahexaenoic acid (C22:6 omega-3), representative omega-3 PUFAs, have been observed to reduce hepatic lipidosis, improve markers of liver damage, and increase insulin sensitivity [31,32]. They appear to exert these beneficial effects on the liver by downregulating pathways related to adipogenesis, inflammation, and fibrogenesis. They are readily incorporated into phospholipid species to maintain cell membrane fluidity and permeability [12]. Thus, a low mass% of omega-3 PUFAs was associated with susceptibility to hepatic lipotoxicity and low insulin sensitivity in Cluster 4, suggesting its association with MAFLD prevalence. Linoleic acid (C18:2 omega-6) is the most abundant PUFA in the ω-6 series, and it is unsaturated by FADS2 in the first step of the conversion process to arachidonic acid (C20:4 omega-6) [33]. FADS2 is a key enzyme in synthesizing arachidonic acid (C20:4 omega-6) from linoleic acid (C18:2 omega-6), and the activity of FADS2 in the plasma has been found to be higher in patients with NAFLD than in healthy participants [34]. In addition, plasma FADS2 activity was found to be positively correlated to BMI, insulin, and visceral fat mass, all of which are closely related to the development and progression of NAFLD [34]. Thus, high BMI and insulin resistance may have increased FADS2 activity in Cluster 4, reducing the mass% of linoleic acid (C18:2 omega-6). The information above suggests that the serum fatty acid composition of Cluster 4, which is strongly associated with MAFLD prevalence, includes a combination of high SFAs, low PUFAs, and increased SCD1 and FADS2 activities; thus, serum fatty acid composition balance may be important in determining MAFLD risk.
The trend test results showed that MAFLD prevalence was in the following order: Cluster 4 > Cluster 3 > Cluster 2 > Cluster 1 (Table 3). In contrast to the association between Cluster 4 and MAFLD prevalence, Clusters 2 and 3 were not associated with MAFLD prevalence in the multivariable analysis ( Table 4). The mass% of oleic acid (C18:1 omega-9) in Cluster 2 was higher than that in Cluster 1 (Table 2 and Figure 1). Oleic acid (C18:1 omega-9) is synthesized by unsaturation of stearic acid (C18:0) and through diet and lipolysis. As mentioned above, an increase in SFAs increases SCD1 activity to avoid SFAinduced hepatotoxicity, leading to increased MUFAs [12]. Thus, to avoid hepatotoxicity of stearic acid (C18:0), serum oleic acid (C18:1 omega-9) may have increased in Cluster 2. However, the increase was less than that in Cluster 4, and it may not have been significant enough to be associated with the prevalence of MAFLD. Compared to Cluster 3, Cluster 1 showed differences in the mass% of SFAs and omega-3 PUFAs (Table 2 and Figure 1). SFAs are hepatotoxic, whereas omega-3 PUFAs are hepatoprotective [8,10,11]; therefore, the beneficial effects of omega-3 PUFAs may compensate for the detrimental effects of SFAs, and Cluster 3 was not associated with MAFLD prevalence.
In subjects with MAFLD, GGT and FIB-4 indexes are different among the four clusters, and the clusters can be arranged in terms of decreasing AST and GGT levels as follows: Cluster 4 > Cluster 3 > Cluster 2 > Cluster 1 (Table S3). Therefore, serum fatty acid composition balance may be associated with the severity of MAFLD and, furthermore, may change as MAFLD progresses or recovers. However, the liver function test values and FIB-4 index of the subjects with MAFLD were relatively low (Table 1), indicating that the severity was relatively low and that most of them had simple steatosis. Therefore, further longitudinal studies in MAFLD subjects with fibrosis are needed to clarify the relationship between serum fatty acid composition balance and MAFLD severity and disease course.
In this study, differences in alcohol intake and smoking habits were observed among the clusters (Table 3). Furthermore, the frequencies of eating fruits and sweets were different among the clusters (Table S2). These results may be useful for reducing alcohol consumption, improving smoking cessation, and dietary modification for changing the pattern of serum fatty acid composition balance. However, we were unable to examine the effects of changes in drinking, smoking, and dietary habits on the serum fatty acid composition balance. Moreover, detailed information on dietary habits (e.g., exact intake of carbohydrates, protein, and fat) could not be obtained. Further investigation is needed through longitudinal analysis incorporating detailed information.
The present study had some limitations. It had a retrospective cross-sectional design, and it did not examine the changes in serum fatty acid composition balance. Therefore, further studies are needed to verify the findings of this study by adopting a prospective design in a larger population. Although liver biopsy, the gold standard for fatty liver diagnosis, is more sensitive than other methods, it could not be performed in this study conducted in health screening participants because liver biopsy is invasive. In this study, we used hepatic ultrasonography scanning to diagnose FLD, because this method has a sensitivity of 64% and a specificity of 97% in detecting fatty liver [23]. Therefore, the current study could identify the presence of FLD in the study subjects. However, further studies using liver biopsy are needed to validate the results of this study.

Conclusions
We determined the serum fatty acid composition balance using the fuzzy c-means method and showed its association with MAFLD prevalence. These results suggest the importance of serum fatty acid composition balance in MAFLD and may contribute to elucidating the pathogenesis of MAFLD. Furthermore, evaluation of serum fatty acid composition balance by the fuzzy c-means method provides more detailed information (e.g., proximity between fatty acid compositional balances) than simple clustering methods (e.g., the k-means method). This method may be useful for lifestyle improvement and for precision and personalized medicine development for the prevention and treatment of MAFLD in the future.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/nu15040809/s1, Table S1: Principal component scores of serum fatty acids. Table S2: Differences in dietary habits among the four clusters. Table S3  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The datasets generated and/or analyzed during the current study are not publicly available due to individual privacy but are available from the corresponding authors on reasonable request.