Artificial Intelligence in Chromatin Analysis: A Random Forest Model Enhanced by Fractal and Wavelet Features

Pantic, Igor; Paunovic Pantic, Jovana

doi:10.3390/fractalfract8080490

Open AccessArticle

Artificial Intelligence in Chromatin Analysis: A Random Forest Model Enhanced by Fractal and Wavelet Features

by

Igor Pantic

^1,2,3,*

and

Jovana Paunovic Pantic

^4,*

¹

Laboratory for Cellular Physiology, Department of Medical Physiology, Faculty of Medicine, University of Belgrade, Višegradska 26/2, 11129 Belgrade, Serbia

²

University of Haifa, 199 Abba Hushi Blvd, Mount Carmel, Haifa 3498838, Israel

³

Faculty of Health Sciences, Ben-Gurion University of the Negev, Be’er Sheva 8410501, Israel

⁴

Department of Pathological Physiology, Faculty of Medicine, University of Belgrade, Dr Subotica 9, 11129 Belgrade, Serbia

^*

Authors to whom correspondence should be addressed.

Fractal Fract. 2024, 8(8), 490; https://doi.org/10.3390/fractalfract8080490

Submission received: 13 July 2024 / Revised: 13 August 2024 / Accepted: 19 August 2024 / Published: 21 August 2024

(This article belongs to the Special Issue Fractals in Biophysics and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

In this study, we propose an innovative concept that applies an AI-based approach using the random forest algorithm integrated with fractal and discrete wavelet transform features of nuclear chromatin. This strategy could be employed to identify subtle structural changes in cells that are in the early stages of programmed cell death. The code for the random forest model is developed using the Scikit-learn library in Python and includes hyperparameter tuning and cross-validation to optimize performance. The suggested input data for the model are chromatin fractal dimension, fractal lacunarity, and three wavelet coefficient energies obtained through high-pass and low-pass filtering. Additionally, the code contains several methods to assess the performance metrics of the model. This model holds potential as a starting point for designing simple yet advanced AI biosensors capable of detecting apoptotic cells that are not discernible through conventional microscopy techniques.

Keywords:

artificial intelligence; apoptosis; chromatin; fractals; wavelets

1. Introduction

The dynamic organization of chromatin is regulated by numerous factors and can be altered under a variety of pathological conditions [1,2,3]. Traditional methods for analyzing chromatin organization often rely on complex biochemical techniques that may be prone to methodological errors and inaccurate assessments of intricate changes in nuclear structure. In conventional light and electron microscopy, capturing high-resolution structural changes is often challenging, even for the most experienced histologists or pathologists. Therefore, there is a need for a more sophisticated and advanced method to quantify changes in chromatin structural organization in both fundamental and clinical research [4,5,6].

With the introduction of machine learning (ML) methods in cell biology, an opportunity has presented itself to automate and enhance the identification of complex cellular structures and processes [7,8,9,10]. These methods usually use the computer code developed in Python programming language due to Python’s versatility, ease of use, and extensive library support. In particular, the Scikit-learn Python library is one of the most popular tools for the creation of supervised machine learning models in biomedical sciences since it offers a wide range of options for hyperparameter testing, cross-validation, feature importance calculations, and the integration of the model with other computational approaches.

Both the Python programming language and the Scikit-learn library have been previously used on numerous occasions to develop AI-based computational concepts for the classification and prediction of physiological phenomena. For example, artificial neural networks that are often based on different Python deep learning architectures (i.e., Xception, VGG16, Resnet50, Gao, etc.) have been created to differentiate between normal and abnormal cell divisions [11]. Convolutional network models that use imaging data from the altered chromatin patterns and other phenomena were suggested as a way to predict cell transformations in malignancies [12]. The Scikit-learn library was previously applied to create a classification and regression tree (CART), a gradient boosting classifier, and a random forest model for detecting subtle alterations in a cell nucleus induced by a toxic chemical agent [13]. Also, the library was recently used to evaluate the utility of machine learning in a binary classification of soft tissue tumors based on cellular data [14].

Random forests are ensemble learning methods that create a multitude of decision trees and, when used for classification purposes, usually by majority voting, chose the output as a target class or prediction [15,16]. Random forests may in certain cases have certain advantages over artificial neural networks that are often more popular in bioinformatics. These include better interpretability, better resilience to overfitting, robustness to data noise, and conservation of processing power needed for training. Also, random forests can often perform better on smaller datasets that are frequent in biomedical experiments.

Many different mathematical and biophysical quantifiers of structure can serve as potential inputs for random forests [17,18]. These include discrete wavelet transform (DWT) features, such as wavelet coefficient energies that can indirectly be used as measures of heterogeneity in chromatin micrographs at different scales and resolutions [19]. Fractal features, such as the values of fractal dimension and lacunarity, can also serve as input data for RF training as demonstrated earlier [13]. Additionally, it seems that fractal dimension might be a relatively sensitive parameter for the detection of discrete changes in two-dimensional signals originating from digital imaging in medicine. A combination of fractal and wavelet parameters might be a way to increase the capabilities of the future random forest models created for the detection of very subtle alterations in chromatin distribution, which are often present in cells experiencing DNA damage.

The aim of our study was to create a Python Scikit-learn-based code for an advanced random forest ML model intended for the identification of subtle structural changes in cells that are in the early stages of programmed cell death. We also aimed to propose a theoretical concept of a model that uses nuclear chromatin fractal dimension, fractal lacunarity, and three coefficient energies of discrete wavelet transform analyses as inputs. If successfully trained and tested in the future, this approach would hold potential as a starting point for designing simple yet advanced AI biosensors capable of detecting apoptotic cells that are not discernible through conventional microscopy techniques.

2. Materials and Methods

2.1. Fractal Analysis

Fractal analysis is based on the concept of self-similarity, which includes the property of many biological systems and exhibits structural or functional similarities when analyzed on different scales [20,21]. Fractal dimension and fractal lacunarity are the two main quantifiers of fractal analysis. Both parameters are usually determined by applying the box-counting method to a two-dimensional signal, preferably after binarization. In this method, the structure is overlaid with a series of squares (boxes) of varying sizes, and the count is taken of the squares that are at least partially filled with the structure. The fractal dimension is determined from the slope of the regression line on a logarithmic graph, where data points are plotted to show the relationship between the number of boxes (N) and the scale. This can be presented as follows:

D = log(N)/log(r)

where r is the magnification factor. Box counting can be performed using freely available software such as FracLac (version 2.0) for ImageJ (National Institutes of Health, Bethesda, MD, USA) [22,23,24]. On the other hand, rotational and translational invariance of the binarized chromatin structure could be quantified using lacunarity feature taking into account the scale (ε), standard deviation (σ), and the position of the grid (g):

λ = CV_ε,g² = (σ_ε,g ⁄_με,g)²

The alternative approach to fractal analysis of binarized chromatin structures can be to determine the values of fractal dimension on a gray-scale micrographs. In the abovementioned programs, such as FracLac, this can be achieved with the differential volume variation method, where three-dimensional volumes (Vi,j,ε) are defined over a two-dimensional space [13].

V_i,j,ε ∼ I_i,j,ε * ε²

In this case, we can calculate the line slope from the following formula:

S = lim(ε→0)(ln Vε/ln 1/ε)

Vϵ = ∑Ii,j,ϵ ϵ²

This type of fractal dimension obtained from the gray-scale (Ddif) data is then determined as follows:

D_dif = 3 − (S/2)

Fractal dimension and lacunarity, when applied to nuclear chromatin in digital micrographs, can capture subtle changes in chromatin organization and distribution, which often occur when cells are exposed to toxic environmental agents. For instance, in apoptotic cells, the chromatin structure may become more irregular and disorganized. The early stages of programmed cell death, characterized by chromatin condensation and marginalization, may lead to distinct changes in chromatin patterns. These changes may not be readily visible during traditional histopathological examinations by a pathology expert but can be detected through shifts in fractal parameters. Our previous research suggests that cell damage typically results in a reduction in fractal dimension values, as condensed chromatin tends to be more complex. Conversely, lacunarity may increase due to the structural degradation of the nucleus.

2.2. Wavelet Analysis

In a digital micrograph, the wavelet analysis of cell nuclei can be performed using the discrete wavelet transform (DWT) method [19,25]. This method may involve applying linear transformations to data vectors, thereby converting them into numerically distinct vectors of the same length. Depending on the scale, the process includes separating the data into frequency components, along with sampling and filtering. Contemporary software platforms for the DWT analysis of microscopic and radiological data include the “MaZda (version 4.6)” program developed by the Institute of Electronics, Technical University of Lodz, Poland [26,27,28,29], as a part of two COST European projects entitled “COST B21—Physiological modelling of MR Image formation” and “COST B11 AQ6—Quantitative Analysis of Magnetic Resonance Image Texture” (Figure 1).

The software applies a combination of high-pass and low-pass filters to the rows and columns of data extracted from the regions of interest within digital micrographs (Figure 2). The process of filtering cascade depends on sub-band locations (x and y) and the number of resolution units (n). Finally, for each combination of filters, the wavelet coefficient energies are calculated as follows:

E n = \frac{\sum_{x, y \in R O I} {(d_{x, y}^{s u b b a n d})}^{2}}{n}

Similarly, as with fractal dimension and lacunarity, the DWT parameters can also be used as inputs for training supervised machine learning models, including random forests.

The discrete wavelet transform shares certain similarities and differences with the Fourier transform (FT). Both methods are used to analyze the frequency content in two-dimensional signals, and as linear operations, they both adhere to the principles of superposition. However, unlike the Fourier transform, which provides a global frequency representation, the DWT offers a time–frequency representation, making it more suitable for analyzing non-stationary components. Additionally, while the FT uses sinusoidal basis functions that extend infinitely in time, the DWT employs finite and localized wavelet basis functions. Finally, the DWT supports the multi-resolution signal analysis on various scales through hierarchical decomposition.

2.3. Random Forest Model

The random forest algorithm is an ensemble supervised machine learning strategy based on generating a large number of decision trees (a “forest”) that are randomly constructed to include a sample of the training data [16,30]. It is frequently used for classification tasks, and in biological experiments, the output (target) data may be the assignment of results to experimental or control groups. Although a single decision tree can be susceptible to overfitting, random forests may mitigate this issue by incorporating randomness and diversity into the model training process. The model’s prediction becomes the class that received the majority of the decision tree ‘votes.’ The versatility and robustness of this approach make it suitable for analyzing complex interactions and nonlinear relationships, which are often present in biological phenomena. Previous research has shown that it is possible to train a random forest model with features extracted from the regions of interest in cell nuclei using discrete wavelet transform [31].

Today, one of the most popular Python libraries used to develop random forest and other machine learning classifiers is Scikit-learn (sklearn) [15,32,33,34]. It has the ability to perform model evaluation, data preprocessing, feature selection, and many other functions that are valuable for the classification of biological data. The library can also use grid search to optimize random forest hyperparameters and increase classification accuracy and discriminatory power. In this study, we opted for this library, although we acknowledge the existence and potential of other approaches. Alternative Python libraries that can be used for the same purpose and include XGBoost, H2O, and LightGBM. XGBoost [35] is mainly used for gradient boosting trees but can also produce complex random forest models on large datasets. H2O [36] can run in both Python and R languages, while LightGBM [37] can be especially useful when computational resources are limited. However, it should be stressed that all libraries should be able to produce a valid and testable RF model based on fractal and wavelet chromatin indicators without significant limitations.

3. Results

Using the Scikit-learn library, Google Collaboratory environment, and Jupyter Notebook service, we created the code for the random forest model training and testing. The data are loaded from an Excel database, facilitating the use of the model by scientists with limited expertise in Pandas dataframe creation:

df = pd.read_excel(“path_to_excel_file.xlsx”)

From the dataframe, columns of input data are selected and assigned to a new variable X (Figure 3):

X = df[[“Fractal Dimension”, “Lacunarity”, “WavHL Energy”, “WavLH Energy”, “WavHH Energy”]]

If we assume that, in the Excel database, the target column is marked as “Class”; then, its data will be assigned to the variable y:

y = df[“Class”]

To preprocess the data, we can use a preprocessing module to remove the mean and scale to unit variance:

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

We than use the train_test_split utility function to split datasets into training and testing parts:

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size = 0.3, random_state = 42)

In this example, 30% of the data is used for testing and 70% for training. This can be modified depending on the dataset size and other factors. The random state is a seed value that enables reproducibility, although it is not a necessary item in the code.

Following the data split, we define the random forest classifier:

rf = RandomForestClassifier(random_state = 55)

For hyperparameter optimization we can use the ‘param_grid’ dictionary:

param_grid = {
‘n_estimators’: [50, 100, 200],
‘max_features’: [‘auto’, ‘sqrt’, ‘log2’],
‘max_depth’: [None, 10, 20, 30],
‘min_samples_split’: [2, 5, 10],
‘min_samples_leaf’: [1, 2, 4],
‘bootstrap’: [True, False]
}
cv_rf = GridSearchCV(estimator = rf, param_grid = param_grid, cv = 5, scoring = ‘accuracy’)
cv_rf.fit(X_train, y_train)

Here, we can tune the number of trees in the forest (‘n_estimators’), the maximum depth of the tree (‘max_depth’), the number of features to consider when searching for the best split (‘max_features’), the minimum number of samples for splitting an internal node (‘min_samples_split’), the minimum number of samples present at a leaf node (‘min_samples_leaf’), and the use of bootstrap samples (‘bootstrap’). The risk of overfitting can be reduced by decreasing the number of estimators, decreasing the tree depth (making them shallower), and decreasing the number of features (max_features). Using higher minimum samples at a leaf node and enabling bootstrapping can also lower the overfitting risk.

Five-fold cross-validation is performed. Scoring and selecting the best model are conducted as follows:

best_rf = cv_rf.best_estimator_

The trained model is used for predictions:

y_pred = best_rf.predict(X_test)

y_prob = best_rf.predict_proba(X_test)[:, 1]

The model evaluation is performed by determining and displaying the confusion matrix, accuracy score, classification report, F1 score, and Matthews correlation coefficient (Figure 4):

print(“Confusion Matrix:\n”, confusion_matrix(y_test, y_pred))
print(“Classification Report:\n”, classification_report(y_test, y_pred))
print(“Accuracy Score:”, accuracy_score(y_test, y_pred))
print(“F1:”, f1_score(y_test, y_pred, pos_label = “damaged”))
print(“Matthews Correlation Coefficient:”, matthews_corrcoef(y_test, y_pred))
Finally, the Receiver Operating Characteristic analysis is performed as follows:
fpr, tpr, thresholds = roc_curve(y_test, y_prob, pos_label = “damaged”)
roc_auc = auc(fpr, tpr)

This enables the user to properly evaluate the performance of the model and also to test its discriminatory power in separating damaged from intact cells. The simplified model architecture with only three sample decision trees is presented in Figure 5.

4. Discussion

In this study, we discuss the concept of a hypothetical supervised machine learning model for the identification of apoptotic cells based on the fractal and wavelet features of nuclear chromatin texture. The model code, based on the random forest classifier algorithm, was developed using the Python Scikit-learn library and includes hyperparameter tuning with grid search as well as 5-fold cross-validation. There is a possibility that the code can serve as a useful tool for the future development of AI-related biosensors that can be applied for identifying cell damage and death in cell cultures.

Today, supervised machine learning model development is quite common in digital pathology, and numerous computational strategies have been created to help with the identification of pathological structures in digital micrographs [38,39,40]. However, our concept is somewhat unique since it integrates machine learning with two contemporary computational signal analysis methods, i.e., fractal and wavelet analyses. Fractal analysis, as mentioned before, is capable of quantifying discrete changes in structural complexity and has been successfully applied in chromatin research in the past [41,42]. Nuclear fractal dimension may decrease during cell aging and programmed cell death. Euchromatin and heterochromatin may, in some circumstances, have different values of fractal dimension, and the euchromatin/heterochromatin ratio can influence both the fractal dimension and lacunarity of the nucleus. Chromatin condensation and marginalization, often present in damaged and apoptotic cells, can be associated with reduced complexity, possibly as a result of increased textural homogeneity. Finally, it is possible that, even minor changes in chromatin distribution during sublethal damage, not visible using conventional means, can in some cases result in significant reductions in fractal dimension and increases in lacunarity [43].

Similarly, as with fractal analysis, discrete wavelet transform features may also be sensitive for detecting minor structural alterations in euchromatin and heterochromatin. Wavelet coefficient energies may be closely related to other textural features such as those obtained through the gray-level co-occurrence matrix analysis and the run length matrix. From our previous experience, a rise in textural heterogeneity due to structural degradation and deterioration often reflects on the values of not only GLCM but also wavelet features [19]. Different combinations of filters in discrete wavelet transform may provide valuable data on the intricate mechanisms behind changes in chromatin patterns.

In the future, one may consider enhancing the model with additional input parameters using other texture analysis techniques. Prime candidates are the features of the gray-level co-occurrence matrix analysis, which are sometimes calculated in conjunction with discrete wavelet transform energies. GLCM can quantify textural uniformity and local homogeneity by determining the values of angular second moment and inverse difference moment, respectively [44]. Perhaps, even more important is the calculation of textural entropy, which depends on the levels of structural disorder and chaos. Other GLCM features include textural variance, contrast, and correlation, which may reflect certain mathematical aspects of structural heterogeneity. Another approach for textural analysis would be the utilization of the run-length matrix (the gray-level run-length matrix) [45,46], which depends on the length and direction of contiguous runs of pixels with the same intensity value. Some of the features that may be considered as input parameters include the values of chromatin long-run emphasis, short-run emphasis, gray-level non-uniformity, run-length non-uniformity, and run percentage. Contemporary platforms such as MaZda can, with relative ease, compute simultaneously both gray-level co-occurrence matrix and run-length matrix features on a single chromatin region of interest. The same platforms can also perform discrete wavelet transforms, which increase the validity and reproducibility of the obtained results and the subsequent machine learning model.

Despite the obvious advantages of the random forest model in biological research, this approach has numerous limitations that potentially hamper its wider use in practice. Supervised machine learning models, in general, heavily rely on the training sample quality, and random forests are no exception [47]. The input data need to be representative, or, in this case, the values of fractal dimension, lacunarity, and wavelet coefficient energies must accurately represent the cell populations that the model intends to evaluate. Unfortunately, both fractal and wavelet parameters may be highly variable in certain experimental conditions, and their values may heavily depend on the technical settings during image acquisition and cell staining protocols. Also, there is significant variability across different software platforms used for fractal and textural analyses, and the results obtained from one computer program may not be the same as the results produced by others. This may introduce bias in the model and render it inapplicable when used outside the scope of the training data.

Another limitation that needs to be addressed is the narrow focus of the proposed random forest model on identifying cells undergoing early programmed cell death. This specific focus reduces its applicability to other fields of cell biology or scientific research. Although the model could theoretically be adapted to study other cellular processes, such as differentiation, cancer progression, or tissue regeneration, there is no guarantee that fractal dimension or discrete wavelet transform indicators would maintain the same sensitivity or accuracy under these new conditions. The same applies to analysis of specific cell types that either have multi-lobed nucleus (granulocytes) or chromatin traps. Even in analyzing non-apoptotic cells exposed to lower toxin doses, the model might show significantly reduced performance in terms of discriminatory power. However, this lack of generalizability is a broader challenge in machine learning, particularly in biological and medical fields where data variability is often high. Finally, an important limitation refers to the general flaws and pitfalls of supervised machine learning related to the lack of interpretability. While a complex model like random forest may achieve high performance metrics, such as classification accuracy, understanding and explaining how the model arrives at its predictions can be challenging. In other words, the inner workings of the model are difficult, if not impossible, to describe and understand. This is not due to a methodological error during the model creation but is mainly related to the sheer complexity of the machine learning algorithm and the training process. This absence of interpretability in scientific literature is often described as a ‘black box’, meaning that known input data result in correct outputs but the precise mechanism of how that is achieved remains unknown. While individual decision trees are straightforward, the collective decision making in a random forest is far more complex and less transparent. Therefore, in terms of the degree of reduction in interpretability, random forests are more similar to some types of artificial neural networks, such as multilayer perceptrons or convolutional neural networks.

5. Conclusions

In this study, we present an advanced and innovative concept along with a Python Scikit-learn-based code for a machine learning model that applies the random forest algorithm to identify cells undergoing the early stages of programmed cell death. The proposed model operates on textural data from nuclear regions of interest in the digital micrographs of cells exposed to detrimental environmental conditions. The input data involve a combination of parameters from discrete wavelet transform and fractal analysis. If adequately trained and tested in the future, the model could serve as a foundation for advanced AI apoptosis-sensing systems, facilitating current protocols and procedures in digital microscopy, pathology, and cell biology.

Author Contributions

Conceptualization, I.P. and J.P.P.; methodology, I.P.; software, I.P.; resources, I.P. and J.P.P.; writing—original draft preparation, I.P.; writing—review and editing, I.P. and J.P.P.; supervision, I.P. and J.P.P.; project administration, I.P.; funding acquisition, I.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science Fund of the Republic of Serbia, grant no. 7739645, “Automated sensing system based on fractal, textural and wavelet computational methods for detection of low-level cellular damage”, SensoFracTW. We also acknowledge the support of the Ministry of Science, Technological Development and Innovation of the Republic of Serbia, grant 451-03-66/2024-03/200110 (subgrant entitled “Development of artificial intelligence models based on the random forest algorithm for the detection of discrete structural changes in the cell nucleus”).

Data Availability Statement

The data related to this article can be found online at https://github.com/drpantic/Random-Forest-Fractal-DWT/blob/main/RF%20Fractal%20DWT.ipynb, accessed on 18 July 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Arnould, C.; Rocher, V.; Saur, F.; Bader, A.S.; Muzzopappa, F.; Collins, S.; Lesage, E.; Le Bozec, B.; Puget, N.; Clouaire, T.; et al. Chromatin compartmentalization regulates the response to DNA damage. Nature 2023, 623, 183–192. [Google Scholar] [CrossRef] [PubMed]
Dos Santos, A.; Cook, A.W.; Gough, R.E.; Schilling, M.; Olszok, N.A.; Brown, I.; Wang, L.; Aaron, J.; Martin-Fernandez, M.L.; Rehfeldt, F.; et al. DNA damage alters nuclear mechanics through chromatin reorganization. Nucleic Acids Res. 2021, 49, 340–353. [Google Scholar] [CrossRef] [PubMed]
Feng, Y.; Zhang, Y.; Lin, Z.; Ye, X.; Lin, X.; Lv, L.; Lin, Y.; Sun, S.; Qi, Y.; Lin, X. Chromatin remodeler Dmp18 regulates apoptosis by controlling H2Av incorporation in Drosophila imaginal disc development. PLoS Genet. 2022, 18, e1010395. [Google Scholar] [CrossRef] [PubMed]
Herbomel, G.; Grichine, A.; Fertin, A.; Delon, A.; Vourc’h, C.; Souchier, C.; Usson, Y. Wavelet transform analysis of chromatin texture changes during heat shock. J. Microsc. 2016, 262, 295–305. [Google Scholar] [CrossRef] [PubMed]
Luzhin, A.; Rajan, P.; Safina, A.; Leonova, K.; Stablewski, A.; Wang, J.; Pal, M.; Kantidze, O.; Gurova, K. Comparison of cell response to chromatin and DNA damage. bioRxiv 2023. [Google Scholar] [CrossRef] [PubMed]
Souliotis, V.L.; Vlachogiannis, N.I.; Pappa, M.; Argyriou, A.; Sfikakis, P.P. DNA damage accumulation, defective chromatin organization and deficient DNA repair capacity in patients with rheumatoid arthritis. Clin. Immunol. 2019, 203, 28–36. [Google Scholar] [CrossRef] [PubMed]
Cao, F.; Zhang, Y.; Cai, Y.; Animesh, S.; Zhang, Y.; Akincilar, S.C.; Loh, Y.P.; Li, X.; Chng, W.J.; Tergaonkar, V.; et al. Chromatin interaction neural network (ChINN): A machine learning-based method for predicting chromatin interactions from DNA sequences. Genome Biol. 2021, 22, 226. [Google Scholar] [CrossRef] [PubMed]
Challa, K.; Paysan, D.; Leiser, D.; Sauder, N.; Weber, D.C.; Shivashankar, G.V. Imaging and AI based chromatin biomarkers for diagnosis and therapy evaluation from liquid biopsies. NPJ Precis. Oncol. 2023, 7, 135. [Google Scholar] [CrossRef]
Gilbertson, E.N.; Brand, C.M.; McArthur, E.; Rinker, D.C.; Kuang, S.; Pollard, K.S.; Capra, J.A. Machine learning reveals the diversity of human 3D chromatin contact patterns. bioRxiv 2023. [Google Scholar] [CrossRef]
Xu, D.; Forbes, A.N.; Cohen, S.; Palladino, A.; Karadimitriou, T.; Khurana, E. Recapitulation of patient-specific 3D chromatin conformation using machine learning. Cell Rep. Methods 2023, 3, 100578. [Google Scholar] [CrossRef] [PubMed]
Delgado-Rodriguez, P.; Sanchez, R.M.; Roumeas-Noel, E.; Paris, F.; Munoz-Barrutia, A. Automatic classification of normal and abnormal cell division using deep learning. Sci. Rep. 2024, 14, 14241. [Google Scholar] [CrossRef] [PubMed]
Irshaid, L.; Bleiberg, J.; Weinberger, E.; Garritano, J.; Shallis, R.M.; Patsenker, J.; Lindenbaum, O.; Kluger, Y.; Katz, S.G.; Xu, M.L. Histopathologic and Machine Deep Learning Criteria to Predict Lymphoma Transformation in Bone Marrow Biopsies. Arch. Pathol. Lab. Med. 2022, 146, 182–193. [Google Scholar] [CrossRef]
Pantic, I.; Topalovic, N.; Corridon, P.R.; Paunovic, J. Oxidopamine-Induced Nuclear Alterations Quantified Using Advanced Fractal Analysis: Random Forest Machine Learning Approach. Fractal Fract. 2023, 7, 771. [Google Scholar] [CrossRef]
Di, J.; Hickey, C.; Bumgardner, C.; Yousif, M.; Zapata, M.; Bocklage, T.; Balzer, B.; Bui, M.M.; Gardner, J.M.; Pantanowitz, L.; et al. Utility of artificial intelligence in a binary classification of soft tissue tumors. J. Pathol. Inform. 2024, 15, 100368. [Google Scholar] [CrossRef]
Ahn, S. Building and analyzing machine learning-based warfarin dose prediction models using scikit-learn. Transl. Clin. Pharmacol. 2022, 30, 172–181. [Google Scholar] [CrossRef]
Wu, Y.; Han, W.; Xu, D.; Wang, X.; Yang, J.; Lu, Z.; Chen, X.; Ding, Y. Identification of subtype specific biomarkers of clear cell renal cell carcinoma using random forest and greedy algorithm. Biosystems 2021, 204, 104372. [Google Scholar] [CrossRef]
Bayat, S.; Broche, L.; Degrugilliers, L.; Porra, L.; Paiva, M.; Verbanck, S. Fractal analysis reveals functional unit of ventilation in the lung. J. Physiol. 2021, 599, 5121–5132. [Google Scholar] [CrossRef]
Darawsheh, A.F.; Kolarovszki, B.; Hong, D.H.; Farkas, N.; Taheri, S.; Frank, D. Applicability of Fractal Analysis for Quantitative Evaluation of Midpalatal Suture Maturation. J. Clin. Med. 2023, 12, 4189. [Google Scholar] [CrossRef]
Pantic, I.; Cumic, J.; Dugalic, S.; Petroianu, G.A.; Corridon, P.R. Gray level co-occurrence matrix and wavelet analyses reveal discrete changes in proximal tubule cell nuclei after mild acute kidney injury. Sci. Rep. 2023, 13, 4025. [Google Scholar] [CrossRef]
Citir, M.; Karslioglu, H.; Uzun, C. Evaluation of mandibular trabecular and cortical bone by fractal analysis and radiomorphometric indices in bruxist and non-bruxist patients. BMC Oral Health 2023, 23, 522. [Google Scholar] [CrossRef] [PubMed]
Di Ieva, A.; Reishofer, G. Fractal-Based Analysis of Arteriovenous Malformations (AVMs). Adv. Neurobiol. 2024, 36, 413–428. [Google Scholar] [CrossRef]
Dos Santos, J.B.; Starosta, R.T.; Pilar, E.F.S.; Kunz, J.D.; Tomedi, J.; Cerski, C.T.S.; Ruppenthal, R.D. Nuclear morphometry and chromatin texture changes in hepatocellular carcinoma samples may predict outcomes of liver transplanted patients. BMC Gastroenterol. 2022, 22, 189. [Google Scholar] [CrossRef]
Karperien, A. FracLac for ImageJ. Available online: http://rsb.info.nih.gov/ij/plugins/fraclac/FLHelp/Introduction.htm (accessed on 28 January 2023).
Mancini, M.; Bargiacchi, L.; De Vitis, C.; D’Ascanio, M.; De Dominicis, C.; Ibrahim, M.; Rendina, E.A.; Ricci, A.; Di Napoli, A.; Mancini, R.; et al. Histologic Analysis of Idiopathic Pulmonary Fibrosis by Morphometric and Fractal Analysis. Biomedicines 2023, 11, 1483. [Google Scholar] [CrossRef]
Paunovic, J.; Vucevic, D.; Radosavljevic, T.; Vukomanovic Djurdjevic, B.; Stankovic, S.; Pantic, I. Effects of Iron Oxide Nanoparticles on Structural Organization of Hepatocyte Chromatin: Gray Level Co-Occurrence Matrix Analysis. Microsc. Microanal. 2021, 27, 889–896. [Google Scholar] [CrossRef]
Kociołek, M.; Materka, A.; Strzelecki, M.; Szczypinski, P. Discrete wavelet transform–derived features for digital image texture analysis. In Proceedings of the Interational Conference on Signals and Electronic Systems, Lodz, Poland, 18–21 September 2001; pp. 163–168. [Google Scholar]
Strzelecki, M.; Szczypinski, P.; Materka, A.; Klepaczko, A. A software tool for automatic classification and segmentation of 2D/3D medical images. Nucl. Instrum. Methods Phys. Res. A 2013, 702, 137–140. [Google Scholar] [CrossRef]
Szczypinski, P.; Strzelecki, M.; Materka, A. MaZda—A Software for Texture Analysis. In Proceedings of the ISITC 2007, Jeonju, Republic of Korea, 23 November 2007; pp. 245–249. [Google Scholar]
Szczypinski, P.; Strzelecki, M.; Materka, A.; Klepaczko, A. MaZda-A software package for image texture analysis. Comput. Methods Programs Biomed. 2009, 94, 66–76. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Liu, G.; Peng, X. A Random Forest Model for Post-Treatment Survival Prediction in Patients with Non-Squamous Cell Carcinoma of the Head and Neck. J. Clin. Med. 2023, 12, 5015. [Google Scholar] [CrossRef]
Valjarevic, S.; Jovanovic, M.B.; Miladinovic, N.; Cumic, J.; Dugalic, S.; Corridon, P.R.; Pantic, I. Gray-Level Co-occurrence Matrix Analysis of Nuclear Textural Patterns in Laryngeal Squamous Cell Carcinoma: Focus on Artificial Intelligence Methods. Microsc. Microanal. 2023, 29, 1220–1227. [Google Scholar] [CrossRef] [PubMed]
Tanaka, T. [Fundamentals] 5. Python+scikit-learn for Machine Learning in Medical Imaging. Nihon Hoshasen Gijutsu Gakkai Zasshi 2023, 79, 1189–1193. [Google Scholar] [CrossRef]
Yang, Y.; Hu, P.; Chen, S.R.; Wu, W.W.; Chen, P.; Wang, S.W.; Ma, J.Z.; Hu, J.Y. Predicting the Activity of Oral Lichen Planus with Glycolysis-related Molecules: A Scikit-learn-based Function. Curr. Med. Sci. 2023, 43, 602–608. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Su, K.; Yuan, X.; Huang, Y.; Yuan, Q.; Yang, M.; Sun, J.; Li, S.; Long, X.; Liu, L.; Li, T.; et al. Improved Prediction of Knee Osteoarthritis by the Machine Learning Model XGBoost. Indian J. Orthop. 2023, 57, 1667–1677. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Liu, X.; Gao, L.; Zhu, M.; Shia, B.C.; Chen, M.; Ye, L.; Qin, L. Using the H₂O Automatic Machine Learning Algorithms to Identify Predictors of Web-Based Medical Record Nonuse among Patients in a Data-Rich Environment: Mixed Methods Study. JMIR Med. Inform. 2023, 11, e41576. [Google Scholar] [CrossRef] [PubMed]
Ahn, J.M.; Kim, J.; Kim, K. Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting. Toxins 2023, 15, 608. [Google Scholar] [CrossRef]
Knudsen, B.S.; Jadhav, A.; Perry, L.J.; Thagaard, J.; Deftereos, G.; Ying, J.; Brintz, B.J.; Zhang, W. A pipeline for evaluation of machine learning/AI models to quantify PD-L1 immunohistochemistry. Lab. Investig. 2024, 104, 102070. [Google Scholar] [CrossRef]
Zhang, W.; Koh, M.Y.; Sirohi, D.; Ying, J.; Brintz, B.J.; Knudsen, B.S. Predicting IHC staining classes of NF1 using features in the hematoxylin channel. J. Pathol. Inform. 2023, 14, 100196. [Google Scholar] [CrossRef]
Pantic, I.V.; Cumic, J.; Valjarevic, S.; Shakeel, A.; Wang, X.; Vurivi, H.; Daoud, S.; Chan, V.; Petroianu, G.A.; Shibru, M.G.; et al. Computational approaches for evaluating morphological changes in the corneal stroma associated with decellularization. Front. Bioeng. Biotechnol. 2023, 11, 1105377. [Google Scholar] [CrossRef] [PubMed]
Gupta, S.; Savala, R.; Gupta, N.; Dey, P. Fractal dimension and chromatin textural analysis to differentiate follicular carcinoma and adenoma on fine needle aspiration cytology. Cytopathology 2020, 31, 491–493. [Google Scholar] [CrossRef]
Metze, K.; Adam, R.; Florindo, J.B. The fractal dimension of chromatin—A potential molecular marker for carcinogenesis, tumor progression and prognosis. Expert Rev. Mol. Diagn. 2019, 19, 299–312. [Google Scholar] [CrossRef]
Yi, J.; Stypula-Cyrus, Y.; Blaha, C.S.; Roy, H.K.; Backman, V. Fractal Characterization of Chromatin Decompaction in Live Cells. Biophys. J. 2015, 109, 2218–2226. [Google Scholar] [CrossRef]
Dheepak, G.; Christaline, J.A.; Vaishali, D. Brain tumor classification: A novel approach integrating GLCM, LBP and composite features. Front. Oncol. 2023, 13, 1248452. [Google Scholar] [CrossRef] [PubMed]
Ouyang, Z.; Zhao, S.; Cheng, Z.; Duan, Y.; Chen, Z.; Zhang, N.; Liang, D.; Hu, Z. Dynamic PET Imaging Using Dual Texture Features. Front. Comput. Neurosci. 2021, 15, 819840. [Google Scholar] [CrossRef] [PubMed]
Mishra, A.; Ravina, M.; Kote, R.; Kumar, A.; Kashyap, Y.; Dasgupta, S.; Reddy, M. Role of textural analysis parameters derived from FDG PET/CT in differentiating hepatocellular carcinoma and hepatic metastases. Nucl. Med. Commun. 2023, 44, 381–389. [Google Scholar] [CrossRef] [PubMed]
Hu, J.; Szymczak, S. A review on longitudinal data analysis with random forest. Brief. Bioinform. 2023, 24, bbad002. [Google Scholar] [CrossRef]

$Fractalfract 08 00490 g001$

Figure 1. An example of micrograph textural analysis using the MaZda (version 4.6) software (A) performed on yeast cells exposed to a pro-apoptotic environment. Panels (B,C) show the damaged cells, which are visually indistinguishable from the control cells shown in panels (D,E). However, the wavelet coefficient energies and other textural indicators reveal significant differences, highlighting the subtle structural changes that a random forest model could detect.

$Fractalfract 08 00490 g001$

$Fractalfract 08 00490 g002$

Figure 2. The filtering cascade during the discrete wavelet transform with factor 2 subsampling. A combination of high-pass (H) and low-pass (L) filtering is applied to determine wavelet coefficient energies.

$Fractalfract 08 00490 g002$

$Fractalfract 08 00490 g003$

Figure 3. In the Scikit-learn Python library, we can import fractal and wavelet data from an external database and train the RF model with hyperparameter tuning.

$Fractalfract 08 00490 g003$

$Fractalfract 08 00490 g004$

Figure 4. Scikit-learn Python code for cross-validation and calculation of model performance indicators.

$Fractalfract 08 00490 g004$

$Fractalfract 08 00490 g005$

Figure 5. The simplified model architecture with only 3 sample decision trees.

$Fractalfract 08 00490 g005$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pantic, I.; Paunovic Pantic, J. Artificial Intelligence in Chromatin Analysis: A Random Forest Model Enhanced by Fractal and Wavelet Features. Fractal Fract. 2024, 8, 490. https://doi.org/10.3390/fractalfract8080490

AMA Style

Pantic I, Paunovic Pantic J. Artificial Intelligence in Chromatin Analysis: A Random Forest Model Enhanced by Fractal and Wavelet Features. Fractal and Fractional. 2024; 8(8):490. https://doi.org/10.3390/fractalfract8080490

Chicago/Turabian Style

Pantic, Igor, and Jovana Paunovic Pantic. 2024. "Artificial Intelligence in Chromatin Analysis: A Random Forest Model Enhanced by Fractal and Wavelet Features" Fractal and Fractional 8, no. 8: 490. https://doi.org/10.3390/fractalfract8080490

APA Style

Pantic, I., & Paunovic Pantic, J. (2024). Artificial Intelligence in Chromatin Analysis: A Random Forest Model Enhanced by Fractal and Wavelet Features. Fractal and Fractional, 8(8), 490. https://doi.org/10.3390/fractalfract8080490

Article Menu

Artificial Intelligence in Chromatin Analysis: A Random Forest Model Enhanced by Fractal and Wavelet Features

Abstract

1. Introduction

2. Materials and Methods

2.1. Fractal Analysis

2.2. Wavelet Analysis

2.3. Random Forest Model

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI