Development of an Early Lung Cancer Diagnosis Method Based on a Neural Network

Karymsakova, Indira; Kozhakhmetova, Dinara; Bekenova, Dariga; Ostroukh, Danila; Bekbayeva, Roza; Kydyralina, Lazat; Bugubayeva, Alina; Kurushbayeva, Dinara

doi:10.3390/computers14090397

Open AccessArticle

Development of an Early Lung Cancer Diagnosis Method Based on a Neural Network

by

Indira Karymsakova

¹,

Dinara Kozhakhmetova

^1,*

,

Dariga Bekenova

²,

Danila Ostroukh

¹,

Roza Bekbayeva

¹

,

Lazat Kydyralina

¹,

Alina Bugubayeva

³ and

Dinara Kurushbayeva

¹

Graduate School Digital Technologies and Construction, Department of Automation and Information Technologies, Shakarim University, St. Glinka, 20A, Semey 071412, Kazakhstan

²

Department of Information Technologies, Higher School of Business and Digital Technologies, University “Turan-Astana”, Y Dukenuly, 29a Street, Astana 010000, Kazakhstan

³

Department of Digital Engineering and IT-Analytics, Faculty of Finance, Logistics and Digital Technologies, Karaganda University of Kazpotrebsouz, 9 Academic St., Karaganda 100009, Kazakhstan

^*

Author to whom correspondence should be addressed.

Computers 2025, 14(9), 397; https://doi.org/10.3390/computers14090397

Submission received: 3 July 2025 / Revised: 11 September 2025 / Accepted: 16 September 2025 / Published: 18 September 2025

(This article belongs to the Special Issue AI in Its Ecosystem)

Download

Browse Figures

Versions Notes

Abstract

Cancer is one of the most lethal diseases in the modern world. Early diagnosis significantly contributes to prolonging the life expectancy of patients. The application of intelligent systems and AI methods is crucial for diagnosing oncological diseases. Primarily, expert systems or decision support systems are utilized in such cases. This research explores early lung cancer diagnosis through protocol-based questioning, considering the impact of nuclear testing factors. Nuclear tests conducted historically continue to affect citizens’ health. A classification of regions into five groups was proposed based on their proximity to nuclear test sites. The weighting coefficient was assigned accordingly, in proportion to the distance from the test zones. In this study, existing expert systems were analyzed and classified. Approaches used to build diagnostic expert systems for oncological diseases were grouped by how well they apply to different tumor localizations. An online questionnaire based on the lung cancer diagnostic protocol was created to gather input data for the neural network. To support this diagnostic method, a functional block diagram of the intelligent system “Oncology” was developed. The following methods were used to create the mathematical model: gradient boosting, multilayer perceptron, and Hamming network. Finally, a web application architecture for early lung cancer detection was proposed.

Keywords:

expert system; information systems; artificial intelligence; neural network; Hamming network; analysis; intellectual systems; databases

1. Introduction

The high incidence of oncological diseases across various age groups is a critical issue. Hence, early detection and prevention of cancer remain crucial tasks. In regions that were exposed to nuclear testing, there is a high incidence of cancer, and hereditary factors also play a role. The hereditary factor plays a significant role in cancer development. Lung cancer does have a genetic origin in a subset of cases, primarily among young patients with adenocarcinoma [1].

In this regard, one of the most important tasks is the early detection of oncological diseases and their prevention. Early diagnosis and accurate diagnostic assessment enhance the likelihood of effective treatment measures and extend patient survival. When diagnosing lung cancer, heredity and smoking play an important role.

The integration and utilization of information technologies in medicine are rapidly advancing. Currently, numerous decision support systems (DSSs) exist within the medical field, widely applied in preliminary diagnostics for various diseases, including oncology, where expert systems have been significantly developed. Depending on the type of disease and localization, different mathematical apparatus is used in these expert systems.

Development of a method for early diagnosis of lung cancer will allow us to identify the disease at an early stage and to take the necessary measures to preserve the full life of patients. The proposed expert system-based method for early detection of lung cancer facilitates the assessment of individual risk levels for lung cancer, as well as for other pulmonary conditions. This approach has been validated in clinical settings and has demonstrated promising results. Its implementation could enable timely intervention in high-risk cases, thereby improving early diagnostic outcomes.

2. Materials and Methods

The present study uses methods of scientific analysis, synthesis, and classification of approaches in the development of expert systems, along with the application of Hamming neural networks.

Numerous expert systems have been developed for diagnosing oncological diseases, employing diverse approaches and methodologies. For instance, Karabatak et al. described constructing an expert system for breast cancer detection using association rules (ARs) and neural networks (NNs). This method involves developing an automated diagnostic system based on AR and NN, employing a three-fold cross-validation method for performance evaluation during testing. This approach facilitates quick automatic diagnostic systems for other diseases; however, it has a high dependency on feature selection [2].

Oyelade et al. proposed an approach utilizing semantic networks and natural language processing (NLP). They introduced a formalized input generation model addressing the challenge of raw data inputs through inference processes, a breast cancer lexicon, rule sets, and NLP. An input-generation algorithm using Python’s 3.13.1 NLP capabilities initially filters and generates a preliminary input collection. This collection serves as input for the inference mechanism, which employs ontology-based rules and lexicon to generate valid tokens, subsequently processed by medical expert systems or diagnostic decision support systems (DDSSs). This model improved performance by 64% compared to systems processing raw patient data [3].

Physical research methods for expert systems in oncology diagnostics were proposed in [4], focusing on multimedia training systems for physicians. Disease localizations studied include esophagus, stomach, intestines, pancreas, and liver. The system’s practical application enhances the efficiency of oncologists’ postgraduate training and professional development [4].

Shereen A. Taie et al. developed a prototype expert system evaluating breast cancer risk based on criterion-based parameters. The system assesses lifestyle factors against specified criteria to determine breast cancer development risk, providing preventive recommendations that encourage lifestyle modifications. Although knowledge within this system derives from expert opinions and published research, further expert validation is needed for practical application [5].

Rahul Boadh et al. discussed fuzzy expert systems for diagnosing various cancers. Fuzzy reasoning has recently emerged as a dynamic strategy in disease recognition, particularly for malignant neoplasms. While identifying malignancies involves uncertainties and errors, fuzzy-based models have significantly advanced diagnostics. Integrating fuzzy logic and expert recommendations allows early analysis of human infections, reducing mortality. The Ant Colony Optimization (ACO) algorithm optimized fuzzy IF/THEN rule sets for enhanced diagnostic performance [6].

Intelligent screening methods for diagnosing and treating oncological diseases were introduced in [7]. This approach predicts patient conditions beyond medical history by simulating qualified physician prognostic processes and therapeutic decision-making. Unlike traditional statistical models, this approach builds knowledge databases based on patterns from similar patient situations. These knowledge-based models provide superior accuracy compared to classical statistical models [7]. Ritna Wahyuni et al. proposed a hybrid expert system construction method employing forward chaining for symptom tracking and certainty factor methods for accuracy in symptom condition assessment. The system facilitates early education in diagnosing and treating lymph node diseases, achieving an 80% user accuracy rating during initial system testing [8].

Tanyukevich et al. introduced an expert system classifying cancer risk groups based on a production model for formalizing medical knowledge. This study presented a Markov chain-based model for predicting cancer patient numbers, incorporating standard oncological classifications and latent disease periods. This expert system enables personalized risk-reduction programs, outperforming traditional systems [9].

Ali Keleş et al. investigated a neuro-fuzzy rule-based expert system for breast cancer diagnostics. Neuro-fuzzy methods identified rules employed in the Ex-DBC system’s inference engine, providing reliable diagnostic performance with 97% specificity, 76% sensitivity, 96% positive predictive value, and 81% negative predictive value. High positive predictive value facilitates unnecessary biopsy avoidance and can also serve educational purposes for medical students [10].

Shinsuke Morio et al. developed an expert system for early breast cancer detection aimed at nonspecialist users. The system facilitates interactions between users and a microcomputer, collecting symptom descriptions and offering preventive advice. The system, implemented in Prolog, outlines early detection methods and suggests subsequent actions [11].

In [12], new approaches using fuzzy expert systems for breast cancer prediction and diagnosis were examined. This study investigated developing an enhanced mobile fuzzy expert system (EMFES) employing type-2 fuzzy logic for accuracy and uncertainty management. Key aspects included comparative effectiveness, uncertainty handling, mobile feature integration, programming language selection, and dynamic fuzzy rule creation. Challenges and deployment opportunities for mobile EMFES were explored, highlighting the necessity for comprehensive, adaptable mobile solutions [13].

A Mamdani inference system with 67 fuzzy rules derived from expert opinions was implemented in MATLAB 2017a [14] and evaluated through 50 clinical case scenarios. Sensitivity was 92.1%, and specificity was 83.1%, demonstrating effectiveness [14,15,16,17,18,19].

The authors in [20] present a fuzzy expert system for diagnosing heart diseases. The system receives precise input data and performs a fuzzification step. After that, defuzzification is carried out to convert the fuzzy output of the system into a crisp value, which is easier for humans to interpret. The system measures the risk level of heart disease, providing one of the following outputs: low, high, or risky. The results demonstrate that the system is effective in identifying the appropriate risk levels [20].

In [21], the authors propose a hybrid approach wherein probabilistic rules generated from a decision tree are integrated with expert-provided rules. The aim of this integration is to augment the data derived from the decision tree with expert knowledge. The system is applied to the diagnosis of diabetes mellitus. Experimental results demonstrate that this combined approach outperforms the use of each rule set independently [21].

In [22], the Hamming method is applied. Subset accuracy evaluates the proportion of correctly classified labels (identical to the ground truth), which is generally considered too strict a metric for multi-label classification models. Hamming loss assesses performance based on errors in instance-label pair predictions; an error here indicates either a relevant label is missed or an irrelevant label is predicted [22].

In [23], it was established that the choice of similarity function has a significant impact on the system’s performance. The random forest similarity function was found to be the most accurate for the given datasets. This study highlights the importance of careful selection of the similarity function when developing a case-based reasoning system for medical diagnosis [23].

Approaches for expert systems in cancer diagnosis were analyzed, identifying strengths and weaknesses, and classified according to disease localization applicability.

3. Results

Analyzing these approaches enables classification according to applicability, as discussed in Table 1:

Class 1: Methods applicable across various localizations.
Class 2: Methods applicable across fewer localizations.
Class 3: Methods applicable exclusively to specific localizations.

Table 1. Classification of approaches in expert systems.

Class	Approaches
Class 1	Neural network approach
Class 2	Approach using factor analysis and physical research methods
Class 3	Full formalization, associative rules, semantic networks, natural language analysis, and fuzzy logic approaches

The approaches used in the development of expert systems for cancer diagnosis were classified based on their applicability depending on the localization of the disease.

Class 1—most frequently used methods.

Class 2—less frequently used methods.

Class 3—least frequently used methods.

The conducted classification of approaches used in expert systems allows identifying the most optimal methods applicable across a broad range of localizations for subsequent use in developing a prototype of an information system for cancer diagnostics [24,25,26,27,28,29,30]. Based on this analysis, the decision was made to utilize neural networks in constructing an expert system for preliminary diagnosis.

The following stages are proposed for building an intelligent network based on neural networks for analyzing and providing recommendations for cancer treatment:

Knowledge acquisition and analysis, identification, and conceptualization.
Knowledge modeling and formalization.
Development of a script-based question database and neural network training.
Creation of an intelligent system prototype.

Knowledge acquisition and analysis, identification, and conceptualization.

At this stage, an analysis of the subject domain was performed, tasks were comprehended, requirements were formed, and an informal description was compiled highlighting the main characteristics, key concepts/objects, their input/output values, the anticipated type of solution, and relevant knowledge.

2.: Knowledge modeling and formalization.

Data collection regarding medical indicators of patients diagnosed with lung cancer over the past five years within the region was conducted, and primary risk assessment indicators for lung cancer susceptibility were identified. Medical indicators and responses were determined for inclusion in a survey questionnaire, forming the knowledge base of the intelligent diagnostic system. This questionnaire was developed as an online survey following the lung cancer diagnostic protocol.

Two additional questions regarding patients’ place of birth and residence were included to consider the radioactive contamination factor resulting from nuclear testing. The weight of each response is allocated proportionally to its overall significance to facilitate subsequent development of the Hamming neural network.

To simulate oncological experts’ reasoning, a total of 21 rules were derived based on medical expert opinions. The relative weight of each response was assigned by oncologists based on the results of studies involving data from cancer patients. Each rule includes 2 to 5 responses, each with a numerical value corresponding to the degree of impact in Table 1 [31].

The online survey was completed by 1283 patients. The results of the online survey showed the following percentages in Table 2 and Table 3.

3.: Development of a script-based question database and neural network training.

At this stage, a database of questions was created, and the neural network was trained on data obtained from patients’ responses to the online questionnaire.

4.: Creation of an intelligent system prototype.

At this stage, the architecture and structure of the intelligent system were developed, including data file exchange, an information-logical model, and data, physical, and logical design of the system. A web application for medical organization staff was also developed. The functional structure of the expert system is shown in Figure 1.

The intelligent system consists of six modules: Symptoms Module, Medical History Module, Tests Module, Application Module, Treatment Module, and Prognosis Module.

The Symptoms Module stores information about symptoms associated with respiratory system diseases, diagnoses, and questions based on the protocol used during patient interviews. The Medical History Module stores information about the medical histories of patients who have visited the center with respiratory system complaints. The Tests Module stores information about the results of patient tests recorded in the database.

Data from the Symptoms, Medical History, and Tests modules are transferred to the Application Module. These data are processed in Power BI 2.145.1105.0, after which a mathematical model is created in Python, and patient response data are stored in an SQL database.

The Treatment Module stores information about possible diagnoses for respiratory system diseases and treatment protocols.

The Prognosis Module allows forecasting of respiratory diseases, including lung cancer specifically.

Knowledge modeling and formalization.

At this stage, questions and answers from patients’ oncology history cards are formalized for subsequent processing by the expert system.

In Equation (1), we define the model of the intelligent system (Figure 2):

Y = \{A, B, C, E\},

(1)

where Y is the set of resulting data.

A is the set of questions on the oncologic history.
B is the set of answers on the oncologic history.
C is the set of weights of answers on the oncologic history.
E is the set of diagnoses.

Figure 2. Neural network’s model. Here, A_i, B_i, and C_i are input sets; d_i is the activation function; and y_n are the resulting values.

The maximization function is used as the activation function d_i; the answer with the maximum weight is selected from the answer options. In Equation (2), we define the activation functions:

d_{i} = f \{m a x (C_{i})\}

(2)

Creating a database of scripted questions, training the neural network.

At this stage, a database of questions is created, and the neural network is trained using the data obtained from the online questionnaire of the respondents.

The neural network training process is shown in Figure 3.

The process of signal propagation according to one criterion is shown in Figure 4.

The functioning of this network can be described as follows:

The input layer receives data consisting of respondents’ questions and answers for processing and passes them along the network.

Hidden layer(s) perform computations by selecting the maximum value among the weighted responses and processing the data and then transmit the results to the next layer.

The output layer predicts the likelihood of lung cancer susceptibility and classifies the results based on the processed data.

Each neuron in the neural network is connected to the adjacent layers through weights, which determine the importance of each input signal. The input values and the corresponding weights are multiplied, and the results are summed. If a certain threshold is reached, the neuron’s activation function is triggered, allowing the neuron to transmit its output to the next layer.

A mathematical model was created in Python, utilizing a neural network. The input parameters for this model are the weighted answers, with the endogenous factor being the response to the question “Have you ever had cancer?” Each answer is assigned a specific weight, and based on these weights, the system calculates a resulting value. Consequently, the script outputs the probability of lung cancer susceptibility.

The following methods were used to create the mathematical model: gradient boosting, multilayer perceptron, and Hamming network.

Alongside widely used models such as multilayer perceptrons and gradient boosting methods, we selected the Hamming network for the following reasons:

Robustness to Input Noise.

The Hamming network is known for its robustness to input distortions, maintaining correct classification even in the presence of noise—provided the input remains sufficiently similar to the reference patterns. In our study, the responses are selected from five predefined answer options, resulting in inputs that are generally close to the reference vectors. Given these conditions, we consider the Hamming network to be a well-suited choice for achieving accurate classification.

2.: Simplicity of Implementation.

The architecture of the Hamming network consists of only two layers:

-: The first layer computes the correlation between the input vector and the reference patterns.
-: The second layer implements a winner-takes-all mechanism, selecting the closest matching pattern.

This structure makes the model both transparent and easy to implement, whether in hardware or software.

3.: High Recognition Speed.

Thanks to its simple architecture, the Hamming network can quickly identify the closest reference pattern without requiring extensive training or complex computations. This leads to high recognition speed, which is especially advantageous in applications that demand rapid decision-making.

4.: Efficiency with Small Datasets.

Unlike deep neural networks that require large volumes of training data, the Hamming network can operate effectively with a limited number of reference patterns. This is particularly important for tasks where data are scarce or expensive to obtain. In our study, the dataset is based on medical indicators. Conducting surveys within medical institutions involves significant time and numerous administrative approvals from institutional leadership. As a result, the survey was conducted online, which limited the number of respondents. Given these constraints, the Hamming network was selected as an appropriate model for initial experimentation.

5.: Applicability to Classification and Recognition Tasks.

The Hamming network is widely used for the following tasks:

Pattern and character recognition.
Signal and time-series analysis.
Data integrity verification.
Development of systems with limited computational resources.

Since this experiment was conducted online and tested on a single physician’s workstation, computational resources were limited. Therefore, the Hamming network was chosen for model validation due to its low computational requirements.

In summary, the choice of the Hamming network is justified by its simplicity, robustness to noise, fast processing speed, and suitability for tasks with constrained resources. In future research, with a larger volume of input data, deep neural network models will be explored and evaluated.

Creating a model using the Hamming network.

The code that calculates the risk probabilities is shown in Figure 5.

The most significant factors identified were age, smoking duration, daily cigarette consumption, and the annual frequency of ARVI infections.

To build a forecast model, the following numerical features were taken:

numeric_features = [
“ Your age ”,
“ Smoking experience ”,
“ How many packs/pieces do you consume per day? ”,
“ How many times a year do you get SARS?”

The test and training samples were generated as follows:

X_train, X_test, y_train, y_test = train_test_split(
X_train_full,
y_train_full,
test_size = 0.2,
random_state = 42
)

The following parameters were defined as categorical features:

categorical_features = [
“ Your gender ”,
“Oncoanamnesis of parents and close relatives? ”,
“ Do you use cigarettes? ”
]

The output of the intermediate values of the neural network is shown in Figure 6.

Intermediate results were obtained in the first layer.

// Train shape: (175, 21), Test shape: (44, 21)

After splitting the dataset into training and testing sets, we obtained 175 rows with 21 features in the training set and 44 rows with the same number of features in the test set. This confirms the correct 80/20 split and that all features were retained.

Class balance in the training set: Counter({0.0: 169, 1.0: 6})

The training set exhibited significant imbalance: 169 negative instances (0) and only 6 positive instances (1). This imbalance necessitated balancing; hence, SMOTE was subsequently employed.

After ColumnTransformer: (175, 10).

Following numerical imputation, standardization, and one-hot encoding, the original 21 columns transformed into 10 features. The feature matrix size became 175 × 10, indicating that categories were correctly encoded and numerical features underwent scaling.

Number of features: 10.

Verification confirms that the final feature vector indeed consists of 10 columns after all transformations.

First 10 features:

Feature names output (get_feature_names_out) revealed one numerical feature (num_age) and nine categorical one-hot features. The names are meaningful, with no duplicates.

After SMOTE: (338, 10)

The SMOTE method generated synthetic samples for the minority class, increasing the number of rows to 338 while retaining 10 features. Each row continues to be described by the same feature set.

Class balance after SMOTE: Counter({0.0: 169, 1.0: 169})

The classes are now fully balanced, with 169 examples each. The model will be trained on equal numbers of positive and negative cases.

Classification report.

On the test dataset, the model achieves approximately 0.95 accuracy for the zero class but fails to recognize the positive class (precision and recall for class 1 are 0). This indicates that under current settings, RandomForest has not learned to identify the rare class in independent data.

ROC-AUC: 0.9524

Despite poor metrics for class 1, the overall ROC-AUC metric is high (~0.95). This occurs because the test dataset contains only two positive samples; even a single error significantly affects precision/recall but has minimal impact on the ROC curve.

The precision and recall values for the positive class are equal to 0. This is due to the fact that there are only two positive instances in the test set. While the model is capable of identifying positive cases, the likelihood of doing so is low. This is primarily because the survey was conducted among 1283 respondents, of whom only a small fraction tested positive for cancer.

In future study, the questionnaire is expected to be administered across multiple medical institutions, including a specialized oncology center.

In particular, a model was built using the original input parameters based on gradient boosting with decision trees as the base algorithm. The model was implemented using the LightGBM framework. Class balancing was performed using the SMOTE method, while hyperparameter optimization was carried out with Optuna. The obtained results are presented in Figure 7.

Based on the operation of the neural network, the following data were obtained, as presented in the corresponding Figure 8.

Subsequently, a model using a multilayer perceptron was implemented in Keras. The final performance metrics of this model are presented in Figure 9.

The problem of zero accuracy for the positive class arose due to the following constraints:

-: Insufficient amount of input data, as the survey was conducted among residents of regions affected by nuclear testing.
-: A low number of positive responses indicating cancer incidence. Since the majority of respondents were healthy, the percentage of positive cases was consequently low.

We have taken the following steps to address the issue of zero precision for the positive class:

Class Imbalance Adjustment:

Using an oversampling technique, the size of the positive class was increased. The proportion of positive instances was raised to 40%. Additionally, class weighting was applied in the loss function to ensure that the model pays greater attention to the positive class.

2.: Use of Additional Metrics:

To further evaluate the models, the following metrics were also employed: precision, recall, F1-score, and AUC-ROC. All three models demonstrated strong performance according to these metrics:

Gradient boosting using LightGBM: Cross-Validation ROC-AUC (best from Optuna) = 0.9059; Test ROC-AUC = 1.0000.
Multilayer perceptron model implemented in Keras: ROC-AUC (sklearn) = 0.9459.
Hamming network model: ROC-AUC (sklearn) = 0.95.

3.: Classification Threshold Adjustment:

To improve the model’s sensitivity to the positive class, the classification threshold was adjusted from 0.3 to 0.5.

4.: Model Architecture Improvement:

To prevent overfitting, dropout regularization was applied to the models.

After adjusting the models according to the above-described procedure, the following results were obtained, as shown in Figure 10, Figure 11 and Figure 12:

After model adjustment, all three models demonstrated adequate probability estimates, indicating the development of models suitable for predicting cancer risk.

To date, validation has been conducted using three types of neural networks: multilayer perceptron, gradient boosting method, and Hamming network. It should be noted that this study is not conclusive. As a larger cohort of respondents is planned, the amount of input data will increase accordingly. Therefore, future studies will focus on developing models using different types of neural networks depending on the data volume and computational resource constraints. A more detailed comparison of the results obtained from multiple model types will then be performed.

The model calculates a predicted_proba value indicating the numerical probability of susceptibility to lung cancer. This resulting value is classified into four risk categories as follows:

Class 1: 0–0.01, low risk.

Class 2: 0.01–0.1, moderate risk.

Class 3: 0.1–0.54, high risk.

Class 4: 0.55–1, very high risk.

Based on responses from 1283 respondents, a diagram illustrating the percentage distribution of risk classes was obtained, as shown in Figure 13.

Creation of an intelligent system prototype.

At this stage, the architecture and structure of the intelligent system were developed, including data file exchange, an information-logical model, and data, physical, and logical system design. Additionally, a web application layout was proposed for medical organization staff.

The system architecture comprises a survey questionnaire, server, mathematical module, application, expert decision-maker (oncologist), medical indicators verification module, and diagnostic module (Figure 14).

The structure of the system is shown in Figure 15.

The Questionnaire Module contains questions focused on the early diagnosis of lung cancer. The collected data are uploaded to the server and stored in a database. The web system is presently undergoing refinement.

4. Discussion

Three methods were tested to create a mathematical model: gradient boosting, multilayer perceptron, and Hamming network. The Hamming network was chosen to create the mathematical model, as it combines sufficient accuracy with ease of implementation and interpretation, which is especially important for expert systems. In the mathematical module, the obtained data are processed using Hamming’s neural network in Python. As a result, the system generates values representing the risk of lung cancer susceptibility. The input layer of the network receives data comprising respondents’ questions and answers for processing and subsequently forwards it through the network. The hidden layer performs computations by selecting the maximum value among the weighted responses and processes the data before transmitting the results to the next layer. The output layer predicts the probability of susceptibility to lung cancer and classifies the outcomes based on the processed information.

The Application Module includes questions and corresponding responses related to the early detection of lung cancer. The module includes questions based on the lung cancer diagnostic protocol, covering disease symptoms, medical history, heredity, and other relevant factors. Within this application, the patient answers questions and receives a risk assessment for potential lung cancer.

High-risk values are then forwarded to the Expert Decision-Maker (an oncologist), who evaluates the results and directs the patient to appropriate medical tests. If these tests confirm the presence of malignant tumors, the patient is registered at an oncology center. Otherwise, the patient is referred to another specialist.

A patient completed the lung cancer risk assessment questionnaire. The resulting score was 0.936779, which was classified as Class 4—very high risk. The patient was advised to undergo a chest X-ray, which revealed a formation in the upper right lung. A subsequent contrast-enhanced CT scan of the chest was performed. The conclusion: CT signs of peripheral cancer in S1 of the upper lobe of the right lung, lymphadenopathy (mts) of the paratracheal intrathoracic lymph nodes, fibrotic-focal changes in S2 and S6 of the right lung and S1–2 and S6 of the left lung, chronic bronchitis, bullous emphysema, and atherosclerosis of the thoracic aorta.

Surgical treatment was carried out as follows: single-port video-assisted thoracoscopic biopsy of the upper lobe of the right lung, right-sided anterolateral thoracotomy, right upper lobectomy with lymph node dissection, and pleural cavity drainage on the right side.

The proposed method for early detection of lung cancer was initially evaluated on a single patient, demonstrating positive outcomes. Nevertheless, extensive validation involving a larger patient population is required to establish its clinical utility. We anticipate that, upon successful validation, the implementation of this method will facilitate prompt identification and management of high-risk cases. Future research plans include the integration of an ultrasound image recognition module into the system, which is expected to further enhance diagnostic accuracy. This method may also assist in the detection of other pulmonary diseases.

Future research plans include integrating an ultrasound image recognition module into the system, which will enhance diagnostic accuracy and facilitate timely intervention for subsequent patient treatment.

5. Conclusions

The detection and treatment of oncological diseases remain among the most pressing issues today. The mortality rate associated with cancer is extremely high. Therefore, early diagnosis of cancer is one of the most critical tasks in both medicine and science. Early detection allows for prolonged patient life expectancy and reduced mortality.

An algorithm for constructing a prototype intelligent system for the early diagnosis of cancer has been developed. Population-based surveys were employed for knowledge acquisition and analysis in lung cancer diagnostics. A conceptual system model was created, detailing the principal processes and their interrelationships. Domain knowledge was subsequently formalized for neural-network design: the questions and answers of the lung-diagnosis protocol were systematized, and response weights were assigned in accordance with their relative importance. The following methods were used to create the mathematical model: gradient boosting, multilayer perceptron, and Hamming network.

A neural network was then built to estimate the risk of lung cancer susceptibility. Finally, an architecture for a web-based intelligent system enabling early lung cancer diagnosis was proposed, with eventual deployment in clinical practice envisaged.

The proposed method for the early diagnosis of lung cancer was tested on one patient and gave positive results. Implementing such a method will enable a rapid response to high-risk cases. Future research plans include integrating an ultrasound image recognition module into the system, which will further enhance diagnostic effectiveness. This method can also help identify other diseases of the lungs: pulmonary tuberculosis (disseminated), pulmonary sarcoidosis, pneumonia, and pulmonary fibrosis.

The integration of an ultrasound image recognition module into the system facilitates early detection of a range of pulmonary diseases and allows for the determination of disease progression stages.

The following challenges can be identified in the research: if this application were implemented in local clinics, the coverage of the surveyed population would be greater. The integration of ultrasound imaging into the system requires selecting an image recognition method that is optimally suited for this expert system.

Author Contributions

Conceptualization, writing—review and editing, I.K. and D.B.; writing—original draft preparation, D.K. (Dinara Kozhakhmetova); investigation, methodology, R.B. and L.K.; software, D.O. and A.B.; supervision, D.K. (Dinara Kurushbayeva). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan (Grant No. AP23485656).

Data Availability Statement

The original contributions presented in this study are included in the article. The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Correction Statement

This article has been republished with a minor change. The change does not affect the scientific content of the article and further details are available within the backmatter of the website version of this article.

Abbreviations

The following abbreviations are used in this manuscript:

DSS	decision support systems
AR	association rules
NN	neural networks
NLP	natural language processing
DDSS	diagnostic decision support systems
ACO	Ant Colony Optimization

References

Benusiglio, P.R.; Fallet, V.; Sanchis-Borja, M.; Coulet, F.; Cadranel, J. Lungcancer isalsoahereditary disease. Eur. Respir. Rev. 2021, 30, 210045. [Google Scholar] [CrossRef]
Karabatak, M.; Ince, M.C. An expert system for detection of breast cancer based on association rules and neural network. Expert Syst. Appl. 2009, 36, 3465–3469. [Google Scholar] [CrossRef]
Oyelade, O.N.; Obiniyi, A.A.; Junaidu, S.B.; Adewuyi, S.A. Patient symptoms elicitation process for breast cancer medical expert systems: A semantic web and natural language parsing approach. Future Comput. Inform. J. 2018, 3, 72–81. [Google Scholar] [CrossRef]
Goldshtein, M.; Amin, G.; Rod, D. An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation. Computers 2024, 13, 160. [Google Scholar] [CrossRef]
Hassan, L.; Saleh, A.; Singh, V.K.; Puig, D.; Abdel-Nasser, M. Detecting Breast Tumors in Tomosynthesis Images Utilizing Deep Learning-Based Dynamic Ensemble Approach. Computers 2023, 12, 220. [Google Scholar] [CrossRef]
Gavade, A.B.; Nerli, R.; Kanwal, N.; Gavade, P.A.; Pol, S.S.; Rizvi, S.T.H. Automated Diagnosis of Prostate Cancer Using mpMRI Images: A Deep Learning Approach for Clinical Decision Support. Computers 2023, 12, 152. [Google Scholar] [CrossRef]
Kalra, G.; Rajoria, Y.K.; Boadh, R.; Rajendra, P.; Pandey, P.; Khatak, N.; Kumar, A. Study of fuzzy expert systems towards prediction and detection of fraud case in health care insurance. Mater. Today Proc. 2022, 56, 477–480. [Google Scholar] [CrossRef]
Bakhtadze, N.N.; Belenkiy, V.M.; Pyatetsky, V.E.; Sakrutina, E.A.; Nikulina, I.V. Intelligent Screening for Diagnostic and Treatment of Cancer Diseases. Procedia Comput. Sci. 2017, 112, 1238–1245. [Google Scholar] [CrossRef]
Keleş, A.; Keleş, A.; Yavuz, U. Expert system based on neuro-fuzzy rules for diagnosis breast cancer. Expert Syst. Appl. 2011, 38, 5719–5726. [Google Scholar] [CrossRef]
Apasiya, E.; Salifu, A.; Agbedemnab, P. New Approaches to the Prognosis and Diagnosis of Breast Cancer Using Fuzzy Expert Systems. J. Comput. Commun. 2024, 12, 151–169. [Google Scholar] [CrossRef]
Safdari, R.; Arpanahi, H.K.; Langarizadeh, M.; Ghazisaiedi, M.; Dargahi, H.; Zendehdel, K. Design a Fuzzy Rule-based Expert System to Aid Earlier Diagnosis of Gastric Cancer. Acta Inform. Medica 2018, 26, 19–23. [Google Scholar] [CrossRef] [PubMed]
Übeyli, E.D. Implementing automated diagnostic systems for breast cancer detection. Expert Syst. Appl. 2007, 33, 1054–1062. [Google Scholar] [CrossRef]
Houssami, N.; Lee, C.I.; Buist, D.S.M.; Tao, D. Artificial intelligence for breast cancer screening: Opportunity or hype? Breast 2017, 36, 31–33. [Google Scholar] [CrossRef]
Teramoto, A.; Kiriyama, Y.; Michiba, A.; Yazawa, N.; Tsukamoto, T.; Imaizumi, K.; Fujita, H. Automated Generation of Lung Cytological Images from Image Findings Using Text-to-Image Technology. Computers 2024, 13, 303. [Google Scholar] [CrossRef]
Melyani, M.; Frasetyo, T.; Rahadjeng, I.; Mufid, Z.; Rafik, A.; Shaura, R.; Daniel, D.; Emita, I. Design Framework of Expert System Program in Otolaryngology Disease Diagnosis use Extreme Programming (XP) Method (Case Study in THB Bekasi Hospital). J. Technol. Inform. Eng. 2024, 3, 397. [Google Scholar] [CrossRef]
Kunhimangalam, R.; Ovallath, S.; Joseph, P.K. A Novel Fuzzy Expert System for the Identification of Severity of Carpal Tunnel Syndrome. BioMed Res. Int. 2013, 2013, 846780. [Google Scholar] [CrossRef]
Gashteroodkhani, O.; Majidi, M.; Etezadi-Amoli, M. A Fuzzy-based Control Scheme for Recapturing Waste Energy in Water Pressure Reducing Valves. In Proceedings of the IEEE Power and Energy Society General Meeting (PESGM), Portland, OR, USA, 5–10 August 2018; pp. 1–5. [Google Scholar] [CrossRef]
Wójcik, W.; Mezhiievska, I.; Pavlov, S.V.; Lewandowski, T.; Vlasenko, O.V.; Maslovskyi, V.; Volosovych, O.; Kobylianska, I.; Moskovchuk, O.; Ovcharuk, V.; et al. Medical Fuzzy-Expert System for Assessment of the Degree of Anatomical Lesion of Coronary Arteries. Int. J. Environ. Res. Public Health 2023, 20, 979. [Google Scholar] [CrossRef]
Mazhar, T.; Nasir, Q.; Haq, I.; Kamal, M.M.; Ullah, I.; Kim, T.; Mohamed, H.G.; Alwadai, N. A Novel Expert System for the Diagnosis and Treatment of Heart Disease. Electronics 2022, 11, 3989. [Google Scholar] [CrossRef]
Aguilera-Venegasa, G.; Roanes-Lozanob, E.; Rojo-Martínezc, G.; Galán-Garcíaa, J. A proposal of a mixed diagnostic system based on decision trees andprobabilistic experts rules. J. Comput. Appl. Math. 2023, 427, 115130. [Google Scholar] [CrossRef]
Sarabi, S.; Han, Q.; Vries, B.; Georges, L. Methodology for development of an expert system to derive knowledge from existing nature-based solutions experiences. MethodsX 2023, 10, 101978. [Google Scholar] [CrossRef]
Mustafa, E.; Saad, M.; Rizkallah, L. Building an enhanced case-based reasoning and rule-based systems for medical diagnosis. J. Eng. Appl. Sci. 2023, 70, 139. [Google Scholar] [CrossRef]
Tariq, R. Make Your Own Neural Network; CreateSpace Independent Publishing Platform: North Charleston, SC, USA, 2016; p. 222. [Google Scholar]
Chen, M.; Challita, U.; Saad, W.; Yin, C.; Debbah, M. Artificial Neural Networks-Based Machine Learning for Wireless Networks: A Tutorial. IEEE Commun. Surv. Tutor. 2019, 21, 3039–3071. [Google Scholar] [CrossRef]
Molina, E.; Parraga-Alava, J. Redes Neuronales Artificiales en la Resolución de Problemas de Clasificación: Una Revisión Sistemática de la Literatura. ENFOQUE UTE 2024, 15, 1–10. [Google Scholar] [CrossRef]
Vásquez-Coronel, J.A.; Mora, M.; Vilches, K. A Review of multilayer extreme learning machine neural networks. Artif. Intell. Rev. 2023, 56, 13691. [Google Scholar] [CrossRef]
Syahidin, Y.; Ismail, A.; Siraj, F. Application of Artificial Neural Network Algorithms to Heart Disease Prediction Models with Python Programming. J. E-Komtek 2022, 6, 292–302. [Google Scholar] [CrossRef]
Wang, Y.H.; Lin, G.Y. Design of Medical Diagnostic System Based on Artificial Intelligence. Int. J. Appl. Inf. Manag. 2023, 3, 170–176. [Google Scholar] [CrossRef]
Tiwari, A.; Mishra, S.; Kuo, T.R. Current AI technologies in cancer diagnostics and treatment. Mol. Cancer 2025, 24, 159. [Google Scholar] [CrossRef]
Liao, J.; Li, X.; Gan, Y.; Han, S.; Rong, P.; Wang, W.; Li, W.; Zhou, L. Artificial intelligence assists precision medicine in cancer treatment. Front. Oncol. 2023, 12, 998222. [Google Scholar] [CrossRef]
Adilgazyuly, S.; Bulegenov, T.; Mussakhanova, A.; Kenesh Dzhusupov, K.; Altybayeva Zh Medeulova, A.; Tokenova Zh Karymsakova, I.; Uruzbayeva, G.; Kussainova, A. Assessment of physicians’ perspectives on the quality and accessibility of rehabilitation services for lung cancer patients across all levels of healthcare delivery in the Republic of Kazakhstan. Cent. Asian J. Med. Hypotheses Ethics 2025, 6, 142–153. [Google Scholar] [CrossRef]

Figure 1. Functional diagram of the intelligent system “Oncology”.

Figure 3. NN training process.

Figure 4. The process of signal propagation according to one criterion.

Figure 5. The code that calculates the risk probabilities.

Figure 6. Output of intermediate values of a neural network.

Figure 7. Results of the model obtained using the gradient boosting method.

Figure 8. Parameters of the neural network.

Figure 9. Results of the model obtained using a multilayer perceptron.

Figure 10. Implementation of the model using a multilayer perceptron.

Figure 11. Implementation of the model using the gradient boosting method.

Figure 12. Implementation of the model using the Hamming network.

Figure 13. Lung cancer risk status.

Figure 14. System architecture.

Figure 15. System structure.

Table 2. Rules and answers of the expert system.

Question	Answers
Entry 1	1st answer	2nd answer	3rd answer	4th answer	5th answer
Age	30–40	40–50	50–60	60 and above	20–30
	0.2	0.5	0.5	0.5
Gender	Male	Female
	1	0.6
Do you smoke?	Yes	No
	1	0.5
Smoking experience	Up to 10 years	From 10 to 20 years	From 20 to 30 years	From 30 and above	I do not smoke
	0.7	0.9	1	1
How many packs/pieces of cigarettes do you consume per day?	Up to 10 pieces	Up to 1 pack	1 pack	From 1 pack and above
	0.8	1	1	1
How many times a year do you get sick with ARVI?	1 time	2 times	periodically	Was not sick
	0.4	0.5	0.8	0.4
Oncologic history of parents and close relatives	One of the parents	Both parents	Among close relatives on the maternal side	Among close relatives on the paternal side	No one has
	0.9	0.9	0.5	0.5
Have you ever had hemoptysis?	Yes	No
	0.9	0.4
Do you ever have shortness of breath, a feeling of lack of air?	Periodically	Often	During physical exertion
	0.5	1	1
Was there a change in the voice?	Yes	No
	0.8	0.2
Do you ever have weakness that is not related to anything?	Periodically	Often	No
	0.3	0.5	0.2
Have you been coughing?	Not at all	A little	Not so little	Very much
	0.2	0.2	0.3	0.8
Have you had problems swallowing?	Not at all	A little	Not so little	Very much
	0.3	0.3	0.4	1
Have you had chest pain?	Not at all	A little	Not so little	Very much
	0.5	0.6	1	1
Have you had pain in your arm or shoulders?	Not at all	A little	Not so little	Very much
	0.5	0.5	0.7	1
Have you ever had a dry cough?	Not at all	A little	Not so little	Very much
	0.4	0.6	0.8	0.8
Have you experienced unrelated weight loss?	Not at all	A little	Not so little	Very much
	0.5	0.5	1
Have you experienced a loss of appetite?	Not at all	A little	Not so little	Very much
	1	1	1	1
Are you registered with a pulmonologist for other lung diseases?	Yes	No
	1	0.5
Birthplace	Region 1	Region 2	Region 3	Region 4	Other
	0.3	0.5	0.8	0.9	0.3
Place of residence	Region 1	Region 2	Region 3	Region 4	Other
	0.3	0.5	0.8	0.9	0.3

Table 3. Percentage of responses received.

Question	Answers
Entry 1	1st answer	2nd answer	3rd answer	4th answer	5th answer
Age	30–40	40–50	50–60	60 and above	20–30
	12.3	18.3	18.7	11	19.6
Gender	Male	Female
	46.6	53.4
Do you smoke?	Yes	No
	24.7	75.3
Smoking experience	Up to 10 years	From 10 to 20 years	From 20 to 30 years	From 30 and above	I do not smoke
	9.4	5.7	6.6	1.9	76.4
How many packs/pieces of cigarettes do you consume per day?	Up to 10 pieces	Up to 1 pack	1 pack	From 1 pack and above
	25.4	41.3	9.5	23.8
How many times a year do you get sick with ARVI?	1 time	2 times	periodically	Was not sick
	26.5	22.8	19.2	31.5
Oncologic history of parents and close relatives	One of the parents	Both parents	Among close relatives on the maternal side	Among close relatives on the paternal side	No one has
	15.1	4.1	5.5	7.8	67.6
Have you ever had hemoptysis?	Yes	No
	93.6	6.4
Do you ever have shortness of breath, a feeling of lack of air?	Periodically	Often	During physical exertion
	5.9	2.3	31.3	60.7
Was there a change in the voice?	Yes	No
	18.7	81.3
Do you ever have weakness that is not related to anything?	Periodically	Often	No
	26.9	10.5	62.6
Have you been coughing?	Not at all	A little	Not so little	Very much
	31.1	53	8.2	7.8
Have you had problems swallowing?	Not at all	A little	Not so little	Very much
	63	26.9	7.3	2.7
Have you had chest pain?	Not at all	A little	Not so little	Very much
	60.3	31.5	5.5	2.7
Have you had pain in your arm or shoulders?	Not at all	A little	Not so little	Very much
	49.8	38.8	7.3	4.1
Have you ever had a dry cough?	Not at all	A little	Not so little	Very much
	40.2	48.9	6.4	4.6
Have you experienced unrelated weight loss?	Not at all	A little	Not so little	Very much
	72.1	19.6	8.2
Have you experienced a loss of appetite?	Not at all	A little	Not so little	Very much
	62.1	26	6.4	5.5
Are you registered with a pulmonologist for other lung diseases?	Yes	No
	7.8	92.2
Birthplace	Region 1	Region 2	Region 3	Region 4	Other
	0.3	0.5	0.8	0.9	0.3
Place of residence	Region 1	Region 2	Region 3	Region 4	Other
	0.3	0.5	0.8	0.9	0.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Karymsakova, I.; Kozhakhmetova, D.; Bekenova, D.; Ostroukh, D.; Bekbayeva, R.; Kydyralina, L.; Bugubayeva, A.; Kurushbayeva, D. Development of an Early Lung Cancer Diagnosis Method Based on a Neural Network. Computers 2025, 14, 397. https://doi.org/10.3390/computers14090397

AMA Style

Karymsakova I, Kozhakhmetova D, Bekenova D, Ostroukh D, Bekbayeva R, Kydyralina L, Bugubayeva A, Kurushbayeva D. Development of an Early Lung Cancer Diagnosis Method Based on a Neural Network. Computers. 2025; 14(9):397. https://doi.org/10.3390/computers14090397

Chicago/Turabian Style

Karymsakova, Indira, Dinara Kozhakhmetova, Dariga Bekenova, Danila Ostroukh, Roza Bekbayeva, Lazat Kydyralina, Alina Bugubayeva, and Dinara Kurushbayeva. 2025. "Development of an Early Lung Cancer Diagnosis Method Based on a Neural Network" Computers 14, no. 9: 397. https://doi.org/10.3390/computers14090397

APA Style

Karymsakova, I., Kozhakhmetova, D., Bekenova, D., Ostroukh, D., Bekbayeva, R., Kydyralina, L., Bugubayeva, A., & Kurushbayeva, D. (2025). Development of an Early Lung Cancer Diagnosis Method Based on a Neural Network. Computers, 14(9), 397. https://doi.org/10.3390/computers14090397

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of an Early Lung Cancer Diagnosis Method Based on a Neural Network

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Correction Statement

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI