Diagnosing Hyperlipidemia Using Association Rules

Data mining methodologies have been developed for exploration and analysis of large quantities of data to discover meaningful patterns and rules. This paper presents a new approach, that employs data mining, to find association rules an effective method for discovering Hyperlipidemia. The propose system has been projected from the biochemistry blood parameters which will be very helpful for and will make everything easier for the physicians in the diagnosis of Hyperlipidemia. The basic characteristic of the lipide parameters that is Total cholesterol, LDL, Triglyceride, HDL and VLDL parameters are used in the process of entering the system and finally Hyperlipidemia (T) and Hyperlipidemia (F) results have been evaluated at the end of this process. Data of 492 patients are evaluated in the projected system. The results of the decision support system have completely matched with those of the physicians decisions.


INTRODUCTION
Hyperlipidemia is a term doctors use to describe several conditions in which high concentrations of lipids (fats) exist in the bloodstream and it occurs because of the abnormal increase of the fats in blood [1].Lipid is the scientific term for fats in the blood and like vitamins and minerals, certain fats are useful to the body as an energy source and to help build cells and hormones.The measurement of plasma lipid levels makes it possible to determine the hyperlipidemia patients and so necessary precautions can be taken easily.Most commonly, changes of the serum levels is taken as the base in hyperlipidemia diagnosis.The parameters taken into consideration in hyperlipidemia diagnosis are Total cholesterol, LDL, Triglyceride, HDL and VLDL [2][3][4][5].These enzymes form great data stacks when used in each person's diagnosis.Also, as these kinds of researches in the biology and medical science world increase, this will cause the increase in different kinds of data related with the real-life cases [6].It became quite difficult for people to be able to comprehend and change hundreds of characteristics and thousands of images into meaningful knowledge in modern medical science [7,8], DNA [9] and protein synthesis [10], biological state measurements [11], graphs [12] and enzymes [6] are some of the main types of data which are still being used.As the data increased it became quite a boring and difficult work for the medical science experts to prepare a guide that will be used in the processes of reading, simplifying, classifying the findings and making a decision at the end.Also a lot of findings hidden in these data have remained as the stack in this way.It is necessary to get these data automatic in order to obtain useful information.Getting information from the databases or the data mining is the kind of method mostly used for solving these kinds of problems [13].Increasingly, researchers in medical informatics are using natural language processing, information extraction, and data mining methods to utilize existing clinical data for secondary purposes.Researchers have employed such techniques in decision support efforts, to encode medical reports, to provide structured data for data mining and clinical research, in order entry and in information retrieval [14].In In this context data mining has been applied so successsfully in many fields of medical science [13,15].For instance, there are a lot of work that can be seen in the literature related with discovering certain rules in the diagnosis and treatment of acute ailments; applying the determined rules and making complementary data mining for defining the enlargement states of the working procedures of the refined, shared, organized and produced information in the system [16].In these works, data stores are known as the clinical stores in medical field [6].Data warehouses in medicine field are called clinical warehouse too.Then, those clinical stores containing biological, clinical and administrating data unit the patients information.Thus, the possibility of the usage of the systems related with the patient is improved [17].
Among the data-mining technologies, finding association rules in transaction databases is most commonly seen.Association rule induction is a powerful data mining method for finding temporal trends in large datasets.The goal of data mining is to automate the process of finding interesting temporal patterns.The output of a datamining method should be a "summary" of the data sets.Such goal is difficult to achieve due to the vagueness of the term "interesting".The solution is to define various types of trends (patterns) and to look for only those defined trends in the data sets.One such type of trend is the association rule [18].Association rules identify the set of items that are most often purchased with another set of items.For example, an association rule may state that "95% of customers who bought items A and B also bought C: and D." Association rules may be used for catalog design [19], biomedical [20], store layout [21], product placement, target marketing [22], etc [23].Association rules in biomedical are used the microarray gene expression in applications [24], the ant colony system in applications [25], biomedical data classification [26], visual text mining [27], extracted data in large databases [28].
In this study, a decision support system whill will be helpful for the diagnosis of hyperlipidemia is improved to be used for the aim of classifying the association rule construction from the data mining techniques.Lipide parameters from biochemistry data were used for application and 492 patient' data have been evaluated successfully.

THEORETICAL VIEW
In this part, the informations related with the easier conception of the decision support system's construction in hyperlipidemia diagnosis is introduced in the shape of sub-classifications.

Lipide Parameters
Hyperlipidemia is related with concentration increase of plasma lipoproteins.More than one lipoprotein class can come together due to the increased construction or release in circulation or because of the decreased clerens or of being far away from the circulation.These changes in the metabolic events are often related with the changes in the apoloproteins, receptors, enzymes or cofactors related with the lipoprotein metabolism.Such changes rooted from the genetic changes are classified as primary disorders of the lipid metabolisms.In Hyperlipidemia Total Cholesterol, LDL, HDL, Triglyceride ve VLDL values are taken as the base [2][3][4][5]29].These parameters are shortly explained in the following part: Total Cholesterol: Cholesterol is a steroid alcohol (sterol) found in animal fats and oils.It is widely distributed throughout the body, especially in the blood, brain, liver, kidneys, and nerve fiber myelin sheaths, and it is an essential component of cell membrane development and production of bile acids, adrenal steroids, and sex hormones.Cholesterol testing evaluates the risk of arthrosclerosis, myocardial occlusion, and coronary arterial occlusion.Cholesterol relates to coronary heart disease (CHD) and is an important screening test for risk factors.It is part of the lipid profiles.Elevated cholesterol levels are a major component in the hereditary hyperlipoproteinemias.Triglycerides: Triglycerides account for >90% of dietary intake and comprise 95% of fat stored in tissues.Because they are insoluble in water, they are the main plasma glycerol ester.Normally stored in adipose tissue as glycerol, fatty acids, and monoglycerides, the liver reconverts these to triglycerides are in VLDL, and 15% are in LDL.This test evaluates suspected atherosclerosis and measures the body's ability to metabolize fat.Elevated triglycerides, together with elevated cholesterol, are atherosclerotic disease risk factors.Because cholesterol and triglycerides can vary indepedent of each other, measurement of both values is more meaningful.HDL: HDL-C is a class of lipoproteins produced by the liver and intestines.HDL is comprised of phospholipids and 1 or 2 apolipoproteins.It plays a role in the metabolism of the other lipoproteins and in cholesterol transport from peripheral tissues to the liver.LDL and HDL may combine to maintain cellular cholesterol balance through the mechanism of LDL moving cholesterol into the arteries and HDL removing it the from the arteries.Decreased HDL levels are atherogenic, whereas elevated HDL levels protect against arthrosclerosis by removing cholesterol from vessel walls and transporting it to the liver where it is removed from the body.LDL-VLDL: Most serum cholesterol is present in the LDL.LDLs are the cholesterolrich remnants of the VLDL lipid transport vehicle.Because LDL has a longer half-life (3-4 days) than its precursor VLDL, LDL is more prevalent in the blood.It is mainly catabolized in the liver and possibly in nonhepatic cells as well.The VLDLs are major carriers of triglycerides.Degradation of VLDL is a major source of LDL.Circulating fatty acids form triglycerides in the liver, and these are packaged with apoprotein and cholesterol to be exported into the blood as VLDLs.Therefore, LDH is the test of choice because of its longer half-life and the fact that VLDLs are extremely hard to measure.This test is specifically done to determine CHD risk.LDLs are closely associated with increased incidence of atherosclerosis and CHD [30].

Data Mining
Data mining has recently emerged as a growing field of multidisciplinary research.It combines disciplines such as databases, machine learning, artificial intelligence, statistics, automated scientific discovery, data visualization, decision science, and high performance computing.It is our contention that data mining techniques can be used to provide the knowledge required to assist users in locating relevant information on the web, through automating the analysis of the current users navigation and combining this with datamined knowledge of multiple users' buying behaviour [31].For instance, some standart softweres used in a medical diagnosis system were problematic because of the unsystematic data, the abscence of control, the usage of too many various kinds, being unable to make a consistent and systematic analysis on databases, comparing the examples and determining the critical differences.Data mining and techniques are also accepted in the medical science world because they make everything easier for the experts and provides a necessary and important help for the practitioners [32].In fact data mining is evaluated as a part of the information discovery process both in medical science and in the other fields [33].Data mining stages are presented in Figure 1  Data mining has an interaction with the user and the database.Interesting data patterns are showed to the user [35].And also, they can be saved in the database if wanted.According to this, data mining goes on till the hidden data patterns are found.Firstly the necessary data are taken, classified and then processed while obtaining meaningful information from the databases.It is an important problem to be able to save the state of a patient and to predict the characteristic of the data such as all the laboratory test results, findings and signals of all the patients.These points are also problems in machine learning and data mining which works in many fields such as classification and problem detection [8].Classification in data mining is used for an automatic definition of the interesting object in great data and for the information discovery in the applications including the classification of the trend in the market.There are a lot of methods used for classifying these data.In data mining the decision trees, association rules, genetic algorithm etc. among these techniques [36].

Association Rules
A number of data mining algorithms have been introduced to the community that perform summarization of the data, classification of data with respect to a target attribute, deviation detection, and other forms of data characterization and interpretation.One popular summarization and pattern extraction algorithm is the association rule algorithm, which identifies correlations between items in transactional databases.Given a set of transactions, each described by an unordered set of items, an association rule X _ Y may be discovered in the data, where X and Y are conjunctions of items.The intuitive meaning of such a rule is that transactions in the database which contain the items in X, tend to also contain the items in Y.An example of such a rule might be that many observed customers who purchase tires and auto accessories also buy some automotive services.In this case, X = {tires, auto accessories} and Y = {automotive services}.Two numbers are associated with each rule, that indicate the support and confidence of the rule.The support of the rule X _ Y represents the percentage of transactions from the original database that contain both X and Y.The confidence of rule X _ Y represents the percentage of transactions containing items in X that also contain items in Y. Applications of association rule mining include cross marketing, attached mailing, catalog design and customer segmentation.An association rule discovery algorithm searches the space of all possible patterns for rules that meet the user-specified support and confidence thresholds.The problem of discovering association rules can be divided into two steps: 1. Find all itemsets (sets of items appearing together in a transaction) whose support is greater than the specified threshold.Itemsets with minimum support are called frequent itemsets.
2. Generate association rules from the frequent itemsets.To do this, consider all partitionings of the itemset into rule left-hand and right-hand sides.Confidence of a candidate rule X_ Y is calculated as support(XY) / support(X).All rules that meet the confidence threshold are reported as discoveries of the algorithm.
In addition to the antecedent (the "if" part) and the consequent (the "then" part), an association rule has two numbers that express the degree of uncertainty about the rule.In association analysis the antecedent and consequent are sets of items (called itemsets) that are disjoint (do not have any items in common).The support is simply the number of transactions that include all items in the antecedent and consequent parts of the rule.(The support is sometimes expressed as a percentage of the total number of records in the database.)Confidence is the ratio of the number of transactions that include all items in the consequent as well as the antecedent (namely, the support) to the number of transactions that include all items in the antecedent [37].

DEVELOPED METHOD
The most common and main tests that help the physicians decide on a diagnosis are biochemistry tests which are mostly successful in diagnosing.An hyperlipidemia diagnosis can be obtained by controlling the lipide parameters which were taken into consideration for the hyperlipidemia diagnosis in biochemestry test results.The construction of the improved decision support system is showed in Figure 2. The functions of system are constituted of these following steps: rational decision of basic trends.Thus, there is a growing pressure for intelligent data analysis techniques to facilitate the creation of knowledge to support clinicians in making decisions

DISCUSSION AND CONCLUSION
The increase in the knowledge obtained under the light of biomedical searches, the processes of reading, simplifying, classifying these findings and making a decision are being so complex day by day.Understanding the major risk factors of a diesease is an important factor for clinicians in prevention strategy.The attending physician plays an important role providing information to reduce those risk factors.It is up to the physician whether to warn patients at risk about the major causes of a particular disease and the degree of risk that they are facing.These processes are automized after the data mining took a part in this field.It has considerably helped the medical experts and made it easier to prepare a guide.Also, a lot of findings hidden between the data mining and the data stacks obtained in these kinds of fields have turned to be useful information in this way.
In this paper we dealt with association rule.We restristed ourselves to the "classic" association rule problem, that is the generation of all association rules that medical data with respect to minimal thresholds for support and confidence.The advised system is based on association rules on which so many clever diagnosis systems are constituted.There, the association rules, which take an important part in data mining for the feature extracting and classification stage, have been used.For this disease, decision tree technique, a kind of data mining techniques, has been applied [38].On the other hand, since association rules find results with percentage rates, it's better than decision tree technique for those diseases which can't be seperated by definite rules.
The developed decision support system will considerably be helpful for the expert physicians and practitioners for the interpretation of the illnesses.This system construction can also be used in the diagnosis of every illnesses in which the criteria having certain parameters can be controlled.

Figure 2 .
Figure 2. The algorithm of decision support system.

Table 3 .
Performance of the decision support system