Introduction of Materials Genome Technology and Its Applications in the Field of Biomedical Materials

Traditional research and development (R&D) on biomedical materials depends heavily on the trial and error process, thereby leading to huge economic and time burden. Most recently, materials genome technology (MGT) has been recognized as an effective approach to addressing this problem. In this paper, the basic concepts involved in the MGT are introduced, and the applications of MGT in the R&D of metallic, inorganic non-metallic, polymeric, and composite biomedical materials are summarized; in view of the existing limitations of MGT for R&D of biomedical materials, potential strategies are proposed on the establishment and management of material databases, the upgrading of high-throughput experimental technology, the construction of data mining prediction platforms, and the training of relevant materials talents. In the end, future trend of MGT for R&D of biomedical materials is proposed.


Introduction
As a dynamic and fast-growing branch of materials science, biomedical materials are used to diagnose and treat physiological diseases, repair or replace biological tissues or organs to enhance or restore their functions. Biomedical materials have been extensively applied in clinical practice in forms of various medical devices including sutures, scaffolds, dentures, artificial bones, and even artificial hearts, etc. The research and development (R&D) of innovative biomedical materials involves the breakthrough in a wide range of knowledge including materials science, engineering, medicine, and life science, etc.
The utilization of biomedical materials dates back to 3500 B.C. Since the first use of horsehair as suture, the fast-growing developments in life science and materials science have led to extensive applications of biomedical materials. However, the tissues or organs are extremely complex, the ability of biomedical materials to precisely regulate the growth, regeneration, and repair is far from satisfaction. The growing curiosity on the structureproperty relationship of biomedical materials as well as the interactions between biomedical materials and cells/tissues/organs may lead to the investigation of these topics to foster the R&D of new biomedical materials. It is often achieved by development of novel chemical structure, combination of materials, or fusion of materials with living cells, etc.
Traditionally, most biomedical materials are developed via the trial-and-error method. In detail, different material components and synthetic techniques could be explored based on existing theories or experiences to achieve desired material properties. During this process, repeated experiments are usually required to validate the design or formulation, which would inevitably lead to the sacrifice and waste of material/time. Such trial-anderror approach may be effective at small scale, but would greatly hinder the innovation of materials and the development of related industries in the case of complex tasks or large scale. In terms of economic and time costs, there is an urgent need for a new method to novation of materials and the development of related industries in the case of complex tasks or large scale. In terms of economic and time costs, there is an urgent need for a new method to overcome the drawbacks of traditional trial-and-error method in the R&D of biomedical materials [1][2][3][4].
Materials genome technology (MGT) is a superior tool of materials research to the traditional trial-and-error method. MGT utilizes high-throughput experimental technique while adding data management and computational tools. The data are analyzed using computational tools to explore potential links between material parameters and material properties. In this way, the ideal material can be discovered more efficiently. This can improve experimental efficiency, reduce costs, and perhaps reduce experimental errors.
This review presents the recent progress of the application of MGT in the R&D of biomedical materials to halve their development cycle and cost. Core concepts of biomedical materials and MGT are briefly introduced, followed by the application of MGT in the R&D of metallic, inorganic non-metallic, polymeric, and composite biomedical materials. Finally, future trends in the MGT empowered R&D of innovative biomedical materials are proposed.

Materials Genome
As the first program on materials genomic study, the Materials Genome Initiative (MGI) was launched in 2011 and includes three major aspects covering database establishment, experimental techniques, and means of material calculation (Figure 1b) [5][6][7][8].
Through "simulation and prediction followed by experimental verification" approach, the MGI aims at establishing the link between composition, process, microstructure, and performance ( Figure 1c) to facilitate the R&D of materials [5]. Such link could be subsequently used in the design and optimization of new materials to meet the demand for material performance. As a consequence, the MGT may also halve the development cycle of new materials as well as the development cost in industry, which is almost impossible through the traditional trial and error approach (Figure 1a).  [9] as an example (a). Three elements of MGI (b) and the connection between material composition, synthesis technology, structure, and material performance (c).
Database: It is hard to find valid and high-quality data from the massive amount of data when data are the basis for materials research, and the creation of databases becomes  [9] as an example (a). Three elements of MGI (b) and the connection between material composition, synthesis technology, structure, and material performance (c).
Material foundation, non-ferrous materials and special alloys, ferrous materials, composite materials, organic polymer materials, inorganic non-metallic materials, information materials, energy materials, biomedical materials, natural materials and products, building materials and road traffic materials High-throughput experimental tools: As the essential part of the MGT, high-throughput synthesis and characterization of materials refers to the preparation and characterization of samples with different structures or components in parallel and in large quantities in a relatively shorter time [15]. Usually, only very limited and inefficient data could be acquired through individual experiments with inevitable human errors. High-throughput experiments can help improve the accuracy and reproducibility of the data at higher efficiency, and consequently accelerate the establishment of material databases, testing the accuracy of theoretical models, and screening of new materials. Nowadays, highthroughput/combinatorial methods have been successfully applied to the development and production of metallic, ceramic, inorganic, and polymeric materials [16][17][18][19][20][21].
Materials calculation methods: Materials calculation methods usually refers to various types of materials computing software and algorithmic models. Facing the complex potential connections between material components, structures, processes, and properties, which usually cannot be discovered directly by researchers, the help of some material computational means is urgently required. Although high-throughput first-principles calculations and density functional theory have achieved appreciable success in predicting and optimizing new materials, the huge amount of calculation makes it impossible to obtain ideal results quickly when the structure is more complex or the material search space is larger [21]. Because of their ability of establishing accurate material performance prediction models from existing theoretical and empirical data, machine learning techniques have received great attention in predicting material properties [18,22,23], optimizing material composition [24,25], and discovering new materials [26].

Algorithmic Models in Material Genome Technologies
Because of the large volume of computational tasks and the automatic processing of computational results, it is extremely challenging to make full use of the huge and complex data and to reveal potential connections between relevant parameters and performance of materials. Computational tools, including various software and algorithm models, could be harnessed to solve the challenges in the processing of large amounts of data.
Regression, classification, or clustering tasks are usually employed in the processing of biomaterial data. Regression tasks mainly deal with continuous or real number, while classification or clustering tasks result in discrete outcome by inputting data with or without label. Both regression and classification aim at discovering the relationship between data points through a predictive model, and achieve reliable predicted outcome; for these purposes, some algorithms, such as support vector machines (SVM), random forests, and Bayesian neural networks, etc., can be used for both regression and classification tasks in case of biomedical materials data ( Table 2). RBF is used as the activation function of the hidden layer neurons, and the output layer is a linear combination of the output of the hidden layer neurons The structure is simple, the training is concise, the learning convergence speed is fast, it can approximate any nonlinear function, and overcome the local minimum problem. Approximation ability, classification ability and learning speed are better than BPNN. But need more neurons [33,[40][41][42][43][44][45][46][47]

Classification
Probabilistic Neural Network (PNN) It is a branch of RBFNN, which combines density function estimation and Bayesian decision theory on the basis of RBF network The fault tolerance is good, the classification result is not sensitive to the choice of radial basis function, the number of neurons in each layer is fixed, and there is no need to retrain when the sample changes. But every sample has to be calculated and stored. [48-53]

Regression & Classification
Support Vector Machine & Support Vector Regression (SVM&SVR) Establish the maximum separation line or hyperplane for sample classification, and find a balance between model learning accuracy and learning ability to obtain the best promotion ability.
It can solve nonlinear problems with high precision and good generalization ability; it is difficult to implement large-scale training samples and it is difficult to solve multi-classification problems. [20,33,[54][55][56][57][58][59][60][61][62][63][64][65]  For unbalanced data sets, errors can be balanced, high-dimensional data can be processed, and with good accuracy; but in the face of small samples, good classification results may not be obtained, and it is easy to overrun in some noisy regression problems combine. [20,66-74]

3.6
Clustering K-means Clustering Use distance as the similarity evaluation index to cluster the samples with high similarity into a cluster The algorithm is simple and fast, with strong interpretability, good clustering effect, difficult to determine parameters, sensitive to noise and abnormal points, poor clustering effect on severely unbalanced data, and it takes a long time to process large sample sizes. [75-81]

Convolutional Neural Network (CNN)
Multi-layer representation of the target using convolution and multi-layer network structure. It is expected that the abstract information contained in the data can be expressed through the multi-layer high-level features to obtain better feature robustness Local connection and weight-sharing greatly reduce the number of parameters, and there is no pressure on high-dimensional data processing, reducing the risk of overfitting, no need to manually select features, no complicated preprocessing process when processing image data, but when adjusting parameters, a large sample size is required, GPU is best for training, and the physical meaning is not clear [82][83][84][85][86][87][88]

Back Propagation Neural Network
Back propagation neural network (BPNN) [89] is characterized by two processes: forward propagation and error back propagation. In forward propagation, data are processed layer by layer, and the error was evaluated between the result of the output layer and that of the actual sample. In back propagation, the calculated error is back propagated, and then the weight and threshold of the network were continuously adjusted to minimize the error sum of squares of the network. Because of its good nonlinear mapping ability and strong self-learning and self-adaptive ability, BPNN has become one of the most widely used neural networks at present. However, in the case of a complex target task, the network converges slowly and easily to a local minimum, and the global optimal result cannot be obtained.

Radial Basis Function Neural Network
Radial basis function neural network (RBFNN) is a single hidden layer, function approximation-based feedforward neural network [90]. After selecting the radial basis function such as Gaussian function and multiquadric function, the output is obtained according to the distance between the sample and the center point. Compared with BPNN, RBFNN is structurally simpler, exhibits higher convergence speed, and rarely produces local optimum. In addition, RBFNN not only has powerful nonlinear approximation capability which transforms linearly indistinguishable problems into linearly divisible ones, but also can be applied to data classification problems. However, it is difficult to determine the center point of the hidden layer, the width of the path base, as well as and the number of nodes, which may have a substantial impact on the output.

Probabilistic Neural Network
Probabilistic neural networks (PNN) is a simple structured neural network based on Bayesian decision theory, which is often used for pattern classification; PNN can also be regarded as a branch of RBFNN, it combines density function estimation and Bayesian decision theory [91]. The unique advantage of PNN is that it is not necessary to retrain the network when adding or reducing samples, and the classification results are insensitive to the choice of radial basis functions. In addition, the number of neurons in each layer of the network is relatively fixed, which is easy to implement in hardware. The drawback of PNN lies in the high complexity in computation and space of the network; in addition, individual computation and storage are required for each sample.

Support Vector Machine and Support Vector Regression
SVM is a binary classification model, which aims to find a line or plane with the largest geometric interval to classify the samples [92]. In case of nonlinear problems, the vector can be mapped to a higher dimensional space to find the best plane to classify sample. Support vector regression (SVR) is an application of SVM to regression problems [93]. It finds a plane to fit all the sample data so that the total variance of the sample distances from the plane is minimized, instead of separating the sample points. Both of them are powerful models that harness relatively fewer samples to find a balance between learning accuracy and learning ability, and obtain the best generalization ability. However, there are still some difficulties when dealing with multi-classification problems and large-scale samples, and their results are sensitive to the choice of kernel function.

Random Forest
Random forest (RF) was proposed as a machine learning algorithm for classification and regression [94]. The "forest" means that the RF algorithm is a combination of multiple decision trees, and "random" means that when training each tree, a subset is randomly selected for training, and the remaining is used for error evaluation. RF has appreciable accuracy even when applied to large data sample sets and missing data sets. However, when dealing with small data sets or low-dimensional data sets, RF may not produce good classification results. In addition, overfitting is likely to occur when processing some noisy data.

K-Means Clustering
For some unlabeled sample data, only the similarity between the data can be used to group the data. For example, the unlabeled sample data with high similarity form a cluster, which is called clustering [95]. K-means clustering is one of the most classical clustering algorithms. Given the number of clusters, the initial cluster centroids are randomly set up. The spatial distance is used as the evaluation index of similarity, so that the sample points within a cluster are as close as possible, and the sample points of different clusters are as far away as possible. Such an algorithm is simple, explanatory, and effective, and therefore is widely used. It was also found that the number of clusters has a great influence on the clustering results, as well as there is no reference and a lot of trials and experience are needed. The algorithm is sensitive to outliers and noise, and it is also difficult to obtain good clustering results for severely imbalanced samples.

Convolutional Neural Network
Inspired by Hubel's research on cat visual cortex cells, the convolutional neural network (CNN) was proposed [96]. The combination of convolutional and pooling layers of CNN can automatically perform feature extraction, and use local connectivity and weight-sharing to greatly reduce the number of the model parameters, reduce the risk of overfitting, and also simplify the complexity of the model. This is what distinguishes CNN from other neural networks. This advantage is even more evident when processing speech, image, or video data.
Although CNN does not require manual feature selection, which reduces human intervention, the question on what features is automatically extracted remains unanswered for the time being. In addition, CNN requires a large number of samples for model tuning, and usually requires GPU for model training. The puzzling functions and uncertain working principle of CNN have been questioned, but the performance of convolutional neural networks has been greatly improved over other methods.
The above-mentioned algorithm models have been widely used in processing biomedical data, and improvements have been extensively proposed.

Metallic Materials
Metals, especially alloys, have excellent mechanical properties, fatigue resistance, processability, and appreciable biocompatibility, and are widely used in the fabrication of implantable medical devices for the treatment of orthopedics, dentistry, and cardiovascular diseases. However, metallic materials are susceptible to the physiological environment and may lead to a series of problems such as degradation/corrosion, toxicity, and fatigue failure [97][98][99][100].
Amorphous alloys possess good strength, hardness, wear resistance, corrosion, and soft magnetic properties, which are not available in traditional alloys and therefore have broad biomedical applications [101,102]. Although empirical guidelines have been constructive in the design of amorphous alloys, such approaches are characteristically timeconsuming and reckless to some extent [103]. The intervention of artificial intelligence can not only improve the R&D efficiency, but also explore the unknown parameter space. With the aid of MG technology, the relationship between the resistivity and glass-forming ability (GFA) of amorphous Ir-Ni-Ta-(B) alloys was explored via high-thought characterization of resistance and components. A set of development methods of high-thought amorphous materials was built including the preparation of composite films, rapid characterization of composition, structure, and glass-forming ability, and a class of high-temperature amorphous materials was successfully designed [104]. Through SVM classification, the prediction of the GFA of binary alloys with random composition was achieved, and the prediction efficiency was also improved via using a larger database and changing the input descriptors ( Figure 2) [54]. In addition, to better understand and predict the GFA of new alloys, machine learning clustering technique was harnessed to learn the structural properties of metallic glasses [105]. and usually requires GPU for model training. The puzzling functions and uncertain working principle of CNN have been questioned, but the performance of convolutional neural networks has been greatly improved over other methods. The above-mentioned algorithm models have been widely used in processing biomedical data, and improvements have been extensively proposed.

Metallic Materials
Metals, especially alloys, have excellent mechanical properties, fatigue resistance, processability, and appreciable biocompatibility, and are widely used in the fabrication of implantable medical devices for the treatment of orthopedics, dentistry, and cardiovascular diseases. However, metallic materials are susceptible to the physiological environment and may lead to a series of problems such as degradation/corrosion, toxicity, and fatigue failure [97][98][99][100].
Amorphous alloys possess good strength, hardness, wear resistance, corrosion, and soft magnetic properties, which are not available in traditional alloys and therefore have broad biomedical applications [101,102]. Although empirical guidelines have been constructive in the design of amorphous alloys, such approaches are characteristically timeconsuming and reckless to some extent [103]. The intervention of artificial intelligence can not only improve the R&D efficiency, but also explore the unknown parameter space. With the aid of MG technology, the relationship between the resistivity and glass-forming ability (GFA) of amorphous Ir-Ni-Ta-(B) alloys was explored via high-thought characterization of resistance and components. A set of development methods of high-thought amorphous materials was built including the preparation of composite films, rapid characterization of composition, structure, and glass-forming ability, and a class of high-temperature amorphous materials was successfully designed [104]. Through SVM classification, the prediction of the GFA of binary alloys with random composition was achieved, and the prediction efficiency was also improved via using a larger database and changing the input descriptors ( Figure 2) [54]. In addition, to better understand and predict the GFA of new alloys, machine learning clustering technique was harnessed to learn the structural properties of metallic glasses [105]. Pure titanium metal is known for not only its superior mechanical performances but also the reactivity under certain biochemical environment, thus it is not suitable for the fabrication of implantable medical devices; titanium alloys with improved resistance to corrosion and better biocompatibility could be a substitution to the pure titanium metal in the manufacturing of clinical implant devices. The participation of artificial intelligence Pure titanium metal is known for not only its superior mechanical performances but also the reactivity under certain biochemical environment, thus it is not suitable for the fabrication of implantable medical devices; titanium alloys with improved resistance to corrosion and better biocompatibility could be a substitution to the pure titanium metal in the manufacturing of clinical implant devices. The participation of artificial intelligence is helpful to the innovation of titanium alloy [106][107][108]. Banerjee collected the indentation hardness and elastic modulus of titanium alloy samples to build a database of composition, microstructure, and mechanical properties, which became the basis of a fuzzy logic-based neural network building and predicting; the predicted results were subsequently validated through experiments [109]. Wan constructed BPNN to predict the high-temperature rheological stresses of Ti-2.7Cu alloy and provide theoretical support for practical hot forming of the alloy [110]. Based on the PNN and databases from experimental research on titanium alloys, Kulyk created a software to define the optimal microstructure and properties of titanium alloy products [48]. Tkachenko described a method for identifying material categories using second-order Kolmogorov-Gabor polynomials and RF algorithms; this method was then used to determine the basic properties and identify the category of the alloy of a material based on parameters such as microstructure and elemental composition of a titanium alloy powder. This approach can be used to optimize the development of powdered materials [66]. Izonin also used Ito decomposition and logistic regression to classify alloys in order to select materials with appropriate properties to design biocompatible medical products. [111]. Izonin combined Wiener polynomials and SVM to classify medical titanium alloy implant materials, this combined method exhibited higher accuracy as well as shorter training time [55]. The above studies have clearly demonstrated the efficacy of different algorithms in guiding the optimization of microstructure and processing routes of titanium alloys.
High entropy alloy (HEA), as a rapidly developing new metallic material, is a class of alloy with high strength, wear, and corrosion resistance, and may have wide clinical application [112]. However, the composition of high-entropy alloys is complex; and there is no linear relationship between performance and entropy value, so it is impossible to design multi-component materials with excellent performance merely by entropy of mixing. In addition, the number of constituent elements of the alloy gradually increases, and the cost of the alloy also rises accordingly. Through the combination of high-throughput experimental techniques with artificial intelligence algorithms, the experimental efficiency could be enhanced and experimental compositions could also be explored to a wider space. Moorehead explored the composition space of the HEA through high-throughput synthesis and characterization combined with modeling techniques, and the development of alloys and assess of the relative stability thereof could be significantly accelerated [113]. Coury utilized high-throughput nano-indentation techniques to effectively predict the yield strength and hardness trends of HEA, the number of experiments required to find compositions in a large composition space was greatly reduced, thus promoting the development of multicomponent alloys [19]. Liu prepared 138 alloy samples through full-flow high-throughput preparation of alloys, and then constructed predictive models using different machine learning algorithms. The newly proposed method was at least 20 times faster than that of a permutation-based search in the full-component space (Figure 3) [20].
It is important to distinguish the phases of high-entropy alloys for material design. Ouyang optimized feature variables and used SVM model for phase distinction. It was found that the difference in elastic energy and atomic size had a significant effect on the formation of different phases. Importantly, machine learning (especially the SVM combined KPCA) showed its powerful role in the prediction of alloy phases [65].

Polymeric Materials
The demand of polymeric biomaterials, either natural or synthetic, has become increasingly urgent in recent years. Most polymeric materials including polyethylene, poly(methyl methacrylate), silicone rubber, cellulose, gelatin, and chitosan are known for their good biocompatibility. However, most of these polymers suffer from insufficient mechanical strength and mismatch between material degradation and tissue regeneration [3,114,115].
Chitosan nanoparticles have been widely used as drug delivery matrix due to their unique biocompatibility, degradability, and antimicrobial activity. Amani analyzed the effect of four parametric variables in preparation of chitosan nanoparticles on the nanoparticle size, drug loading, and cytotoxicity using artificial neural networks, and ranked the influential degree of the variables on the dependent variable and optimized the nanoparticles [116]. In another work, Amani analyzed the effects of time and amplitude of ultrasonication on the size of nanoparticle during the preparation [23].
Alexander and co-workers investigated the preparation drug release matrix through 3D printing of 253 ink formulations in a high-throughput manner, and the functional properties including the release of paroxetine, cytotoxicity, printability as well as mechanical properties are screened [117].
After implantation of biomedical materials, the adsorption of protein and the attachment of cells are the main determinants of the applicability of medical implant materials, especially those applications involving tissue regeneration. Both the surface chemistry and physics of polymeric implant could affect the protein adsorption; in this term, the surface chemistry and physics of polymers are complex and a fingerprint profile could be developed as a characteristic representation of polymers in order to enable a reasonable discovery of new materials for specific applications. Machine learning models can be then trained to quickly predict the properties of new polymer formulations and provide uncertainty in the predictions [118]. In addition to study the pathogen infection of the implant, bacterial cell adhesion on the surface of implant was investigated; machine learning was utilized to quantitatively predict and screen the polymer surface adhesion, and the screened polymers can be candidates for implants or indwelling medical devices [119].
Poly(lactic acid)/ploylactide (PLA)-based composites are ideal materials in bone repair, but PLA suffers from low cell adhesion on the material surface, poor mechanical properties, and high cost, which greatly limit its clinical applications [120]. Rojek investigated the customized fabrication of a PLA hand exoskeleton using 3D printing technique, and the artificial neural network (ANN) optimization method supported by GA was used

Polymeric Materials
The demand of polymeric biomaterials, either natural or synthetic, has become increasingly urgent in recent years. Most polymeric materials including polyethylene, poly(methyl methacrylate), silicone rubber, cellulose, gelatin, and chitosan are known for their good biocompatibility. However, most of these polymers suffer from insufficient mechanical strength and mismatch between material degradation and tissue regeneration [3,114,115].
Chitosan nanoparticles have been widely used as drug delivery matrix due to their unique biocompatibility, degradability, and antimicrobial activity. Amani analyzed the effect of four parametric variables in preparation of chitosan nanoparticles on the nanoparticle size, drug loading, and cytotoxicity using artificial neural networks, and ranked the influential degree of the variables on the dependent variable and optimized the nanoparticles [116]. In another work, Amani analyzed the effects of time and amplitude of ultrasonication on the size of nanoparticle during the preparation [23].
Alexander and co-workers investigated the preparation drug release matrix through 3D printing of 253 ink formulations in a high-throughput manner, and the functional properties including the release of paroxetine, cytotoxicity, printability as well as mechanical properties are screened [117].
After implantation of biomedical materials, the adsorption of protein and the attachment of cells are the main determinants of the applicability of medical implant materials, especially those applications involving tissue regeneration. Both the surface chemistry and physics of polymeric implant could affect the protein adsorption; in this term, the surface chemistry and physics of polymers are complex and a fingerprint profile could be developed as a characteristic representation of polymers in order to enable a reasonable discovery of new materials for specific applications. Machine learning models can be then trained to quickly predict the properties of new polymer formulations and provide uncertainty in the predictions [118]. In addition to study the pathogen infection of the implant, bacterial cell adhesion on the surface of implant was investigated; machine learning was utilized to quantitatively predict and screen the polymer surface adhesion, and the screened polymers can be candidates for implants or indwelling medical devices [119].
Poly(lactic acid)/ploylactide (PLA)-based composites are ideal materials in bone repair, but PLA suffers from low cell adhesion on the material surface, poor mechanical properties, and high cost, which greatly limit its clinical applications [120]. Rojek investigated the customized fabrication of a PLA hand exoskeleton using 3D printing technique, and the artificial neural network (ANN) optimization method supported by GA was used to calculate and optimize the process parameters and material selection to achieve the maximum tensile force of the hand exoskeleton component. The combination of AI and 3D printing not only can optimize PLA properties but also has provide a good inspiration for using artificial intelligence to customize patient solutions [39].

Inorganic Materials
Inorganic materials are known for their high melting point, hardness, and resistance to oxidation as well as potential biocompatibility in clinical use. The dissolution of metal ions from metallic implant may cause toxicity to host tissue, oxide films on the surface of metallic implant could be used to address this issue; it was found that both the formation and the thickness of oxide films may significantly affect the surface properties and biocompatibility of metallic implant, but the effect of process parameters on the film thickness is far from being elucidated and the general linear fitting methods cannot meet the needs in modeling such processes. To visually examine the quality of oxide film on the surface of magnesium alloy, Yang used genetic algorithm (GA) to optimize the initial weights and thresholds of the BPNN to construct a film thickness prediction model (GA-BP). The GA-BP model was found to have better prediction accuracy than the BPNN model [121].
Titanium dioxide (TiO 2 ) nanotube arrays have been found to promote cell adhesion, proliferation, and differentiation, and can strongly bind to titanium substrate; such characteristics of titanium dioxide nanotube arrays could be harnessed to improve the biocompatibility of titanium/titanium alloy implants and have attracted the attention of biomaterial researchers. Mou fabricated gradient TiO 2 nanotubes micro-patterned films on the surface of titanium to facilitate the high-throughput screening of protein adsorption, platelet adhesion, bacterial adhesion, and the effect of octacalcium phosphate membrane layer construction. The gradient TiO 2 nanotube micropatterning proved to be an effective tool in high-throughput screening for biomedical applications [122].
Besides the oxide films on metal surfaces, metal oxides are also good options for biomaterials. In order to select suitable metal oxides quickly and accurately, Hu applied machine learning and feature selection to predict the physical properties of metal oxides, and found that the RFR model combining different feature selection methods (Variance Threshold, Univariate feature selection, and Least absolute shrinkage and selection operator) achieved better results in terms of prediction accuracy [74].

Composite Materials
Composites are materials made by combining two or more materials with different properties in order to effectively make up for the deficiencies in biological and physicochemical properties of a single material, and further improve the applicability of the material in clinical applications [120,123,124].
Nano-emulsions have been utilized as the carrier for oral drugs, but the cytotoxicity and low stability hinder their wide application. The artificial neural network analysis revealed that the concentration of surfactant is the main determinant of stability without causing dose-dependent cytotoxicity. Such findings paved the way for the preparation of nano-emulsions with optimized cytotoxicity and stability [125]. The nature of the composites can have an impact on the drug loading and the release behavior of the loaded drug. Bikiaris prepared a series of poly(ε-caprolactone)/poly(propylene glutarate) (PCL/PPGlu) polymer blends at different weight ratios as the matrix using risperidone as the model drug, followed by the evaluation of the interaction between the polymer and the drug. Artificial neural network, applied for simulation of the drug dissolution behavior, revealed a higher fitting and correlation compared with multiple linear regression (Figure 4) [37].
To better predict and optimize the performance of the carrier, the neural network using other algorithms have been explored. Varshosaz combined genetic algorithm and artificial neural network to optimize and simulate the synthetic process of agar nanosphere from agar, calcium chloride, hydroypropyl-β-cyclodextrin, and bupropione hydrochloride. Satisfactory consistence was achieved between the predicted and actual values of the ANN model [126]. Wu and co-workers found that the combination of neural network and genetic algorithm could better predict and optimize the formulation of nanoparticles than the response surface method to achieve better controlled release behavior [127].
artificial neural network to optimize and simulate the synthetic process of agar na sphere from agar, calcium chloride, hydroypropyl-β-cyclodextrin, and bupropione h drochloride. Satisfactory consistence was achieved between the predicted and actual v ues of the ANN model [126]. Wu and co-workers found that the combination of neu network and genetic algorithm could better predict and optimize the formulation of noparticles than the response surface method to achieve better controlled release behav [127]. Composite materials based on chitosan has good biocompatibility and biodegra bility, so the composite material with chitosan as one of the components surpasses traditional materials to a certain extent for potential clinical use [128,129]. Fourier tra form infrared spectroscopy and differential scanning calorimetry were employed to vestigate the interactions of variable chitosan and sodium tripolyphosphate in the f mation of nanoparticles, artificial neural network was built based on these data and us to predict not only the particle size, but also the yield of nanoparticles [36]. Shang us the relevant data of fish skin collagen extraction process to establish a BPNN to analy and study the different factors and levels in the extraction process, and screen the b parameters. Finally, the relative error between the predicted value obtained by the n work and the actual value obtained by the orthogonal experiment is not more than 5 which shows the feasibility of BP neural network combined with the orthogonal expe ment to optimize the collagen extraction process, and the model has reliable predict performance [130]. Composite materials based on chitosan has good biocompatibility and biodegradability, so the composite material with chitosan as one of the components surpasses the traditional materials to a certain extent for potential clinical use [128,129]. Fourier transform infrared spectroscopy and differential scanning calorimetry were employed to investigate the interactions of variable chitosan and sodium tripolyphosphate in the formation of nanoparticles, artificial neural network was built based on these data and used to predict not only the particle size, but also the yield of nanoparticles [36]. Shang used the relevant data of fish skin collagen extraction process to establish a BPNN to analyze and study the different factors and levels in the extraction process, and screen the best parameters. Finally, the relative error between the predicted value obtained by the network and the actual value obtained by the orthogonal experiment is not more than 5%, which shows the feasibility of BP neural network combined with the orthogonal experiment to optimize the collagen extraction process, and the model has reliable predictive performance [130].

Perspectives and Outlook
Materials genome technologies have changed the traditional material R&D paradigm; however, tremendous efforts on the following topics are required to meet the fast-growing needs and challenges in the R&D of biomedical materials. Mapping the relationship between the components, structures, and properties of biomedical materials is complex and challenging. This challenge comes mainly from the difficulty in obtaining adequate data of biomedical materials, which are generally small sample data sets. In this regards, either the development of high-throughput experimental tools or data enhancement would be highly desirable [131,132]. In addition, image data of biomedical materials are underutilized; related tools for image feature extraction can be harnessed in image analysis of biomedical materials to help extract image information [133].

Establishment and Management of the Database
As an important foundation in the data-driven material R&D, the present biomedical material data are characterized by different sources, diverse types, and complexity, which to some extent hinder the rapid development of the perfect material data standard system and the establishment of data import template are prerequisite to deal with a variety of data types and formats as well as the systematic management and storage of data. My SQL database, which is small in size and low in cost, is a good option for the establishment of open material database. Artificial intelligence can also be used to automatically collect and classify the latest literature data. In addition, the experimental parameters have a huge impact on the performance and structure of the material. Relevant experimental conditions of the data should be supplemented and refreshed in the database to validate the simulation results.
In addition, the parameters in animal experiments should also be considered. This is unique and critical data for biomedical materials.

Development of High-Throughput Technology
Large-scale automation of the experiment and calculation process may significantly accelerate the establishment of the database. At this point, high-throughput techniques can be used to synthesize and characterize materials with high efficiency and precision. The use of high-throughput equipment is of great significance. It may not only solve the defects of manual experiments but also can strengthen the combination of high-throughput equipment with databases and calculation methods, and consequently improve the level of material development, production, and application. In addition, biomedical materials with nanometers and microns in size also require a significant reduction in experimental error.

Innovation of Algorithm
The simulation and prediction ability of algorithm model is essentially important for the R&D of biomedical materials. Animal models are commonly used in the pre-clinical investigation of biomedical material. However, the inevitable inconsistency between the physiological/pathological environments of the animals and humans would lead to confused results during the clinical translation. New algorithm models are needed to simulate as much as possible the real human body environment and the situation of biomedical materials in physiological tissue/organ. In addition, the relatively smaller data set derived from open databases or high-throughput synthesis/screening of biomedical materials does not meet the requirement of general machine learning and deep learning. It would be highly necessary to develop algorithms suitable for small data sets to obtain material data analysis results with higher accuracy.

Industrial Involvements
The majority of current efforts on the MGT for biomedical materials have been devoted to the basic research; however, the ultimate outcome or performance of biomedical materials strongly depend on the processing and manufacturing parameters, and the involvement of industry is far from satisfactory. One would foresee that the MGT may add emphasis to its application in R&D of biomedical materials by receiving more input from the industry.

Concluding Remarks
MGT has strongly accelerated the R&D of materials, including predicting rapidly and screening materials, optimizing the properties of biomedical materials. Meanwhile, the application of MGT in biomedical materials has also promoted the innovation and development of science and industrial technology, basic theories, key technologies, and equipment. In the future, the development opportunities of MGT may be harnessed to facilitate the R&D of biomedical material, and work out the bottlenecks and difficulties in related fields.