Large-Scale Screening and Machine Learning to Predict the Computation-Ready, Experimental Metal-Organic Frameworks for CO 2 Capture from Air

: The rising level of CO 2 in the atmosphere has attracted attention in recent years. The technique of capturing CO 2 from higher CO 2 concentrations, such as power plants, has been widely studied, but capturing lower concentrations of CO 2 directly from the air remains a challenge. This study uses high-throughput computer (Monte Carlo and molecular dynamics simulation) and machine learning (ML) to study 6013 computation-ready, experimental metal-organic frameworks (CoRE-MOFs) for CO 2 adsorption and di ﬀ usion properties in the air with very low concentrations of CO 2 . First, the law inﬂuencing CO 2 adsorption and di ﬀ usion in air is obtained as a structure-performance relationship, and then the law inﬂuencing the performance of CO 2 adsorption and di ﬀ usion in air is further explored by four ML algorithms. Random forest (RF) was considered the optimal algorithm for prediction of CO 2 selectivity, with an R value of 0.981, and this algorithm was further applied to analyze the relative importance of each metal-organic framework (MOF) descriptor quantitatively. Finally, 14 MOFs with the best properties were successfully screened out, and it was found that a key to capturing a low concentration CO 2 from the air was the di ﬀ usion performance of CO 2 in MOFs. When the pore-limiting diameter (PLD) of a MOF was closer to the CO 2 dynamic diameter, this MOF could possess higher CO 2 di ﬀ usion separation selectivity. This study could provide valuable guidance for the synthesis of new MOFs in experiments that capture directly low concentration CO 2 from the air.


Introduction
It is well known that the amount of CO 2 discharged into the atmosphere increases with the rapid development of industry and population growth.In addition, deforestation, the large amount of CO 2 and other gases generated by the burning of fossil fuels such as coal, oil, and natural gas directly discharged into the atmosphere, and the emission of limestone roasting to produce cement have resulted in global carbon dioxide emissions increasing by 3.8% [1].All of the above factors have aggravated carbon dioxide emissions, thereby increasing the urgency of counteracting the greenhouse effect and its associated global warming.The Kyoto Protocol and the Paris Agreement aim to control greenhouse gas emissions under the United Nations Framework Convention on Climate Change (UNFCCC), in which CO 2 is listed as a major greenhouse gas that needs to be mitigated or recycled [2].The greenhouse gases include more than CO 2 , however; in fact, the global warming potentials of CH 4 and N 2 O are 25 times and 298 times that of CO 2 , respectively.Nevertheless, due to its relatively large emission levels, CO 2 accounts for approximately 55% of the total greenhouse gas contribution [3,4].Thus, it is obvious that the adsorption and separation of carbon dioxide from the air is particularly important.In addition, the successful capture of CO 2 could have multifaceted practical values: first, oil recovery could be improved through appropriate reservoir engineering; second, the captured CO 2 could be used to produce industrial chemicals, including concrete, paint, and fertilizer; third, the CO 2 in the atmosphere could be captured and combined with hydrogen for direct synthesis into liquid hydrocarbons, which could then be utilized in fuel synthesis and supply, including gasoline and diesel.The use of raw materials can reduce the proportion of fossil energy to further control CO 2 emissions, ultimately achieving carbon neutrality or even net negative carbon emissions [5].
Recently, carbon engineering has developed a series of capture technologies that remove carbon dioxide directly from the air.Carbon dioxide can be removed from the atmosphere using biological, chemical, or physical processes [6].These methods have certain limitations, however.For example, biological processes are very economical, but they are usually very slow and ineffective.As for chemical processes, the waste of carbon resources and volatilization of organic solvents during these actions lead to further environmental pollution, equipment corrosion, and complex post-treatment issues.The traditional technique for separating carbon dioxide is solvent washing, such as the use of an alcohol amine solution [7][8][9][10].Although this conventional method can reduce the concentration of carbon dioxide in the air, it is extremely expensive, the solvent is difficult to regenerate, the operation is complicated, and it consumes a great deal of energy [11].In fact, the energy consumption of solvent washing is 3 to 4 times that of CO 2 captured from exhaust gas [12].Given these drawbacks, there is an urgent need to find a more efficient, convenient, and energy-saving technique to replace the traditional carbon dioxide capture method.Adsorption separation is a potential technique.It is not only inexpensive, but also simple in terms of operation and equipment, and relatively low in energy consumption when the adsorbent is regenerated (the regeneration process of adsorbents is to desorb the adsorbed substances).Conventional adsorbents, however, including activated carbon, zeolite, silica gel, and metal oxides have poor scavenging effects on carbon dioxide in the air due to inferior separation selectivity and regeneration difficulty.For example, silica gel, which has amorphous properties, does not have a continuous uniform porous structure and exhibits unfavorable diffusion properties [11].Therefore, the development of a new type of adsorbent is imperative.In recent years, studies have shown that the use of metal-organic frameworks (MOFs) to adsorb and separate carbon dioxide can not only make up for the shortcomings of the above adsorbents, but also feature the advantages of high selectivity and being non-polluting.The MOF is an organic-inorganic hybrid material with intramolecular pores formed by the self-assembly of organic ligands and inorganic metal ions or clusters by coordination bonds [13].Compared with common adsorbents, MOFs exhibit many advantages such as various structures and properties, large specific surface area, high porosity, and structural control.Therefore, they are widely used in gas adsorption [11] and separation [14][15][16][17][18][19], as well as general materials in processes including storage [20], optics [21], catalysis [22][23][24][25], and drug delivery [26,27].To date, thousands of MOFs have been synthesized, some of which have been utilized in the attempt to capture CO 2 from the air.Peng et al. [28] designed and synthesized 2 incorporated MOFs to study their stability and ability to capture CO 2 from the air.Liu et al. [11] used an amine-functionalized MOF and an ultra-microporous MOF to capture CO 2 directly from the air, and further investigated the performance of CO 2 capture and the reproducibility of MOFs under humid conditions.Osama et al. [29] synthesized an isomorphic MOF SIFSIX-3-Cu with uniform adsorption sites for capturing CO 2 from the air.Since CO 2 capture from the air has a very high selectivity of MOF, when the traditional approach is used to screen MOFs for the best-performing candidates, it not only consumes a great deal of manpower and material resources, but also has an extended study period and causes pollution to a certain extent.With the continuous advancement and development of computers, molecular simulation is playing an increasingly important role in the field of materials science [30].Some studies have used high-throughput molecular simulation calculation methods to screen large numbers of MOFs in a database, thereby successfully screening MOFs with high selectivity and high working capacity based on different target performance requirements.For example, Wilmer et al. adsorbed pure carbon dioxide, nitrogen, and methane using more than 130,000 hypothetical MOFs, and proposed a relationship between structural properties (pore size, volume, and surface area) and chemical functions, as well as evaluation criteria for the separation of carbon dioxide from adsorbents [31].In the presence of nickel dilution, Watanabe et al. combined pore size analysis with classical simulation to screen 1163 MOFs as membrane materials for CO 2 /N 2 separation [32].Lin et al. screened hundreds of thousands of theoretically predicted zeolites and zeolite MOFs and identified a number of potential materials for capturing carbon dioxide [33].Based on 105 MOFs, Wu et al. proposed the relationship of CO 2 /N 2 adsorption selectivity with porosity and the isosteric heat of adsorption [34].Fernandez et al. [35] used advanced machine-learning (ML) algorithms to quickly identify 292,050 hypothetical high-performance MOFs for pure CO 2 adsorption (0.15 bar and 1 bar).These screening studies, however, were aimed at capturing high concentrations of CO 2 .Given that the concentration of CO 2 in the atmosphere is comparatively low relative to the concentrations of natural gas and other components, it is undoubtedly a challenge to discover efficient MOF materials that can directly capture CO 2 from the air.
To date, given that there have been 6013 MOFs reported, finding the appropriate MOFs for a specific system in such a large database is undoubtedly a daunting task.This study focused on the aforementioned MOF simulation of the adsorption and diffusion performances of CO 2 , N 2 , and O 2 in infinite dilutions in order to identify materials with excellent performance in terms of both static adsorption and kinetic adsorption.The influencing factors affecting the adsorption and diffusion of CO 2 were obtained by univariate analysis.Next, multivariate analyses, namely 4 ML algorithms (back propagation neural network (BPNN), decision tree (DT), random forest (RF), and support vector machine (SVM)), were explored in depth.Finally, we adopted the optimal algorithm model.The parameters affecting CO 2 selectivity were predicted, and 14 types of MOFs with the same diffusion selectivity and adsorption selectivity were selected.

Molecular Model
In this work, we used molecular simulation to screen the capability of 6013 computation-ready, experimental MOFs (CORE-MOFs version 2) [36] to capture CO 2 from the air.Their crystal structures were derived from the Cambridge Crystallographic Data Centre (CCDC), and their parameters were compiled and verified by Chung et al. [37].We removed all solvent and ligand molecules prior to running the simulation.Each MOF used 5 structural parameters, namely, volumetric surface area (VSA), largest cavity diameter (LCD), pore-limiting diameter (PLD), porosity φ, density ρ, and an energy parameter: heat of adsorption.Both LCD and PLD were calculated using the Zeo++ software package [38].The VSA and φ were calculated using the N 2 of 3.64 Å and He of 2.58 Å as probes in the RASPA software package [39].If VSA is close to or equal to 0, this indicates that the MOF cannot accommodate N 2 molecules [40].We used NVT-Monte Carlo (NVT-MC) simulation, where N is the number of particles, V is the volume of the system, and T is the temperature of the system.The Q st of each gas was calculated in an infinite dilution state.
The force field parameters for the 3 gas components CO 2 /N 2 /O 2 were from the transferable potentials for phase equilibria (TraPPE) force field [41] and are listed in Table S2 The CO 2 molecule has a C-O bond length of 1.16 Å and a bond angle ∠OCO of 180 • .N 2 is considered as a 3-point model, and the bond length of N-N is 1.10 Å. Oxygen is also a 3-point atom, and the O-O bond length is 1.21 Å.The models of 3 gases are shown in Figure S1, The atomic charge of MOF was estimated using the MOF electrostatic-potential-optimized charge scheme (MEPO-Qeq) method [42], which accurately evaluated electrostatic interactions.Due to the advantages of the MEPO-Qeq method with fast and accurate, it is widely used in various systems of adsorption-MOF [43][44][45].The Lennard-Jones (LJ) electrostatic parameters were obtained from the universal force field (UFF) [46] and are listed in Table S1 Data from previous studies had shown that the UFF-TraPPE force field combination could accurately predict the adsorption and diffusion behaviors of these 3 gases in MOFs [40,47,48].The Lorentz-Berthelot combination rule was used to calculate the cross-LJ parameters.

Screening Methods
In MOFs, the values of Henry's constant K and the diffusion coefficient D of CO 2 , N 2 , and O 2 were estimated using Monte Carlo (MC) and molecular dynamics (MD) simulations with the same set, respectively.In principle, a single gas molecule should be added to an MOF to simulate infinite dilution, while in reality, we added 30 gas molecular models to each MOF, ignoring the force between the gas models, thus being equivalent to the independent simulation of each gas molecule.Ultimately, the simulation results of the 30 independent molecules were statistically averaged.Throughout the simulation, the MOF frame was assumed to be rigid and the simulation elements were extended to at least 24 Å along the three-dimensional periodic boundary conditions.A 12 Å spherical cutoff with long-range correction was used to calculate the LJ interaction, while the Ewald sum was used to calculate the electrostatic interaction.In each MOF, the MC simulation ran 100,000 cycles, with the first 50,000 used for balancing and the last 50,000 used for overall averaging.Each cycle consisted of n trials (n: number of adsorbed molecules), including translation, rotation, regeneration, and exchange (exchange movement, including insertion and deletion).In the MD simulation, the 30 gas molecules had an MD duration of 10 ns at each MOF, and 5 ns was ultimately selected for statistical averaging.After the sampling analysis of dozens of MOFs, it was found that further increases of cycle time and MD duration had little effect on the simulation results.All MCs and MDs were simulated using the RASPA software package [39].

Univariate Analysis
In order to investigate the relationship of CO 2 adsorption and diffusion properties in N 2 +O 2 with the MOF structure during static adsorption and kinetic adsorption, we first analyzed the relationship among adsorption selectivity S ads , diffusion selective S diff , and the LCD of CO 2 /N 2 +O 2 , as shown in Figure 1.Obviously, most MOFs with large adsorption selectivity and diffusion selectivity have relatively small LCDs. Figure 1a indicates that when the LCD is 2.8-6.5 Å, the adsorption selectivity of CO 2 /N 2 +O 2 decreases, and when the LCD is >15 Å, the adsorption selectivity gradually becomes stable, tending to 5, as depicted by the red line in Figure 1 This is because CO 2 has a strong quadrupole moment, and even in infinitely large pores it is preferentially adsorbed compared to N 2 and O 2 .The trend of gas separation is consistent with the trends of previous reports [49,50].Figure 1b presents the relationship between S diff and LCD.Similar to S ads , the larger diffusion selectivity (S diff >1) only occurs in the LCDs ranging from 2.4-5 Å, since the kinetic diameter of CO 2 is less than the kinetic diameters of O 2 and N 2 (the kinetic diameters of CO 2 , O 2 , and N 2 are 3.3, 3.46, and 3.64 Å, respectively).When the LCD of an MOF is small, the CO 2 molecules with smaller diameters diffuse faster, so the diffusion selectivity S diff (CO2/N2+O2) is larger.As the LCD increases, the diffusion selectivity gradually decreases.When the LCD is >15 Å, the diffusion selectivity tends to be stable and fluctuates at around 0.36.Comparison of Figure 1a,b reveals that the adsorption selectivity is generally >1, while it is rare for the diffusion selectivity to be >1.Because CO 2 has a strong quadrupole distance, it has a strong interaction force with MOF molecules, thus hindering the diffusion of CO 2 and resulting in a slower diffusion rate, which may be even smaller than the diffusion rates of N 2 and O 2 .
Figure 1c,d show the relationships of S ads and S diff to the PLD, respectively.Comparing the panels in Figure 1 reveals that the PLD and LCD display the same trend in their relationships to the S ads and S diff of CO 2 /N 2 +O 2 .Larger S ads and S diff values appear when the LCD and PLD are small, and S ads and S diff both decrease with increasing PLD or LCD, eventually tending toward stability.Therefore, there is a greater possibility of finding MOFs with simultaneously high S ads and S diff among MOFs with small PLDs and LCDs. Figure 2a shows that Sads (CO2/N2+O2) increases monotonically with increasing Qst, indicating that Qst may be the main parameter during the adsorption process.Since the concentration of CO2 in the atmosphere is low, it is close to the infinite dilution state.Hence, the selectivity is strongly dependent on the isosteric heat of adsorption of CO2 in the infinite dilution state.The larger Sdiff (CO2/N2+O2) in Figure 2b occurs when the VSA is close to zero.As the VSA continues to increase, Sdiff (CO2/N2+O2) gradually decreases, and eventually stabilizes.This is because when the VSA is close to zero, the MOF molecule either cannot pass any or only passes a small amount of CO2 molecules.When the VSA is large, all the gas molecules can pass through the MOF molecule.Therefore, the separation of CO2 cannot be achieved, i.e., the diffusion selectivity is substantially unchanged.Figure S11b,c indicate the relationship of adsorption selectivity with porosity and VSA, respectively.It can be observed that both of these parameters exert weak influences on adsorption selectivity.Figure 2a shows that S ads (CO2/N2+O2) increases monotonically with increasing Q st , indicating that Q st may be the main parameter during the adsorption process.Since the concentration of CO 2 in the atmosphere is low, it is close to the infinite dilution state.Hence, the selectivity is strongly dependent on the isosteric heat of adsorption of CO 2 in the infinite dilution state.The larger S diff (CO2/N2+O2) in Figure 2b occurs when the VSA is close to zero.As the VSA continues to increase, S diff (CO2/N2+O2) gradually decreases, and eventually stabilizes.This is because when the VSA is close to zero, the MOF molecule either cannot pass any or only passes a small amount of CO 2 molecules.When the VSA is large, all the gas molecules can pass through the MOF molecule.Therefore, the separation of CO 2 cannot be achieved, i.e., the diffusion selectivity is substantially unchanged.Figure S11b,c indicate the relationship of adsorption selectivity with porosity and VSA, respectively.It can be observed that both of these parameters exert weak influences on adsorption selectivity.Figure 2a shows that Sads (CO2/N2+O2) increases monotonically with increasing Qst, indicating that Qst may be the main parameter during the adsorption process.Since the concentration of CO2 in the atmosphere is low, it is close to the infinite dilution state.Hence, the selectivity is strongly dependent on the isosteric heat of adsorption of CO2 in the infinite dilution state.The larger Sdiff (CO2/N2+O2) in Figure 2b occurs when the VSA is close to zero.As the VSA continues to increase, Sdiff (CO2/N2+O2) gradually decreases, and eventually stabilizes.This is because when the VSA is close to zero, the MOF molecule either cannot pass any or only passes a small amount of CO2 molecules.When the VSA is large, all the gas molecules can pass through the MOF molecule.Therefore, the separation of CO2 cannot be achieved, i.e., the diffusion selectivity is substantially unchanged.Figure S11b,c indicate the relationship of adsorption selectivity with porosity and VSA, respectively.It can be observed that both of these parameters exert weak influences on adsorption selectivity.In addition to adsorption and diffusion selectivity, the Henry coefficient of CO 2 reflects the adsorption performance of CO 2 in the infinite dilution state, helping to explain the capture performance of MOFs for air with very low CO 2 concentration.Figure 3a clearly shows the tendency of K N2 to change with enthalpy.When the porosity φ is small, the MOF has no space due to the limited pore volume, and only a small amount of N 2 can be adsorbed; therefore, K N2 is small.When φ is in the range of 0-0.29,K N2 increases significantly with increasing φ.When φ > 0.29, K N2 slows down and gradually stabilizes with increasing φ. Figure 3b compares the Henry coefficients of the 3 gases.It can be seen that the trends of the Henry coefficients of N 2 and O 2 are almost the same; however, CO 2 is different.First, in most MOFs, the Henry coefficient values of CO 2 are basically larger than the Henry coefficient values of N 2 and O 2 .Second, when LCD >20 Å, the K CO2 value tends to be level, and eventually stabilizes.The Henry coefficient of CO 2 is still higher than the coefficients of N 2 and O 2 , which also leads to MOF selectivity >1 when the LCD is infinite, as seen in Figure 1a Finally, it can be observed that only a few MOFs can be identified for which the CO 2 Henry coefficient can be >10 −1 mmol/g/Pa.Observing these MOF structures reveals that most have smaller or open metal sites.The above univariate analysis can only determine the relationship between individual parameters and performance.Q st , PLD and LCD are considered to have dramatic impacts on adsorption selectivity and diffusion selectivity, but their variable influences cannot be analyzed quantitatively.We will further utilize 4 types of ML algorithms to obtain additional information about structure-performance.
In addition to adsorption and diffusion selectivity, the Henry coefficient of CO2 reflects the adsorption performance of CO2 in the infinite dilution state, helping to explain the capture performance of MOFs for air with very low CO2 concentration.Figure 3a clearly shows the tendency of KN2 to change with enthalpy.When the porosity ϕ is small, the MOF has no space due to the limited pore volume, and only a small amount of N2 can be adsorbed; therefore, KN2 is small.When ϕ is in the range of 0-0.29,KN2 increases significantly with increasing ϕ.When ϕ > 0.29, KN2 slows down and gradually stabilizes with increasing ϕ. Figure 3b compares the Henry coefficients of the 3 gases.It can be seen that the trends of the Henry coefficients of N2 and O2 are almost the same; however, CO2 is different.First, in most MOFs, the Henry coefficient values of CO2 are basically larger than the Henry coefficient values of N2 and O2.Second, when LCD >20 Å, the KCO2 value tends to be level, and eventually stabilizes.The Henry coefficient of CO2 is still higher than the coefficients of N2 and O2, which also leads to MOF selectivity >1 when the LCD is infinite, as seen in Figure 1a Finally, it can be observed that only a few MOFs can be identified for which the CO2 Henry coefficient can be >10 −1 mmol/g/Pa.Observing these MOF structures reveals that most have smaller or open metal sites.The above univariate analysis can only determine the relationship between individual parameters and performance.Qst, PLD and LCD are considered to have dramatic impacts on adsorption selectivity and diffusion selectivity, but their variable influences cannot be analyzed quantitatively.We will further utilize 4 types of ML algorithms to obtain additional information about structure-performance.

Machine Learning
Currently, machine learning has been used to predict the performance of materials and to filter high-performance materials from large databases [51].Aiming to discover a better machine prediction method suitable for this system, we individually compared the simulations of the 4 ML algorithms commonly used in big data analysis, i.e., the BPNN, DT, RF, and SVM.Among them, BPNN is a kind of forward signal propagation with error back propagation in which the gradient descent algorithm continuously adjusts the weight and threshold until the error is less than a set threshold.Some parameters of BPNN were set: the training function is Levenberg-Marquardt, the transfer function is a hyperbolic tangent sigmoid transfer function, and the performance evaluation function is the mean square error (MSE).The number of hidden layer neurons was 18, the maximum number of training was 1000, the training required an accuracy of 0.001, and the learning rate was 0.01.DT is a traditional method for data classification and screening.Under the condition that the probability of occurrence takes place in various situations, probability analysis is employed to analyze data with the dendritic model to obtain the expected values.The random forest algorithm is composed of multiple decision trees.The setting parameters of DT were: standard CART (classification and regression tree) used to select the best split predictor at each node.The criteria of splitting and pruning are the MSE function.After optimizing and pruning, the minimum number of branch node observations was 10, the minimum number of leaf node observations was 4, and the maximal number of decision splits was 1. RF uses the method of randomly selecting split attribute

Machine Learning
Currently, machine learning has been used to predict the performance of materials and to filter high-performance materials from large databases [51].Aiming to discover a better machine prediction method suitable for this system, we individually compared the simulations of the 4 ML algorithms commonly used in big data analysis, i.e., the BPNN, DT, RF, and SVM.Among them, BPNN is a kind of forward signal propagation with error back propagation in which the gradient descent algorithm continuously adjusts the weight and threshold until the error is less than a set threshold.Some parameters of BPNN were set: the training function is Levenberg-Marquardt, the transfer function is a hyperbolic tangent sigmoid transfer function, and the performance evaluation function is the mean square error (MSE).The number of hidden layer neurons was 18, the maximum number of training was 1000, the training required an accuracy of 0.001, and the learning rate was 0.01.DT is a traditional method for data classification and screening.Under the condition that the probability of occurrence takes place in various situations, probability analysis is employed to analyze data with the dendritic model to obtain the expected values.The random forest algorithm is composed of multiple decision trees.The setting parameters of DT were: standard CART (classification and regression tree) used to select the best split predictor at each node.The criteria of splitting and pruning are the MSE function.After optimizing and pruning, the minimum number of branch node observations was 10, the minimum number of leaf node observations was 4, and the maximal number of decision splits was 1. RF uses the method of randomly selecting split attribute sets to construct a decision tree.The parameters for RF were set as: number of trees 200, minimum leaf size 10.The number of variables randomly selected in the variable subset of the node split in each tree was 2. SVM is an algorithm for binary classification of data through supervised learning, and employs mathematical transformation methods to divide data with a certain centralized structure into rules.We chose the support vector machine regression model of Statistics and Machine Learning Toolbox in MATLAB 2016b to predict, where the kernel function is radial basis function (Gaussian), the kernel scale parameter is set as "auto", and the loss function is epsilon-insensitive.The box constraint (also called the penalty coefficient, C) was 0.0567, and the half width of epsilon-insensitive band (ε) was set as 0.0057.In the radial basis kernel function, Gamma = 1/(2σ 2 ), where σ is the parameter of the kernel function, which can affect the complexity of the SVM regression algorithm.In our study, the value of gamma was 7.125.The solver of convex quadratic programming is sequential minimal optimization (SMO).Before training and testing, we first processed the data set out-of-order, and then randomly divide it into training and testing sets based on a ratio of 7:3.More detailed descriptions of ML algorithms are listed in the supporting information, and the corresponding diagrams of each algorithm are shown in Figures S2-S5.
We used BPNN, RF, DT, and SVM to predict the adsorption selectivity, and took the logarithm of the adsorption selectivity in order to reduce the differences associated with the varying magnitudes of the data.The 4 ML for predicting the correlation coefficient R value of the adsorption selectivity are listed in Table 1 The results of the testing and training are shown in Figure 4 and Figure S12.The distribution trends of the points in Figure 4a-d are all straight lines inclined upward.The different colors from top to bottom in the figure represent an increase in the number of points, and most of the points are concentrated on the diagonal, indicating that the prediction results are good.Figure 4 reveals that RF has the highest correlation coefficient value (0.982), while the support vector machine algorithm has the lowest (0.886).Thus, the prediction accuracy obtained by the RF algorithm is the highest.Therefore, among the 4 ML algorithms, the structure-performance relationship of RF on adsorption selectivity obtains more information, and the prediction results are the best.RF has good generalization ability and strong model learning ability, and this type of ML is suitable for the system.To verify the accuracy of four ML algorithms, we performed 5 repeated predictions, listed in Table S3.The repeated prediction results do not vary significantly, confirming that the RF algorithm is a suitable model.Because RF introduces two kinds of randomness (sampling randomness and feature randomness), it has strong generalization ability.In previous studies, random forest algorithms also exhibited the best prediction results [52,53].Whether the overfitting of the model is an important issue.We used combinations of different descriptors and 5 times 5-fold cross-validation to verify the RF model.The results showed that the selected model was not overfitting.During the material screening process, the relative importance of parameters may affect the ultimate screening results.We selected the best RF algorithm to predict the relative importance of each descriptor.The relative importance percentages are shown in Figure 5 and Table S7.We used mean squared error (MSE) to evaluate the relative importance of the 6 descriptors; the greater the percentage of relative importance of the resulting descriptors, the higher the relative contribution of the specific descriptor.According to the results presented in Figure 5, the percentage of Q st is the largest, thus indicating that Q st exerts the greatest impact on the adsorption selectivity.The relative importance of the MOF descriptors to the adsorption selectivity is Q st > ρ > LCD > VSA ≥ φ > PLD.The importances of VSA and φ are very close, indicating that the effects of two descriptors on adsorption selectivity are roughly equal.From a material science point of view, the larger the φ, the larger the VSA.This may be the reason why the effects of these parameters are essentially the same.

Best Metal-Organic Frameworks (MOFs)
We selected 5 limiting conditions for S ads (CO2/N2+O2) and S diff (CO2/N2+O2), and chose 14 optimal MOFs from the 6013 MOFs, as listed in Table 2 of the 14 materials, HIQPEE exhibited the largest S diff , which was as much as 62.27 Å. NORGOS displayed the largest S ads , which also corresponded to its maximum heat of adsorption.In comparison, the optimal MOF selected by this study is also more selective at higher Q st (4712.33)under the same conditions than that predicted by Wu et al. (433) [30] at Q st = 47.8 kJ/mol and Ravichandar Babarao et al. (500) [54].It was discovered that diffusion selectivity is generally lower than adsorption selectivity.The diffusion of CO 2 is the key property in determining the performance of MOFs for low concentrations of CO 2 during the kinetic adsorption process.For these 14 MOFs, the LCD, φ, and PLD ranges of the six descriptors also corresponded to those in the previous univariate analysis.Especially for PLDs, the optimal range of 2.66-3.64Å only spans 1 Å, which is also very close to the kinetic diameter of the 3 gases.In such strictly restricted channels, only CO 2 molecules can enter and be adsorbed, greatly increasing the probability of CO 2 being captured at low concentrations.Therefore, the analysis of the optimal MOF revealed that a PLD with a kinetic diameter close to CO 2 is a key condition for good CO 2 diffusion performance, further resulting in the excellent performance of the MOF in capturing CO 2 from the air, and thus providing effective guidance for the design and synthesis of new MOFs.

Conclusions
Firstly, we simulated the adsorption and diffusion properties of CO 2 , N 2 , and O 2 in 6013 CoRE-MOFs using high-throughput MC + MD.Then, we investigated the correlation among adsorption selectivity and diffusion selectivity for CO 2 and MOF descriptors by the univariate analysis.The Q st and PLD were considered to be the most important for S ads (CO2/N2+O2) and S diff (CO2/N2+O2) , respectively.In conjunction with multivariate analysis, a comparison of 4 ML algorithms revealed that the RF had the best prediction results for adsorption selectivity, with an R value of 0.982.This indicated that the RF method was the most suitable for the predictions of the capture of low CO 2 concentrations in MOF.The relative importance analysis of the RF algorithm quantitatively indicated that the relative importance of the MOF descriptors on adsorption selectivity is It was also confirmed that Q st is the most important parameter, while the VSA and φ are relatively less important.Through this high-throughput screening, 14 types of MOFs with optimal adsorption selectivity and diffusion selectivity were obtained.After comparison, it was found that their adsorption selectivity was generally higher than their diffusion selectivity.The diffusion separation performance of CO 2 is the key property in determining the performance of MOFs on low concentrations of CO 2 during the kinetic adsorption process.This study provides experimental guidance for the determination of MOFs that effectively capture CO 2 from the air, and indicates that advanced ML algorithms can accelerate the research and development of new materials.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2076-3417/10/2/569/s1:S1: Lennard-Jones parameters of MOFs, Table S2: Lennard-Jones parameters and charges of adsorbates, Table S3: The training and testing R values of adsorption selectivity using repeat 5 time-four ML.Table S4 Prediction using RF models with different descriptor combinations.Table S5 Prediction using repeat 5 times-RF models with different descriptor combinations.Table S6 The results of predicted RF with k times k-fold cross validation.Table S7: Predicted by the RF the relative importance of the six descriptors for adsorption selectivity.Formula S1: S ads (CO2/N2+O2) indicates the adsorption selectivity of CO 2 /N 2 +O 2 ; K i represents the Henry coefficient of component i (CO 2 , N 2 and O 2 ), Formula S2: S diff (CO2/(N2+O2)) represents the diffusion selectivity of CO 2 /N 2 +O 2 ; D i represents the diffusion coefficient of component i.
Author Contributions: X.D., Z.S., H.L. and Z.Q.conceived the idea.Z.Q.calculated all the materials' structural parameters and obtained valid data about the structure descriptors and performance.X.D., W.Y. and Z.S. analyzed the relationship between structure descriptors and performance.X.D. and S.L. used univariate analysis to obtain the influence law of affecting CO 2 adsorption and diffusion in air and Z.S. used ML algorithms to predict the MOF performance.X.D. and Z.S. wrote the original draft.H.L. and Z.Q.wrote the manuscript with contributions from all authors.All authors have read and agreed to the published version of the manuscript.

Figure 5 .
Figure 5. Predicted by the Random Forest, the relative importance of the six descriptors for adsorption selectivity.

Figure 5 .
Figure 5. Predicted by the Random Forest, the relative importance of the six descriptors for adsorption selectivity.

Table 1 .
The 4 ML algorithms for predicting the correlation coefficient R value of the adsorption selectivity.

Table 1 .
The 4 ML algorithms for predicting the correlation coefficient R value of the adsorption selectivity.
a CSD Code is the code of MOFs in the Cambridge Structure Database; b LCD: largest cavity diameter; c VSA: volumetric surface area; d PLD: pore-limiting diameter.