Combining Computational Screening and Machine Learning to Predict Metal–Organic Framework Adsorbents and Membranes for Removing CH4 or H2 from Air

Separating and capturing small amounts of CH4 or H2 from a mixture of gases, such as coal mine spent air, at a large scale remains a great challenge. We used large-scale computational screening and machine learning (ML) to simulate and explore the adsorption, diffusion, and permeation properties of 6013 computation-ready experimental metal–organic framework (MOF) adsorbents and MOF membranes (MOFMs) for capturing clean energy gases (CH4 and H2) in air. First, we modeled the relationships between the adsorption and the MOF membrane performance indicators and their characteristic descriptors. Among three ML algorithms, the random forest was found to have the best prediction efficiency for two systems (CH4/(O2 + N2) and H2/(O2 + N2)). Then, the algorithm was further applied to quantitatively analyze the relative importance values of seven MOF descriptors for five performance metrics of the two systems. Furthermore, the 20 best MOFs were also selected. Finally, the commonalities between the high-performance MOFs were analyzed, leading to three types of material design principles: tuned topology, alternative metal nodes, and organic linkers. As a result, this study provides microscopic insights into the capture of trace amounts of CH4 or H2 from air for applications involving coal mine spent air and hydrogen leakage.


Introduction
With the development and advancement of society and the economy, the demand for fossil energy fuels is increasing with each passing day, leading to a widespread problem of CH 4 leakage in fossil energy extraction, production, transportation, and application. For instance, leaking in underground boreholes for gas extraction can diminish CH 4 production and contaminate groundwater, and CH 4 also volatilizes in gas stations. More noteworthy is that the total amount of coal mine methane (coal mine gas with methane concentration below 0.75%) is huge. During the mining process, the total amount of methane emitted is about 28 billion m 3 /a, which is equivalent to 420 million tons of CO 2 , accounting for 8% of the total anthropogenic methane emissions. Moreover, combined with its low viscosity and the technical difficulty of its use, this methane has had to be exhausted entirely into the atmosphere for long periods, causing significant greenhouse gas pollution. Apart from fossil fuels, natural gas, hydrogen energy, and hydrogen fuel cells are also developing rapidly owing to their high green energy value. However, one of the main factors hindering the commercialization of hydrogen-fueled vehicles is the safety and cost caused by the leakage of H 2 . Meanwhile, CH 4 and H 2 are clean energy carriers with high calorific value and low carbon emissions. It would be progress if we could capture this part of the H 2 and CH 4 free in the air. Therefore, determining how to separate and capture the relatively small amounts of CH 4 and H 2 from air (O 2 and N 2 ) has become the focus of research. Exploiting diverse new adsorbents or separation membranes is the key for the safe and cost-effective storage and separation of gases (CH 4

and H 2 ).
In recent years, a newly developed category of porous coordination polymer materials has attracted significant research attention and received widespread attention in scientific research and industry. Obtained by self-assembly of metal ions and organic ligands, metal-organic frameworks (MOFs) [1][2][3] not only have crystalline structures similar to the regular pores of zeolite molecular sieves but also have higher specific surface areas than conventional porous materials, and organic components make them both designable and tailorable, with adjustable pore sizes and easy functionalization of the channel surfaces. Based on these structural features, which act as adsorbents, MOF materials can achieve high-density energy storage of clean fuel gases. MOFs can also generate differential interactions for different gas molecules to achieve economical and energy-efficient separation of gases, and they are therefore widely used in gas adsorption and separation [4][5][6][7], storage [8], catalysis [9,10], drug transport [11], and sensing applications [12], especially in the field of gas adsorption and separation in which a number of breakthroughs have been accomplished. Chang [13] showed in breakthrough experiments that a microporous MOF (SBMOF-1) allowed good separation of methane from N 2 . Xu [14] found it effective to combine B-substitution and Li-decoration or C 48 B 12 Li insertion as a strategy to improve the capacity of CO 2 to absorb H 2 and methane and to separate it from CO 2 /CH 4 and CO 2 /H 2 mixtures. This multiple modification strategy opened a door for the production of porous nanomaterials with higher gas adsorption and separation capabilities. Kang [15] found in his study that MOFJUC-150 membranes have a significant preferential permeability to H 2 compared to other gas molecules. At ambient temperature, the selectivity factors of the membranes for H 2 /CH 4 , H 2 /N 2 , and H 2 /CO 2 were 26.3, 17.1, and 38.7. MOFs have also been reported to possess ultra-high surface areas, especially for a series of CH 4 /H 2 separations, with higher selective adsorption capacities than conventional porous materials, indicating that such materials are the most attractive adsorbent materials for CH 4 /H 2 separation [16,17]. Eddaoudi and co-workers [18] developed several novel MOF materials for efficient gas storage, and one of them, a hybrid material, could carry out the cost-effective storage of methane.
MOFs have experienced rapid development over the past decade. In addition to being used as adsorbents, the properties of MOF membrane materials make them suitable for gas separation. Kang et al. [19] summarized the advances in improving the performance of MOF membranes (MOFMs) in recent years, including practical application problems faced by the design and growth of MOFs. Separation processes by using membrane materials have shown to be efficient, have low energy costs, can be easily performed [20], and have advantages in terms of environmental safety and scalability. Conventional materials, such as carbon, zeolites, and polymers, have been explored as membranes for the separation of natural gas. However, the number of zeolite structures and the diversity of zeolite membranes are restricted, and it is difficult to control the precise pore size and pore function of zeolites [21]. Polymeric membranes, however, are often subject to a balance between the permeability and selectivity (called Robson's upper limit); that is, the increase in the permeability in conventional membrane materials occurs at the cost of selectivity [22]. Nevertheless, MOF membrane materials show strong potential. Hou et al. [16] reported a 10-fold higher separation coefficient, up to 25, for ZIF-722-8 membranes in CO 2 /methane mixtures. Liu's group [23] found that fcu-MOF membranes exhibited good permeability and selectivity for the separation of H 2 /CO 2 /N 2 , CO 2 /CH 4 , and N 2 /CH 4 mixtures. Some MOF membrane materials even exceeded Robson's upper limit. For example, Wang [24] developed the ZIF-62MOF glass membrane with separation coefficients of 50.7, 34.5, and 36.6 for H 2 /CH 4 , CO 2 /N 2 , and CO 2 /CH 4 mixtures, respectively.
To date, although a large number of MOFs have been synthesized and reported, the number is almost infinite due to the large number of possible metal ions and organic linkers. Consequently, it is inefficient to filter MOFs from a large database for a specific application for which the high-throughput computational screening (HTCS) method based on molecular simulations provides a suitable alternative. Numerous studies [25][26][27][28] have shown that the method of discovering high-performance target materials and mining the quantitative structure-property relationships (QSPRs) from the large number of MOFs provide is effective for selecting superior MOFs on a large scale. However, many inefficient computations are performed during high-throughput computing, resulting in a considerable waste of computational resources and valuable research time. Recently, the emergence of machine learning (ML) has made up for this shortcoming. ML has been gradually applied in many fields, such as material discovery, structure analysis, property prediction, and reverse design, and has shown high potential in materials research [6,[29][30][31][32]. Li et al. [33] summarized the latest advances in the use of MOFs for gas storage from the three aspects of H 2 , CH 4, and C 2 H 2 . In addition to gas storage, it highlights some of the significant advances made by MOF materials in the separation of important gases in recent years. According to Shi et al.'s work, ML models were trained to be two to three orders of magnitude faster than HTCS. The combination of ML and simulation techniques is an effective way to find the optimal MOF quickly and with the greatest probability [6]. For example, Yan et al. [34] screened the dynamic adsorption of O 2 and N 2 in 6013 computationready experimental MOFs (CoRE-MOFs) by using ML and large-scale calculations. They also systematically analyzed the influences of all the metals in the periodic table on the performances of the materials and proposed six rational design criteria for adsorbent materials, successfully designing a series of materials with superior performances. Recently, Jiang and co-workers [35] deployed a hierarchical approach using molecular simulations and machine learning to rapidly screen 100,000+ MOFs that were synthesized experimentally for C 3 separation, and they used trained ML models for rapid screening of other MOF datasets (e.g., experimental Cambridge Structural Database (CSD) MOFs and hypothetical MOFs). For the CSD MOFs, the out-of-sample predictions closely coincided with the simulation results, which indicated good transferability of the ML model from the CoRE-MOFs to CSD MOFs. Furthermore, nine CSD MOFs that exhibited better separation performances than the best performing CoRE-MOFs were revealed. Rosen et al. [36] trained an ML model on a quantum database (QMOF) with 14,000 MOFs to discover MOFs with structural properties of the target electrons in a rapid manner. Through reconstruction of the typical track of failed experiments, Moosavi et al. [37] reported an ML approach to obtaining chemical intuition and successfully searched for the optimal synthesis conditions, which yielded the highest surface area of HKUST-1 reported to date. He also illustrated the importance of quantifying this intuition for the synthesis of new materials. Azar [38] predicted the H 2 permeability and H 2 /N 2 selectivity from 3765 different types of MOFMs, and the results demonstrated that MOFMs had a high H 2 permeability, 2.5 × 10 3 to 1.7 × 10 6 barrer. Qiao et al. [26] identified five optimized MOFs at 298 K and 10 bar by simulating the adsorption, diffusion, and permeation of CO 2 /N 2 /CH 4 mixtures in 24 samples using a computational study of the high-throughput screening of 137,953 MOFs. Qiao also selected 4764 CoRE-MOFs for membrane separation of ternary gas mixtures (CO 2 /N 2 /CH 4 ) at 298 K and 10 bar and finally identified seven PLDs of the best MOFs with 2.91-3.26 Å and pore size distribution (PSD% (2.4 to 3.5 Å) ) of 48.2-64.1% [39]. Recently, Bai et al. [40] targeted computational screening 6013 MOFMs for H 2 separation and explored the relationship between material characteristics and properties from the perspective of ML. The main difference with Bai's research is the separation of trace amounts of CH 4 (or H 2 ) and ternary systems in this work. All these again confirmed that the synergistic use of ML and HTCS is an effective way to achieve faster, better predictions and probability maximization of high-performance MOFs or MOFMs.
Gas separation using MOFs is generally divided into two categories: equilibriumbased gas separations and kinetic-based gas separations [41]. In equilibrium-based gas separations where MOFs are used as adsorbents, its selectivity of equilibrium-based separations was controlled by the affinity of MOFs for adsorption of one gas relative to another. In kinetic-based separations, the selectivity is controlled by the combination of adsorp-tion and diffusion, being determined by the different transport rates of the gas species through the membrane pores, and MOFs are used as membranes [42]. Admittedly, the use of membrane separation for processing trace gases requires a larger driving force and increases cost; however, MOFMs have certain properties and their applicability to specific sites that are superior to MOF adsorbents and other types of membrane materials. Firstly, the present simulations were performed under atmospheric pressure and temperature conditions, which are applicable to general situations. Secondly, MOF adsorbents are less stable in aqueous or high heat environments, while membranes have stronger stability, e.g., ZIF-90 membranes show good steadiness in the presence of steam [43]. Therefore, membrane materials may be more advantageous when operating in liquid or high temperature environments. Gokay Avci et al. [44] also proposed that the separation of H 2 is more advantageous in membrane applications than CO 2 because H 2 has a higher molar fraction and a smaller kinetic diameter, which allows H 2 to penetrate smaller membrane pores more quickly. In addition, MOFMs are better equipped for further modification and refinement of the material, e.g., as polymer-filled particles in polymers to improve polymer separation performance, when making MOF-based MMM is much more economical than making pure MOF membranes, since the former requires only a small amount of MOF [43]. MOFs have good potential in pressure-driven membrane processes, as well as in selective filtration or permeation-driven membrane processes. After comprehensive consideration, in some specific occasions, MOF membranes will have better performance and more economical applications. In summary, it is also necessary to develop materials with good membrane separation performance.
The purpose of this work was to investigate the adsorption, diffusion, and permeation properties of 6013 MOF adsorbents and MOFMs for the capture of clean energy gases (CH 4 and H 2 ) in air using large-scale computational screening and ML. In Section 2, we described atomic models of CoRE-MOFs/MOFMs and gases, simulation methodologies, and machine learning principles. In Section 3, we discussed the relationships between the adsorbent performance metrics and their characteristic descriptors. With the help of ML techniques, we found the best ML model for a particular performance metric aimed at CH 4 /O 2 + N 2 and H 2 /O 2 + N 2 mixtures. Furthermore, this algorithm was applied to quantify the relative importance of the performance metrics for each MOF descriptor. Finally, high-performance MOF materials for different systems were identified, and commonalities between the high-performance MOFs were evaluated to propose design principles.

Model
In this work, we used molecular simulation to screen 6013 computation-ready, experimental MOFs (CORE-MOFs) for their ability to capture energy gases (CH 4 and H 2 ) from air. All crystal structures were taken from the Cambridge Crystallographic Data Centre (CCDC), whose parameters were compiled and validated by Chung et al. [45]. Each MOF is described by five structural descriptors (largest cavity diameter (LCD, Å), void fraction (φ), volumetric surface area (VSA, m 2 /cm 3 ), pore limiting diameter (PLD, Å), density (ρ, kg/m 3 )) and two energy descriptors (heat of adsorption (Q st , kJ/mol) and Henry's constant (K, mol/kg/Pa)). Specifically, N 2 with a diameter of 3.64 (Å) and He with a diameter of 2.58 (Å) were used as probes in RASPA to calculate VSA and φ. Then, LCD and PLD were calculated using the Zeo++ [46] package. K was simulated in RASPA using the NTV-MC Scheme (N is the number of molecules, T is the temperature, V is the volume, and MC denotes Monte Carlo). The structural atoms of MOFs are described by the Lennard-Jones (LJ) potential and electrostatic potential, and all the LJ potential parameters are derived from the universal force field (UFF) [47], which is listed in Table S1. The atomic charges of MOFs can be quickly calculated using the MOF electrostatic-potential-optimized charge equilibration (MEPO-Qeq) method, which helps to accurately screen MOFs to adsorb specific gases. The force field parameters of CH 4 , H 2 , O 2, and N 2 are derived from the TraPPE force field, which is listed in Table S2. A number of studies have verified that the UFF force field, the MEPO-Qeq charge algorithm, and the TraPPE [48] force field can accurately predict the adsorption and diffusion of gases in different MOFs materials [49,50].

Simulation Method
We simulated the adsorption and diffusion characteristics of 6013 CoRE-MOFs for airborne capture of energy gases (CH 4 and H 2 ) using grand canonical MC (GCMC) and Molecular Dynamics (MD) at 1 bar and 298 K. The interactions between MOFs and adsorbent molecules were calculated by the Lorentz-Berthelot rule, and periodic boundaries were imposed in the three-dimensional system to simulate the cell expansion along the threedimensional direction to at least 24 Å. To calculate the LJ interactions, the spherical truncation radius for the long-range correction was set to 12 Å. The electrostatic interactions between the frame and gas molecules and between the gas molecules were calculated using the summation of Ewald [51]. In each MOF, the MC simulation was run for 100,000 cycles. The first 50,000 cycles were used to equilibrate the simulated system, and the second 50,000 cycles were used to calculate the average value of the pendulum amplitude. Each cycle consisted of tests for experiment with n (n: number of adsorbed molecules including translation, rotation, regeneration, and exchange). One GCMC [52] and MD simulation were run independently for each of the 6013 MOFs. In the MD simulations, for each ternary gas mixture system in each MOF the MD duration was 7 ns, and the last 5 ns was used for production. It was found that further increases in the number of cycles and MD duration had little effect on the simulation results. All simulations were run under the RASPA package.

Machine Learning
The rapidly evolving field of ML-assisted materials research and development has made it possible to combine materials databases and machine learning methods to drive materials discovery and design and predict material properties. The machine learning part of this work was performed using a Python-based automated machine learning development tool (tree-based pipeline optimization tool (TPOT)), a fast model selection and tuning method based on genetic algorithms. In addition, two machine learning algorithms (decision tree (DT) and random forest (RF)) from the Scikit-learn package [53] in Python 3.9 were used for each performance metric (adsorption selectivity (S ads ), diffusion selectivity (S diff ), diffusion coefficient (D), permeability (P), and permselectivity (S perm )) for each sorbent (CH 4 and H 2 ). It is worth mentioning that the first three performance indicators correspond to MOF adsorbent applications, and the last two performance indicators correspond to MOF membrane applications. More details of the algorithm are shown in Figures S6-S8. For each ML method, the data set was randomly divided into a training set and a test set, where 80% was used for training the model and the rest for testing. We used k-fold cross-validation (k = 5) for model construction to reduce the effect of data partitioning during learning, and we set a fixed random seed to ensure the reproducibility of the results. Each algorithm was repeated five times in our work, and the final results were averaged for each performance metric prediction. In addition, the accuracy of the models was evaluated using the Pearson correlation coefficient (R), mean absolute error (MAE), and root mean square error (RMSE). More details can be found in the Supporting Materials (SM).

Results and Discussion
In this work, 6013 CoRE-MOFs were used to adsorb and separate mixtures of energy gases (CH 4 or H 2 at 1000 ppm) with air (N 2 :O 2 = 78:21), an initial concentration that mimics the CH 4 concentration of spent coal mine air and the concentration of a small amount of H 2 leaking inside a confined hydrogen fuel cell. This separation aim is the capture of CH 4 or H 2. With this micro level at atmospheric pressure (1 bar), it can also be theoretically shown that it makes sense to the separation of CH 4 or H 2 in air. In order to design superior adsorbents or membranes, first, the relationships between seven structural/energetic descriptors (LCD, φ, VSA, PLD, ρ, Q st , and K) of 6013 CoRE-MOFs and the static adsorption, diffusion performance, and permeation properties (D, S ads , S diff , P, and S perm ) were explored by univariate analysis for two systems (CH 4 /N 2 + O 2 and H 2 /N 2 + O 2 ). The relationships were obtained, and the preliminary influence characteristics affecting the adsorption and diffusion of CH 4 (or H 2 ) in air were obtained. Then, the complex structure-property relationships were further explored by machine learning methods (TPOT, DT, and RF). Finally, the best candidates were identified for different applications, and the corresponding design principles of adsorbent/membrane materials were proposed.

Adsorption and Diffusion
Understanding the correlation between the diffusion coefficients of CH 4 (or H 2 ) for N 2 and O 2 , adsorption selectivity, and diffusion selectivity as well as the structure/energy descriptors of the MOFs in terms of both adsorption and diffusion separation properties can help to uncover materials with potential for specific applications. Figures S1 and S2 show highly similar trends in the structure-property relationships in both systems, so only the CH 4 /N 2 + O 2 hybrid system was elaborated. Figures S1a-d,g clearly show that D, S ads , and S diff first increased and finally leveled off as the five descriptors (LCD, φ, VSA, PLD, and K) increased. When the pore size was small, the pore walls of different framework materials exhibited strong and weak adsorption of gases, and high variability in the selectivity occurred. As the pore size increased, both N 2 and O 2 entered the MOFs in large amounts, and the selectivity decreased dramatically and eventually converged to one for both the CH 4 /N 2 + O 2 and H 2 /N 2 + O 2 systems. The φ, VSA, and PLD showed similar relationships, which was consistent with the previously reported trends [25,54]. It is worth noting that in Figure S1e, three performance metrics, S ads , S diff , and D, all tended to decrease as ρ increased. This was because a higher density of an MOF corresponded to a very dense internal space. As shown in Figure S1f, there were no significant trends of the three performance indicators, S ads , S diff , and D, with Q st , which indicated the insignificant role of Q st .
In addition to adsorption and diffusion selectivity, Henry's constant of CH 4 (or H 2 ) reflects the adsorption performance of CH 4 (or H 2 ) in an infinitely dilute state and helps to explain the trapping performances of the MOFs in air with very low CH 4 (or H 2 ) concentrations. Figure 1 clearly shows the trend of the K values with PLD for the four gases. The trends of K for N 2 and O 2 were almost the same. In general, K decreased with CH 4 > N 2 (≈O 2 ) > H 2 because CH 4 had the highest affinity for most MOFs, while H 2 had the weakest affinity. This is the reason that CH 4 and H 2 could be separated in CH 4 /O 2 /N 2 and H 2 /O 2 /N 2 mixtures, respectively. Then, the quantitative relationship log D = aPLD − b was established based on the diffusion coefficients of CH 4 , H 2 , N 2 , and O 2 at an infinite dilution and with PLD, as shown in Figure 2a,b. For most of the MOFs, D H2 > D O2 > D N2 > D CH4 , and according to Table S3, this was because the gas molecules with smaller kinetic diameter diffused more quickly than their larger counterparts. This conforms to the conclusions reached in previous reports [26,34]. The above univariate analysis could only tentatively determine the relationship between the individual parameters and performance. We further utilized the ML algorithms to systematically and comprehensively analyze the integrated structure-performance relationship for the MOFs.

Permeation
To explore the separation performances of MOF membranes, two commonly used membrane separation performance metrics were further calculated: the permeability P (P = K × D) and the permeation selectivity S perm (S perm = S ads × S diff ). An evaluation method [26,40,55] has been shown to be applicable for calculating MOF membrane performances in multiple studies. Figure 2c,d show the P versus PLD plots of CH 4 , H 2 , N 2 , and O 2 for 6013 MOFs. As shown in Figure 2a,b, the variation of P with PLD was essentially similar to that of D with PLD because P = KD, and K was almost independent of PLD. Therefore, there was also a quantitative relationship between log P and PLD: log P = cPLD -d. Slopes c for CH 4 , N 2 , O 2 , and H 2 were 1.484, 1.134, 0.827, and 0.631, respectively. It is clear that the magnitude of the slope was positively correlated with the molecular dynamic diameters of the four gases. The permeability of CH 4 was most affected by PLD, and the permeability of H 2 was least affected by PLD because CH 4 had the largest molecular diameter of the four gases, while H 2 had the smallest. The effects of other parameters, LCD, φ, VSA, and PLD, on P and S perm are shown in Figures S3 and S4. For the CH 4 /O 2 + N 2 system, the trends of the structure-property relationships were almost the same as those in Figures S1 and S2. In contrast, for H 2 /O 2 + N 2 , Figure S4 shows similar variations of P with the seven structural parameters to those of the CH 4 system. However, S perm showed a completely opposite trend with the structural parameters LCD, φ, VSA, PLD, and K. As shown in Table S3, this was mainly because O 2 and N 2 had higher molecular dynamics diameters than H 2 and smaller kinetic diameter than CH 4 . Small-pore MOFMs had different permeabilities to different size gas molecules, and this difference in permeability eventually led to a trend opposite to that of the permeation selectivity. S perm decreased with an increase in any of the above five MOF descriptors. When PLD was small, the H 2 molecules could pass through the MOFMs, whereas the N 2 and O 2 molecules could hardly pass through. Thus, it possessed a high S perm . When PLD increased slowly, the other two gas molecules could pass through the MOFMs freely; thus, S perm decreased significantly and eventually tended to one. S perm had a weakly decreasing trend with increasing ρ, as shown in Figure S4e. This was because high-density MOFMs had only small free spaces that were permeable to gas molecules, which led to low permeability values. However, some of the discrete points exhibited high P H2 values at >2000 kg/m 3 . These MOFs had relatively large free spaces, and they were composed of very heavy metal atoms (e.g., gold, platinum, and uranium), consistent with the findings of previous studies [39,40]. A variation trend of S perm with Q H2 is not evident in Figure S4f, and most of the MOFMs exhibited moderate Q H2 values (8-12 kJ/mol). The greater the gas adsorption heat of the material was, the stronger the forces between the H 2 gas molecules and the surface of the adsorbent material were, and the better the selectivity was. However, at the same time, energy cost required for desorption was higher, so a good adsorbent material should have a suitable adsorption heat. From the above analysis, it can be seen that it is not possible to obtain MOFMs with excellent performances based on a single descriptor. To understand the complex relationships between multiple descriptors and each performance metric, ML was further used to analyze the constitutive relationships of the MOFMs. conclusions reached in previous reports [26,34]. The above univariate analysis could only tentatively determine the relationship between the individual parameters and performance. We further utilized the ML algorithms to systematically and comprehensively analyze the integrated structure-performance relationship for the MOFs.

Permeation
To explore the separation performances of MOF membranes, two commonly used membrane separation performance metrics were further calculated: the permeability P (P = K × D) and the permeation selectivity Sperm (Sperm = Sads × Sdiff). An evaluation method [26,40,55] has been shown to be applicable for calculating MOF membrane performances in multiple studies. Figure 2c,d show the P versus PLD plots of CH4, H2, N2, and O2 for 6013 MOFs. As shown in Figure 2a,b, the variation of P with PLD was essentially similar forces between the H2 gas molecules and the surface of the adsorbent material were, and the better the selectivity was. However, at the same time, energy cost required for desorption was higher, so a good adsorbent material should have a suitable adsorption heat. From the above analysis, it can be seen that it is not possible to obtain MOFMs with excellent performances based on a single descriptor. To understand the complex relationships between multiple descriptors and each performance metric, ML was further used to analyze the constitutive relationships of the MOFMs.

Machine Learning
Through univariate analysis, the relationships between the MOF structure/energy descriptors and adsorption/membrane separation performance were preliminarily obtained. In order to further understand the deeper relationships between the material performance and structure, especially the comprehensive effects of various MOFs descriptors on the performance and the ranking of the influence effects, we introduce an

Machine Learning
Through univariate analysis, the relationships between the MOF structure/energy descriptors and adsorption/membrane separation performance were preliminarily obtained. In order to further understand the deeper relationships between the material performance and structure, especially the comprehensive effects of various MOFs descriptors on the performance and the ranking of the influence effects, we introduce an automatic machine learning development tool, TPOT. TPOT uses the structural characteristics and target performances of the MOFs as inputs and automatically generates models to achieve regression predictions of the target performances. At the same time, the best code pipeline for different target performances can be derived for further optimization and learning. In addition, two commonly used ML algorithms were employed: DT and RF. DT is a very commonly used ML algorithm. It achieves regression predictions by building a binary tree. The decision tree model is easy to implement and has strong interpretability. The RF algorithm is composed of multiple decision trees, which improves the generalization and fault tolerance of DT.
In order to find a machine learning method suitable for this system and obtain better prediction results, we used the TPOT, DT, and RF algorithms to predict each performance index (D, S ads , S diff , P, and S perm ). All the results are shown in Table 1 and Figure 3. It can be seen from Table 1 that the three algorithms yielded good predictions (R ≥ 0.85) for all the performances of the two systems, except that the prediction effect of H 2 on the diffusion selectivity of N 2 and O 2 was not good. The TPOT and RF models had good prediction abilities for the prediction of the adsorption selectivity S ads and permeability P, and their R values were above 0.97 (see Figure 3). It is worth noting that, to narrow down a poor fitting effect caused by data span, we used logarithms to analyze S ads(H2/O2+N2) , S diff(H2/O2+N2) , P (CH4/O2+N2) , and S perm(CH4/O2+N2) . It can be seen from Figure 3 that the predictions of the RF model for the CH 4 adsorption selectivity and gas permeability were the closest to the simulation values and had the best prediction effect. The R values of S ads(CH4/O2+N2) and P (CH4/O2+N2) on the test set reached 0.96 and 0.99, the MAE values were 0.36 and 0.26, and the RMSE values were 0.64 and 0.36, respectively. In terms of adsorption and separation, the prediction effects of the three ML models on the performance indicators of the two systems were in the order of S ads > S diff > D. However, in terms of membrane performance, for the three ML models, although the R values of S perm and P in the test set were similar, the MAE and RMSE of P were more than 1000 times those of S perm , which was caused by the characteristics of the data.    In conclusion, for the prediction of all the performance indicators, the RF prediction effect was generally better than that of the DT model for different gas mixture systems, which may have been because the RF model has a strong generalization ability. In terms of the adsorption performances of the two systems, the TPOT algorithm was better than the DT algorithm for most of the predictions. This may have been because TPOT could find the best combination of models and parameters by using a genetic algorithm, which led to better prediction results on different data sets. Although the prediction results of RF and TPOT were similar, the R and error (MAE and RMSE) values of most of the performance indicators showed that the prediction accuracy of the RF algorithm was slightly higher than that of TPOT. Therefore, RF was considered to be an optimal, stable, and useful algorithm for the MOF system in this work. In many previous studies, the RF algorithm also showed a good prediction ability for MOF systems [56][57][58].
Based on the discussion above, RF was further used to estimate the influences of the seven descriptors of the MOFs on the performances of the MOF adsorbents and membranes. The results are shown in Figure 4. The relative importance (RI) analysis showed that in the CH 4 /N 2 + O 2 system (Figure 4a), the Henry's constant of S ads(CH4/O2+N2) had the highest importance (about 49%, more than two times that of Q st ), and the RI order was K CH4 > Q st(CH4) > LCD > φ > ρ > PLD ≥ VSA. This is consistent with the conclusion of Cai et al. [59] that the Henry's constant is the key descriptor of the trade-offs of C 1 , C 2 , and C 3 . In addition, the RF model quantitatively showed that PLD was the first important descriptor of the CH 4 diffusion coefficient and diffusion selectivity, which was a similar conclusion to that of Yang et al. [60]. For P CH4 , the RI values of LCD and Q st(CH4) of the MOFMs were large, with values of 33.50% and 28.14%, respectively. At the same time, the RF model analysis showed that K CH4 was the most important descriptor of S perm , and LCD was the second most important. In the H 2 /N 2 + O 2 system (Figure 4b), LCD was the most important influencing factor for both S ads and S diff , with an RI of about 40%; φ was the most important descriptor of P, with a large RI (about 5.6 times that of ρ), and the RI order was φ > ρ > Q st(H2) > PLD > VSA ≥ LCD > K H2 . In addition, φ was an important factor for S perm , and the change of PLD (RI of about 67%) could also affect S perm . These results showed that the RF algorithm could accurately predict the performance indicators of the two systems by using the relevant descriptors of the MOFs and quantitatively determine the importance of the descriptors. This is conducive to screening candidate materials from a large number of MOFs to further guide experiments.
Membranes 2022, 12, x FOR PEER REVIEW 12 of 18 In addition, ϕ was an important factor for Sperm, and the change of PLD (RI of about 67%) could also affect Sperm. These results showed that the RF algorithm could accurately predict the performance indicators of the two systems by using the relevant descriptors of the MOFs and quantitatively determine the importance of the descriptors. This is conducive to screening candidate materials from a large number of MOFs to further guide experiments.

Top-Performing MOFs and MOFMs
In order to select MOFs and MOFMs with excellent performances for different systems, we used the limiting conditions shown in Table S4 to determine the best MOFs of the two systems as well as D, Sads, Sdiff, P, and Sperm of each adsorbate gas molecule in the MOFMs. Five optimal MOFs and five optimal MOFMs were screened for each system, which are listed in Table 2. In Figure 5a,b, the blue dots in the pink shaded part represent the highest-potential MOFs, which showed the best combination of adsorption and diffusion selectivity and had high DCH 4 or DH 2 values simultaneously. However, it was further found that most of the MOFs in Table S4 had large Sdiff(H 2 /(O 2 +N 2 )) values, but only when the diffusion coefficient of H2 was larger, was the MOF more conducive to separation. Among the five materials selected for the CH4/N2 + O2 mixed system, the Sads(CH 4 /O 2 +N 2 ), Sdiff(CH 4 /O 2 +N 2 ), and DCH 4 values of the ITAHEQ MOF were 7.61, 6.79, and 4.84 × 10 −6 cm 2 /s, respectively, which were the largest adsorption and diffusion selectivity values of the materials considered. In addition, Sdiff(CH 4 /O 2 +N 2 ) was generally lower than Sads(CH 4 /O 2 +N 2 ), which also meant that the diffusion property of CH4 was the key performance metric for the MOFs to capture low-concentration CH4 from N2 and O2 during dynamic adsorption and membrane separation. Moreover, some MOFs in Figure  5a,c with very small pores only could adsorb single special gas molecules, leading to high selectivity. However, these MOFs also possess very low loading or permeability, so they are still not good candidates. This was consistent with the previous RI analysis. PLD is the key descriptor to control DCH 4 and Sdiff(CH 4 /O 2 +N 2 ). Likewise, Sumer [61] pointed out that CH4 has a stronger adsorption capacity than N2, and in his work, the better performing MOFs screened showed high adsorption selectivity for CH4 (1.3-9) over weak diffusivity selectivity for N2 (1)(2)(3)(4), which makes them methane-selective membranes. Qiao [26] also showed that diffusion is not only influenced by the size of gas molecules but also by gas-framework interactions, and Sdiff(N2/CH4) is much larger than Sads(N2/CH4). The separation is controlled by diffusion, with methane diffusion being much slower and N2/CH4 separation being driven by diffusion. In the H2/N2 + O2 system,

Top-Performing MOFs and MOFMs
In order to select MOFs and MOFMs with excellent performances for different systems, we used the limiting conditions shown in Table S4 to determine the best MOFs of the two systems as well as D, S ads , S diff , P, and S perm of each adsorbate gas molecule in the MOFMs. Five optimal MOFs and five optimal MOFMs were screened for each system, which are listed in Table 2. In Figure 5a,b, the blue dots in the pink shaded part represent the highestpotential MOFs, which showed the best combination of adsorption and diffusion selectivity and had high D CH4 or D H2 values simultaneously. However, it was further found that most of the MOFs in Table S4 had large S diff(H2/(O2+N2)) values, but only when the diffusion coefficient of H 2 was larger, was the MOF more conducive to separation. Among the five materials selected for the CH 4 /N 2 + O 2 mixed system, the S ads(CH4/O2+N2) , S diff(CH4/O2+N2) , and D CH4 values of the ITAHEQ MOF were 7.61, 6.79, and 4.84 × 10 −6 cm 2 /s, respectively, which were the largest adsorption and diffusion selectivity values of the materials considered. In addition, S diff(CH4/O2+N2) was generally lower than S ads(CH4/O2+N2) , which also meant that the diffusion property of CH 4 was the key performance metric for the MOFs to capture low-concentration CH 4 from N 2 and O 2 during dynamic adsorption and membrane separation. Moreover, some MOFs in Figure 5a,c with very small pores only could adsorb single special gas molecules, leading to high selectivity. However, these MOFs also possess very low loading or permeability, so they are still not good candidates. This was consistent with the previous RI analysis. PLD is the key descriptor to control D CH4 and S diff(CH4/O2+N2) . Likewise, Sumer [61] pointed out that CH 4 has a stronger adsorption capacity than N 2 , and in his work, the better performing MOFs screened showed high adsorption selectivity for CH 4 (1.3-9) over weak diffusivity selectivity for N 2 (1-4), which makes them methane-selective membranes. Qiao [26] also showed that diffusion is not only influenced by the size of gas molecules but also by gas-framework interactions, and S diff (N 2 /CH 4 ) is much larger than S ads (N 2 /CH 4 ). The separation is controlled by diffusion, with methane diffusion being much slower and N 2 /CH 4 separation being driven by diffusion. In the H 2 /N 2 + O 2 system, the analysis of the optimal MOFs showed that the LCD range was very concentrated, ranging from 2.89 to 2.98 Å. This was because the N 2 and O 2 molecules in the LCD range could not enter the channel due to the large molecular dynamic diameters, which significantly improved the H 2 separation performance. At the same time, according to the analysis of the RI of the descriptors calculated by the RF algorithm, LCD is the key descriptor to control H 2 /N 2 + O 2 separation. This showed that LCD with a kinetic diameter close to the H 2 diameter was the key for capturing H 2 molecules from the air and achieving a good separation effect, which provides effective theoretical guidance for the design and synthesis of new MOFs. The optimal MOFMs of the two systems are shown with blue dots in Figure 5c,d, with the highest P and S perm values. These optimal MOFMs often have narrow channel structures, especially in H 2 /N 2 + O 2 . LCD was mainly concentrated in the range of 2.74-2.83 Å, and PLD was in the range of 2.44-2.52 Å. For the optimal MOFMs for CH 4 separation, the PLD also ranged from 2.85-3.49 Å, but the values were higher than those for the H 2 separation membranes because the size of the CH 4 molecule is larger than that of H 2 . Thus, CH 4 requires larger pores to pass through the membrane. Since VSA was calculated by GCMC using N 2 gas molecules as probes, when the pore diameter was smaller than the N 2 molecular diameter, VSA was displayed as 0. Therefore, the VSA of almost all the optimal MOFMs was 0. These MOFMs showed excellent separation performances, and their P H2 values were between 6000 and 10,000 barrer. S perm was also above 10, which is suitable for H 2 membrane separation applications.
Membranes 2022, 12, x FOR PEER REVIEW 13 of 18 the analysis of the optimal MOFs showed that the LCD range was very concentrated, ranging from 2.89 to 2.98 Å. This was because the N2 and O2 molecules in the LCD range could not enter the channel due to the large molecular dynamic diameters, which significantly improved the H2 separation performance. At the same time, according to the analysis of the RI of the descriptors calculated by the RF algorithm, LCD is the key descriptor to control H2/N2 + O2 separation. This showed that LCD with a kinetic diameter close to the H2 diameter was the key for capturing H2 molecules from the air and achieving a good separation effect, which provides effective theoretical guidance for the design and synthesis of new MOFs. The optimal MOFMs of the two systems are shown with blue dots in Figure 5c,d, with the highest P and Sperm values. These optimal MOFMs often have narrow channel structures, especially in H2/N2 + O2. LCD was mainly concentrated in the range of 2.74-2.83 Å, and PLD was in the range of 2.44-2.52 Å. For the optimal MOFMs for CH4 separation, the PLD also ranged from 2.85-3.49 Å, but the values were higher than those for the H2 separation membranes because the size of the CH4 molecule is larger than that of H2. Thus, CH4 requires larger pores to pass through the membrane. Since VSA was calculated by GCMC using N2 gas molecules as probes, when the pore diameter was smaller than the N2 molecular diameter, VSA was displayed as 0. Therefore, the VSA of almost all the optimal MOFMs was 0. These MOFMs showed excellent separation performances, and their PH 2 values were between 6000 and 10,000 barrer. Sperm was also above 10, which is suitable for H2 membrane separation applications.

Design Strategies of MOFs and MOFMs with High Performances
In order to further guide the experiment and design and synthesize new MOF adsorbents or membranes with excellent performance, this section discusses a series of design strategies for a series of new hypothetical MOF materials based on ml and HTCS predictions. Firstly, 10 pairs of materials (20 pairs in total) are selected for each system. Each pair contains one MOF with excellent performance and one MOF with poor performance, but only one MOF composition has changed between them, such as topology, connector, or metal center, as shown in Table S5. Based on this, we propose three design strategies to facilitate the capture of CH 4 or H 2 from the air, as shown in Figure 6. Three pairs of representative design strategies are selected for each system, similar to the previous research work [34,62]. Among them, Figure 6a-c show three strategies for the CH 4 /O 2 + N 2 system, and Figure 6d-f show three strategies for H 2 /O 2 + N 2 system. Figure 6a,d display that changes in organic linkers result in different separation effects. Combined with RI, K CH4, and LCD, their adsorption selectivity and permeability were greatly affected. In Figure 6a for MOFMs, different organic connectors change the size of the channel, making the channel more compact. The LCD is reduced from 6.906 Å to 5.769 Å. Both LCD and PLD are closer to the dynamic diameter (3.8 Å) of CH 4 , thus enhancing the separation effect of CH 4 . For MOF adsorbent, the Henry's constant of MOF with excellent performance is relatively large (1.36 × 10 −5 > 8.07 × 10 −6 ), plus with above influence of LCD, CH 4 can be adsorbed preferentially. Figure 6b,e have the same topology and organic links but different metal centers. In Figure 6b, MOF with good performance takes V as the metal center. Since the atomic radius is similar to that of Cr, but the metal activity of V is stronger than that of Cr, it has better adsorption, separation, or permeability. Figure 6e depicts that the atomic radius of metal Zn is smaller than that of Cd. Therefore, the regulation of the metal center affects the pore size of MOF and then affects the properties. S ads , S diff , and S perm all increased by 60%~300%. Therefore, finding suitable metal nodes for a certain system can be considered an effective strategy to improve the separation performance. Figure 6c,f show the same metal centers and organic linkers, but different topologies lead to different separation effects. Among them, Figure 6c shows two different topological structures (mmm and rtl) assembled by the Cd-based metal center and 1,3,5-benzenetricarboxylic acid. Because the pore channels formed by different topological networks have different pore diameters and shapes, the diffusion coefficient is increased by 79%, and the gas permselectivity is increased by 120%. Figure 6f shows two different topologies (hcb and mog) assembled by the Ag-based metal center and 1,3,5,7-tetrazatricyclo[3.3.1. 13,7]decane. The pore shape is more complex and has better performance of H 2 separation. Therefore, for different systems, better MOF adsorbents or membranes can be obtained or designed by adjusting more appropriate topological structures, connectors, or metal centers. effects. Among them, Figure 6c shows two different topological structures (mmm and rtl) assembled by the Cd-based metal center and 1,3,5-benzenetricarboxylic acid. Because the pore channels formed by different topological networks have different pore diameters and shapes, the diffusion coefficient is increased by 79%, and the gas permselectivity is increased by 120%. Figure 6f shows two different topologies (hcb and mog) assembled by the Ag-based metal center and 1,3,5,7-tetrazatricyclo[3.3.1. 13,7]decane. The pore shape is more complex and has better performance of H2 separation. Therefore, for different systems, better MOF adsorbents or membranes can be obtained or designed by adjusting more appropriate topological structures, connectors, or metal centers.

Conclusions
In this work, the adsorption separation performances of 6013 CoRE-MOF adsorbents and membranes for CH4/N2 + O2 and H2/N2 + O2 mixtures were simulated using large-scale computational screening and ML. First, the close relationship between D(P) and PLD in both systems was initially examined by univariate analysis, and the quantitative relationships log D = aPLD − b and log P = cPLD − d were established. Then, three ML methods, TOPT, DT, and RF, were used to predict the performance indices of each system. The analysis of the results showed that the model prediction accuracies were ranked as RF ≥ TPOT > DT. Moreover, the feature importance was determined by the RF algorithm, and the RI ranking for the adsorption selectivity prediction of CH4 was KCH 4 > Qst(CH 4 ) > LCD > ϕ > ρ > PLD ≥ VSA. Nevertheless, the descriptor that had the greatest impact on DCH 4 and Sdiff(CH 4 /O 2 +N 2 ) was PLD. LCD and Qst(CH 4 ) were the two important descriptors of P and Sperm. In the H2/N2 + O2 system, LCD had the largest RI for the performance of the H2 adsorbent, while for P, the RI ranking was ϕ > ρ > Qst(H 2 ) > PLD > VSA ≥ LCD > KH 2 . Furthermore, five optimal MOFs and five optimal MOFMs were screened for each of the two systems. The analysis of the optimal MOFs also further

Conclusions
In this work, the adsorption separation performances of 6013 CoRE-MOF adsorbents and membranes for CH 4 /N 2 + O 2 and H 2 /N 2 + O 2 mixtures were simulated using large-scale computational screening and ML. First, the close relationship between D(P) and PLD in both systems was initially examined by univariate analysis, and the quantitative relationships log D = aPLD − b and log P = cPLD − d were established. Then, three ML methods, TOPT, DT, and RF, were used to predict the performance indices of each system. The analysis of the results showed that the model prediction accuracies were ranked as RF ≥ TPOT > DT. Moreover, the feature importance was determined by the RF algorithm, and the RI ranking for the adsorption selectivity prediction of CH 4 was K CH4 > Q st(CH4) > LCD > φ > ρ > PLD ≥ VSA. Nevertheless, the descriptor that had the greatest impact on D CH4 and S diff(CH4/O2+N2) was PLD. LCD and Q st(CH4) were the two important descriptors of P and S perm . In the H 2 /N 2 + O 2 system, LCD had the largest RI for the performance of the H 2 adsorbent, while for P, the RI ranking was φ > ρ > Q st(H2) > PLD > VSA ≥ LCD > K H2 . Furthermore, five optimal MOFs and five optimal MOFMs were screened for each of the two systems. The analysis of the optimal MOFs also further verified that the diffusivity of CH 4 is a key indicator to test in dynamic adsorption or membrane separation processes, and an LCD close to the kinetic diameter of H 2 is an overriding condition to achieve the separation of H 2 from air. Finally, three types of design strategies, tuned topology, alternative metal nodes, and organic linkers, were proposed to effectively facilitate the capture of low-concentration CH 4 or H 2 from the air for specific applications. This study may provide an effective guide for experimental researchers to find MOFs/MOFMs for capturing energy gases (CH 4 and H 2 ) from air and can lead to new research ideas for applications involving coal mine spent air and hydrogen leakage.
Supplementary Materials: The supplementary material can be downloaded at: https://www.mdpi. com/article/10.3390/membranes12090830/s1. Figure S1: The relationship between the descriptors of MOFs with adsorbents performance in CH 4 /N 2 + O 2 ; Figure S2: The relationship between the descriptors of MOFs with adsorbents performance in H 2 /N 2 + O 2 ; Figure S3: The relationship between the descriptors of MOFMs with adsorbents performance in CH 4 /N 2 + O 2 ; Figure S4: The relationship between the descriptors of MOFMs with adsorbents performance in H 2 /N 2 + O 2 ; Figure S5: Diffusion coefficient D and Permeability P versus PLD for N 2 and O 2 in 6013 CORE-MOFs; Figure S6: Tree-based pipeline optimization tool; Figure S7: Decision tree; Figure S8: Random forest; Figure S9: k-fold cross validation; Table S1: Lennard-Jones parameters of MOFs; Table S2: Lennard-Jones parameters and charges of adsorbates;