Optimization of Proton Exchange Membrane Electrolyzer Cell Design Using Machine Learning

: We propose efﬁcient multiple machine learning (ML) models using speciﬁcally polynomial and logistic regression ML methods to predict the optimal design of proton exchange membrane (PEM) electrolyzer cells. The models predict eleven different parameters of the cell components for four different input parameters such as hydrogen production rate, cathode area, anode area, and the type of cell design (e.g., single or bipolar). The models ﬁt well as we trained multiple machine learning models on 148 samples and validated the model performance on a test set of 16 samples. The average accuracy of the classiﬁcation model and the mean absolute error is 83.6% and 6.825, respectively, which indicates that the proposed technique performs very well. We also measured the hydrogen production rate using a custom-made PEM electrolyzer cell fabricated based on the predicted parameters and compared it to the simulation result. Both results are in excellent agreement and within a negligible experimental uncertainty (i.e., a mean absolute error of 0.615). Finally, optimal PEM electrolyzer cells for commercial-scaled hydrogen production rates ranging from 500 to 5000 mL/min were designed using the machine learning models. To the best of our knowledge, we are the ﬁrst group to model the PEM design problem with such large parameter predictions using machine learning with those speciﬁc input parameters. This study opens the route for providing a form of technology that can greatly save the cost and time required to develop water electrolyzer cells for future hydrogen production.


Introduction
Nowadays, numerous researchers and industry experts pay much attention to hydrogen production through water electrolysis.It is the most environmentally friendly among the many hydrogen production methods as hydrogen can be produced with zero carbon emissions.Water electrolysis occurs in an electrolyzer cell consisting of the following two compartments: an anode and a cathode that are separated by a diaphragm (membrane).Hydrogen and oxygen are produced at the cathode and anode sides.The electrolyzer cells Energies 2022, 15, 6657 2 of 15 are categorized according to the different types of electrolyte substances such as alkaline, proton exchange membrane (PEM), and solid oxide (SO).The alkaline water electrolysis process (AWE) is well established for hydrogen production, and it is considered the most commercially used technology globally [1][2][3][4].The AWE cell consists of two electrodes separated by a diaphragm immersed in a liquid alkaline solution of sodium (NaOH) or potassium hydroxide (KOH) as the electrolyte.The AWE has many advantages including a low manufacturing cost and relatively high lifetime; however, it also has drawbacks such as the low purity of the produced hydrogen (purity <99.8), the complexity, the immense size of the electrolyzer, and rapid corrosion to the electrode materials.Solid oxide electrolyzers (SOE) utilize water as vapor at a high temperature range of 500-700 • C and use an oxygen ion conductor (ceramic proton conducting materials) as the electrolyte [5][6][7].SO electrolyzers are still in the research stage due to the high cost of implementation and the particular working environment required, such as steam power plants.PEM electrolyzers are easy to implement and produce high-purity hydrogen, unlike the AWE and SOE [8,9].
The PEM electrolyzer cell consists of two electrodes (cathode and anode) separated with a solid polymer electrolyte called a proton exchange membrane.In PEM electrolysis, water enters the anode, where it is split into oxygen (O 2 ), protons (H + ), and electrons (e − ).The protons (H + ) pass through the electrolyte membrane to the cathode side, and two protons react with two moles of electrons and become hydrogen, as shown in Figure 1.The half-reaction of the cell for each electrode (the anode and cathode reactions are shown in Equations ( 1) and ( 2), respectively), and the overall cell reaction are shown in Equation (3): Cathode: Overall: Energies 2022, 15, x FOR PEER REVIEW 2 of 17 electrolysis process (AWE) is well established for hydrogen production, and it is considered the most commercially used technology globally [1][2][3][4].The AWE cell consists of two electrodes separated by a diaphragm immersed in a liquid alkaline solution of sodium (NaOH) or potassium hydroxide (KOH) as the electrolyte.The AWE has many advantages including a low manufacturing cost and relatively high lifetime; however, it also has drawbacks such as the low purity of the produced hydrogen (purity <99.8), the complexity, the immense size of the electrolyzer, and rapid corrosion to the electrode materials.Solid oxide electrolyzers (SOE) utilize water as vapor at a high temperature range of 500-700 °C and use an oxygen ion conductor (ceramic proton conducting materials) as the electrolyte [5][6][7].SO electrolyzers are still in the research stage due to the high cost of implementation and the particular working environment required, such as steam power plants.PEM electrolyzers are easy to implement and produce high-purity hydrogen, unlike the AWE and SOE [8,9].The PEM electrolyzer cell consists of two electrodes (cathode and anode) separated with a solid polymer electrolyte called a proton exchange membrane.In PEM electrolysis, water enters the anode, where it is split into oxygen (O2), protons (H + ), and electrons (e − ).The protons (H + ) pass through the electrolyte membrane to the cathode side, and two protons react with two moles of electrons and become hydrogen, as shown in Figure 1.The half-reaction of the cell for each electrode (the anode and cathode reactions are shown in Equations ( 1) and ( 2), respectively), and the overall cell reaction are shown in Equation (3): (1) Overall: H2O (l) → H2 (g) + Although the working principle seems simple, the optimal design of the PEM electrolyzer cell requires the maximization of the hydrogen production rate of the cell.Many such as Iridium (Ir) and Platinum (Pt), which would raise the capital cost of the electrolyzer.To address these issues, novel materials have been investigated with polarization curves under various operating conditions, such as a varied electrolyte flow rate, pH, and operating temperatures.These materials include porous transport layers, various support materials, different types of ion-exchange membranes, and a variety of cell configurations.Additionally, catalysts with a high activity of hydrogen evolution or oxygen evolution reactions have been studied at the cathode and anode electrodes for long-term high-performance electrolyzer operation using non-noble metals or low noble metals [10][11][12][13][14][15].Recent research has also focused on enhancing the electrochemical reaction of PEM electrolysis cells.A uniform coating of catalyst particles over an electrode surface was achieved by a research group [16], and they also explored the optimal operating conditions to improve the performance of PEM water electrolysis cells [17]. Lee et al. [18] created micro-dimples on the surface of the gas diffusion electrode (GDE) using laserplasma to increase the open area of the original pores.Voronova et al. provided a logical framework for building a uniform accelerated stress test (AST) protocol applicable to PEM water electrolysis systems driven by renewable energy and ensuing material-development tactics to reduce degradation processes [19].Menesy et al. obtained ideal PEM fuel cell parameter values for farmland fertility, making the suggested model fit well with the observed PEM data [20].
Optimization through experimental methods is demanding in terms of cost and time.With this fact in mind, we propose a solution that can help to rapidly and easily reach the optimal design of the PEM electrolyzer cell using machine learning (ML) methods with existing experimental data collected from previous research.The data were used as training data for the proposed model.Water electrolysis is a relatively well-established method for hydrogen production; however, few research papers have presented modelling-specific parameters in the process of electrolyzer cell design using machine learning approaches.Bahr et al. proposed a machine learning method based on artificial neural networks to estimate the aging of electrolyzer cell stacks [21].They showed the degradation effect of the performance of the electrolyzer cell stack on the current, the temperature, and the operation time.They modelled the relation between input parameters (stack current, stack temperature, and operation time) and the output parameter voltage difference to the starting point of the respective operation setup.Curtains et al. simulated the electrolysis process in wastewater treatment using an artificial neural network [22].They employed single and stacked artificial neural networks to predict the efficiency of the electrolysis process after removing the chlorophyll from the wastewater in the wastewater treatment plant.They also modelled the relationship between the wastewater components (e.g., Total Suspended Solids (TSS), chlorophyll, and Chemical Oxygen Demand (COD)) and the cell input parameters, specifically electrical power, temperature, time, electrode distance, electrode type, and initial value for the considered output.Wang et al. proposed a machine learning method to predict the hydrogen and oxygen evolution reactions (HER and OER) using non-precious-metal catalysts [23].There are no or few previous examples of efforts to employ ML methods for the optimal design of the PEM electrolyzer cell.Tapan et al. developed ML models based on collected data from past research to optimize the design and operation of direct alcohol fuel cells [24].Satjaritanun et al. created an ML model to study the oxygen-preferred paths across porous transport layers of polymer electrolytewater electrolyzers at various water flow rates and current densities [25].Hossain et al. suggested an ML technique based on Support Vector Machine (SVM) regression, Regression Trees, and Gaussian Process Regression (GPR) to estimate the effect of palladium-supported carbon nanotubes utilized in formic acid electro-oxidation for DBFC [26].
In this study, we evaluated the reliability of ML for a specific purpose.Our main contribution is modelling the PEM electrolyzer cell design problem using a multi-ML model while considering a number of design parameters that were not considered in other related research.Currently, studies related to PEM design using ML [21][22][23][24][25][26] are limited to modeling only one or two parameters, unlike our study, which aimed to model all possible cell parameters we could find in recent PEM design research.
Polynomial regression is the extended version of the linear regression model, which assumes a linear relationship between the input and the output of the model.The relationship can be stated as where X and Y are the input and the output parameters, respectively.W 1 and W 0 are the learnable parameters representing the slope of the line and the intersection between the line and the vertical axis, respectively.The linear equation can be solved with the least-squares method, which attempts to find the best slope of the line that fits the input data to the output data using the mean absolute error as a loss function.Nevertheless, the relation between the input and output parameters is not linear.It is much more complex, representing a curve with different degrees depending on the nature of the inputs and outputs.The polynomial regression [39] can be expressed as where . .} are learnable parameters of the model used to train the mapping from the input parameters to the output parameters, N is the polynomial degree, i is an iterator over the polynomial degree N, N is the number of the learnable parameters.
The effect of the polynomial degree on the learned curve parameters is shown in Figure 2.
For the case of multiple input parameters and single output parameters, the model learns the weights that map multiple inputs to a single output.In this case, the multiple input polynomial regression function can be given by where X j represents the input parameters and j is an iterator over the number of the input parameters.
Energies 2022, 15, x FOR PEER REVIEW 5 of 17 Logistic regression is a popular linear model in machine learning usually used to solve a multiclass classification problem.The model, in our case, is required to predict the category of some materials such as cathodic and anodic catalysts from known fixed catalyst types that are available in the dataset.The logistic regression [40] model uses a logistic We used an adaptive degree for the polynomial regression training in which predefined constraints were applied to the output to select the best polynomial degree.
Logistic regression is a popular linear model in machine learning usually used to solve a multiclass classification problem.The model, in our case, is required to predict the category of some materials such as cathodic and anodic catalysts from known fixed catalyst types that are available in the dataset.The logistic regression [40] model uses a logistic function (namely sigmoid) to model the probability of each class of the output classes.The sigmoid function can be stated as where e is the natural logarithmic function.The output of this function is the probability of each class given the values of the inputs.The predicted output probabilities must sum up to 1 since the output probabilities follow a probability distribution.An optimization method called maximum likelihood estimation (MLE) is used to estimate the model parameters.
MLE can be expressed mathematically as where x i and y i are the input and output vectors.An example of the logistic regression curve based on Equations ( 7) and ( 8) is shown in Figure 3.For the case of multiple input parameters, the logistic regression function can be given by where x 1 , x 2 , x 3 . . .are the input parameters.

Dataset Collection
Since the training process of a machine learning model requires sufficient experimental data, we strictly collected actual design data from previous research papers depending on the availability of all 15 design parameters.The dataset information is provided in the Supplementary Materials.
Lee et al. [27] made a dynamic simulation of PEM electrolysis that shows the effects of different temperatures and flow rates on the experimental output for the efficient generation of hydrogen and oxygen.Navarro-Solís et al. [28] carried out different PEM electrolyzer cell experiments to demonstrate hydrogen production with PEM electrolysis assisted by effluent treatment in an industrial wastewater anolyte, which is an economical

Dataset Collection
Since the training process of a machine learning model requires sufficient experimental data, we strictly collected actual design data from previous research papers depending on the availability of all 15 design parameters.The dataset information is provided in the Supplementary Materials.
Lee et al. [27] made a dynamic simulation of PEM electrolysis that shows the effects of different temperatures and flow rates on the experimental output for the efficient generation of hydrogen and oxygen.Navarro-Solís et al. [28] carried out different PEM electrolyzer cell experiments to demonstrate hydrogen production with PEM electrolysis assisted by effluent treatment in an industrial wastewater anolyte, which is an economical method with which to produce hydrogen from industrial wastewater.The experiments showed the effects of using different electrode materials (e.g., stainless steel and carbon plate), treatment solutions for the electrolyte, and applied electric voltages on the hydrogen production rate.Sarno et al. [29] reported a new nanocatalyst-a combination of MoS 2 nanosheets and RuS 2 nanoparticles (NPs)-that produced a high rate of hydrogen during PEM water electrolysis.Gibson et al. [30] designed a photovoltaic (PV)-electrolysis system to optimize the efficiency of the system by obtaining the maximum output power of the photovoltaic used in PEM electrolysis.The authors proved that the efficient water electrolysis process increases the efficiency of hydrogen generation.Caravaca et al. [31] studied the influence of the system parameters of a PEM water electrolyzer cell with a bimetallic Pt-Ru carbon-based anode and a Pt carbon-based cathode on hydrogen production.Siracusano et al. [32] performed a study on the electrochemical properties of the PEM cell stack for water electrolysis, and they carried out experiments on a 9-cell PEM stack water electrolyzer at different temperatures and different electric voltages, which affect the hydrogen production rate.Valverde et al. [33] developed an electrochemical sub-model depending on the properties of the membrane and electrocatalyst.Selamet et al. [34] optimized a highly efficient 10-cell stack of a PEM water electrolyzer to produce a considerable amount of hydrogen.They carried out experiments to investigate the effect of the different temperatures and water flow rates on the PEM cell performance.Garcia et al. [35] designed a single-cell PEM electrolyzer for hydrogen production to show the effect of different membrane materials, catalyst loading, and clamping pressures on cell performance.Naga et al. [36] evaluated the performance of 5 wt% and 10 wt% Palladium (Pd) on activated carbon as cathodic catalysts in a 10 cm 2 PEM water electrolyzer single cell at a different operating temperature which affected the yield of hydrogen.Tebibel and Medjebour [37] conducted three different experiments with various electrolytes using water, methanol, and hybrid sulfur, and a grid-connected PV system powered those PEM cells.Based on the results, the methanol and hybrid sulfur electrolytes produced 65% and 100% more hydrogen than water as an electrolyte.Siracusano et al. [38] optimized the electrochemical performance of three bipolar cells of PEM electrolyzer (100 cm 2 ) using Ti-based backing layer current collectors with different thicknesses to reduce the ohmic contact resistance between the bipolar plates and to obtain the efficient performance of the PEM electrolyzer.
Those papers provide around 164 experimental designs at a 1 atm operating pressure for the training and testing of our multi-model based on a machine learning approach (148 samples for training and 16 samples for testing).Table 1 shows the details of the categories and the ranges of the different parameter values of the collected dataset.

Machine Learning Models Training and Validation
In this study, the multiple machine learning models for predicting the optimal design of a PEM electrolyzer cell have four input (hydrogen production rate, cathode area, anode area, and the type of cell design (single or bipolar)) parameters and another eleven output parameters (anode type, cathode type, membrane type, cathode catalyst, anode catalyst, anolyte, catholyte, power, water flow rate, cell temperature, and a number of cells).The models employ the following two machine learning methods: polynomial regression (PR) and logistic regression (LR).The LR is a statistical method used for classification (the prediction is a category).It was used to predict the seven following output parameters: anode type, cathode type, membrane type, cathodic catalyst, anodic catalyst, anolyte, and catholyte.PR was used to predict the following four outputs of numerical values (regression): power, water flow rate, cell temperature, and number of cells.We employed adaptive degree prediction to optimize the PR model by iteratively changing the polynomial degree to obtain the best prediction bounded by predefined constraints such as power prediction >0.01, a water flow rate prediction of 0-750, temperature of 298-359, and the number of cells was ≥1.The constraints are predefined based on the ranges of those parameters in the dataset to avoid out-of-range predictions.LR models are trained using different iterations depending on the best fit of the data to the curve.Since we have four input parameters and one output parameter in each model, the problem dimension is 5D which cannot be plotted or visualized.There is no way to represent the learned relationship between the inputs and the output.The learned numerical values of the weights of the LR models are reported in Appendix A. Although the output parameters are modelled using separate ML models for each parameter, the learned ML models are indirectly related since the models are trained on the same input parameters, and the models already learned to attend to the cross-relation between the input parameters and the corresponding output parameters, which can be proven mathematically as shown in Appendix B.
The models were trained on 148 experimental samples until the best fitting for the PR and the LR models to the output parameters was obtained.We validated the model's performance on the 16 samples of the test set.We also carried out four experiments in different input and output parameters to validate the PEM cell-design ML models.The multi-models have four inputs (hydrogen production rate (mL/min), cathode area (mm 2 ), anode area (mm 2 ), and the type of cell design).The hydrogen production rate varied at 2.5, 5, 7.5, and 10 mL/min, while other parameters such as anode and cathode areas were fixed at 900 and 900 mm 2 , respectively.In addition, the single-cell design was used to evaluate the parameters.After that, the models predicted the optimum design parameters of the PEM cell (anode type, cathode type, membrane type, cathode catalyst, anode catalyst, anolyte, catholyte, power (W), water flow rate (mL/min), cell Temperature (K), and number of cells).Table 2 lists the input and prediction parameters (materials and values) of the PEM electrolyzer cell design for the four experiments.The ML model suggests the same predicted parameters except for catalysts, operating power, temperature, and water flow rate for all experiments.The result indicates that those parameters significantly affect the hydrogen production rate in the range where the hydrogen production rate is relatively low from 2.5 to 10 mL/min.However, one should remain hesitant when coming to the conclusion that the effects of other parameters on the rate of hydrogen production are negligibly small.It is possible the amount of data based on the results of the 148 experiments used in ML model training was insufficient.In common knowledge, it is well known that using acidic or basic solutions as an anolyte and catholyte is more advantageous for hydrogen production through electrolysis.We collected all available data and trained the ML model to anticipate the predicted parameters based on that data.More custom experiments are ongoing and will be added to the dataset in future studies.The learned weights by the logistic regression models using Equation ( 9) are shown in Appendix A. The number of learned weights is the number of input parameters (Ni) which is four multiplied by the number of categories (Nc) of this parameter (Ni × Nc).The weights of the polynomial regression cannot be shown as we applied an online training method that uses adaptive degree learning during inference in which the degree of the polynomial is changed based on the input parameter values.Hence, the polynomial weights are dynamic and variable in any inference execution.

Preparation of the Catalysts
Most of the research papers did not provide the details of the catalyst preparation; hence, we used a standard method of catalyst preparation for the four experiments in this study.The catalyst ink (RuO 2 , 10% Pd, and 20% Pt) was made by sonicating a mixture of 30 mg of a catalyst particle (~1 µm) in a 3.6 mL isopropyl alcohol solution (50 wt%) with a 30 µL Nafion solution (DuPont (DuPont de Nemours, Wilmington, DE, USA), 1100 EW 5 wt%).The catalyst ink was then sprayed over a nine cm 2 carbon gas diffusion layer (GDE, Toray (Tokyo, Japan), TGP-H-060).Catalysts were uniformly dispersed on carbon paper after the isopropanol thoroughly evaporated from the surface.The process was repeated until a fine coating of the catalysts was developed.

Experimental Method
We designed (9 cm 2 ) a single cell at a 1 atm operating pressure using a carbon plate as the electrode material (cathode and anode) of the cell and Nafion115 as a membrane.The Pt/C (20% Pt) on the carbon electrode was used as the cathode catalyst for experiments 1 and 2, and the Pd/C (10% Pd) on carbon was used as the cathode catalyst for experiments 3 and 4. The Ruthenium dioxide (RuO 2 ) was used as the anode catalyst for all experiments (1, 2, 3, and 4). Figure 4 shows the experimental setup used in this study.A regulated (DC) Energies 2022, 15, 6657 9 of 15 power supply was used to control and set the power.We set the cell voltage first in the regulated power supply, and the power supply automatically set the cell current while we changed the voltage until we obtained the required power.The anolyte and the catholyte were deionized water in each experiment.They were supplied to each side of the cell using a rotary pump (Fisher Scientific (Hampton, NH, USA), Variable-Flow Peristaltic Pumps 13-876-2).The temperature of electrolytes was set at 60 • C with a Heater (Mtops (Seoul, Korea), Hot plate-Magnetic stirrer heater MS300HS).

Performance of the Machine Learning Models
We trained our models on 90% of the collected dataset (148 samples) and validated them on 10% of the dataset (16 samples).The best metric for the validation of the classification (LR) machine learning models (anode type, cathode type, cathode catalyst, anode catalyst, catholyte, anolyte membrane type) is the scoring accuracy, as shown in Equation (10).
The selected metric for the validation of the regression (PR) machine learning models (power, water flow rate, number of cells, temperature) is the mean absolute error (MAE) as shown in Equation (11): where Y is the ground truth value,  is the predicted value, N is the number of samples (dataset size), and i is an iterator over the dataset samples.Figure 5a shows the training and validation accuracy for each classification model, while Figure 5b shows the training and validation MAE for each regression model.The mean test accuracy of the classification models is 83.6% which is fairly good since the model tries to select the best category for each parameter instead of the exact category in the dataset corresponding to the input parameters.Additionally, the mean test MAE for the regression models is extremely small (MAE = 6.825), which reflects the good performance of the proposed method in modeling the PEM design parameters.The performances of the classification models are similar since all of them have an accuracy of around the 83.6%.For the regression models, the MAE values are highly dependent on the range of the predicted parameter itself, e.g., the test MAE for the temperature, power, and water flow rate PR models are relatively high compared to the number of cells in the model since the ranges of the temperature, power, and water flow in the dataset are 298~359 K, 0~1300 W, and 5~750 mL/min, respectively.However, the range of the number of cells in the PR model is 1~20, which is small compared to the other PR models.Overall, the classification and the regression models are good enough to design a reliable PEMWE cell with very close design requirements defined by the user.

Performance of the Machine Learning Models
We trained our models on 90% of the collected dataset (148 samples) and validated them on 10% of the dataset (16 samples).The best metric for the validation of the classification (LR) machine learning models (anode type, cathode type, cathode catalyst, anode catalyst, catholyte, anolyte membrane type) is the scoring accuracy, as shown in Equation (10).
The selected metric for the validation of the regression (PR) machine learning models (power, water flow rate, number of cells, temperature) is the mean absolute error (MAE) as shown in Equation (11): where Y is the ground truth value, Ŷ is the predicted value, N is the number of samples (dataset size), and i is an iterator over the dataset samples.Figure 5a shows the training and validation accuracy for each classification model, while Figure 5b shows the training and validation MAE for each regression model.The mean test accuracy of the classification models is 83.6% which is fairly good since the model tries to select the best category for each parameter instead of the exact category in the dataset corresponding to the input parameters.Additionally, the mean test MAE for the regression models is extremely small (MAE = 6.825), which reflects the good performance of the proposed method in modeling the PEM design parameters.The performances of the classification models are similar since all of them have an accuracy of around the 83.6%.For the regression models, the MAE values are highly dependent on the range of the predicted parameter itself, e.g., the test MAE for the temperature, power, and water flow rate PR models are relatively high compared to the number of cells in the model since the ranges of the temperature, power, and water flow in the dataset are 298~359 K, 0~1300 W, and 5~750 mL/min, respectively.However, the range of the number of cells in the PR model is 1~20, which is small compared to the other PR models.Overall, the classification and the regression models are good enough to design a reliable PEMWE cell with very close design requirements defined by the user.
MAE values are highly dependent on the range of the predicted parameter itself, e.g., the test MAE for the temperature, power, and water flow rate PR models are relatively high compared to the number of cells in the model since the ranges of the temperature, power, and water flow in the dataset are 298~359 K, 0~1300 W, and 5~750 mL/min, respectively.However, the range of the number of cells in the PR model is 1~20, which is small compared to the other PR models.Overall, the classification and the regression models are good enough to design a reliable PEMWE cell with very close design requirements defined by the user.For further validation of the proposed models, we performed laboratory experiments (specifically four experiments) to ensure that the proposed models can simulate real world conditions, as described in Section 6.2.

Model Validation Using Laboratory Experiments
Table 2 shows the optimal design and operating conditions of PEM electrolyzer cells for a particular hydrogen production rate.During the PEM water electrolysis experiments, the rate of hydrogen produced from the cathode side was measured with PEM electrolyzer cells prepared based on parameters tabulated in Table 2 and compared to the model input value (hydrogen production).For experiments 1, 2, 3, and 4, each experiment was replicated five times, and the average hydrogen production for the replications is calculated and summarized in Table 3.The average hydrogen production rate for experiments 1, 2, 3, and 4 was 3.2, 5.8, 7.0, and 9.6 mL/min, respectively.The measured hydrogen production rate from the experiments is in excellent agreement with the required input hydrogen production (2.5, 5.0, 7.5, and 10 mL/min, respectively) in the ML models.Figure 6 demonstrates that the hydrogen production absolute error values for the four experiments are 0.7, 0.8, 0.50, and 0.40 mL/min with a mean absolute error of 0.615.Those error values come derive from some unmentioned design parameters in most research papers used in the dataset that did not record all conditions such as catalyst loading, electrode path pattern, and unclear input power setting during the experiment.However, the error values are very small, which means the proposed method simulates the real PEM cell performance well.and 10 mL/min, respectively) in the ML models.Figure 6 demonstrates that the hydrogen production absolute error values for the four experiments are 0.7, 0.8, 0.50, and 0.40 mL/min with a mean absolute error of 0.615.Those error values come derive from some unmentioned design parameters in most research papers used in the dataset that did not record all conditions such as catalyst loading, electrode path pattern, and unclear input power setting during the experiment.However, the error values are very small, which means the proposed method simulates the real PEM cell performance well.Our model predicts the same electrodes type (carbon plate), membrane type (nafion115), catholyte (pure water), anolyte (pure water), anode catalyst (RuO 2 ), and temperature for each experiment due to the simple change in the input parameters (hydrogen production rate) of 2.5, 5, 7.5, and 10 mL/min for the experiments 1, 2, 3, and 4, respectively.However, the model predicts the different cathodic catalysts for experiments 1 and 2 and experiments 3 and 4. As the model tries to find similar input conditions in the training data, the model predicts Pt/C (20% Pt) as the cathodic catalyst for low hydrogen production rates with an input of 2.5 and 5 mL/min in experiments 1 and 2. The model recommends Pd/C (10% Pd) as a cathodic catalyst for higher hydrogen production rates of 7.5 and 10 mL/min in experiments 3 and 4. As expected, the power and water flow rate prediction increases with the required hydrogen production rate increase.

ML Model Performance for a Commercial-Scaled PEM Water Electrolyzer Cell Design
After validating the reliability of the ML model, the model was used to predict design parameters for commercial-scale PEM water electrolyzer cells since our dataset contains commercial-size designs.For example, we used the model to design a commercial-scale (e.g., electrodes of 100 cm 2 for each cell) PEM water electrolyzer bipolar cells stack for different hydrogen production rates of 500, 1000, 3000, and 4500 mL/min.Our model predicted the design parameters for each case as summarized in Table 4.The model selected fewer cells but better electrochemical characteristics for electrode type (anode and cathode) and cathode catalyst for the high hydrogen production rates such as 3000 and 4500 mL/min.The anode and cathode types are selected to be titanium due to the superior electrochemical characteristics of titanium electrodes.The cathodic catalyst is selected to be Pt/C (30% Pt) which is much better than Pt/C (10% Pt).The predicted power, water flow rate, and temperature increase with the required hydrogen production rate.The class category of "Ruthenium (RuO 2 ) or Iridium oxide (IrO 2 )" in the anode catalyst categories for 500 and 1000 mL/min hydrogen production rates, provides the flexibility to choose one of them.IrO 2 (~$570/g and high durability for OER) is much more expensive than RuO 2 (~$78/g and low durability for OER); therefore, manufacturers can select the cheaper one for lower hydrogen production rates in the suggested commercial-sized PEM cell designs by taking the durability for OER into consideration.

Conclusions
We developed ML models that could predict eleven parameters using four inputs to optimize the design of PEM electrolyzer cells.With 148 data samples, we trained the machine learning models and evaluated their performance on a test set of 16 samples.The training and validation values matched well, indicating that the models fit well.The average accuracy of the classification models is 83.6%, which is an optimal range, and the mean absolute error for regression models is 6.825, suggesting that the recommended approach performs quite well.The reliability of the models was also verified by comparing the target hydrogen production rate to the measured results with the electrolyzer cells of four different designs.The laboratory experimental results agreed well with the model values within a low mean absolute error of approximately 0.615.In addition, the validation of our proposed models showed the high performance in classification and regression of the PEM cell design, which proves the excellent fitting of the proposed models.We also showed that the models could design commercial-sized PEM electrolyzer cells suitable for high hydrogen production rates ranging from 500 to 5000 mL/min.The proposed work has obvious limitations, such as the limited number of material categories used in the input parameters (i.e., cathode, anode, catholyte, anolyte, cathode, and anode catalyst types) and the numerical values' limited working range (i.e., temperature, power, electrode area, water flow rate, and number of cells).The experimental data in the dataset have restrictions due to the nature of the data.To develop a more universal machine learning-based design for the PEM water electrolysis, we will focus on expanding the current dataset to encompass more materials and a wider range of numerical parameters in future work.

Figure 1 .
Figure 1.Schematic of the PEM electrolyzer cell.Although the working principle seems simple, the optimal design of the PEM electrolyzer cell requires the maximization of the hydrogen production rate of the cell.Many parameters must be taken into account for optimization, and the electrochemical reactions are affected by many unknown variables.Many researchers work on the development of electrolyzers to enhance electrolyzer performance; investigate the corrosion of electrodes at high anodic potentials and a low pH; and analyze the use of expensive transition metals

Figure 2 .
Figure 2. Effect of changing the degree of the polynomial regression on the learning curve.The black squares represent the data samples.

Figure 2 .
Figure 2. Effect of changing the degree of the polynomial regression on the learning curve.The black squares represent the data samples.

Figure 3 .
Figure 3.An example of a learned logistic regression curve.

Figure 3 .
Figure 3.An example of a learned logistic regression curve.
score = Number o f truely predicted samples Total number o f samples

Figure 5 .
Figure 5. Training and Testing Accuracy/MAE for each ML model.(a) Training/Testing Accuracy for each classification model (anode type, cathode type, membrane type, cathode catalyst, anode catalyst, catholyte, and anolyte); (b) Training/Testing MAE for each regression model (power, temperature, water flow rate, and number of cells).

Figure 6 .
Figure 6.Hydrogen production rate for experiment and ML model.

Table 1 .
Categories and Values range of the dataset.

Table 2 .
Input and predicted PEM cell design by our model.

Table 3 .
Hydrogen flow rate (mL/min) measurement for each experiment.

Table 4 .
Optimal predicted parameters for suggested commercial-sized PEM cell designs.