Using Self-Organizing Maps to Elucidate Patterns among Variables in Simulated Syngas Combustion

: This study focused on demonstrating the use of a self-organizing map (SOM) algorithm to elucidate patterns among variables in simulated syngas combustion. The work was implemented in two stages: (1) modelling and simulation of syngas combustion under various feed composition and reactor temperature implemented in AspenPlus TM chemical process simulation software, and (2) pattern recognition among variables using SOM algorithm implemented in MATLAB. The varied levels of feed syngas composition and reactor temperature was randomly sampled from uniform distributions using the Morris screening technique creating four thousand eight hundred simulation conditions implemented in the process simulation which consequently produced a multivariate dataset used in the SOM analysis. Results show that cylindrical SOM topology models the dataset at lower quantization error and topographic error as compared to the rectangular SOM topology indicating suitability of the former for variables pattern elucidation for the simulated combustion. Nonetheless, the variables pattern between component planes from rectangular SOM (9 × 28 grid) and those from cylindrical SOM (9 × 28 grid) are almost similar, indicating that either rectangular or cylindrical architectures may be used for variables pattern analysis. The component planes of process variables from trained SOM are a convenient visualization of the trends across all process variables. visualization, D.L.B.F., M.C., A.D., S.K., M.L. and A.F.; supervision, D.L.B.F.; funding acquisition, D.L.B.F., W.S., E.R., R.H. and M.Z. authors have


Introduction
Like many traditional and emerging technologies on energy and materials use, syngas production and combustion has been among those actively evaluated for cleaner implementation in pilot scale and industrial scale [1]. Syngas (short for 'synthesis gas') is typically produced through gasification of many resources such as natural gas, coal and biomass by reaction with steam or oxygen [2]. Syngas have been studied for their chemistry and physics resulting in some established literature on reaction mechanisms, reaction kinetic parameters, etc., of which some, in the form of simulations of system performance, have been used to give insight into the possible design and operation considerations [3,4]. Clean Technol. 2020, 2

157
Chemical process simulators (such as AspenPlus, CHEMCAD, and COMSOL Multiphysics) have been instrumental not just in training but also in the practice of chemical process design and operation [5]. Most of these simulators have extensive property databanks and they allow rigorous material and energy balance calculations for modeling complex process plants [6]. These features make process simulators key tools in performing system-oriented analysis, which is the root of the modern process design paradigm [7]. This work aims to demonstrate that systems analysis with process simulators may be enhanced through the use of machine learning algorithms that extract patterns within multivariate datasets of simulations. Specifically, this work implements self-organizing map (SOM) techniques to elucidate patterns among variables in a simulated syngas combustion modeled in AspenPlus™.
SOMs, which are also called Kohonen networks due to the original proponent [8][9][10], are derivatives of artificial neural network (ANN) algorithms with applications in data visualization, feature extraction, clustering, and supervised classification [11,12]. These applications leverage on SOMs' efficiency in discovering patterns in high-dimensional multivariate data through projections onto low dimensions such as 2D hence their designation as 'maps' [12,13]. In essence, an SOM defines an 'elastic net' of neuron nodes that are fitted to the input data space to approximate its density function in a topologically ordered mapping [8]. The process by which such mappings are formed is the SOM algorithm schematically depicted in Figure 1. So far, SOM is one of two ANNs executing competitive learning, a form of unsupervised learning which allows for specialization of each neuron in the network that ultimately becomes a feature-discovering tool for high-dimensional data [14]. The unsupervised learning nature of SOM also means that it does not require model calibration towards a target response such as those in regression.
Clean Technol. 2020, 2 FOR PEER REVIEW 2 considerations [3,4]. Chemical process simulators (such as AspenPlus, CHEMCAD, and COMSOL Multiphysics) have been instrumental not just in training but also in the practice of chemical process design and operation [5]. Most of these simulators have extensive property databanks and they allow rigorous material and energy balance calculations for modeling complex process plants [6]. These features make process simulators key tools in performing system-oriented analysis, which is the root of the modern process design paradigm [7]. This work aims to demonstrate that systems analysis with process simulators may be enhanced through the use of machine learning algorithms that extract patterns within multivariate datasets of simulations. Specifically, this work implements selforganizing map (SOM) techniques to elucidate patterns among variables in a simulated syngas combustion modeled in AspenPlus™. SOMs, which are also called Kohonen networks due to the original proponent [8][9][10], are derivatives of artificial neural network (ANN) algorithms with applications in data visualization, feature extraction, clustering, and supervised classification [11,12]. These applications leverage on SOMs' efficiency in discovering patterns in high-dimensional multivariate data through projections onto low dimensions such as 2D hence their designation as 'maps' [12,13]. In essence, an SOM defines an 'elastic net' of neuron nodes that are fitted to the input data space to approximate its density function in a topologically ordered mapping [8]. The process by which such mappings are formed is the SOM algorithm schematically depicted in Figure 1. So far, SOM is one of two ANNs executing competitive learning, a form of unsupervised learning which allows for specialization of each neuron in the network that ultimately becomes a feature-discovering tool for high-dimensional data [14]. The unsupervised learning nature of SOM also means that it does not require model calibration towards a target response such as those in regression. Schematic of basic sequential learning SOM applied to a dataset : (a) two-dimensional SOM modelling a n-dimensional ( , … , ) input data vector onto a hexagonal lattice sheet map of neurons with associated weight vectors such as in neuron including the best-matching unit (BMU) with associated weight vector ; (b) is projected to all the weight-initialized neurons in the grid to determine the BMU, in this case neuron , to initiate the learning process; and (c) the weight vectors of the BMU and the neighboring neurons are updated to move the active neurons in the BMU neighborhood closer to in a recursive manner. SOM trains until neuron weight vectors asymptotically converge on all samples in . Schematic of basic sequential learning SOM applied to a dataset X: (a) two-dimensional SOM modelling a n-dimensional (ξ 1 , . . . , ξ n ) input data vector x j onto a hexagonal lattice sheet map of neurons with associated weight vectors such as m i in neuron i including the best-matching unit (BMU) c with associated weight vector m c ; (b) x j is projected to all the weight-initialized neurons in the grid to determine the BMU, in this case neuron c, to initiate the learning process; and (c) the weight vectors m i of the BMU and the neighboring neurons are updated to move the active neurons in the BMU neighborhood closer to x j in a recursive manner. SOM trains until neuron weight vectors asymptotically converge on all samples in X.
The following are some of the areas of application for SOM: minmax objective for multirobotmultigoal path planning [15]; graph-mining for economic measures [16]; social interactions [17]; fault diagnosis [8,18]; various clustering tasks [8,19,20]; various classification tasks [8,21]; speech recognition [8]; image simplification of atmospheric data [22] for cloud detection and land cover mapping [23]. The potential of SOM as a tool in chemical process systems has been claimed by Simula and Kangas [24] around two decades ago, and they based their idea on the topology-preserving property of SOM. Liukkonen, Hiltunen [25] used SOM to diagnose the various states of a utility-scale fluidized bed combustion process. Muñoz, Martín-Torre [26] used SOM to extract patterns and cluster dataset on the element release from sediments in contact with acidified seawater conducted via laboratory leaching tests. Ganhadeiro, Christo [27] used SOM to analyze electric energy distribution systems. Li, Du [28] used SOM as a process monitoring computational tool on the Tennessee Eastman chemical process. However, a survey of the current literature indicates that, presently, there is no work on the use of SOM as a data analytics tool in chemical process simulators. Hence, this work aims to demonstrate the potential SOM to elucidate patterns in multivariate datasets of chemical process simulators to aide in the study and clean implementation of alternative technologies such syngas.

Methodology
The work was implemented in two stages ( Figure 2): (1) modelling and simulation of syngas combustion under various feed composition and reactor temperature implemented in AspenPlus TM chemical process simulation software, and (2) pattern recognition among variables using SOM algorithm implemented in MATLAB. For definitions of symbols, refer to the Nomenclature. Clean Technol. 2020, 2 FOR PEER REVIEW 3 The following are some of the areas of application for SOM: minmax objective for multirobotmultigoal path planning [15]; graph-mining for economic measures [16]; social interactions [17]; fault diagnosis [8,18]; various clustering tasks [8,19,20]; various classification tasks [8,21]; speech recognition [8]; image simplification of atmospheric data [22] for cloud detection and land cover mapping [23]. The potential of SOM as a tool in chemical process systems has been claimed by Simula and Kangas [24] around two decades ago, and they based their idea on the topology-preserving property of SOM. Liukkonen, Hiltunen [25] used SOM to diagnose the various states of a utility-scale fluidized bed combustion process. Muñoz, Martín-Torre [26] used SOM to extract patterns and cluster dataset on the element release from sediments in contact with acidified seawater conducted via laboratory leaching tests. Ganhadeiro, Christo [27] used SOM to analyze electric energy distribution systems. Li, Du [28] used SOM as a process monitoring computational tool on the Tennessee Eastman chemical process. However, a survey of the current literature indicates that, presently, there is no work on the use of SOM as a data analytics tool in chemical process simulators. Hence, this work aims to demonstrate the potential SOM to elucidate patterns in multivariate datasets of chemical process simulators to aide in the study and clean implementation of alternative technologies such syngas.

Methodology
The work was implemented in two stages (  is the set of all simulation variables values ( ∈ 1: ) collected from each simulation run ( , ∈ 1: ) and used as input dataset into SOM.

Process Kinetics Simulation
The AspenPlus chemical process kinetics simulation model ( Figure 3) consisted of a mixing unit and a combustion reactor modeled as a plug-flow reactor (PFR) under adiabatic condition, which is a common model for syngas combustion [3]. The PFR was sized according to previous works [29] with 0.381 m internal diameter and 5.90 m total length. The thermodynamic properties were calculated through the Soave-Redlich-Kwong (SRK) equation of state as implemented by AspenPlus V10.

Process Kinetics Simulation
The AspenPlus chemical process kinetics simulation model ( Figure 3) consisted of a mixing unit and a combustion reactor modeled as a plug-flow reactor (PFR) under adiabatic condition, which is a common model for syngas combustion [3]. The PFR was sized according to previous works [29] with 0.381 m internal diameter and 5.90 m total length. The thermodynamic properties were calculated through the Soave-Redlich-Kwong (SRK) equation of state as implemented by AspenPlus V10. The reaction kinetics were modeled in power-law form and the reaction constant, kb, was modeled in Arrhenius law form to account for the effect of reactor temperature on the reaction constant. These reaction mechanism and parameters used in the PFR (Table 1) were based on the works of Khan, DeJong [30] and J. De Kam, V. Morey [3]. The feed gas component mole fraction , of the syngas, and the reactor temperature, hence the exhaust temperature, were varied according to the Morris sampling technique [31,32]. The Morris sampling technique, which is typically used in global sensitivity analysis, randomly samples 1 combinations of the variables with corresponding sets of lower bound and upper bound levels (see Table 2), where is the number of increments within the sampling bounds for each variable. In this work, the variables are the six feed gas components and the reactor temperature, which makes = 7 . Based on previous works implementing the Morris sampling technique, a conservative level of was set = 200. To ensure the combinations of variable levels randomly covered a wide region of the design space, the Morris sampling was performed using 3 random seed generators. Hence, the total number of simulation runs was 3 × 200 7 1 = 4800. The Morris random sampling was implemented in R-statistical software using the 'sensitivity' R-package. See the Supplementary Information section for the summary of 4800 simulation conditions, and the accompanying R-script used in implementing Morris sampling on the variables. Setting the PFR at a constant temperature allowed for the determination of heat duty supplied by (−) or to (+) the reactor. Table 2 Table 2 were based on the work of McDonell [33] on syngas from coal, and of Martínez, Mahkamov [4] on syngas from biomass. The feed syngas enters the PFR at 486 K and 1 atm at a steady rate total molar flow rate of 5.58 kmol/hr based on previous works [29]. Pure oxygen was assumed to be mixed with the syngas going to the PFR and was supplied at 5% excess, based on Martínez, Mahkamov [4], calculated for each reactant composition. Variables: , ,  The reaction kinetics were modeled in power-law form and the reaction constant, k b , was modeled in Arrhenius law form to account for the effect of reactor temperature on the reaction constant. These reaction mechanism and parameters used in the PFR (Table 1) were based on the works of Khan, DeJong [30] and J. De Kam, V. Morey [3]. The feed gas component mole fraction y 0,a of the syngas, and the reactor temperature, hence the exhaust temperature, were varied according to the Morris sampling technique [31,32]. The Morris sampling technique, which is typically used in global sensitivity analysis, randomly samples [r(k + 1)] combinations of the k variables with corresponding sets of lower bound and upper bound levels (see Table 2), where r is the number of increments within the sampling bounds for each variable. In this work, the variables are the six feed gas components and the reactor temperature, which makes k = 7. Based on previous works implementing the Morris sampling technique, a conservative level of r was set r = 200. To ensure the combinations of variable levels randomly covered a wide region of the design space, the Morris sampling was performed using 3 random seed generators. Hence, the total number of simulation runs was 3 × [200(7 + 1)] = 4800. The Morris random sampling was implemented in R-statistical software using the 'sensitivity' R-package. See the Supplementary Information for the summary of 4800 simulation conditions, and the accompanying R-script used in implementing Morris sampling on the variables. Table 1. Syngas combustion key reaction steps, rate law expression, and reaction constant parameter as function of temperature T implemented in the PFR.

Reaction
Rate Expression Reaction Constant Parameter Setting the PFR at a constant temperature allowed for the determination of heat duty supplied by (−) or to (+) the reactor. Table 2 Table 2 were based on the work of McDonell [33] on syngas from coal, and of Martínez, Mahkamov [4] on syngas from biomass. The feed syngas enters the PFR at 486 K and 1 atm at a steady rate total molar flow rate of 5.58 kmol/hr based on previous works [29]. Pure oxygen was assumed to be mixed with the syngas going to the PFR and was supplied at 5% excess, based on Martínez, Mahkamov [4], calculated for each reactant composition.

Variables Pattern Recognition via SOM
The SOM computational works in the study were implemented through MATLAB (by MathWorks ® ) with program codes built on functions from the public-domain add-in SOM Toolbox version 2.0 (available for download at http://www.cis.hut.fi/somtoolbox/) developed and maintained by the Laboratory of Computer and Information Science at Helsinki University of Technology in Finland, where Kohonen [8] and associates conducted many of the works on SOM. This toolbox has been designed to accompany the SOM literature material by Kohonen [8], and it has been used and tested in several research works. In addition to the script-embedded documentation on the use of the SOM Toolbox, literature explaining some of the main features of code-functions and usage with the Toolbox has been published previously [34,35]. Following are the key components of the SOM implementation in the software. The program scripts used in the study are summarized in the Supplementary Material.

Data Pre-and Post-Processing
The simulation variables used for SOM training are the following (16 variables): heat duty (Q); mole fractions of syngas components (y 0,a 's): H 2 , CO, CH 4 , CO 2 , N 2 , H 2 O; feed O2 (at 5% excess); reactor temperature (T reactor ), total molar flow of outlet stream; mole fractions of outlet stream components (y a 's): CO, CH 4 , CO 2 , N 2 , H 2 O, and O 2 . These variables are typical analysis variables in chemical reactors, specifically in syngas combustion. The datasets from the AspenPlus simulation runs were summarized in spreadsheets. Then in MATLAB, the variables were scaled through normalization to treat the variables that were equally important, i.e., eliminate dominance of high-value variables in the mapping [35]. For post-processing after SOM training, the values (as weight vectors at the end of SOM training) were de-normalized to original scale to aid the meaningful interpretation of results [34].

Lattice Structure, Map Shape, and Size
The lattice structure may be rectangular or hexagonal. Hexagonal (see Figure 1) was preferred because it does not favor horizontal and vertical directions as much as the rectangular array [8]. The global map shape may be a 2D sheet, a cylinder, or a toroid [34]. Though the projection 2D sheet map has been found to be sufficiently effective in capturing patterns from the original high-dimensional dataset [10], this study evaluated both the 2D sheet and cylinder projections. The sheet map must be elongated, i.e., length > width, to attain a stable orientation as the learning process proceeds [8].
The total number of neurons should be at most 5 √ n × S, where n is the number of variables used for SOM training (i.e., ξ 1 , . . . , ξ n ) and S is the number of samples [35,36] or simulation runs in this work (4800 simulation runs).

SOM Initialization and Training
The SOM Toolbox uses Euclidean metric for vectorial distance calculations: c = argmin i x j − m i . The c-index neuron is the BMU with weight vector m c , which will be the first neuron to 'learn'. This step implements the competitive learning nature of SOM. Initialization of weight vectors was linear and training was batch learning. Initialization and training were done in two stages: (1) rough-large radius of neighborhood kernel and fast learning rate, and (2) fine-tuning-small radius of neighborhood kernel and slow learning rate. These were the default parameters in the Toolbox, and the default models were Gaussian model for the neighborhood function h ci (t), and reciprocally decreasing learning rate α(t) (see Supplementary Material for model forms) [34,35].
As an unsupervised learning algorithm, a specific SOM is not amenable to evaluation of the 'best fit' of a particular SOM architecture to the dataset. However, two parameters are typically used to measure the performance of SOM mapping on data-(1) quantization error (Qe), and (2) topographic error (Te) [35]. Qe is the average distance between each data vector and its BMU and it measures map resolution. Te is the proportion of all data vectors for which first and second BMUs are not adjacent units and it measures topology preservation. Pölzlbauer [37] found that these two measures of quality of SOM were found to be inversely proportional when map sizes are large. To establish a measure of map quality, each Qe and Te were normalized within their respective range of values and the normalized Qe and Te (nQe and nTe) were added, i.e., sum(nQe,nTe). Hence, the map sizes that minimize sum(nQe,nTe) would indicate a good compromise of the quantization and topographic errors of an SOM model.

Results
The collected data from process simulations are summarized as Supplementary Information (see the Supplementary Information). Of interest for discussion are the outputs of SOM trained on the process simulation dataset.

SOM Architecture
Many possible combinations of size and shape may be specified to create an SOM architecture to map a dataset. When the formula for the number of neurons (size) 5 √ n × S is used with n = 16 and S = 4800, the size of the SOM should be 1386 neurons. With this SOM size, the resulting sets of U-matrix produced random patterns instead of the hypothesized organized patterns on the maps (see Supplementary Material showing 47 × 28 rectangular SOM grid). Based on the findings of Kohonen [10], it is not possible to determine the exact size of the SOM beforehand, and that it must be determined by trial-and-error. Hence, using the U-matrix as indicator for patterns in the SOM, a trial-and-error analysis of the number of neurons was done and it was found that organized patterns emerge if the n is reduced down to the number of variables originally varied through Morris sampling, i.e., n = 7, and if the formula for size is changed to 2 √ n × S, which is consistent with the proposed mathematical form to estimate SOM size [38]. Consequently, a working maximum neuron size of 2 √ 7 × 4800 = 367 was used (a graphical comparison of U-matrix between a 9 × 28 rectangular grid and a 47 × 28 rectangular grid is shown in the electronic Supplementary Material).
The sum of the normalized Qe and normalized Te is summarized in Figure 4 for the rectangular SOM (Figures 4 and 5) and in Figure 6 for the cylindrical SOM (Figures 6 and 7). The sum of errors (Figures 4 and 6) have been compared with one of the response variables-heat duty-to examine the trends of mapping performance with varying SOM size and shape. Frequency distribution plots of the 16 variables are summarized in the supporting information (see Supplementary Material). The range of Heat Duty values in this study is from negative to positive (minimum: −17,497 cal/sec to maximum: 42,498 cal/sec; median: 14,690 cal/sec; mean: 15,138 cal/sec). The lowest nominal value of the head duty at a neuron center was noted for each SOM architecture used for mapping. The closer the heat duty nominal value at a neuron center to the minimum value in the data set (−17,497 cal/sec) would indicate a good mapping of the data. The lower bound heat duty observed on a neuron for each architecture was summarized in Figures 5 and 7. In the chemical system, negative heat duty indicates an exothermic process, while positive heat duty indicates an endothermic process.
Comparing Figures 4 and 5 for the rectangular SOM, the size and shape of SOM that minimizes the sum of errors (8 × 9 and 8 × 10 grids) (Figure 4) does not overlay well with the size and shape of SOM that renders heat duty value closest to the lower bound value (example: 8 × 22 to 8 × 29 grids, or 10 × 22 to 10 × 28 grids) ( Figure 5). For the cylindrical SOM, on the other hand, the sizes and shapes of SOM that minimizes the sum of errors ( Figure 6) overlay well with sizes and shapes that render heat duty value closest to the lower bound value (example: 9 × 28 grid, or 10 × 28 grid) (Figure 7).

Syngas Combustion Variables Pattern
From the evaluation of the SOM architecture's effect on the mapping performance of the simulated syngas combustion dataset (Section 3.1), one SOM architecture in the rectangular (9 × 28 hexagonal) grid and one architecture in the cylindrical (9 × 28 hexagonal) grid were used to render the patterns in the simulated variables. Figures 8 and 9 summarize the variables pattern from the trained rectangular SOM and cylindrical SOM, respectively. One of the advantages of using SOM for pattern extraction in multivariate data becomes apparent in these surface renderings. As graphically illustrated in Figure 1, each SOM neuron after training contains the quantized values of all the attributes (variables). When the quantized values are rendered as surface maps with each attribute having its own map (also called component plane), the maps of all the attributes may then be compared for pattern analysis. The black dots in the surface maps in Figures 8 and 9 indicate the neurons of the SOM. A neuron (black dot) on a specific row and column index in one component plane is the same neuron in another component plane of the same row and column index. In other words, component plane rendering is like 'slicing' a trained SOM into its various variables. This then may make the comparison of variables pattern a visual technique, which may be more advantageous over numerical measures such as variables correlation [12].

Syngas Combustion Variables Pattern
From the evaluation of the SOM architecture's effect on the mapping performance of the simulated syngas combustion dataset (Section 3.1), one SOM architecture in the rectangular (9 × 28 hexagonal) grid and one architecture in the cylindrical (9 × 28 hexagonal) grid were used to render the patterns in the simulated variables. Figures 8 and 9 summarize the variables pattern from the trained rectangular SOM and cylindrical SOM, respectively. One of the advantages of using SOM for pattern extraction in multivariate data becomes apparent in these surface renderings. As graphically illustrated in Figure 1, each SOM neuron after training contains the quantized values of all the attributes (variables). When the quantized values are rendered as surface maps with each attribute having its own map (also called component plane), the maps of all the attributes may then be compared for pattern analysis. The black dots in the surface maps in Figures 8 and 9 indicate the neurons of the SOM. A neuron (black dot) on a specific row and column index in one component plane is the same neuron in another component plane of the same row and column index. In other words, component plane rendering is like 'slicing' a trained SOM into its various variables. This then may make the comparison of variables pattern a visual technique, which may be more advantageous over numerical measures such as variables correlation [12].  relationship with the heat duty (Figures 8a and 9a). The mole fraction of CO in the outflow stream (Figures 8k and 9k) does not seem to vary so much from the mole fraction of CO in the feed syngas (Figures 8c and 9c). This may be due to the high activation energy for the decomposition of CO to CO2 (reaction 2 in Table 1), which requires higher reaction temperatures to increase the reaction constant. The H2 gas in syngas supplied in all the runs was completely consumed; hence, no component plane for H2 was rendered.  In general, the variable patterns via component planes from the rectangular SOM ( Figure 8) are almost of the same variables pattern from the cylindrical SOM ( Figure 9). This shows that either rectangular SOM (9 × 28 grid) or cylindrical SOM (9 × 28 grid) may be used to analyze the relationships among variables. A variable pattern of interest for clean technologies is the use of syngas for exothermic process (negative heat duty) that produces lesser harmful emissions [1]. In both rectangular and cylindrical SOM results, the heat duty (Figures 8a and 9a) approaches negative values (exothermic) when the mole fraction of H 2 in the syngas (Figures 8b and 9b) increases while at the same time, the mole fraction of CO and CH 4 decreases (Figures 8c,d and 9c,d). Interestingly, the reactor temperature (Figures 8i and 9i) does not seem to show any definite inverse or direct relationship with the heat duty (Figures 8a and 9a). The mole fraction of CO in the outflow stream (Figures 8k and 9k) does not seem to vary so much from the mole fraction of CO in the feed syngas (Figures 8c and 9c). This may be due to the high activation energy for the decomposition of CO to CO 2 (reaction 2 in Table 1), which requires higher reaction temperatures to increase the reaction constant. The H 2 gas in syngas supplied in all the runs was completely consumed; hence, no component plane for H 2 was rendered.

Discussion
SOM has been finding applications in areas beyond the original problems it was intended for in the 1970s and 1980s-speech recognition with the inherent tasks of clustering, visualization, and abstraction [8]. The unprecedented efforts in SOM research and applications in recent decades may indicate the growing realization of the potential of SOM as a computational tool in broader spectra. Nonetheless, current literature does not show applications of SOM in the area of computer-based simulations of chemical systems.
Simulation-based analysis of technologies aimed at clean implementation has been gaining interest, owing to the availability of computational tools. Syngas combustion has been a case study of many modelling-simulation analyses due to the rigorous models [39] and availability of empirically estimated model parameter values [40,41]. Inherent to the use of rigorous models is the task of dealing with multivariate datasets arising from chemical simulations of the process. This paper demonstrates the use of a proven machine learning algorithm-SOM-in aiding the analysis of patterns in multivariate dynamical systems such as syngas combustion. This technique is proposed to be a data analytics tool in studies that require evaluation of the effects of varying variables or variables that may naturally fluctuate within certain ranges, and that a set of process conditions must be identified for design and operation of the system. For example, the composition of the syngas was found by Kousheshi, Yari [40] to significantly affect the performance of the reactivity controlled compression ignition (RCCI) syngas/diesel engine. Moreover, the composition of syngas from the gasification of natural gas, coal, or biomass typically varies with H 2 as the main component followed by CO, CO 2 and CH 4 [2].
As shown in Figures 4-7, both rectangular and cylindrical SOMs may be designed to map the original dataset, while both topographic error and quantization error are minimized. The importance of properly designed SOM (size, shape, lattice structure, etc.) is apparent in these results (Figures 5 and 7). The quantization of the dataset via each neuron centroid may result to aggregation of data that may not closely map the original dataset. Selecting the lower-bound value of the heat duty (minimum: −17,497 cal/sec) as an analysis variable on this aspect revealed that certain sizes and shapes of SOM result to neurons centroids that are close to the reference value (−17,497) (upper right portion of Figures 5 and 7), while other SOM architectures produce neuron centroids far from the reference value (lower left portion of Figures 5 and 7). Another SOM design consideration that was already accounted for during the planning of the simulation experiment is the need for a sufficient number of data [8,10], which usually means a large amount of data. This requirement, however, is easily addressed in computer simulation-based works as up-to-date computers are fast enough to run multitudes of chemical process simulations through advanced methods such as parallel computation, for solving large scale equation-oriented models [42]. Further, the 4800 simulation runs in AspenPlus in this work were performed within a couple of days, most of which was spent organizing the data into the data summary spreadsheets and less time on each simulation run, which took less than a minute each on a laptop computer. The results of properly designed and trained SOMs (Figures 8 and 9) may then be used to render the component planes as surface plots. These component plane surfaces (Figures 8  and 9) are 2D visualizations of the originally multi-dimensional dataset. These 2D maps preserve the relationship of all the variables as they are the various attributes of a fixed set of neurons on the SOM. Hence, comparison of the multivariate patterns becomes a task of comparing the surface plot levels of the various variables on a particular area of the maps, since the same portion of each component plane pertains to the same set of neurons 'sliced' to render the component planes. This dimension reduction (usually to 2D) while preserving the variables relations in the original dataset is one of the strengths of SOM [10,43,44].
This demonstration of the use SOM for the patterns analysis of multivariate datasets from chemical process simulation aims to lay down the potential of SOM algorithm in improving the data analytics of simulation-based works on clean technologies such as syngas. Numerous possible similar applications are foreseen: integration of SOM algorithm in chemical simulation packages such AspenPlus, COMSOL MultiPhysics, etc.; application in other simulation-heavy areas such as computational fluid dynamics; mining of material properties suitable for clean technology implementation; the visualization of the variable patterns of traditional systems, such as the operational variables of wastewater treatment facilities. The introduction of machine learning algorithms such as SOM into the methodology of developing and implementing traditional and emerging technologies may be one of the key paths to realizing clean technology implementation.