Crystal-Site-Based Artificial Neural Networks for Material Classification

: In materials science, crystal structures are the cornerstone in the structure – property paradigm. The description of crystal compounds may be ascribed to the number of different atomic chemical environments, which are related to the Wyckoff sites. Hence, a set of features related to the different atomic environments in a crystal compound can be constructed as input data for artificial neural networks (ANNs). In this article, we show the performance of a series of ANNs developed using crystal-site-based features. These ANNs were developed to classify compounds into halite, garnet, fluorite, hexagonal perovskite, ilmenite, layered perovskite, -o-tp- perovskite, perovskite, and spinel structures. Using crystal-site-based features, the ANNs were able to classify the crystal compounds with a 93.72% average precision. Furthermore, the ANNs were able to retrieve missing compounds with one of these archetypical structure types from a database. Finally, we showed that the developed ANNs were also suitable for a multitask learning paradigm, since the extracted information in the hidden layers linearly correlated with lattice parameters of the crystal structures.


Introduction
In recent years, machine learning algorithms have irrupted as an alternative tool to model the properties and structure of materials [1][2][3][4][5][6][7][8][9][10][11]. These algorithms have allowed scientists to work with large particle systems at shorter times and lower computational costs with respect to the recurred quantum methods [12][13][14][15]. Agrawal and Choudhary [16] have suggested that machine learning constitutes nowadays a fourth modeling paradigm in science, which relies on the information stocked in large databases. In addition, strategic initiatives [17][18][19][20] that seek to accelerate the material discovery-commercialization process have come to the public scene. Among all machine learning algorithms, artificial neural networks (ANNs) are perhaps the most extended algorithms, mainly due to their success in the automatization of tasks that have been regarded as exclusive to humans [21][22][23][24][25]. Deriving from this success, deep learning [26,27] has emerged from machine learning as a discipline that gathers all the activity related to the current ANNsthe zoo.
An important factor influencing the performance of ANNs is the input data [28,29]. In fact, conceiving the components of the input data, which are called descriptors or features, falls in a major area called feature engineering. In this sense, it turns out that it is necessary to revisit how the input data was conceived in reported crystal-chemical works.
Fedorov and Shamanaev [30] have described inorganic crystal compounds in terms of topological centers [31] to estimate thermodynamical variables with feed-forward ANNs. These topological centers contained chemical features such as electronegativity, used methodology has two highlighted criteria: (a) quantum calculations are not required to obtain a feature, and (b) different compounds with the same structure type can be managed regardless of their crystal system. This last fact is a consequence of the crystal definition.
The ANNs were developed using the code patolli.py, which is available at GitHub [48]. Patolli was the name of a game in ancient Mexico, which also had divinatory properties. With the code patolli.py, the aim was to create ANNs, called patollis focused on classification and even prediction of the structure type of crystal materials.

Nomenclature
The compounds with the perovskite, spinel, garnet, hexagonal perovskite, layered perovskite, -o-tp-perovskite, ilmenite, halite, and fluorite structures ( Figure 1) were used to create different collections to develop the ANNs. A brief review of the mentioned structure types is provided in the Supplementary Materials. In this article, we refer to those compounds characterized only by a vertex-shared octahedral framework as perovskite structures. Additionally, we refer to Ruddlesden-Popper and Dion-Jacobson structures as layered perovskites [49].

Figure 1.
Crystal structure types used in this work to classify the compounds: (a) garnet structure, (b) spinel structure, (c) perovskite structure. An example of the layered perovskite structure is shown in (d), which corresponds to the Ruddlesden-Popper phase. Similarly, an example of hexagonal perovskite is shown in (e). The -o-tp-perovskite structure is depicted in (f), whereas the ilmenite structure is shown in (g). The fluorite and halite structures are labeled (h,i). The characteristic polyhedral framework in the fluorite and halite structures are not shown to facilitate their visualization.
In addition to the structure type of Figure 1, compounds without any of the previously mentioned structure types were used as examples of not-identified structure type. We refer to these not-identified compounds as the "others" structure type.
All the ANNs were feed-forward, full-connected type and had two hidden layers. The architecture of these ANNs is described in Table 1. The ANNs were named 4S4O-NEF, 4S4O-WEF, 6S4O-NEF, 6S4O-WEF, 6S8O-NEF, 6S8O-WEF, and 6S10O-WEF. The nomenclature refers to the number of crystal sites, S, characterizing the compounds of the used collection to develop the ANNs (either 4 or 6). The number of outputs in the ANNs was pointed out after the letter O, which could be 4, 8, or 10. The number of outputs corresponded to the different structure types. Additionally, if a set of extra features was included in the characterization of the compounds, it was indicated as WEF (with extra features) or NEF (no extra features). This set of extra features corresponded to the average atomic radius and electronegativity of the crystal sites, as well as to the density of the crystal compound.
The number of crystal sites influenced the number of features (input data) to characterize the crystal compounds. When the set of extra features was used, the number of features was 42 and 163 in the 4S-and 6S-ANN, respectively. Otherwise, the number of features was 33 and 150. Except for the 4S4O-ANNs, all ANNs were developed with a collection of compounds having up to six Wyckoff sites. For the 4S4O-ANNs, a collection of compounds with up to four Wyckoff sites was used. The collections used to develop the 4S4O-and 6S4O-ANNs had compounds with the garnet, perovskite, spinel, and "others" structure types. Similarly, the collection used to develop the 4S8O-ANNs had compounds with either the garnet, hexagonal perovskite, ilmenite, layered perovskite, -o-tp-perovskite, perovskite, spinel, and "others" structure types. The collection used to develop the ANN 4S10O-WEF also included the fluorite and halite structure types.

Features
The methodology published by Gómez-Peralta and Bokhimi [46,47] uses the number of symmetry sites to characterize a crystal compound. The features are related to structural factors, such as geometric and packing ones, as well as local functions related to the chemical environment of the atoms in the crystal sites. The local functions [46,47] model the interaction of all neighbor atoms in a j-Wyckoff site over a central atom in the i-Wyckoff site, within cutoff radius Rc = 25 Å (Equation (1)).
The function used to model this interaction has a Gaussian profile: the neighbor atoms closer to the central atom have a larger contribution to the local function. Additionally, the magnitude of the interaction is modulated depending on the nature of the involved atom pair.
The detailed list of the used features may be consulted in the Supplementary Materials. Since the referred methodology uses the number of combinations to compute the features, there were 6 geometric factors, 15 packing factors, and 12 local chemical environment functions to characterize the collection of compounds with up to four sites. Similarly, there were 15 geometric factors, 105 packing factors, and 30 chemical environment functions to characterize the collection of compounds with up to six sites. This makes 33 and 150 features for the collections with four and six sites. However, we also added the average atomic radius and electronegativity of each site, which provided 8 and 12 features more, as well as the density of the compound. With these extra features, there were 42 and 163 features for the collections of compounds with up to four and six sites, respectively.
The features were arranged in the method described in Figure 2. This arrangement did not influence the training of the ANNs, but it is important to look out once the ANNs are developed.

Figure 2.
General architecture of the ANNs developed in this work. All ANNs had three layers. The activation function in the hidden layers was a hyperbolic tangent, whereas in the output layer the SoftMax function was used. The input data consisted of a series of features that could be grouped in the manner described above in the figure. The features enclosed in dashed lines were not used in the NEF ANNs, but they were in the WEF type.

Software and Computational Infrastructure
The ANNs were developed with the Python code patolli.py, which is available via GitHub. Patolli uses a collection of crystal compounds taken from the Crystallography Open Database [50][51][52][53]. This collection of compounds contains information about the CIF, formula, number of Wyckoff sites, number of different elements in the formula, space group, number of atoms in the unit cell, and the occupation of each Wyckoff site. The atomic occupation of each Wyckoff site was assessed with the Python library pymatgen [54]. In fact, this description of occupied Wyckoff sites is crucial for the definition of the features in the input data.
The code patolli.py is executed via the terminal and calls two text files that need to be specified by the user: one defining all structure types in terms of their space groups and occupied Wyckoff sites, and another that defines the characteristics of the ANNs to be trained, as well as the hyperparameters of these. The crystal definition of the structure types used in the research is given in the Supplementary Materials. The crystallographic definition of the occupied sites in the perovskite and spinel structure was consulted in references [55][56][57][58]. Similarly, compounds with the hexagonal perovskite, layered perovskite, and -o-tp-perovskite structure were consulted according to the information provided by Tilley [49]. The other structure types were crystallographically defined after visual inspection of some of their compounds with the VESTA software package [59].
When patolli.py is executed, the code will ask the user whether further constraints will be taken into account to create the collection of compounds to develop the ANNs. These constraints are related to the number of atoms within the unit cell, the number of different elements in the formula, and the highest number of Wyckoff sites to consider of a structure type. The code patolli.py also asks the user whether the extra features related to the atomic radii, electronegativities, and density of the compound are included in the input data. Finally, the code patolli.py asks whether the ANNs will be trained to differentiate not identified compounds. This subset of not-identified compounds corresponds to the label "others" and is created by randomly choosing compounds that do not match the crystal definitions provided.
During patolli.py execution, the collection of compounds was created according to the crystal structure definitions provided in the structure text files. The code automatically computed the features and splits the whole collection into the TRAining-VALidation (traval) set and the test set. After splitting the collection, csv files containing the compounds as well as NumPy files, which contains the features, were created. The ANNs were trained with the compounds of the traval set, which were previously standardized. After the training of all ANNs was completed, these were tested with the compounds reserved in the test set. If the user declared to perform a second test, patolli.py tested the ANNs with all the remaining compounds of the integrated collection, which were not used in the traval or test sets.
The library Keras 2.2.4 is required to satisfactorily execute patolli.py. The Keras backends that have been proven to work are Theano 1.0.3 and TensorFlow 1.14.0. In addition, the libraries pydot and graphviz were required to display some items related to the library Keras. The 6S-ANNs were developed in the computer Mixkhua, which used the NVIDIA GPU Tesla K40c. The training of the 6S-ANNs in this GPU lasted 3-6 min, depending on the extension of the ANN. Mixkhua is located in the Artificial Intelligence Laboratory of the Institute of Physics, UNAM. On the contrary, the 4S-ANNs were trained on a personal computer with a processor AMD 9-9420 Radeon R5 and 8 GB RAM. On average, the training of the 4S-ANNs lasted 5 min in the personal computer. Training the 6S-ANNs on this personal computer required longer times, even up to 1 h. In all cases, the NumPy random seed used to develop the ANNs was 10. This seed is preconfigured in patolli.py.
The linear regressions between the extracted features of the 6S10O-WEF ANNs and lattice parameters were established via the Python library scikit-learn. For this purpose, a Jupyter Notebook was prepared. This Jupyter Notebook can be consulted electronically (see Data Availability section at the end of the article). Figure 3 shows the distribution of the used compounds to develop the 6S10O-WEF in terms of the structure types of Figure 1. Based on Figure 3, the collection of compounds used to develop the 6S8O-ANNs did not include those with the fluorite and halite structure. Similarly, the collection of compounds to develop the 6S4O-and 4S4O-ANNs only contained the perovskite, garnet, and spinel compounds. It is important to remember that the collection to develop the 4S4O-ANNs did not have compounds with more than four Wyckoff sites. Additionally, it did not include the spinel compounds with the hausmannite structure [57] in the collection to develop the 4S4O-ANNs, which are described with three Wyckoff sites. The reason behind this will be explained later in the text. In all the collections, the number of compounds labeled as "others" structure type was equal to the sum of the compounds of all structure types. In all the collections, the number of compounds belonging to the "others" structure was equal to the sum of the compounds of all structure types. Table 2 shows the precision in the classification of the test set compounds into each structure type by the developed ANNs. Since the compounds of the traval set were used during the learning process (optimization) of the ANNs, the precisions obtained with that compounds set are provided to the reader in the Supplementary Materials. The precisions obtained with the traval set were 2-3% higher than those of the test set, which is normally observed in the development of this kind of models. The metric known as precision is defined as the next quotient:

Classification of Crystal Compounds
where TP stands for true positive cases, whereas FP is for false positive ones (misclassified sample after comparing with the actual label). This metric is important for predictive purposes since it tracks the rate of correct predictions. The mean precision of all developed ANNs was 93.72% after the test set compounds and 95.92% after the compounds of the traval test. These values are similar to those reported by Gómez-Peralta and Bokhimi [46] for a binary classification model, where the ANN outputs the probability of adopting the perovskite structure. The highest precisions in the classification were obtained for the compounds with the garnet (97.60%, on average), perovskite (94.58%), spinel (95.25%), layered perovskite (96.37%), and -o-tp-perovskite structure (91.67%). In addition to these, the compounds labeled as "others" structure type were classified by the ANNs with a mean precision of 94.65%. In contrast, the precision obtained for the compounds fluorite, halite, ilmenite, and hexagonal perovskite were mild. The mean precisions of the mentioned structure types were 87.10%, 87.01%, 80.95%, and 77.78%, respectively. Since the number of compounds with the hexagonal perovskite and ilmenite structures was low, we can suggest that the precisions may improve after enrichment of the dataset with compounds of these structure types. Furthermore, the diversity within the hexagonal perovskite compounds could also be a factor of this diminished performance since different polytypes of the hexagonal perovskite structure (such as 2H-, 6H-, 8H-, and 10H-structures) were included with the same label compounds. The low number of available compounds in the data set was the reason for gathering the mentioned structures under the label hexagonal perovskite.
It is important to mention that all the compounds, except those adopting the fluorite or halite structure, were described with at least three Wyckoff sites. Fluorite and halite structures require two Wyckoff sites to be described. Since the methodology developed by Gómez-Peralta and Bokhimi depends on the number of occupied sites, there were not enough features to characterize compounds with halite and fluorite structures. In fact, the referred methodology expected three non-zero features in the input data. To alleviate this issue, the average atomic radius and electronegativities by site, as well as the density of the compound, were included in the feature set. The inclusion of compound density is justified since it could potentially model the presence of vacancies or solid solutions. After including the mentioned features, there were eight non-zero features for the fluorite and halite compounds. Nevertheless, the obtained precisions for fluorite and halite suggest that more features are needed to improve the performance for these structure types.
Regarding each ANN developed, 4S4O-ANNs had the highest precisions, with 97.73% for 4S4O-NEF ANN and 96.88% for 4S4O-WEF ANN (last row of Table 2). The result can be explained in terms of a sufficiently large number of compounds with garnet, spinel, and perovskite structure, as well as a small number of outputs in those ANNs. Similarly, 6S4O-NEF and 6S4O-WEF ANNs had the same structure type outputs as 4S-ANNs and classified the compounds with a mean precision of 95.03% and 96.19%, respectively. 6S8O-NEF and 6S8O-WEF included in their outputs the hexagonal perovskite, ilmenite, layered perovskite, and -o-tp-perovskite and had mean precisions of 88.71% and 90.36%. 6S8O-NEF had the lowest average score of all developed ANNs. It is interesting to compare the performance of the 6S8O-NEF (88.71%) and 6S10O-WEF (91.17%). The differences between these ANNs were the use of i) more features (average atomic radii and electronegativity per site, as well as the compound density) and ii) the inclusion of the halite and fluorite compounds. In addition, the use of the halite and fluorite structure type seem to improve the metrics for the perovskite, hexagonal perovskite, and ilmenite compounds but did not do so with the spinel compounds. The lower precision for the spinel compounds in the 6S10O-WEF with respect to 6S8O-NEF may be related to the number of different elements in their formula, which sometimes can be similar to fluorite and halite.
We performed a second and larger test on the developed ANNs. For this second test, the remaining compounds of the database were used, i.e., these compounds were not used either in the traval or test sets. In principle, it can be considered that the remaining compounds belonged to the structure type labeled as "others", since these compounds did not match the provided definition of the crystal structure types. For this second test, 4S4O, 6S4O, 6S8O, and 6S10O ANNs were tested with 12,264, 19,229, 18,179, and 16,667 compounds, respectively. The difference in the number of compounds used in the second test depended on the number of compounds in the traval and test sets. The results in the classification of the remaining "others" structure compounds available with the ANNs are in Table 3. More details regarding the number of compounds with respect to their Wyckoff sites are provided in Supplementary Materials. The results in Table 3 corresponded to the metric known as recall, which is defined as follows: = + TN stands for true negative cases, whereas TP stands for true positive cases. Table 3 shows the recall of "others" compounds in terms of their number of Wyckoff sites. Recall of the "others" compounds was consistent with the precision shown in Table  2. The NEF ANNs slightly outperformed the WEF ANNs in classifying the compounds with less than three Wyckoff sites. The lower value in the classification of the two sited compounds with the 6S10O-WEF can be attributed to confusion with the halite and fluorite structures. Therefore, we may suggest that the compounds with the halite and fluorite structure might have worked to improve the performance in the classification of the compounds with some of the structure types of Figure 1. In contrast, the recall of the five-site compounds was the lowest, which was a consequence of the small number of existing compounds with that number of Wyckoff sites in the collection. Table 3. Recall with the remaining compounds having structure types different to those depicted in Figure 1 (second test). As previously established, the developed ANNs should classify all the remaining compounds as "others" structure type. Except for the ANN 6S10O-WEF, the developed ANNs were able to discriminate almost perfectly the compounds with less than three Wyckoff sites as "others". This result is noteworthy since none of the compounds of Figure  1 are described with less than three Wyckoff sites, except those with the fluorite and halite structures, as is well known by an experienced crystal chemist. This capability to discriminate the compounds with less than three Wyckoff sites can be ascribed to the quality of the used features.

Retrieval of Compounds with an Archetypal Structure Type
In general, it was observed that the misclassifications in the test set occurred between the compounds belonging to the structure types of Figure 1 and the "others" structure type. In fact, the purpose of the "others" structure output was to prevent ANNs from systematically mixing up two structure types of Figure 1. The compounds confused by the ANN between the phases depicted in Figure 1 were BaMnO3, Ba2Co2O5.56, and CsNiF3, which were hexagonal perovskites misclassified as perovskites; MgSiO3, which was an ilmenite misclassified as perovskite; Cs2NaYCl6 and Cs2AgAuCl6, which were perovskites misclassified as -o-tp-perovskites; FeBO3, GaBO3, and Na0.22FeF3, which were perovskites misclassified as spinel; and Fe1.2Mn1.6O4 and Ca(InS2)2, which were spinel misclassified as ilmenite and perovskite, respectively. The structures of the mentioned compounds were visually verified.
So far, we have not identified a pattern for the misclassified compounds of Figure 1 with the label "others". The comprehension of these misclassified compounds can help us to design new features that can avoid these errors. In contrast, we found that some compounds of the test set initially labeled as "others" were systematically classified as one of the structure types of Figure 1. This was the case for the spinel compounds Ag2SO4 (space group ) and Li2SO4 ( ); the ilmenite compounds Fe0.33Sc0.33O, Nb2Mn4O9 and BiTi0.375Fe0.25Mg0.375O3 ( 3 ); the hexagonal perovskite compound Ba3CrS5 ( 6 3 ); and several perovskite compounds with the space groups (Pb0.998Ti0.964O2.9), and 2 (K0.73Na0.27NbO3). After visual inspection, we verified that the ANNs correctly classified the enlisted compounds. For example, the Pb2ReMnO6 had the distorted vertex-shared octahedral framework; Rb2ZrCl6 can be considered a double cubic perovskite structure where half of the octahedral sites were occupied, to mention some ( Figure 4).
In addition, a similar trend was observed with some compounds of the second test. It is important to mention that the proportion of these compounds is a small fraction of the not recovered compounds (errors) in Table 3. The list of these compounds is provided to the reader in the Supplementary Materials. Interestingly, the ANNs were able to recognize the hexagonal perovskite structure in compounds with vacancies such as Cs3W2Cl9 ( 6 3 / ) and RbBa2Fe2F9 ( 3 ̅ ), as well as cyanide compounds with the perovskite structure such as K2FeCu(CN)6 ( 3 ̅ ), and NbFeF6 ( 3 ̅ ) had the distorted vertex-shared octahedral framework with unoccupied cuboctahedral sites, for instance. It is also noteworthy to mention that all hausmannite [57] compounds ( 4 1 / ), which were not used in the collections to develop the 4S4O-ANNs, were retrieved as spinel structures by the trained ANNs. The systematic confusions of compounds initially labeled as the "others" structure type have their origin in the provided definition of the structure types: neither the space groups nor the Wyckoff occupation of these mislabeled compounds were considered. Nevertheless, the developed ANNs were able to recognize the characteristic polyhedral pattern of the structure types in Figure 1.

Lattice Parameter Assessment with the Extracted Features by the ANN
So far, we have focused the obtained results by the ANNs toward applications such as automatized classification and retrieving of the compounds from a crystal database. The success of these ANNs is ascribed to the quality of the features used in the characterization of the crystal compounds. Beyond these applications, it could be of interest to establish correlations with other crystal variables using the processed information within the ANNs' hidden layers. In fact, the information processed in the hidden layer constitute new finer features that may be more accurate to consider for certain tasks.
We used the extracted features in the second layer of the ANN 6S10O-WEF to establish a correlation with the lattice parameters of the simple cubic perovskite ( 3 ̅ ), double cubic perovskite ( 3 ̅ ), orthorhombic perovskite ( ), trigonal perovskite ( 3 ̅ ), garnets ( 3 ̅ ), spinels ( 3 ̅ ), and tetragonal Ruddlesden-Popper structures ( 4/ ). According to the architecture of ANN 6S10O-WEF (Table 1), there were 652 extracted features in the second layer for each compound. The extracted features were computed after feeding the ANN 6S10O-WEF with the input data except the local functions and the density of the compound, which were set to zero ( Figure 5). It was necessary to hide the local functions and the density of the compound since their calculation requires knowledge of the lattice parameter a priori.   54,33,28,127,87,22, and 31 compounds of the test set of the mentioned crystal structures were used to test the linear regression fit. We found compounds of the test set that did not follow the linear regression fit. For these outlier compounds, we found that the lattice parameters were exaggerated, and therefore they were not considered in the calculation of the mean square error. The number of non-outlying compounds appears also in Table  4. In all cases, a correlation coefficient above 0.90 was established between the extracted features and the lattice parameters of the studied structures types.
It is expected that having a larger amount of compounds of each crystal structure will lower the number of outlier cases. The outlier compounds can be explained as cases where it was not possible to establish a good interpolation. A larger collection of compounds to fit a linear regression could also improve the metrics of the linear regression performance. In addition, it is important to mention that these correlations between the extracted features and lattice parameters were not possible to establish with the nodes of the first (which provided 815 extracted features) or third layer (10 extracted features). Other than the well-known relationship between the ionic radii in the aristotype perovskite structure with the lattice parameter, an analog of this has been barely sketched for spinels, garnets, and Ruddlesden-Popper structures. Concerning the perovskite compounds, it is worthy to mention that Javed et al. [60] and Majid et al. [61,62] used similar features to assess the lattice parameters of cubic, monoclinic, and orthorhombic perovskite compounds with supported vector machines. Recently, Zhang et al. [63] established a relationship via Gaussian process regression between the ionic radii and the lattice parameter of monoclinic double perovskites. Interestingly, Song et al. [64] deduced a tolerance factor for garnet structures after reversion of the distorted dodecahedral cations to a regular cube.

Features' Influence on the Performance of the ANNs
Finally, we used the compounds of the test set to analyze the influence of the features on the performance in the classification of crystal structures. Table 5 contains the results of this analysis. This analysis was performed after hiding the features of a given block of Figure 2, i.e., the hidden features were set to zero. With this approach, the ANNs do not receive complete information, thus some connections between the nodes will be not triggered. Results in Table 5 points out that the most crucial set of features in the performance of the ANNs were the local functions. After remotion of the local functions, the precision dropped at least to 50.84% in the 6S4O-NEF, and to 16.23% in the 6S8O-NEF. The local functions sum up the interactions of all neighbor atoms over a central atom in a crystal site. The magnitude of the interaction depended on the nature of the involved atom pair.
The second most important block of features was the packing factors. The packing factors measured the efficiency of the space-filling by the atoms of the crystal site. One of the most known packing factors is the Goldschmidt tolerance factor, which is used in the perovskite compounds. The most notorious reductions in the precision were obtained for 6S8O-NEF (49.65%), 6S8O-WEF (60.52%), and 6S10O-WEF (69.09%), which are the ANNs with more structure type outputs. In contrast, hiding the geometric factors did not affect the performance of the ANNs. The geometric factors were quotients of atomic radii, which are related to the geometry defined by the first neighbor atoms.
The average atomic radii and electronegativities by site and the density of the compound were features used in the WEF ANNs. In these types of ANNs, the omission of the average atomic radii and electronegativities block affected the performance in the classification. The precisions dropped to 91.41%, 86.28%, 74.15%, and 80.30% with the ANNs 4S4O-WEF, 4S6O-WEF, 4S8O-WEF, and 4S10-WEF, respectively. In contrast, the omission of the density of the compound did not affect the performance of the WEF ANNs.

Conclusions
We have shown that crystal-site-based features enabled the ANNs to classify the crystal compounds with an average precision of 93.72%. The ANNs classified compounds with garnet, perovskite, spinel, layered perovskite, -o-tp-perovskite, fluorite, halite, ilmenite, and hexagonal perovskite structures with mean precisions of 97.60%, 94.58%, 95.25%, 96.37%, 91.67%, 87.10%, 87.01%, 80.95%, and 77.78%, respectively. The low scores obtained by the ANNs were ascribed to the availability in the database of compounds with a structure type. In addition, the compounds not belonging to any of the mentioned structures were classified by the ANNs with a mean precision of 94.65%. Hence, the ANNs developed with the used feature construction may find application in automatized systems for classifying and retrieving compounds from crystal databases.
In addition to the mentioned application, we were able to establish linear correlations between the lattice parameters with the extracted features in the ANN's hidden layers. More specifically, correlation coefficients above R 2 = 0.9422 were established with the lattice parameters of garnet, spinel, Ruddlesden-Popper, and perovskite compounds. We suggest that the information derived from the ANN's hidden layers may serve to establish other correlations with optical or electronic properties, for instance. Thus, the developed ANNs are suitable for multitask learning applications.
It is important to mention that the ANNs were developed using the crystallographic information of already-synthesized compounds. Each developed ANN constitutes a function to map the chemical composition and atomic spatial arrangement into a structure type. Therefore, we expect that the crystal-site-based ANNs may also serve as a more accurate tool to probe the space of possible new crystal compounds.

Supplementary Materials:
The review of crystal structures, the list of used features, the crystallographic definitions of the structure types, the results in the traval and second test, and the retrieved compounds in the second test are available online at www.mdpi.com/2073-4352/11/9/1039/s1.

Data Availability Statement:
The data presented in this study are openly available in https://github.com/gomezperalta/support_data_cs-anns.