An Open Dataset of Neural Networks for Hypernetwork Research

Kurtenbach, David; Shamir, Lior

doi:10.3390/electronics14142831

Open AccessArticle

An Open Dataset of Neural Networks for Hypernetwork Research

by

David Kurtenbach

and

Lior Shamir

^*

Department of Computer Science, Kansas State University, Manhattan, KS 66506, USA

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(14), 2831; https://doi.org/10.3390/electronics14142831

Submission received: 13 May 2025 / Revised: 7 July 2025 / Accepted: 10 July 2025 / Published: 15 July 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Despite the transformative potential of AI, the concept of neural networks that can produce other neural networks by generating model weights (hypernetworks) has been largely understudied. One of the possible reasons is the lack of available research resources that can be used for the purpose of hypernetwork research. Here we describe a dataset of neural networks, designed for the purpose of hypernetwork research. The dataset includes

10^{4}

LeNet-5 neural networks trained for binary image classification separated into 10 classes, such that each class contains 1000 different neural networks that can identify a certain ImageNette V2 class from all other classes. A computing cluster of over

10^{4}

cores was used to generate the dataset. Basic classification results show that the neural networks can be classified with accuracy of 72.0%, indicating that the differences between the neural networks can be identified by supervised machine learning algorithms. The ultimate purpose of the dataset is to enable hypernetwork research. The dataset and the code that generates it are open and accessible to the public.

Keywords:

hypernetworks; artificial intelligence; artificial neural networks; benchmarks

1. Introduction

The concept of hypernetworks [1,2,3,4,5,6,7] is used to describe a higher-level model that is capable of producing a separate neural network by generating model weights. It is a type of meta-learning architecture where the hypernetwork produces weights for a new network or target network.

Implementing a hypernetwork using neural networks would therefore require neural networks that can perform tasks they were not necessarily trained on. Hypernetworks are capable of generating entirely new models that do not require training and are not informed through any type of transfer learning or one/few shot learning approach. This shifts neural networks from their initial design and requires new training and inference techniques that can satisfy the challenging needs of hypernetworks.

Although several approaches have been proposed to address hypernetworks, this area of advanced computing is still generally considered somewhat underexplored. Approaching hypernetworks from a generative framework requires that the training data be given special considerations provided the complexity of the task. While hypernetwork research has explored a variety of approaches, hypernetwork training data is commonly based on conditioning input such as task embeddings, feature distributions, or latent variables [1]. The training data presents common challenges such as high dimensionality and overall generalization.

Despite the efforts, the development of hypernetworks is a challenging task, and research efforts are still being continued. Here we prepared the first dataset of neural networks designed for hypernetwork research. The ultimate purpose of the dataset is to provide a model that can generate neural networks rather than training them.

Datasets of neural networks have been studied in the past [8]. A dataset of neural networks can be used to train a classifier to identify the machine learning problem that it solves. For instance, a neural network can classify between neural networks trained on MNIST and neural networks trained on CIFAR [8]. Other studies aimed at predicting the performance of a neural network classifier [9]. Some architectures have also been proposed for analyzing weight spaces of neural networks [10,11,12].

While these are based on datasets of neural networks, they were not designed for the purpose of hypernetworks. For instance, the ability to distinguish between a classifier that was trained on MNIST data and a classifier that was trained on CIFAR data does not necessarily provide tools that can be used to generate a classifier in the context of hypernetworks. Therefore, the dataset of neural networks described here is based on a single image dataset, which is Imagenette. Each class in the dataset contains neural networks trained to identify a certain Imagenette class. That is achieved by conceptualizing the problem as a binary classification problem, such that one of the classes contains images from the class of interest, while the other class is a collection of random images from all other classes. Such a dataset can be used to support Generative Adversarial Networks [13,14] that instead of generating text or images can ultimately generate neural networks.

A unique trait of hypernetworks is creating efficiency in the training process when compared to traditional methods of feed forward and back propagation cycles. Primary networks that are lighter and contain a smaller number of parameters can produce larger networks containing a higher number of parameters.

The ability to generate neural networks can ideally lead to solutions of AI tasks without the need to train a neural network for each specific task. Since the training of a neural network is often computationally demanding, generating neural networks can provide a faster and more energy-efficient solution to the training of neural networks. Additionally, it can also lead to a more general AI system that does not require the collection of large training sets for each specific task.

The codebase and dataset are available publicly at https://github.com/davidkurtenb/Hypernetworks_NNweights_TrainingDataset (accessed on 1 July 2025) and https://huggingface.co/datasets/dk4120/neural_network_parameter_dataset_lenet5_binary/tree/main (accessed on 1 July 2025), respectively. Historically, machine learning research has been driven by the availability of benchmark datasets such as ImageNet [15], among many others [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32], that enabled the advancement of the field.

These benchmark datasets served as substantial factors in the rapid progression of machine learning and artificial intelligence. They provide researchers with convenient access to data, allowing researchers to focus on the development of their algorithms. As benchmarks, they also allow researchers to compare the performance of their algorithms by applying different algorithms developed by different research teams to the same datasets. For instance, the sub-field of automatic face recognition was powered by the availability of face datasets such as ORL [16] or FERET [17]. Similarly, the task of automatic object recognition benefited substantially from benchmark datasets such as ImageNet [15], among many others. Since benchmark datasets of neural networks are not yet available, the availability of the open dataset can assist in the advancement of hypernetwork research.

2. Background

While there are multiple research efforts around the study of hypernetworks and their applications, the subfield is somewhat nascent, with ample areas to be further explored. The core idea of leveraging a higher-order neural network that sometimes contains a smaller number of parameters than the target model to generate weights of a separate neural network is a concept that shifts from the “typical” manner neural network are created. The concept gained its initial traction with the work of [1], looking for frameworks to expand the existing methods of training a neural network.

For instance, hyper-representations with layer-wise loss normalization was used to aggregate knowledge from model zoos [6]. That allowed one to generate new models based on that knowledge.

Bayesian hypernetworks [2] provide an expansion of Bayesian deep learning that can transform noise distribution to a distribution with the parameter of a different neural network. It has been demonstrated to be more resistant to adversarial data [2].

Applications of hypernetworks have seen a number of use cases with variety of applicability. Their potential has spread across multiple domains such as meta-learning, continual learning, neural architecture search, and reinforcement learning [33,34,35,36]. Particularly, they have the ability to train neural networks in cases of limited training data with few-shot learning.

For instance, hypernetworks have been used to improve continual learning. By using the concept of task-conditioned hypernetworks, it has been shown that is was possible to overcome the problem of catastrophic forgetting in “standard” artificial neural networks trained on several different tasks [34].

The task of continual learning using hypernetworks was also studied by [36], using task-conditioned hypernetworks to make learning sufficiently fast. The use of these hypernetworks make on-the-fly learning practical, thereby allowing one to avoid the relatively long response time typical to stationary learning models.

The concept of Graph HyperNetworks was used to identify the most effective neural network architecture for a certain machine learning problem without the computationally challenging need to train and test all of these architectures [35].

Hypernetworks have also been found effective in representations of conditional sentences [7], which involve embedding pre-computed conditions into the corresponding layers, allowing the sentence to be handled differently based on the condition.

Hypernetworks have demonstrated a theoretical value in their application to advance continual learning by resolving catastrophic forgetting. Where traditional neural networks have adjusted model weights during the training process, those weights are then static until the model is retrained. Hypernetworks redefine that paradigm by proposing the notion of dynamic weights. The application of a dynamic weight schema serves as a manner to improve network adaptability and performance [1].

While the study of hypernetworks presents promising potential, they are not without their challenges. Hypernetworks have faced stability and scalability concerns where the models grow increasingly complex [37]. These challenges are only amplified with computational requirements, which have also been difficult to overcome. The relationship between a hypernetwork and target network must be carefully designed in their architecture. Another significant challenge of neural networks is having access to a robust and relevant training dataset. The work covered in this paper aims to begin resolving this challenge and creating a path toward novel applications of hypernetworks in conjunction with generative approaches.

3. A Dataset of Neural Networks

The dataset contains 10⁴ instances of neural networks, divided into a total of 10 classes. Each neural network is a two-way, one-versus-all image classifier, and each class contains 1000 neural networks that can identify the images of that class. The different classes are taken from the Imagenette dataset [15], specifically the Imagenette 320 px V2 dataset with classes 0: Tench, 1: English Springer, 2: Cassette Player, 3: Chain Saw, 4: Church, 5: French Horn, 6: Garbage Truck, 7: Gas Pump, 8: Golf Ball, and 9: Parachute.

Imagenette is a well-studied benchmark dataset in a mature stage in its life-cycle. That allows one to minimize risks such as missing data, imbalanced classes, or label accuracy, which can be a problem with new datasets [38].

The code repository also includes model performance metrics, aggregated by class and performance plots for each of the 10⁴ models. Additionally, to further drive accessibility, the model parameters of the 10,000 LeNet-5 binary classifiers have been compiled into two files. One file condenses the weights and biases by model, referred to as modelwise. The individual parameters for each model are captured by model and provided as a single flattened tensor. The other file captures parameters across classes by layer, referred to as layerwise. Each of the 10 classes parameters are saved by layer. For example, the class “church” and “conv2d” dictionary contains the parameters (combined weights and biases) for the first convolutional layer for all 10,000 LeNet-5 models trained for binary, one-vs.-all classifications of a church.

To generate a dataset of neural networks, each neural network is trained as a two-way classifier. The network is trained such that all images of the first class are images taken from one class of Imagenette. The images of the other class are taken randomly from all other Imagenette classes. Each model training dataset contains 9–10% of the target class.

That leads to 10 classes such that each class contains 1000 neural networks that can identify images of one class from all other classes. The dataset is therefore balanced [29]. All models were trained for 25 epochs and achieved an average accuracy of 91.5%. Because the images in the other class are selected randomly, every neural network is different. That leads to a dataset of neural networks such that each class contains a large number of neural networks. Each neural network in the dataset was trained with different images, and therefore it is different from the other neural networks in that class.

The architecture that was used for this dataset was LeNet-5 [18]. The motivation for selecting a relatively simple architecture was to ensure that the generation of the dataset was computationally practical. Another reason was avoiding the curse of dimensionality by using an architecture with a lower number of weights compared to other common architectures such as ResNet or VGG. A deeper architecture would have a higher number of parameters, making it more challenging to use it for the purpose of generating new neural networks due to the higher dimensionality.

Training a very large number of neural networks is a computationally intensive task. The training required over twenty seven hours of a powerful computing cluster with more than 10,000 cores. The cluster was made of 1296 cores of Xeon E5-2690, 1296 cores of Xeon E5-2680, 2048 cores of Xeon E5-2683, 2400 cores of Xeon E5-2630, 1823 cores of Xeon Gold 6130, 2176 cores of AMD EPYC 7452, and 96 Nvidia GeForce GTX 2080 Ti, making up a large cluster of a total of 11,039 cores. Xeon processors were manufactured by Intel, Santa Clara, CA, USA. EPYC processors were manufactured by AMD, Santa Clara, CA, USA. GeForce GPU was manufactured by Nvidia, Santa Clara, CA, USA.

Using a deeper architecture with more parameters would have led to a dataset that would be impractical to generate even with a powerful cluster. Additionally, a relatively simple architecture simplifies the analysis and use of the dataset. Such an analysis can include training a neural network that can classify neural networks, or generate neural networks automatically.

Table 1 shows the classification accuracy, precision, recall, and F1 score of the neural networks of the different classes. Since each class contains 1000 neural networks, and each neural network is trained separately using different data, the performance of the neural networks contained in each class is not expected to be identical.

LeNet-5 Model Training Specifications

The proposed dataset of neural networks contains simple neural networks trained through one-versus-all binary classification models. As mentioned in Section 3, these neural networks follow the LeNet-5 architecture. The total number of trainable parameters for each model is 91,481. For comparison, the number of parameters in the common ResNet-50 architecture is over 2 × 10⁶. Table 2 summarizes the LeNet-5 architecture and the number of parameters.

Each model produced a total of 10 arrays containing alternating model weight and bias information, saved in the format of an hdf5 file. The length of each array varies and ranges in parameters from 1 to 48,000. Using the model weights as a source of training data presents a unique approach to the training of hypernetworks. Because each neural network is trained with different images, the distribution of the weights within each class of model is distinct. Most of the individual model weights were near-zero numbers.

Figure 1 displays the distribution of all weights of all classes, and Figure 2 displays the weight distributions separated by class. The plots are scaled to highlight the near-zero distributions of each model due to the large concentration of values within this range. The values of the weights are not identical, which can be expected given that each neural network is trained with a different set of images. The distinct curves for a given class provide evidence of the distinct patterns and features calculated across weight values for the object classification LeNet-5 models. The distribution by layer can be found in Figure A3, Figure A4, Figure A5, Figure A6 and Figure A7 in Appendix A. Table 3 shows the distribution of common weights in the trained neural networks among the different classes.

4. Parameter Distribution

Understanding the distributions and distinction in patterns between layers separated by class is a critical piece in learning characteristics. It is not just the overall distribution by model that is important; one should also look at distributional differences of the LeNet-5 model layers. Analysis was performed to better understand the distributions as well as compare divergence between classes.

As parameters traverse the LeNet-5 architecture, there is an expected reshaping of their distribution. The convolutional approach reduces the total range of distribution, which then undergoes significant transformations as information is passed through the dense layers. Each class has its own unique pattern but follows a similar profile. The Jensen–Shannon (JS) Divergence was used to assess the level of similarity between class parameter layer distributions. The JS divergence by layer is displayed by Figure A1 and Figure A2 in Appendix A. As expected, the parameter distribution at the third and final convolutional layer were the most similar. They demonstrated characteristics that were nearly overlapping when comparing two different classes. However, this is expected as convolutional layers reduce complexity within the distributions and limit the feature space. This is an aspect of convoloutional layers’ ability to focus on spatial relationships and employ weight sharing. On the opposite side, the second dense layer was the most diverse layer. This again is expected as the dense layer is connecting all neurons passed by the first dense layer. The effect is to open up the range of the parameter distribution.

5. Automatic Classification of Neural Networks

To further explore the potential of the dataset in developing hypernetworks, the model weights were used in classification tasks. In demonstrating the ability for the training set to be effectively classified using traditional machine learning and deep learning approaches, one can reason that the training data has ample features within the model weights. This is a primary requirement that leads to the potential of developing hypernetworks with a robust training dataset.

As mentioned in Section 3, the dataset is fully balanced and contains no missing values. Therefore, classification accuracy higher than mere chance reflects the ability of the classifier to identify between the neural networks. The effectiveness of the classifier was measured by the classification accuracy [29], as well as the specificity, sensitivity, and F1.

5.1. Classification Methods

Traditional methods of classification were applied to the baseline of the model performance. Given the high dimensionality of the data, a deep learning model was also applied. Classification was completed with using the layer weights and biases, with a total of 91,481 parameters per model. Following standard practices [39], the experiments were performed such that 70% of the samples were allocated for training, and the rest of the data was used for testing/validation.

The deep neural network that was used is a fully connected multi-layer perceptron, with three hidden layers of sizes of 256, 128, and 64, with batch normalization. The activation functions are ReLU, and the dropout rate was set to 0.6.

5.2. Classification Results

The results for the entire model are summarized in Table 4. As the table shows, the classification accuracy is far higher than the expected 10% mere chance, showing that the neural networks can be differentiated from each other by their weights. Naive Bayes achieves the highest classification accuracy of 72% (

p < 10^{- 5}

).

The results observed using deep learning classification capture some of the challenges within the subfield of hypernetworks. The high dimensionality of model weights is challenging to work with and prone to overfitting. Even within this example, practices such as batch normalization, dropout, regularization, random search parameter tuning, and experimentation with model architecture were used with minimal success in terms of improving accuracy. Figure 3 shows the loss and accuracy of the deep learning model when using all weights. The training/validation loss by layer is shown in Figure A8 and Figure A9 in Appendix A.

6. Discussion

The dataset of 10⁴ neural networks introduced here was designed specifically for hypernetwork research. Therefore, it is important that the neural networks can be distinguishable through an automatic process. That can show that the weights of the different neural networks exhibit different patterns that are identifiable by machine learning algorithms.

An attempt to use a classifier that can predict the class that a neural network identifies showed that the classifier can identify the class through the weights of the neural network at an accuracy far higher than mere chance. That provides an indication that the dataset can be used for studies that involve machine learning.

For the purpose of automatic classification of neural networks, the deep neural network did not perform well compared to other algorithms, while Naive Bayesian networks showed the best performance. Naive Bayes assumes that each parameter is independent and therefore performs well when the input variables are independent from each other [40]. Weights in a neural network are independent values. For instance, weight in neural networks normally cannot be predicted from other weights, unlike other types of data such as values of pixels in an image. It can therefore be expected that the Naive Bayes provides the best classification accuracy for this specific task.

The fact that the neural networks can be separated using machine learning provides an indication of the existence of patterns in the weights. The expected presence of such patterns is also an indication that such distributions can be produced by generative AI for the purpose of hypernetworks. Generative AI if often used to generate images, audio, video, text, and code [41]. Tools such as AlphaEvolve [42] show that it can also be used to generate new algorithms. Here we provide research resources for exploring the contention that generative AI can also be used to generate artificial neural networks.

For the direct purpose of generative AI, the classifier of neural networks shows that a GAN discriminator is possible. The results can also be used as baseline for future algorithms that can classify between neural networks. Improving the classification accuracy can lead to better discriminators.

7. Conclusions

Here we introduced an open dataset for the study of hypernetworks. The generation of the dataset involved substantial computing resources, resulting in

10^{4}

neural networks separated into 10 classes based on Imagenette data. The purpose of the dataset is to enable the research of hypernetworks. The dataset is open and available to the public. Using a known dataset such as Imagenette to generate the neural networks will allow one to better understand the nature of the content of the dataset, but it can also allow one to expand the dataset in the future by training new image classes against the Imagenette images.

While datasets of neural networks exist, the dataset described here is designed specifically for the purpose of hypernetwork research. For instance, it is based on a single dataset, rather than an attempt to distinguish between neural networks trained with two completely different datasets [8]. It also uses the same neural network architecture, as it does not aim at identifying the ideal architecture for a given classification problem [9].

The dataset of

10^{4}

neural networks separated into 10 classes is definitely far smaller than the number of classes and images in a dataset such as ImageNet. Another limitation of the dataset is that it is limited to one CNN architecture. Naturally, large datasets of neural networks require substantial computing resources to generate each sample and are far more demanding than just adding an image sample to a “traditional” dataset. When using a more complex CNN architecture the training can require far more powerful computing resources, and a higher number of parameters. Yet, the dataset can provide research infrastructure for the development of the concept of hypernetworks and can be used for a variety of purposes that include supervised machine learning, unsupervised machine learning, and generative AI.

The dataset is based on the relatively simple LeNet-5 architecture. It can be trained within a reasonable time using a powerful computing cluster. Future benchmarks will include other common architectures such as ResNet, although using more complex architectures with a higher number of parameters will require substantially stronger computing resources. A higher number of parameters will also require more complex hypernetworks that can be trained by these neural networks. That will require stronger computing and longer training, not merely to generate the dataset but also to train the hypernetworks.

Future work will also include the development of GANs that can generate neural networks. While GANs are often used to generate images or text, they can also be used to generate neural networks. That, however, requires a suitable dataset of neural networks that can allow the training of a GAN that generates neural networks. Such GANs will require modification to the commonly used GAN architectures. The availability of datasets of neural networks as described here can enable the development and testing of such GANs.

Author Contributions

Conceptualization, D.K. and L.S.; Methodology, D.K.; Software, D.K.; Validation, D.K.; Resources, L.S.; Writing—original draft, D.K.; Writing—review & editing, D.K. and L.S.; Supervision, L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported in part by NSF grant 2148878.

Data Availability Statement

The code used in this project is available at https://github.com/davidkurtenb/Hypernetworks_NNweights_TrainingDataset (accessed on 1 July 2025). The dataset is available at https://huggingface.co/datasets/dk4120/neural_network_parameter_dataset_lenet5_binary/tree/main (accessed on 1 July 2025).

Acknowledgments

We would like to thank the three reviewers for the helpful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. JS divergence distribution by layer.

Figure A2. JS diveregence distribution by layer.

Figure A3. Parameter distribution by layer for classes “cassette_player” and “chain_saw”.

Figure A4. Parameter distribution by layer for classes “church” and “english_springer”.

Figure A5. Parameter distribution by layer for classes “french_horn” and “garbage_truck”.

Figure A6. Parameter distribution by layer for classes “gas_pump” and “golf_ball”.

Figure A7. Parameter distribution by layer for classes “parachute” and “tench”.

Figure A8. Training/validation loss of DNN classification model by layer—convolutional layers.

Figure A9. Training/validation loss of DNN classification model by layer—dense layers.

References

Ha, D.; Dai, A.; Le, Q.V. Hypernetworks. arXiv 2016, arXiv:1609.09106. [Google Scholar]
Krueger, D.; Huang, C.W.; Islam, R.; Turner, R.; Lacoste, A.; Courville, A. Bayesian hypernetworks. arXiv 2017, arXiv:1710.04759. [Google Scholar]
Zhang, C.; Ren, M.; Urtasun, R. Graph hypernetworks for neural architecture search. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019; p. 871. [Google Scholar]
Chauhan, V.K.; Zhou, J.; Lu, P.; Molaei, S.; Clifton, D.A. A brief review of hypernetworks in deep learning. Artif. Intell. Rev. 2024, 57, 250. [Google Scholar] [CrossRef]
Knyazev, B.; Drozdzal, M.; Taylor, G.W.; Romero Soriano, A. Parameter prediction for unseen deep architectures. Adv. Neural Inf. Process. Syst. 2021, 34, 29433–29448. [Google Scholar]
Schürholt, K.; Knyazev, B.; Giró-i Nieto, X.; Borth, D. Hyper-representations as generative models: Sampling unseen neural network weights. Adv. Neural Inf. Process. Syst. 2022, 35, 27906–27920. [Google Scholar]
Yoo, Y.H.; Cha, J.; Kim, C.; Kim, T. Hyper-cl: Conditioning sentence representations with hypernetworks. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 11–16 August 2024; Volume 1, pp. 700–711. [Google Scholar]
Eilertsen, G.; Jönsson, D.; Ropinski, T.; Unger, J.; Ynnerman, A. Classifying the classifier: Dissecting the weight space of neural networks. In Proceedings of the 24th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 29 August–8 September 2020; IOS Press: Amsterdam, The Netherlands, 2020; pp. 1119–1126. [Google Scholar]
Unterthiner, T.; Keysers, D.; Gelly, S.; Bousquet, O.; Tolstikhin, I. Predicting neural network accuracy from weights. arXiv 2020, arXiv:2002.11448. [Google Scholar]
Schürholt, K.; Kostadinov, D.; Borth, D. Self-supervised representation learning on neural network weights for model characteristic prediction. Adv. Neural Inf. Process. Syst. 2021, 34, 16481–16493. [Google Scholar]
Schürholt, K.; Taskiran, D.; Knyazev, B.; Giró-i Nieto, X.; Borth, D. Model zoos: A dataset of diverse populations of neural network models. Adv. Neural Inf. Process. Syst. 2022, 35, 38134–38148. [Google Scholar]
Navon, A.; Shamsian, A.; Achituve, I.; Fetaya, E.; Chechik, G.; Maron, H. Equivariant architectures for learning in deep weight spaces. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; PMLR; pp. 25790–25816. [Google Scholar]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 1–9. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Samaria, F.S. Face Recognition Using Hidden Markov Models. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 1994. [Google Scholar]
Phillips, P.J.; Wechsler, H.; Huang, J.; Rauss, P.J. The FERET database and evaluation procedure for face-recognition algorithms. Image Vis. Comput. 1998, 16, 295–306. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Klimt, B.; Yang, Y. The enron corpus: A new dataset for email classification research. In Proceedings of the European Conference on Machine Learning, Pisa, Italy, 20–24 September 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 217–226. [Google Scholar]
Shamir, L.; Orlov, N.; Mark Eckley, D.; Macura, T.J.; Goldberg, I.G. IICBU 2008: A proposed benchmark suite for biological image analysis. Med. Biol. Eng. Comput. 2008, 46, 943–947. [Google Scholar] [CrossRef]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 1 July 2025).
McFee, B.; Bertin-Mahieux, T.; Ellis, D.P.; Lanckriet, G.R. The million song dataset challenge. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; pp. 909–916. [Google Scholar]
Dueben, P.D.; Schultz, M.G.; Chantry, M.; Gagne, D.J.; Hall, D.M.; McGovern, A. Challenges and benchmark datasets for machine learning in the atmospheric sciences: Definition, status, and outlook. Artif. Intell. Earth Syst. 2022, 1, e210002. [Google Scholar] [CrossRef]
Cohen, G.; Afshar, S.; Tapson, J.; Van Schaik, A. EMNIST: Extending MNIST to handwritten letters. In Proceedings of the International Joint Conference on Neural Networks, Anchorage, AK, USA, 14–19 May 2017; pp. 2921–2926. [Google Scholar]
Moscato, V.; Picariello, A.; Sperlí, G. A benchmark of machine learning approaches for credit score prediction. Expert Syst. Appl. 2021, 165, 113986. [Google Scholar] [CrossRef]
Wu, Z.; Ramsundar, B.; Feinberg, E.N.; Gomes, J.; Geniesse, C.; Pappu, A.S.; Leswing, K.; Pande, V. MoleculeNet: A benchmark for molecular machine learning. Chem. Sci. 2018, 9, 513–530. [Google Scholar] [CrossRef] [PubMed]
Khan, F.S.; Beigpour, S.; Van de Weijer, J.; Felsberg, M. Painting-91: A large scale database for computational painting categorization. Mach. Vis. Appl. 2014, 25, 1385–1397. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
Sinka, M.P.; Corne, D.W. A large benchmark dataset for web document clustering. Soft Comput. Syst. Des. Manag. Appl. 2002, 87, 881–890. [Google Scholar]
Eze, C.S.; Shamir, L. Analysis and prevention of AI-based phishing email attacks. Electronics 2024, 13, 1839. [Google Scholar] [CrossRef]
Thiyagalingam, J.; Shankar, M.; Fox, G.; Hey, T. Scientific machine learning benchmarks. Nat. Rev. Phys. 2022, 4, 413–420. [Google Scholar] [CrossRef]
Tschalzev, A.; Purucker, L.; Lüdtke, S.; Hutter, F.; Bartelt, C.; Stuckenschmidt, H. Unreflected Use of Tabular Data Repositories Can Undermine Research Quality. arXiv 2025, arXiv:2503.09159. [Google Scholar]
Ehret, B.; Henning, C.; Cervera, M.R.; Meulemans, A.; von Oswald, J.; Grewe, B.F. Continual Learning in Recurrent Neural Networks. arXiv 2021, arXiv:2006.12109. [Google Scholar]
von Oswald, J.; Henning, C.; Grewe, B.F.; Sacramento, J. Continual learning with hypernetworks. arXiv 2022, arXiv:1906.00695. [Google Scholar]
Zhang, C.; Ren, M.; Urtasun, R. Graph HyperNetworks for Neural Architecture Search. arXiv 2020, arXiv:1810.05749. [Google Scholar]
Huang, Y.; Xie, K.; Bharadhwaj, H.; Shkurti, F. Continual model-based reinforcement learning with hypernetworks. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021. [Google Scholar]
Keynan, S.; Sarafian, E.; Kraus, S. Recomposing the Reinforcement Learning Building Blocks with Hypernetworks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Volume 139, pp. 9301–9312. [Google Scholar]
Gong, Y.; Liu, G.; Xue, Y.; Li, R.; Meng, L. A survey on dataset quality in machine learning. Inf. Softw. Technol. 2023, 162, 107268. [Google Scholar] [CrossRef]
Singh, V.; Pencina, M.; Einstein, A.J.; Liang, J.X.; Berman, D.S.; Slomka, P. Impact of train/test sample regimen on performance estimate stability of machine learning in cardiovascular imaging. Sci. Rep. 2021, 11, 14490. [Google Scholar] [CrossRef]
Friedman, N.; Geiger, D.; Goldszmidt, M. Bayesian network classifiers. Mach. Learn. 1997, 29, 131–163. [Google Scholar] [CrossRef]
Li, H.; Gao, Q.; Zhang, S. Assessing and Improving Dataset and Evaluation Methodology in Deep Learning for Code Clone Detection. In Proceedings of the 34th International Symposium on Software Reliability Engineering, Florence, Italy, 9–12 October 2023; pp. 497–508. [Google Scholar]
Cui, C.; Wang, W.; Zhang, M.; Chen, G.; Luo, Z.; Ooi, B.C. Alphaevolve: A learning framework to discover novel alphas in quantitative investment. In Proceedings of the Proceedings of the 2021 International Conference on Management of Data, Virtual, 20–25 June 2021; pp. 2208–2216. [Google Scholar]

Figure 1. Distribution of class parameters across layers by class.

Figure 2. Distribution of class parameters by class. Because each neural network is trained with different images, the parameters are not expected to be identical.

Figure 3. Loss and accuracy curves of deep learning model with all weights.

Table 1. Performance metrics of classification models whose weights are used to compile the training dataset per each classes.

Class	Accuracy			Precision			Recall			F1
Class	Min	Max	Average	Min	Max	Average	Min	Max	Average	Min	Max	Average
tench	0.932	0.949	0.942	0.663	0.872	0.769	0.478	0.672	0.587	0.604	0.713	0.665
english_springer	0.894	0.920	0.911	0.473	0.779	0.619	0.134	0.496	0.315	0.219	0.522	0.412
cassette_player	0.916	0.937	0.928	0.544	0.845	0.675	0.272	0.569	0.408	0.407	0.581	0.506
chain_saw	0.897	0.908	0.903	0.349	0.933	0.576	0.008	0.127	0.068	0.015	0.214	0.120
church	0.901	0.921	0.911	0.533	0.844	0.666	0.134	0.438	0.301	0.226	0.518	0.411
french_horn	0.886	0.907	0.900	0.300	0.634	0.507	0.008	0.353	0.186	0.015	0.406	0.265
garbage_truck	0.892	0.927	0.917	0.464	0.846	0.645	0.193	0.584	0.395	0.303	0.565	0.484
gas_pump	0.870	0.901	0.892	0.295	0.684	0.480	0.062	0.234	0.151	0.109	0.306	0.228
golf_ball	0.898	0.919	0.912	0.496	0.836	0.658	0.128	0.434	0.298	0.222	0.494	0.407
parachute	0.920	0.944	0.937	0.583	0.875	0.773	0.313	0.674	0.532	0.448	0.685	0.626
all_classes	0.870	0.949	0.915	0.295	0.933	0.637	0.008	0.674	0.324	0.015	0.713	0.412

Table 2. The LeNet-5 architecture and the number of parameters.

Layer (Type)	Output Shape	Param Num
Conv2D	(None, 32, 32, 6)	456
Conv2D	(None, 12, 12, 16)	2416
Conv2D	(None, 2, 2, 120)	48,120
Dense	(None, 84)	40,404
Dense	(None, 1)	85

Table 3. Jensen–Shannon Divergence values across different layers.

Layer	Min	Max	Avg
Conv2d	0.0071	0.1001	0.0357
Conv2d_1	0.0050	0.0723	0.0256
Conv2d_2	0.0019	0.0566	0.0200
Dense	0.0024	0.0680	0.0201
Dense_1	0.0218	0.2394	0.0904

Table 4. Classification accuracy results of the neural network dataset applied to full model parameters. The results show that machine learning algorithms can analyze a neural network and identify what the neural network is trained to classify.

Model	Accuracy	Precision	Recall	F1
Random Forest	0.49	0.47	0.49	0.46
Support Vector Machine	0.25	0.22	0.25	0.22
Naive Bayes	0.72	0.73	0.72	0.72
XGBoost	0.69	0.69	0.68	0.68
Logistic Regression	0.10	0.10	0.10	0.10
DNN	0.19	0.13	0.19	0.13

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kurtenbach, D.; Shamir, L. An Open Dataset of Neural Networks for Hypernetwork Research. Electronics 2025, 14, 2831. https://doi.org/10.3390/electronics14142831

AMA Style

Kurtenbach D, Shamir L. An Open Dataset of Neural Networks for Hypernetwork Research. Electronics. 2025; 14(14):2831. https://doi.org/10.3390/electronics14142831

Chicago/Turabian Style

Kurtenbach, David, and Lior Shamir. 2025. "An Open Dataset of Neural Networks for Hypernetwork Research" Electronics 14, no. 14: 2831. https://doi.org/10.3390/electronics14142831

APA Style

Kurtenbach, D., & Shamir, L. (2025). An Open Dataset of Neural Networks for Hypernetwork Research. Electronics, 14(14), 2831. https://doi.org/10.3390/electronics14142831

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Open Dataset of Neural Networks for Hypernetwork Research

Abstract

1. Introduction

2. Background

3. A Dataset of Neural Networks

LeNet-5 Model Training Specifications

4. Parameter Distribution

5. Automatic Classification of Neural Networks

5.1. Classification Methods

5.2. Classification Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI