1. Introduction
Neuromorphic engineering aims to implement artificial neural systems that reproduce key computational principles and dynamics observed in the biological brain [
1,
2,
3,
4]. Many demonstrated neuromorphic systems have focused on pattern recognition tasks, such as character and image recognition [
5,
6,
7,
8,
9], often relying on unsupervised learning mechanisms based on spike-timing-dependent plasticity (STDP) and deep neural network architectures with a large number of layers and synaptic connections [
2,
7,
8,
9,
10,
11,
12]. While effective, these approaches typically result in complex hardware implementations with significant area and design overhead [
13,
14].
Beyond deep architectures, alternative bio-inspired models seek to emulate cortical processing using relatively compact spiking neural networks trained through biologically plausible learning rules [
7,
10,
11,
15,
16]. However, reproducing the massive synaptic connectivity of the mammalian brain—on the order of 10
4 synapses per neuron—remains a major challenge for conventional CMOS technologies [
17]. This limitation has motivated increasing interest in emerging nanoelectronic devices, particularly memristors, as candidates for electronic synapses in neuromorphic systems [
18,
19,
20,
21].
Memristive devices offer several advantages over purely CMOS-based implementations, including low power consumption, scalability, non-volatility, and analog conductance modulation that can be directly mapped onto synaptic weights. In this context, memristive crossbar arrays enable highly efficient matrix–vector multiplication by exploiting Ohm’s and Kirchhoff’s laws, leading to significant improvements in speed and energy efficiency compared to conventional CMOS accelerators [
5,
10,
16,
22]. Among the various memristive technologies, resistive random access memory (RRAM) has been extensively investigated for neuromorphic applications due to its capability to implement STDP-like learning rules [
10,
11,
15,
23]. Recent advances in oxide-based RRAM (OxRAM), including scaling to the 10 nm node, large-scale array demonstrations, and high endurance [
24,
25,
26,
27,
28,
29], further support its potential for future large-scale neuromorphic systems [
30].
Despite these advances, most memristor-based neuromorphic studies have focused on deep architectures or flat associative memory structures. In contrast, the integration of self-organizing maps (SOMs) with memristive synapses has received comparatively limited attention, even though SOMs offer attractive properties such as unsupervised clustering, topological organization, and robustness. Moreover, existing works rarely explore the combination of SOM-based representations with associative learning mechanisms within a hierarchical framework.
In this work, we propose a biologically plausible neuromorphic architecture based on a neuromorphic self-organizing map (NSOM) module. Using an OxRAM synaptic model calibrated with experimental data, the proposed system forms a hierarchical multi-layer network trained exclusively through unsupervised and associative learning rules. The architecture is composed of identical NSOM modules and enables the natural classification and association of heterogeneous information sources, inspired by hierarchical processing principles observed in the brain.
The proposed architecture differs from prior work in the following aspects:
- -
Unified topographical and hierarchical organization: it combines self-organizing topographical maps with a hierarchical modular structure within a single memristive framework.
- -
Explicit associative binding layer: it introduces a dedicated associative layer that binds distributed representations originating from multiple input modalities.
- -
Bidirectional multimodal recall with local learning: it enables bidirectional recall between modalities using exclusively local STDP-based learning rules, without relying on global optimization procedures.
- -
Concept-level multisensory integration: it focuses on concept-level multisensory integration as an architectural principle, rather than on benchmark-driven classification performance or hardware acceleration metrics.
These features enable progressive abstraction through hierarchical self-organization, natural multisensory integration via associative binding, and bidirectional recall across modalities. By relying exclusively on local STDP-based learning, the architecture maintains simplicity, biological plausibility, and scalability without requiring global supervision or backpropagation. In summary, this work presents an architectural proof-of-concept demonstrating how self-organization combined with associative learning can emerge from simple, modular building blocs, highlighting a principled route toward hierarchical multisensory integration in memristive neuromorphic systems.
2. Symmetric and Linear Weight Updates Through Spike-Timing Dependent-Plasticity (STDP) in OXRAM Devices
One of the goals concerning the implementation of analog electronic synapses consists of inducing small changes in the synaptic weight of the device, i.e., in its conductivity. The linearity and symmetry of the synaptic weight update process is of great interest, since it contributes to the optimization of the training speed and learning performance of an artificial neural network [
5,
21]. However, its implementation in memristive devices remains a challenge due to the non-linearity of their conductance response [
5,
21,
31,
32].
The proposed hierarchical system will be demonstrated through simulation, so that a suitable model for describing the electrical behavior of memristors is required. In this work, a previously proposed model for memristive devices has been considered [
33], calibrated with experimental data. For this purpose, OxRAM devices based on TiN/Ti-HfO
2-W metal–insulator–metal (MIM) structures with 10nm oxide thickness and an area of 5 × 5 μm have been measured. The OxRAM model used in this work already accounts for several device non-idealities, including non-linear conductance updates and saturation effects. However, statistical cycle-to-cycle and device-to-device variability are not explicitly considered in this proof-of-concept study.
Details on the manufacturing process can be found in [
34]. The employed samples display the resistive-switching behavior (
Figure 1a), consisting of transitions from high (HRS) to low (LRS) resistance states and vice versa, which are identified as the SET and RESET processes, respectively. In
Figure 1b some examples of pulsed G-V curves are depicted (gray dots), with the conductance G expressed in G
0 units (G
0 = 77.5 µS). The electrical characterization results indicate that the tested devices are suitable to function as a synapse in a neuromorphic crossbar-array, where the conductance of each of the OxRAM devices is identified as a synaptic weight [
35]. Specifically, it was found that by means of pulse-programming, the synaptic weight can be controlled in an analog manner within a certain range, which is limited by the gmax and gmin values marked in
Figure 1b. In [
36], the experimental G-V characteristics were modeled to simulate the synaptic devices’ electrical behavior. An example of a modeled G-V curve is displayed in
Figure 1b (blue and red lines). Moreover, a linear dependence of the conductance and the employed pulse amplitude was found between 0.65 V and 1 V. The devices can thus be operated within a linear regime, referred to as linear region (brown striped area in
Figure 1b), where an analog synaptic weight with a linear dependence on the pulse amplitude can be programmed. The pulses waveform applied to the device terminals (
Figure 2a,b) controls the voltage difference on the device (
Figure 2c) and consequently can be defined to obtain linear STPD behavior (
Figure 2d).
3. The Modules in a Hierarchical Neuromorphic Self-Organizing Map
A hierarchical computing system is an architecture where multiple layers arranged at different hierarchy levels are considered. The layers located at the bottom levels of the hierarchy are in charge of pre-processing the information given as the input to the whole system. The output of the first level of hierarchy is fed as the input of the layers in the following hierarchy level. The hierarchical organization is found in many biological structures, such as in the brain [
37]. Multiple studies have shown that hierarchical biological neural networks present a spatial organization as to reduce their amount of wiring, by means of having fewer long connections and by locating neurons optimally to reduce the wiring between them [
38]. Furthermore, hierarchical neural networks are capable of associating information by integrating data coming from multiple sources, giving rise to the development of complex or abstract data. In biology, this process is referred to as multisensory integration [
39].
Multisensory integration encompasses primary sensory processing areas, such as the visual and auditory primary cortexes of the mammalian brain, which present topographical organization, and a higher-order processing region such as the association areas (parietal and temporal areas), which constitute the largest area of the cortex in primates. A diagram of the above multisensory integration system is depicted in
Figure 3a where the information processed in the auditory, visual and somatosensory areas (all of them in the first or bottom level of the hierarchy) flows to the integration areas (second level of the hierarchy), and further to the frontal control areas (third or top level).
Our goal is to propose a hardware multi-layered hierarchical NSOM based on identical modules, emulating the multisensory integration hierarchical system of the biological brain. Each of these modules consists of a single synaptic layer arranged as a crossbar array, where each of the synapses is implemented with an OxRAM device (
Figure 1), which connects an input neuron with an output neuron. The input neurons are meant to be connected to external sensors, and the information is coded by means of the location of each of the input neurons. The learning algorithm used for training the NSOM is based on the self-organizing map algorithm, which includes an integrate-and-fire-with-threshold-based model for neurons [
40], which is an unsupervised learning algorithm with applications in many practical fields, especially in those which handle high-dimensional data, such as medicine, meteorology, oceanography or economics. The goal of the self-organizing map learning algorithm is to organize the output neuron layer according to the different features of the input dataset. Each of the output neurons learns a unique combination of synaptic weights that represents one of the features of the input dataset. After the neural network has been trained with some input dataset examples, the output neuron layer shows a topological organization, meaning that nearby output neurons represent (or are sensitive to) a similar input data feature, whereas distant output neurons are related to different features. The first version of the NSOM was proposed in [
41], where the device-level variability impact of the employed OxRAM devices on the system performance was studied by means of simulation, and will be shortly described here, for completeness.
In particular, the NSOM example from
Figure 4a will be used as one of the modules of the proposed hierarchical NSOM, consisting of three input neurons (red, green and blue circles), and a unidimensional output neuron layer (black circles). The input and output neuron layers are connected by OxRAM synaptic devices, which are located in each cross-point. Here, it is trained to map and classify colors, according to the RGB color components of the pixel of an image. Then, the input dataset consisted of the red (R), green (G) and blue (B) components of a color, which were related to the activation of the first, second or third input neuron, respectively (
Figure 4a). Initially, the synapses have a random synaptic weight value. The input dataset is entered into the network by means of the activation of a particular single or group of input neurons. The activation of an input neuron means that it fires a voltage pulse (i.e., the pre-synaptic pulse in
Figure 2a) towards the synaptic layer. The output neurons receive an amount of current according to the conductance values of each of its synapses, which are arranged in columns (
Figure 4a). The received current updates the output neurons’ transmembrane potential, which increases until it reaches a predefined voltage threshold. The transmembrane potential also gives information on the intensity of the post-synaptic neuron response to a particular input, this being the response of the output neurons, referred to as its activation. If the voltage threshold is reached by an output neuron, it triggers a pulse towards its synapses (e.g., the post-synaptic pulse in
Figure 2b), which overlaps temporarily with the pre-synaptic pulse, and causes a voltage drop at the involved synapses (
Figure 2c). The synaptic weights are then updated according to the STDP function (
Figure 2d). Finally, the output neuron communicates with its neighbor neurons, which also trigger a post-synaptic pulse, with a certain delay. Their synaptic weights are also updated, but the magnitude of the synaptic weight change is smaller, since the time delay between the pre- and post-synaptic pulses is larger. This process is repeated iteratively, and the output neurons and their neighbors start to respond to a specific input data feature.
The results of this training stage are shown in
Figure 4b,c. In
Figure 4b, the normalized synaptic weights of the crossbar array after the training stage were performed are represented in a gray scale. The normalization of the synaptic weights is used to simplify the simulation process and is based on the experimentally observed range of available conductance states of the above described OxRAM devices. The intensity of the response of the output neurons, which is related to the change in their transmembrane potential, is plotted in
Figure 4c. Each of the three curves show the output neurons response when a single input neuron was active, meaning that one of the three RGB color components was being perceived by the system (red squares for the R, green circles for the G and blue triangles for the B). When the learning process is concluded, the output neuron layer of the NSOM appears topographically organized in clusters responding to similar cases of the input data set, i.e., there are groups of neurons located nearby that are sensitive to the same color component, whereas distant output neurons react to completely different colors. Each of the output neurons acts as a linear combiner of the three color components and depending on the state of the synapses to which it is connected after the training, they react more to a particular color. The results of [
36,
41] showed that the system in
Figure 4a classified any other input case according to its similarity with the previously learnt RGB color components. This particular example of NSOM will be referred to as the RGB map and is the baseline of hierarchical NSOM that is proposed in this work. Moreover, because the learning algorithm is based on unsupervised learning-friendly rules such as the STDP, it is possible to concatenate multiple self-organizing synaptic crossbar arrays, where information can flow in a bidirectional manner. By means of adding computing self-organizing layers, the neuromorphic hierarchical computing system of
Figure 3b can be built, as will be described in
Section 5.
4. Extension of the NSOM: Data Encoding
The NSOM shown in
Figure 4a is a simple academic case, in the sense that the input dataset is based on the presence or absence of the input dataset, which in this case were the RGB components of an image. NSOM can be used to process other types of input datasets, such as those made of quantitative data. In this case, the codification of the input data is not straight-forward. In the present work, we first introduce a solution to properly encode the input dataset when the variables under study cannot be represented in terms of the absence or presence of a feature, but rather by a discrete value within a range. An example of this type of feature is data related to the frequencies of sounds. In this subsection, an example of the proposed NSOM extension is provided. Our objective is to build a neuromorphic system capable of mapping the Spanish vowel sounds according to the values of their two formant frequencies. This is motivated by the evidence of topological organization found in the cochlea and in the primary auditory cortex of the human brain [
42]. Therefore, this example of extended NSOM will be referred to as the auditory map.
In the case of the human voice, the formant frequencies are often measured as amplitude peaks in the frequency spectrum and give an estimate of the vocal tract resonances.
Figure 5a shows a representation recorded from when the first author and author’s partner recorded vowel sounds, in terms of their formant frequencies values, in particular the first (F1) and second (F2) formant frequencies, for the three particular vowel phonemes \ε\, \i\ and \u\ (corresponding to the words “red”, “green” and “blue”, respectively). To determine these frequencies, a set of six instances per vowel was recorded, each of them with an approximate duration of 0.2 s, from which the formant frequencies were extracted. In this section, for simplicity, we will focus only on the first author’s voice, as a particular example.
Figure 5b gives the average values of F1 and F2 for the vowel phoneme \ε\, as said by the first author. Since the objective consists of developing a NSOM where different sounds can be mapped or represented in terms of F1 and F2, a single synaptic layer relating to the input and output neuron layers of the network is considered. Note that, if all the three phonemes were considered, after the training stage of the NSOM, the output neuron layer is meant to be organized in clusters, where the neurons within the same group respond strongly to similar values of the input dataset features, which in this case are sound frequencies.
As to feed the input data to the network, instead of only using one input neuron for each feature as in the RGB map described in the previous subsection, we now consider multiple neurons for the representation of an input feature. We propose to divide the input neuron layer into as many groups as different features the input data set has. In this case, two groups of neurons are required: one for representing F1, and another for F2. We now assume that each group is constituted by an arbitrary number of input neurons, and that the input data (frequency with values between 0 Hz and 3000 Hz) is first divided into ranges (in this case, six frequency ranges), which are further coded in binary. Therefore, each of the input neurons within a group represents a particular range of frequencies; for instance, the first neuron within the F1 and F2 groups represent the range 0 Hz–500 Hz, whereas the last one represents the range 2500 Hz–3000 Hz. Hence, the input neurons layer is constituted by 12 neurons, representing six frequency ranges for F1 (the first six input neurons or rows, denoted as In
1,x, where x is the range number) and for F2 (last six input neurons, denoted as In
2,x). A scheme of the input neuron layer distribution is shown in
Figure 6a, where a neuromorphic crossbar array with a size of 12 × 100 is depicted.
The presence of a phoneme is coded in terms of the activation of the input neurons representing the ranges where the F1 and F2 values of that phoneme are located. The input neurons’ activation is translated to the application of a certain voltage to that particular row in the synaptic crossbar array.
The system shown in
Figure 6a is trained using the phonemes for the author’s voice as input dataset (\ε\ phoneme in
Figure 5b, as example) and the learning algorithm used for the RGB map training.
Figure 6b shows the synaptic crossbar array after the training stage has been performed, where the normalized synaptic weight is represented in the same gray scale as the one employed in
Figure 4b. As can be seen in
Figure 6b, the system has learned the three phonemes that have been shown during the training stage, i.e., only the connections relating to the output neurons with the input neurons coding the phonemes shown of
Figure 5 have been established. A relatively large normalized synaptic weight (the lighter the color, the larger the synaptic weight) is observed for the input frequencies, whereas the synaptic devices from other rows present a very low synaptic weight (represented with darker colors in
Figure 6b), since the system has never been introduced to other sounds with other combinations of F1 and F2. As occurred with the RGB map, the output neurons appear organized topographically, meaning that some groups of nearby neurons are sensitive to one of the presented phonemes, whereas distant neurons are sensitive to other F1 and F2 combinations. The topological organization can be observed in the output neurons’ response depicted in
Figure 6c, where each curve corresponds to the change in the transmembrane potential of the output neurons when the three different phonemes are fed individually to the network after the training stage (red squares corresponds to \ε\, green circles corresponds to \i\ and blue triangles to \u
\). Despite having a different type of input data, the simulated auditory map has the same behavior as the RGB map: the topographical organization of the output neuron layer permits us to classify the new input data according to its similarity with the data it has been trained with. Both NSOMs share the same structure, being identical modules, except for their size. In the following subsection, the two NSOMs are used to build a hierarchical system, whose task is to perform a multisensory integration process in a third synaptic layer.
5. Implementation of the Neuromorphic Hierarchical System
Our next step towards the implementation of a hierarchical NSOM consisted of training a third NSOM with the outputs of the RGB and auditory maps described in the previous sections. The latter maps are at the bottom level of the hierarchy (first level in
Figure 3b) and are in charge of processing the input information coming from different data sources distributed in parallel. The third NSOM is at a higher level of the hierarchy (the second level in
Figure 3b) and is in charge of associating the information coming from the RGB map with that coming from the auditory map. At this higher level of the hierarchy, association processes can take place. These processes give rise to symbolic learning: an approach based on the assumption that many aspects of intelligence can be achieved by the manipulation of symbols [
42]. In the case of the example detailed in the present subsection, relationships between colors and sounds are meant to be learned by the system: the blue (B) component of an image and a sound, such as the word ‘blue’ in English [ˈblu]; the red (R) color component and the frequency components of the phoneme \ε\, such as the one in the word ‘red’ [ˈrɛd]; and the green (G) component with the \i\ phoneme within the word [ˈɡriːn]. Hence, the higher level of the hierarchy computes the symbols of ‘blue’, ‘green’ and ‘red’, which relate the colors with the sound of their names in a particular language. In this way, the hierarchical NSOM is not only able to classify information into a higher level of abstraction, e.g., identifying both the blue-colored pixels of an image and sounds as the symbol ‘blue’, but also learns that there is a relationship or connection between a particular color component and a set of sound frequencies, exhibiting associative behavior.
An example of such neuromorphic hierarchical system is depicted in
Figure 7, consisting of three NSOMs: the RGB map (
Figure 7a), the auditory map (
Figure 7b) and the associative map (
Figure 7c), the latter consisting of 200 input neurons and three output neurons. In the maps in the bottom level of the hierarchy, the input neurons are located at the top and bottom of the RGB and auditory maps, respectively, and the output neurons located at the right part of both maps. Both maps are assumed to be already trained, following the procedure explained in subsections III and IV, whereas no associative learning process has occurred yet. The inputs of the associative layer (
Figure 7c) are the outputs of the RGB and auditory maps: in this case, when an output neuron of a NSOM reaches the threshold potential, it not only triggers a pulse towards its synapses located in the RGB or auditory maps, but also towards its other synapses, located in the associative layer. Therefore, the output neurons of the RGB and auditory maps are behaving as the input neurons of the associative map.
Specifically,
Figure 7 shows that the associative layer has already some connections with the RGB map. In this scenario, a pure green color (G color component) is being perceived by the system (green arrow in
Figure 7a). The activation of the input neuron G in the RGB map (
Figure 7a) excites the output neurons of the RGB map. The output neuron showing the larger response triggers a pulse towards all of its synapses: three of them are in a row of the RGB map, and three of them are in a row of the associative layer. The green arrows in
Figure 7c represent the current flowing from the RGB output neuron layer through the associative map. In this example, only one of the three synapses located in the same row of the associative map has a relatively large synaptic weight (represented with a light gray color). Then, most of the current flows through this synapse, and excites the associative output neuron #2. The associative output neurons response corresponding to this situation is displayed in
Figure 7d.
On the one hand, there are a total of three connections between the associative layer and the RGB map that show a relatively large synaptic weight in
Figure 7c. The corresponding synaptic weight updates have been performed following the training stage procedure of a regular NSOM (i.e., by means of computing the change in the transmembrane potential of the output neurons and of the STDP function for the synaptic weight update process). On the other hand, any connection with the auditory map has a significant synaptic weight. This is because no associative learning process has occurred yet.
To associate the data coming from the RGB and the auditory maps in the associative layer, the implementation of an associative learning process has to be considered. The simplest case of associative learning involves at least three neurons and two synapses, following the distribution depicted in the diagram of
Figure 7e. An experimental demonstration of this simple case with two of our OxRAM devices can be found [
43], where a Pavlov’s dog experiment was emulated. Initially, the associative output neuron responded only to one of the input sources, which is the unconditioned stimulus (in the present case, this stimulus comes from the RGB map). The other source input is the neutral stimulus (which comes from the auditory map). It was found that, by means of presenting both input data sources together with the proper pre- and post-synaptic pulse amplitudes, the associative output neuron could relate the two input data sources, so that it ended being reactive to both, if presented individually. In the present case, the auditory data would become a conditioned stimulus, if the association process has been successful. Although the system processes information coming from heterogeneous sensory sources, multisensory integration is not performed through explicit feature-level fusion. Instead, each modality is first mapped onto a distributed and topographically organized neural representation by a dedicated self-organizing map. Integration across modalities emerges at a higher hierarchical level through associative learning, where coincidence activations from different sensory maps are bound together via STDP.
Let the scenario depicted before in
Figure 7 be the starting point of the associative learning process. Nothing is expected to happen if an output neuron of the auditory map (e.g., the one responding to phoneme \i
\) fires a pulse towards the associative layer, since the involved synapses present a very low synaptic weight. However, if the phoneme \i\ (activation of auditory input neurons In
1,1 and In
2,5) is presented simultaneously with a color (e.g., green color), the associative output neuron #2 will fire a pulse towards the associative layer, causing a voltage drop at the involved synapses. Since the connections between the associative output neurons and the input neurons related to the auditory map present a low synaptic weight, the voltage drop will provoke a synaptic weight increment, given by the STDP function. In this case, the associative output neuron #2 will pair the ‘G’ color component (green) with the frequencies F1 and F2 corresponding to the \i\ phoneme (corresponding to the word “green”). Therefore, the system learns that there is a relationship between these two inputs, which corresponds to the concept of “green”. After the associative learning process has occurred, the associative output neurons can be excited by the individual presentation of any of the input data cases (a color component or a sound). If the two inputs related to the same concept are presented simultaneously, the associative output neuron response will be more intense, since the amount of current that it will receive will be larger.
Because the neurons of the proposed system can behave as both input and output neurons, the flow of the information within the system is bidirectional. This means that the following situations can now take place: (1) if a neuron corresponding to the green symbol within the higher level of hierarchy is activated (i.e., associative output neuron #2 in
Figure 7c), the system will remember which color corresponds to the learnt green concept (
Figure 7a), and how the vowel within the word “green” sounds. The recall process occurs by means of activating its relative neurons from the lower hierarchy levels. Another situation consists of the following: (2) if the word “green” is perceived (/i/ phoneme), the green color component (G) will be recalled. In other words, a spiking neuron from the auditory layer will provoke first the activation of the associative output neuron #2, and due to the synapses of the latter located in the associative map, an output neuron of the RGB map will also be exciting. Therefore, a single active input source (which is perceived) causes the activation of the input neuron related to other input sources (they are being recalled, not perceived), and this is due to their connections in the associative layer.
6. Addition of Other Data Sources Input
As mentioned above, many modules can be added to the system in any of the hierarchy levels, and associative relationships can be learnt in different areas of the associative layer. As to give an example, consider a new NSOM module located at the bottom level of the hierarchy, in charge of processing images (shapes), which are related to facial features. This NSOM is referred to as the people map. The input dataset used to train the auditory map has been extended and now consists of vowel sounds coming from two different people.
Figure 5a shows the average formant frequencies obtained from recordings of the first author’s voice (person #1, depicted with circles), used in the previous example, and of her partner’s voice (person #2, depicted with triangles).
Figure 5b shows the average F1 and F2 values for the \ε\ phoneme for the two voices. It can be seen in
Figure 5c that the F1 value of the two people activates the same input neuron (In
1,2), whereas the F2 activates a different input neuron (In
2,4 and In
2,5, for person #1 and person #2, respectively). The system is expected to be able to distinguish between the two people’s voices, despite showing similar F1 and F2 values. To do so, associations between the processed phonemes (coming from the auditory map) and the shapes related to the facial features (from the people map) must be established in an untrained region of the associative layer, referred to as the extended region. No association with the color information coming from the RGB map is taken into account for the extended region. The expected behavior of this extended region of the associative layer is represented in the diagram shown in
Figure 8a, which represents the connections built between the auditory and the people maps, by means of the associative layer. The example of
Figure 8a corresponds to the situation when the \ε\ phoneme from the person #1 voice and its facial features are the inputs of the system. Because the extended region is not going to be trained with the RGB data, no connection between the associative output neurons and the RGB map is expected to have a significant synaptic weight in this area.
Figure 8b shows the whole proposed system after the associative learning process has occurred. In here, the hierarchical layer is indicated with a green frame and has a size of 250 × 5. It includes the extended region (2 × 250), which is represented before and after the training. The system of
Figure 8b also shows the people map (a 2 × 50 array), which is indicated with a blue frame. In this new scenario, the auditory array has been trained with the new input dataset, including the person #2 F1 and F2 values of the \ε\ phoneme. The activation of the auditory output neurons is plotted for the person #2 (depicted with triangles) and person #1 (shown with circles) \ε\ phonemes. The activation of the output neurons of the people map is also shown for both person #2 (triangles) and person #1 (circles) shapes.
Two different regions of the associative layer can be distinguished. The first one (250 × 3) relates the color components with sounds, and its output neurons (labeled as ‘1’, ‘2’ and ‘3’) are, on the one hand, sensitive to both types of data. On the other hand, since these associative output neurons do not respond to the people map information, any connection between them and the neurons of the people map presents a high synaptic weight. The second region is the so-called extended region (250 × 2), and its output neurons are intended to respond to both presentations of sound and of people-facial-features-related shapes. After the associative learning has occurred, the extended region (located at the bottom and right part of
Figure 8b) has connections with large synaptic weights between these two pre-processing maps. Hence, significant currents flow through these synapses.
Figure 8b also shows the principal current paths (red arrows) when the system is perceiving the person-#1-related shape and hearing the \ε\ phoneme given by the person-#1 voice. The phoneme \ε\ affects not only to this associative output neuron related to the “person #1” concept, but also to the associative output neuron “3”. Because the flow of information is bidirectional, the RGB map is also affected by the presentation of \ε\, yet no color component is being considered as an input in this new situation (any color is being perceived). Hence, the associative output neuron ‘3’ naturally recalls that the phoneme \ε\ is related to red (color component R). The person-#1-related shape only contributes directly to the activation of the associative output neuron related to the “person #1” concept by means of the new module (the people map), affecting only to the extended region of the associative map. Because of the contributions of \ε\ and the person-#1-related shape, the system can match the person-#1 voice with the corresponding shape, creating the “person #1” concept. Therefore, the system can perform multisensory integration and behave as an associative memory. Yet, not shown in the present work, if the input dataset features change over time, the system is expected to redistribute its synaptic weights as to properly represent them, permitting an on-line (in situ) unsupervised learning process. It has been shown that simulated neuromorphic systems based on OxRAM devices are able to process the input data in a parallel, distributed and hierarchical way. All these characteristics are of great interest when decision-making tasks are considered in applications where a parallel computation process is required, such as in control systems operating in real-time. The given examples are meant to provide proof-of-concept of the proposed architecture, and more studies focused on quantifying the performance of the system for different types of input data and system parameters (size of the system, electrical behavior of the synaptic devices and neurons, parasitic effects, etc.) have to be conducted.
7. Conclusions
In this work, a novel hierarchical neuromorphic architecture based on neuromorphic self-organizing map (NSOM) modules has been presented. The proposed system combines biologically plausible self-organization driven by STDP-based unsupervised learning with associative mechanisms operating across multiple hierarchical layers. Using an OxRAM device model calibrated on experimental data, the architecture has been evaluated through simulations as a proof-of-concept, demonstrating how classification and association of heterogeneous information can naturally emerge from modular crossbar-based building blocks.
The results show that the proposed architecture can form topologically organized representations at lower hierarchical levels and associate these representations at higher levels to encode abstract concepts, without external supervision. Rather than functioning as an associative memory in the conventional engineering sense, the system establishes relationships between distributed representations generated by preceding NSOM layers.
This work is intended as architectural and functional. Future research directions include the integration of the proposed architecture with physical OxRAM arrays, the analysis of circuit-level non-idealities and device variability, and the evaluation of system-level performance under realistic operating conditions. These steps will be essential to translate the proposed proof-of-concept architecture into practical neuromorphic hardware systems.