Neuromodulated Dopamine Plastic Networks for Heterogeneous Transfer Learning with Hebbian Principle

: The plastic modiﬁcations in synaptic connectivity is primarily from changes triggered by neuromodulated dopamine signals. These activities are controlled by neuromodulation, which is itself under the control of the brain. The subjective brain’s self-modifying abilities play an essential role in learning and adaptation. The artiﬁcial neural networks with neuromodulated plasticity are used to implement transfer learning in the image classiﬁcation domain. In particular, this has application in image detection, image segmentation, and transfer of learning parameters with signiﬁcant results. This paper proposes a novel approach to enhance transfer learning accuracy in a heterogeneous source and target, using the neuromodulation of the Hebbian learning principle, called NDHTL (Neuromodulated Dopamine Hebbian Transfer Learning). Neuromodulation of plasticity offers a powerful new technique with applications in training neural networks implementing asymmetric backpropagation using Hebbian principles in transfer learning motivated CNNs (Convolutional neural networks). Biologically motivated concomitant learning, where connected brain cells activate positively, enhances the synaptic connection strength between the network neurons. Using the NDHTL algorithm, the percentage of change of the plasticity between the neurons of the CNN layer is directly managed by the dopamine signal’s value. The discriminative nature of transfer learning ﬁts well with the technique. The learned model’s connection weights must adapt to unseen target datasets with the least cost and effort in transfer learning. Using distinctive learning principles such as dopamine Hebbian learning in transfer learning for asymmetric gradient weights update is a novel approach. The paper emphasizes the NDHTL algorithmic technique as synaptic plasticity controlled by dopamine signals in transfer learning to classify images using source-target datasets. The standard transfer learning using gradient backpropagation is a symmetric framework. Experimental results using CIFAR-10 and CIFAR-100 datasets show that the proposed NDHTL algorithm can enhance transfer learning efﬁciency compared to existing methods.


Introduction
Real brain neurons motivated and inspired the development of neural networks. Deep neural networks such as CNN (Convolutional Neural Networks) using SGD (Stochastic Gradient Descent) to support backpropagation and optimization of such algorithms are popular techniques of contemporary times [1][2][3]. However, neuroscience strongly suggests that biological learning is more relevant to methods such as the Hebbian learning rules and STDP (Spike timing-dependent Plasticity) [4,5].
The propensity to continually adapt and learn in unseen environments is a crucial aspect of human intelligence. Imbibing such characteristics into artificial intelligence is a challenging task. Most machine learning models assume the real-world data to be stationary. However, the real world is non-stationary, and the distribution of acquired data changes over time. These models are fine-tuned with new data, and, as a result, the performance degrades compared to the original data [6,7]. These are the few challenges for deep neural networks, such as long-term learning scenarios [8,9].
The goal in this scenario is to learn consecutive tasks and new data representations with varied tasks. At the same time, it is sustaining how to perform and preserve learned tasks. Many real-world applications typically require learning, including transfer learning, model adoption, and domain adaptation. Neural networks architecture in most recognized learning methods requires independent and similarly distributed data samples from a stationary training distribution. However, in a real-world application, there are class imbalances in the training data distribution, and the test data representation in which the model is expected to perform are not initially available. In such situations, deep neural networks face challenges integrating newly learned knowledge and maintaining stability by presenting the existing knowledge [10,11].
Plasticity using neuromodulated dopamine improves the performance of neural networks on the supervised transfer learning task. Seeking more plausible models that mimic biological brains, researchers introduced alternative learning rules for artificial intelligence. In this work, we explore Hebbian learning rules in the context of modern deep neural networks for image classification. Hebbian learning refers to a family of learning rules inspired by biology, which states, "the weight associated with a synaptic connection increases proportionally to the values of the pre-synaptic and postsynaptic stimuli at a given instant of time" [12,13].
Contemporary deep learning models have achieved high accuracy in image segmenting [14,15]. The latest optimization techniques like gradient descent are very effective [16][17][18], and various neural network models such as AlexNet and VGGNet have been successfully applied for object recognition and image classification tasks [19][20][21][22][23]. In these models, overfitting is an issue that troubles the researchers during in-depth learning algorithms training with insufficiently labeled images [24]. Without proper care, CNN's performance decreases with datasets that are small in size. Methods like parameter finetuning and image augmentation have shown significant success in overcoming overfitting and produce robust results [25]. However, these pre-trained networks lead to poor performance in scenarios with a lack of labeled data in the target domain. Such a primary problem is a learning technique focused on consolidating learned knowledge and reducing development costs in the target domain.
Biology says that the SGD optimization process works differently than the human brain's fundamental processes [26]. Researchers have studied the plastic nature of neuronal responses [27]-Hebb's law named after the individual who posited it, Canadian psychologist Donald Hebb (1949). Hebb's rule forms substantial connection weights and a better connection path from input to output of the neural network that if a synapse between two neurons is repeatedly activated simultaneously, the postsynaptic neuron fires, the structure or chemistry of neurons changes, and the synapse will be strengthened-this is known as Hebbian learning [28]. Implementing plastic networks using a mathematical form of Hebb's law is demonstrated successfully [29]. Plenty of research work with surveys on learning to transfer weights, focusing on parameter fine-tuning based on error backpropagation, have been studied [30][31][32][33]. The ability to acquire new information and retain it over time using Hebb's plasticity rule has significant effects [34].
This paper presents an algorithm called Neuromodulated Dopamine Hebbian Transfer Learning (NDHTL), which introduces advanced parameter connection weights transfer techniques using Hebbian plasticity. The algorithm alters the parameter weights and administers plastic coefficients responsible for each connection weight's flexible nature [35][36][37]. The neuromodulated dopamine signal controls the plasticity coefficient and decides how much plasticity is required by each neuron to connect during runtime. Using the network's flexible nature, the paper defines the network architecture and CNN connection weights parameters separately. Enhancing speed and ease of relay of learned parameters while improving the algorithms' efficiency is the purpose of the proposed method. Training becomes faster by using transfer learning and requires fewer computations. Plenty of labeled data and suitable computer hardware is needed to run algorithms using trillions of computer CPU cycles. Transfer learning in the biomedical domain is essential for applications such as classifying cancer using DCNN, a mammographic tumor, pediatric pneumonia diagnosis, and visual categorization [38][39][40][41][42][43]. In a non-medical domain, CNN on the FPGA chipset is a futuristic approach.
This work's critical contribution is to provide an algorithm utilizing CNN-based hybrid architecture combining standard CNN and a neuromodulated plastic layer in a one-hybrid structure. The NDHTL method is most essential in the proposed technique. We aim to enhance the standard CNN algorithm with newly engineered asymmetric synaptic consolidation keeping the symmetric connection updates. Using the linear system yields a symmetric weights update. In the traditional setting, the backpropagation requires a symmetric framework system; the feedforward and feedback connection require the same weights for a neural network forward and backward pass. The proposed algorithm is easy to extend to other deep learning techniques and domains.
For experiments, we use the CIFAR-10 and CIFAR-100 image datasets to check the proposed algorithm's effectiveness. The experimental results show that NDHTL outperforms the standard transfer learning approach. Also, the NDHTL algorithm is a novel attempt using advanced techniques in the transfer learning domain for image classification and recognition in computer vision.

Related Works
Hebbian learning explains many of the human learning traits in long-term learning [44]. The difference in electric signals, pre and post spike, in brain cells enables learning in neural networks. The alteration of the connection weight's strength of existing synapses is learned, and memory is attributed to weight plasticity according to Hebb's rule [45][46][47]. As per Hebbian learning theory, the related synaptic strength increases and the degree of plasticity decreases to preserve the previously learned knowledge [48]. In [49], Hebb presented various biologically-inspired research. The Hebbian softmax layer [50] can improve learning using SGD and by interpolating between Hebbian learning. The non-trainable Hebbian learning-based associative memory was implemented with fast weights. Differentiable plasticity [51] uses symmetric SGD to optimize plasticity and standard weights. A symmetric update rule assumes feedforward and feedback connection; connecting the two units is the same.
Several notable works have been inspired by task-specific synaptic consolidation for mitigating catastrophic forgetting [52][53][54]. Catastrophic forgetting is a challenge that can be tackled with task-specific synaptic consolidation to protect historically learned knowledge by dynamically adjusting the synaptic strengths to consolidate memories. Some regularization strategies are also helpful in long-term learning [55], e.g., Elastic Weight Consolidation (EWC) [56].
Other approaches like Synaptic Intelligence (SI) [57], empowers the cumulative change in individual synapses over the entire training task. Another approach is Memory Aware Synapses (MAS) [58]. Each plastic CNN connection value consists of a fixed weight where traditional slow learning stores long-term knowledge and a plastic fast-changing weight for temporary associative memory [59,60]. Few approaches such as the Hebbian softmax layer [61], augmenting slow weights in the fully connected layer with a fast weights matrix [62], differentiable plasticity [63,64], and neuromodulated differentiable plasticity [65] are extensions of the latest techniques. However, most of these methods were focused on rapid learning over the distribution of tasks or datasets [66,67].
They are primarily enabled by plastic changes in synaptic connectivity of the brain's neurons. Most importantly, these changes are actively controlled by neuromodulation, which is itself under the brain's control. The brain's resulting self-modifying abilities play an important role in learning and adaptation and are significant for biological memorization and learning sustainability. Genetic coding carries evolutionary information from one generation to another. Neuromodulated plasticity can be applied to train artificial neural networks with gradient descent. Differentiable neuromodulation of plasticity offers a robust new framework for training neural networks. This neuromodulation of plasticity and dopamine plays an essential role in learning and adaptation. The latest plastic neuromodulation can control previously learned knowledge by avoiding irrelevant weight consolidation, specifically integrating the relevant information and maintaining the current weight strength [68][69][70][71][72][73][74].
Evolution controls neuromodulation, enabling the human/animal brain with selfmodifying abilities and enabling efficient lifelong learning. Surveying the related literature, we discovered that networks with neuromodulation outperform non-neuromodulated and non-plastic networks in various tasks [75][76][77]. Neuromodulation mitigates catastrophic forgetting, allowing neural networks to learn new skills without overwriting previously learned skills. By allowing positive values of plasticity only in the subset of neural weights relevant for the task currently being performed, knowledge stored in other weights about different tasks is left unaltered, alleviating forgetting [78,79]. The plasticity of individual synaptic connections can be optimized by gradient descent similar to standard synaptic weights [80,81].
The plasticity of connections can be modified moment-to-moment based on a signal computed by the network. This allows the network to decide where and when to be plastic, enabling the network with true self-modifying abilities. They were directly optimizing the neuromodulation of plasticity within a single network through gradient descent [79]. However, evolved networks operated on low-dimensional problem spaces and were relatively small. The Hebbian rule application remains limited to relatively shallow networks [82][83][84][85]. With large amounts of data, machine learning algorithms do well [86]. Integrating neuromodulated plastic connections applying Hebbian learning with a modern algorithm can be a solution. Neuromodulated plasticity in transfer learning can accelerate neural network training while increasing efficiency using Hebbian synaptic consolidation. Such techniques will enhance the speed of the application of technology to real-world problems. That will further empower adoption of the modern technology in the more significant part of the world to improve human life.

Problem Definition
The heterogeneous transfer learning problem statement is described in this section [87]. A T task is represented by a label space Y and a prediction function f(·). The function is dynamically learned during runtime from a dataset D = {x i , y i }.
Goal: The algorithm aims to calculate the approximation function f T (·) by utilising the source and target classification dataset on task T T , by utilizing learning from the previously learned task T S . Both types of tasks are different, i.e., Tau_S = Tau _T, since they have different label spaces, Y S = Y T . The source and target domains are also different, i.e., D S = D T , and there is a source domain dataset D S = {(x S 1 , y S 1 ), . . . , (x S n , y S n )} and target domain dataset D T = {(x T 1 , y T 1 ), . . . , (x T n , y T n )}. The predictive function f T (·), that makes predictions on the label y T of a classification RGB data x T , is stored as model connection weights W T . Table 1 contains all the notations and respective descriptions.
Input: We provide a source dataset D S , learned connection weights W S calculated from training with source T S , and image data for target task classification D T as input to the algorithm.
Output: The final learned parameters W T obtained with NDHTL. The connection weights for the target task dataset.

The Algorithm
Even though the SGD traditional transfer learning implements a symmetric and synchronous backpropagation, the STL algorithm and NDHTL algorithm are asymmetric. In the NDHTL algorithm, the hyper parameter K controls when the algorithm's backpropagation step. In NDHTL, N is the number of feedforward steps; however, N/K is the number of backpropagation steps, with a similar number of cost function calculations. In the NDHTL Hebbian transfer learning, we update the Hebbian matrix for all the N feedforward cycles. Furthermore, we backpropagate only after K episode cycles are completed. Hence we introduce asymmetric weights updates in the backpropagation process.
Moreover, the same is accurate with cross-entropy cost function calculation. Unless K parameter episodes are reached, we only update the Hebbian matrix for every feedforward performed. So we can say we have asymmetric feedforward and backpropagation in the NDHTL Hebbian transfer learning algorithm as the particular feedforward and backpropagation steps are asynchronous. It implies that the particular backpropagation step performed is independent of any specific feedforward step in the last K episodes of the training cycles. So, the connection weights update and synaptic weight consolidation are asymmetric. The weight value of the Hebbian matrix and connection weight of ith feedforward is different from the jth backpropagation and synaptic gradient consolidation. So our framework for NDHTL implements asymmetric backpropagation and significantly improves the performance of our asymmetric version of backpropagation. Figure 1 shows the flow chart describing the various steps and flow of control in the NDHTL algorithm. CNN architecture is kept the same for all the tasks to make experiments comparable for transfer learning. In the initial step, SGD training is applied to the source task. In the following step, the target task model is initialized with parameters learned from the previous step. In the last step, the Neuromodulated Dopamine Hebbian Transfer Learning algorithm has been applied to fine-tune the weights.
The Hebbian principles are used with a heterogeneous source and target datasets. Hebbian plasticity for connections can be modelled as a time-dependent quantity called Hebbian trace (Hebb i,j ) [65]. The η parameter is a scalar value that automatically learns how quickly to consolidate new experiences into the plastic component. The value of the Hebb trace is calculated and stores as w i,j and Hebb i,j . We are using Equations (1) and (2). The symbol σ in Equation (2) defines non-linearity. It represents an activation function in the neural network. For example, tanh,  The α i,j is a parameter for adjusting the magnitude of Hebb i,j . The Hebb matrix accumulates mean hidden activations of the NDHTL layer for each target task. The pre-synaptic activations of neurons i in hidden compute the postsynaptic activations of neurons j.
Our approach implements transfer learning in the CNN network model with dopamine effects on plasticity. This technique is proposed to introduce neuromodulated plasticity within the transfer learning framework. Plasticity is modulated by a network controlled neuromodulatory signal, M(t), as shown in Equation (3).
M(t) is merely a single scalar output of the network, either used directly as a single value for all connections or passed through a vector of weights (one for each connection). Because η determines the rate of plastic change, placing it under network control allows the network to control how plastic connections should be at any given time. To balance the effects of both rises and dips in the baseline dopamine levels, M(t) can be positive or negative [88].
Hebbian plasticity uses eligibility trace to modify the synaptic weights indirectly; however, it creates a fast-decaying "potential" weight change, which is only applied to the actual weights if the synapse receives dopamine within a short time window. Each synapse is composed of a standard weight and a plastic (fast) weight that automatically increases or decreases due to continuous ongoing activity over time.
As a result, biological Hebbian traces practically implement a so-called eligibility trace [89], keeping the memory of what synapses contributed to recent activity, while the dopamine signal modulates the transformation of these eligibility traces into actual plastic changes.
In Equations (3) and (4), the E i,j (t) (the eligibility trace at connection i, j) is calculated as a simple exponential average of the Hebbian product of the pre-and post-synaptic activity, with trainable decay factor η Hebb i,j (t). The dopamine signal M(t) is the tanh of the eligibility trace., the actual plastic component of the connection accumulates this trace but is gated by the current value of the dopamine signal M(t). M(t) is a symmetric value as calculated by a symmetric function.
The clip function restricts the value of the Hebbian matrix-the clip function, which is more biologically plausible, as it is not unbounded. In the algorithm below, on line number 7, the clamp function is used by setting the clip variable clipval to a minimum value such as 0.5. The clamp function from the PyTorch library uses clipval as the min and max value to clamp the float tensor within the min-max value. By doing so, the value of Hebb i,j (t) is constrained within the clamp range and helps the Hebbian matrix values to grow disproportionately.
The one epoch of training is represented as one lifetime. In NDHTL algorithmic learning, we experimented with CNN architecture from [90] with SGD. The Hebb matrix is initialized to zero only at the start of learning the first task. This Hebbian update reflected in line 7 in the algorithm is repeated for each unique class in the target task. Quick learning, enabled by a highly plastic weight component, improves efficiency accuracy for a given task.
To prevent interference, the plastic component decays between the tasks and selective consolidation results in a stable component to preserve old memories, effectively enabling the model to learn to remember by modeling plasticity to form a learned neural memory. This allows it to scale efficiently with the increasing number of input tasks. The hidden activations are saved as compressed episodic memory in the Hebbian traces matrix to reflect individual episodic memory traces (similar to the work in the hippocampus in biological neural networks [91,92] forgetting of consolidated classes by making connections less plastic to preserve the old information.
Furthermore, by making connections more plastic, the network can integrate and quickly learn new information. Our method encourages task-specific consolidation that alters the standard weights and plastic connection weights only where the change is required and keeps the irrelevant connection weight unchanged using the Hebbian matrix during the process.
The NDHTL Algorithm 1 can be defined as a biologically-motivated approach in computer vision, where simultaneous activation positively affects the increase in synaptic connection strength between the individual cells. The discriminative nature of learning for searching features in image classification fits well with the techniques, such as the Hebbian learning rule-neurons that fire together wire together.
In heterogeneous transfer learning, for such models, the connection weights of the learned model should adapt to the new target dataset with minimum effort. The discriminative learning rule, such as Hebbian learning, can improve learning performance by quickly adapting to discriminate between different classes defined by the target task. We apply the Hebbian principle as synaptic plasticity in heterogeneous transfer learning for the classification of heterogeneous.
A better algorithm will accommodate negative transfer, overfitting, and better synaptic weight consolidation processes. It solves challenges faced in heterogeneous scenarios of the transfer learning technique. It also uses biologically-proven Hebbian rules that form substantial connection weights and better connection paths from input to output of the neural network. Further, it has a better hybrid architecture. Similarly, the existing methods enhanced with plasticity will accommodate minor changes to the parameters of the CNN layers. That makes weights adaptation a quick and faster process.
This discriminating property of Hebbian learning employed with our proposed algorithm makes it a practical approach to techniques such as transfer learning. Relative minor weight fine-tuning using the pre and post spikes of a neural network-like structure enhances fine-tuning of the targeting of an unfamiliar dataset domain. for episode in range(episode) do: for batch_idx, (inputs, targets) in enumerate (train loader): if episode % k == 0: //k is a parameter 9: Calculation of loss function BCE and Back propagate 10: hebb = model.calculate.HebbMatrix() 11:

CNN Hybrid Architecture
The hybrid NDHTL architecture is from [90] with modifications. Furthermore, Hebb i,j plastic connections, and the NDHTL plastic layer, has been added at the end of the CNN. The NDHTL plastic layer following the five convolutional layers has Hebb i,j . The Hebbian trace defines every value of connection weight plasticity. The filter configuration of different layers is 64, 192, 384, 256, and 256, as shown in Figure 2.

CNN Hybrid Architecture
The hybrid NDHTL architecture is from [90] with modifications. Furthermore, , plastic connections, and the NDHTL plastic layer, has been added at the end of the CNN. The NDHTL plastic layer following the five convolutional layers has Hebb , . The Hebbian trace defines every value of connection weight plasticity. The filter configuration of different layers is 64,192, 384, 256, and 256, as shown in Figure 2.

Datasets
In experiments, we use the source dataset CIFAR-10 and target domain CIFAR-100. Figures 3 and 4 show example images.

Datasets
In experiments, we use the source dataset CIFAR-10 and target domain CIFAR-100. Figures 3 and 4 show example images.  We use all class categories of the CIFAR-10 dataset to calculate the source model weights W . The CIFAR-10 source dataset includes truck, ship, horse, frog, dog, deer, cat, bird, automobile, and airplane. We made ten different subsets of CIFAR-100 categories for the target domain by grouping similar superclasses together, as shown in Table 2. We use all class categories of the CIFAR-10 dataset to calculate the source model weights W S . The CIFAR-10 source dataset includes truck, ship, horse, frog, dog, deer, cat, bird, automobile, and airplane. We made ten different subsets of CIFAR-100 categories for the target domain by grouping similar superclasses together, as shown in Table 2. Symmetry 2021, 13, x FOR PEER REVIEW 11 of 21

Experimental Setup
Training begins with the source CIFAR-10 classification task with learning rates 0.0001 and 0.1. Under our experimental setup, we use cross-entropy loss. This works in phases, where a lifetime is mapped to one epoch or one cycle of the fine-tuning process [81].
The meta-parameter-η, defines the plasticity rate of the algorithm. The n is the number of episodes in one lifetime of the algorithm. The NDHTL technique uses an input image batch size of 1 for every algorithm step. The same is used for the iteration of the forward propagation for the NDHTL hybrid neural network architecture.
Followed by value consolidation with the Hebb trace matrix, the last step for every episode is to calculate the loss and perform connection weight consolidation and synaptic consolidation following the backpropagation of the network that updates w i,j and α i,j . Then, the re-initialization of the Hebbian matrix is performed at each episode's end.
We also need to compute eta-η-(the plasticity rate), determined by neuromodulation. The network independently decides the "eta" of the connection weights and plasticity percentage of the neural connections such as the dopamine system. Inside the brain, dopamine controls dopaminergic cell systems and pathways, neuromodulation, and functions in reinforcement and reward. In NDHTL, the neural network layers output controls eta-the plastic nature of the connection weights, which controls the transfer learning efficacy and accuracy on the model altogether.
In NDHTL, unlike traditional backpropagation, which is a symmetric system for gradient backpropagation. The NDHTL gradient is backpropagated asymmetrically at the end of the episodes following the classification loss function calculation mentioned in the paragraph below.
To calculate the gradient, the loss is calculated using Equation (5), and the gradient is backpropagated. For keeping the configuration comparable, i.e., the validation loss function used in NDHTL is kept the same as the one used in standard transfer learning setup, cross-entropy.
We collect the percentage of average validation loss, validation accuracy top-1, and validation accuracy top-5 accuracy values for each lifetime during training, followed by validation dataset testing. In standard transfer learning, the SGD is implemented in the STL algorithm for all the iterations in the dataset. Contrary to the STL and NDHTL transfer learning, the algorithm only back propagates at the end of each episode controlled by hyper parameter K.
In the NDHTL algorithm, the hyper parameter K controls when the algorithm performs backpropagation. In STL, N is the number of feedforward steps, and N is the number of backpropagation steps. In NDHTL, N is the total number of feedforward steps; however, N/ K is the number of backpropagation steps. A similar number of cost function calculations and reductions are performed. So, we can say NDHTL is theoretically computing quicker than STL. However, we also have a Hebb matrix in NDHTL computed for the same number of feedforward steps. We may say that the time saved by performing fewer backpropagation steps by the algorithm is helping in the time consumed in Hebbian matrix plastic weight calculations.

Experimental Results
In the NDHTL learning method, we use the feature-extraction layer and reweigh the output layer, followed by fine-tuning using an NDHTL algorithm to determine if the parameter distribution difference between the source and task dataset is reduced. Our algorithm is generic and can be extended to any neural network architecture with feature extraction and plastic NDHTL classification layer integration. The experimental results showed that the NDHTL achieves better accuracy than the STL method. The average top-1 accuracy improvements are 1.3%. Based on experimental results, we may say that the NDHTL algorithm achieves much better performance when the heterogeneous source and target domain are used. Table 3 displays the standard transfer learning (STL) and NDHTL in top-1 and top-5 validation accuracy for the setup described in Section 4.2. The STL and NDHTL study data plots are shown for ten different datasets in terms of the top-5 accuracy in Figure 5. The data curves for STL and NDHTL for the CIFAR100 target dataset (T 1-10 ), shows a significant improvement in the top-5 accuracy for the NDHTL algorithm. The experimental results are shown in Table 3 and Figure 5.
The source is kept the same as the CIFAR-10, and the target datasets are different from the dataset t 1 to dataset t 10 . The target task performance is different with different target datasets, as in Figure 5 and Table 3 for top-1 and top-5. To understand the overall effect, the algorithm's average performance over all the datasets is calculated as shown in Table 3. Average values of performance over the ten datasets shows significant improvement as compared to the standard method.
In particular, a significant improvement of +3.60% with NDHTL implies that the NDHTL is more effective for transfer learning between heterogeneous source and target. The average improvement in the top-1 accuracy with NDHTL is +1.30%, and improvement with the top-5 accuracy is +0.71%.
Keeping the goal of transfer learning in mind, wherein the scenario where what has been learned in one setting is exploited to improve generalization in another setting. Furthermore, speaking from practical experience, the source model training must be performed by randomly initializing the network. While partially training the network, the weight values are generic and can quickly adapt or fine-tune to any random target task datasets. The data plot curves of DHTL and STL for ten CIFAR100 target datasets. Data graphs (a-j) are results from Table  3 experimenting with Table 2 datasets.
The source is kept the same as the CIFAR-10, and the target datasets are different from the dataset t to dataset t . The target task performance is different with different target datasets, as in Figure 5 and Table 3 for top-1 and top-5. To understand the overall effect, the algorithm's average performance over all the datasets is calculated as shown in Table 3. Average values of performance over the ten datasets shows significant improvement as compared to the standard method.
In particular, a significant improvement of +3.60% with NDHTL implies that the NDHTL is more effective for transfer learning between heterogeneous source and target. The average improvement in the top-1 accuracy with NDHTL is +1.30%, and improvement with the top-5 accuracy is +0.71%.
Keeping the goal of transfer learning in mind, wherein the scenario where what has been learned in one setting is exploited to improve generalization in another setting. Furthermore, speaking from practical experience, the source model training must be performed by randomly initializing the network. While partially training the network, the weight values are generic and can quickly adapt or fine-tune to any random target task datasets.
Overtraining the source model may result in poor transfer learning accuracy on the target task dataset. We are keeping the image classification task in the context of the explanation.
Based on the experimental results, we may conclude that heterogeneous datasets serve as the best source task for Hebbian transfer learning classification. Indeed, extensive data were collected by experimenting with more than ten different datasets and a similar Figure 5. The data plot curves of DHTL and STL for ten CIFAR100 target datasets. Data graphs (a-j) are results from Table 3  experimenting with Table 2 datasets. Overtraining the source model may result in poor transfer learning accuracy on the target task dataset. We are keeping the image classification task in the context of the explanation.
Based on the experimental results, we may conclude that heterogeneous datasets serve as the best source task for Hebbian transfer learning classification. Indeed, extensive data were collected by experimenting with more than ten different datasets and a similar number of experiments. We may conclude that the Hebbian transfer learning algorithm is much better with validation accuracy improvement than standard transfer learning algorithms.

Discussion on Applications
The introduction of biologically inspired neuromodulated dopamine effects on plastic transfer learning has been studied in detail by performing multiple experiments. This paper presents a novel NDHTL algorithm that is extendable to multiple domains from an application point of view. NDHTL, combined with traditional CNN approaches, is an efficient approach. The algorithm can be easily extended to related deep learning neural network domains.
We do not claim that such an asymmetric backpropagation algorithm is implemented in the brain. However, evolutionary learning contributes to learning in an animal's brain. The brain's resulting self-modifying abilities play an important role in learning and adaptation and constitute a significant basis for biological memorization and learning sustainabil-ity. In the literature survey, dopamine is shown to repeatedly manage the plasticity induced by a recent past event within a short time window of about 1 s [93][94][95][96][97]. Such procedures have been modeled in computational neuroscience studies. The paper also introduces the neuromodulation scheme that takes inspiration from the short-term retroactive effects of neuromodulatory dopamine on Hebbian plasticity in animal brains. The backpropagation of the plastic layer for Hebbian learning has been previously studied as well. However, this new study of plasticity and dopamine neuromodulation plays an essential role in learning and adaptation.
Aiming at the image classification model of learning on big datasets, some researchers propose a technique to solve the problem of scene-object recognition in T.V. programs, such as movies, T.V. plays, variety shows, and short videos, by transferring a pre-trained depth image classification model to a specific task. Many personalized advertisement recommendation studies suffer from the problem that only certain tagged items can be recommended in video playback. In such attempts to adopt transfer knowledge to solve data volume problems, providing users with various options is applicable. We often encounter situations where an insufficient amount of high-quality data in a target domain may have plenty of auxiliary data in related domains. Transfer learning aims to exploit additional data to improve the learning performance in the target domain. Innovative transfer learning applications include learning in heterogeneous cross-media domains, online recommendation, social media, and social network mining.
Transfer learning has been applied to alleviate the data scarcity problem in many urban computing applications. For example, chain store site recommendation leverages the knowledge from semantically relevant domains to the target city. A possible solution is a flexible multi-modal transfer learning to the target city to alleviate the data scarcity problem.
Another widely encountered task in the bioinformatics domain is gene expression analysis, e.g., predicting associations between genes and phenotypes. There is usually very little data of the known associations. In such an application, one of the main challenges is the data sparsity problem. As a potential solution, transfer learning can be used to leverage additional information and knowledge. For example, they use a transfer learning approach to analyze and predict the gene-phenotype associations based on the Label Propagation Algorithm (LPA). In addition to Wi-Fi localization tasks, transfer learning has also been employed in wireless network applications. Transfer learning applications, such as those that can be applied to the recognition of tasks for hand gesture recognition, face recognition, activity recognition, and speech emotion recognition can also be applied to learn driver pose recognition. Similarly, NDHTL can be utilized for anomalous activity detection, and traffic sign recognition.
In addition to imaging analysis, transfer learning has other applications in the medical area. Deep learning has been applied to identify, detect, and diagnose breast cancer risk analysis. Transfer learning for pediatric pneumonia diagnosis and lung pattern analysis is beneficial. Applying transfer learning in biomedical image analysis is a promising domain and supports a general-purpose cause.
NDHTL transfer learning may be applied in NLP natural language processing. ND-HTL transfer-learning expertise may also be incorporated into sentiment analysis, fraud detection, social networks, and hyperspectral image analysis. With a large amount of data related to our cities, for example, computing in traffic monitoring directions, health care, social security, the world needs algorithms such as NDHTL for better problem-solving approaches.

Conclusions
In transfer learning with deep learning, the training data is usually from a similar domain or shares the feature space, and the data distribution is symmetrical and does not change much over many iterations. The discriminative learning rule, such as Hebbian learning, can improve learning performance by quickly adapting to discriminate between Symmetry 2021, 13, 1344 16 of 20 different classes defined by target task. We apply the Hebbian principle as synaptic plasticity in heterogeneous transfer learning to classify images using a heterogeneous source-target dataset and compare results with the standard transfer learning case. Hebbian learning is a form of discriminative learning to search for the specific feature for object recognition or image classification activity-dependent synaptic plasticity where correlated activation of pre-and post-synaptic neurons leads to strengthening the connection between the two neurons. Hebbian plasticity is a form of synaptic plasticity induced by and further amplifies correlations in neuronal activity.
However, in real-world situations, the data is of variable distribution with everchanging nature. A dynamic technique that uses hyperparameters and adjusts to target tasks by manipulating during the time may significantly improve the learning efficiency of an algorithm. In this paper, implementing such a solution for the heterogeneous transfer learning domain, a neuromodulated dopamine transfer learning algorithm based on the Hebbian learning principle is presented.
We investigated the use of neuromodulated dopamine-controlled synaptic plasticity implementing the Hebbian theory using plasticity and asymmetric backpropagation, and applied it to heterogeneous transfer learning.
The NDHTL algorithm is a generic cross-domain CNN training algorithm. Using plastic NDHTL layer integration, the fine-tuning of the CNN weights is easily accomplished. In implementing NDHTL, minimum alteration of the weights is required to fine-tune the target dataset by synaptic weight consolidation. In the NDHTL transfer learning algorithm, a hybrid architecture is initialized with a pretrained model from the source task, and the last layer is re-weighted. Following which the model is fine-tuned to a target task.
By using the NDHTL algorithm, the connection-weight-parameter distribution difference between source and target task is quickly reduced. It may be applied to scenarios with negative transfer where the goal for transfer learning is to enhance the performance of a target task using an auxiliary/source domain. The issue becomes that sometimes transferring knowledge from a source domain may have a negative impact on the target model. This is also known as a negative transfer. This occurs primarily when the source domain has very little in common with the target. The NDHTL algorithm is a possible solution to such issues in transfer learning.
The algorithm is easily extendible to similar problems, given that the corresponding solution aims to implement neural network architecture with a weight-extraction and cost-function reduction design.
Experiments were conducted to compare the efficiency of the proposed algorithm with standard transfer learning. The experiment used CIFAR-10 as the source domain and CIFAR-100 as the target domain.
A CIFAR-10 source and CIFAR-100 target dataset were studied implementing NDHTL, and we concluded that asymmetric NHDTL achieves better accuracy than the symmetric STL method based on the data generated as the output.
The recorded average top-1 accuracy improvements are 1.3%. Based on experimental results, we may conclude that the NDHTL algorithm performs much better with crossdomain image classification and heterogeneous source and target transfer learning tasks.
The experimental results showed that the NDHTL achieves better accuracy than the STL method, precisely when the heterogeneous source and target domain are used. Experiments with fully plastic and neuromodulated plastic CNN architectures for quick training and better performance are possible in the future. Researchers may investigate the application of NDHTL techniques other than for image datasets.