PCNN Model Guided by Saliency Mechanism for Image Fusion in Transform Domain

In heterogeneous image fusion problems, different imaging mechanisms have always existed between time-of-flight and visible light heterogeneous images which are collected by binocular acquisition systems in orchard environments. Determining how to enhance the fusion quality is key to the solution. A shortcoming of the pulse coupled neural network model is that parameters are limited by manual experience settings and cannot be terminated adaptively. The limitations are obvious during the ignition process, and include ignoring the impact of image changes and fluctuations on the results, pixel artifacts, area blurring, and the occurrence of unclear edges. Aiming at these problems, an image fusion method in a pulse coupled neural network transform domain guided by a saliency mechanism is proposed. A non-subsampled shearlet transform is used to decompose the accurately registered image; the time-of-flight low-frequency component, after multiple lighting segmentation using a pulse coupled neural network, is simplified to a first-order Markov situation. The significance function is defined as first-order Markov mutual information to measure the termination condition. A new momentum-driven multi-objective artificial bee colony algorithm is used to optimize the parameters of the link channel feedback term, link strength, and dynamic threshold attenuation factor. The low-frequency components of time-of-flight and color images, after multiple lighting segmentation using a pulse coupled neural network, are fused using the weighted average rule. The high-frequency components are fused using improved bilateral filters. The results show that the proposed algorithm has the best fusion effect on the time-of-flight confidence image and the corresponding visible light image collected in the natural scene, according to nine objective image evaluation indicators. It is suitable for the heterogeneous image fusion of complex orchard environments in natural landscapes.


Introduction
Automatic apple fruit picking in natural environments can reduce the intensity of heavy manual labor, which is an inevitable choice for modern agriculture [1]. The natural light in northwest China is strong, and the visible light images collected in the natural environment are vulnerable to the influences of changing light and complex backgrounds, so the recognition effect lacks some robustness [2]. In the complex environment of orchard operations, the most potential in vision research on picking robots lies in the technology of heterogeneous image fusion (IF) between time-of-flight (ToF) images and visible light images. The collected images have a variety of different attributes, including light invariance, spatial hierarchy, infrared perception, reliability of discrimination data, etc. [2]. The image is indirectly generated from the depth information, which can reflect the near-far relationship and infrared reflection characteristics of different objects in the scene, and the effect is not affected by light changes [2]. Image fusion generates a new information processing process that interprets the scene from a different source image which cannot Sensors 2023, 23, 2488 2 of 17 be obtained from the information obtained by a single sensor [2,3]. Determining how to fuse ToF images and visible light images with different wavelength ranges and imaging mechanisms with high quality is currently a topic of great interest in image fusion research.
A non-subsampled shearlet transform (NSST) is a multi-scale, multi-directional, translation-invariant transform domain image decomposition method, which is widely used in image fusion [4]. An NSST shearlet wave transform avoids the down-sampling operation, and has the characteristics of translation invariance, simple operation, low time complexity, etc. [5]. Compared with wavelet transforms such as the discrete wavelet transform (DWT), stationary wavelet transform (SWT), discrete cosine transform (DCT), curvelet transform, and contourlet transform, an NSST has a good effect on searching for edges and contours. There are large numbers of deep neural network layers in deep learning methods. This characteristic could lead to low efficiency and a high cost. The advantage of an NSST is that it can fully fuse the source image information, and the fused image has good correlation coefficient and information entropy, which is more suitable for the situation where the image background in the natural orchard environment is complex, and the contour and image texture information need to be fused at the same time.
Related works are summarized as follows: A pulse coupled neural network (PCNN) is a neural network model established by simulating the activities of visual nerve cells in the cerebral cortex. Similar pattern features are classified into categories based on the principles of similarity clustering and capture characteristics [6]. In terms of image fusion in a transform domain, Cheng et al. used an adaptive dual-channel pulse coupled neural network with triple connection strength in the local non-down-sampled shear wave transform domain to solve the spectral difference between infrared and visible light [7]. Panigrahy et al. proposed a new medical fusion method in a non-down-sampled shear wave transform domain based on a weighted parameter adaptive dual channel PCNN [8].
In terms of image fusion in saliency attention models, Liu et al. proposed a saliency detection model that combines a global saliency map with a local saliency map [9]. Yang et al. designed a new fuzzy logic rule based on global saliency measurements to fuse the details extracted from panchromatic images with high spatial resolution and multispectral images with low spatial resolution [10]. Li et al. used the segmentation-driven low-rank matrix recovery model to detect the significance of each individual image in the image set to highlight the regions with sparse features in each image [11]. In terms of the optimization of image fusion parameters, Zhu et al. applied PCNN parameters to infrared and visible image fusion through quantum-behavior particle swarm optimization improvement [12]. Huang et al. used an NSCT to independently decompose the intensity hue saturation of the image, a PCNN to fuse high-frequency sub-band images and low-frequency images, and a hybrid leapfrog algorithm to optimize PCNN parameters [13]. Dharini et al. proposed a nature-inspired optimal feature selection method using ant colony optimization to reduce the complexity of the PCNN fusion of infrared and visible images [14]. In the research of overexposure problems with concern to the ongoing climate change-related environmental changes over mountainous areas, Muhuri et al. used polarization fraction variation with temporal RADARSAT-2 C-Band full-polarimetric to study SAR Data [15]. Raskar et al. introduced a novel technique to allow a user to interact with projected information and to update the projected information [16].
A PCNN classifies similar pattern features into categories based on the principles of similarity aggregation and capture characteristics. The segmentation combination has the advantages of the grayscale aggregation lighting mechanism and the same grayscale attribute priority lighting. This is consistent with the basic idea of cluster analysis. Qiu et al. proposed a new density peaks-based clustering method, called clustering with local density peaks-based minimum spanning tree [17]. Huang et al. proposed new adaptive spatial regularization for the representation coefficients to improve the robustness of the model to noise [18]. Huang et al. proposed ultra-scalable spectral clustering and ultra-scalable ensemble clustering methods [19].
Although scholars have studied the optimization and improvement of PCNN parameters, there are still cases of pixel artifacts, region blurring and unclear edges due to ignoring the impact of image changes and fluctuations on the results during the ignition process.
This paper introduces the concept of entropy [20] in information theory and proposes a PCNN model guided by a saliency mechanism (SMPCNN). The ToF low-frequency component after multiple lighting segmentation using PCNN is simplified into a firstorder Markov situation, and the significance function is defined as first-order Markov mutual information. On this basis, a PCNN model guided by a saliency mechanism for image fusion in transform domain (NSST-SMPCNN) is proposed to fuse ToF and visible light heterogeneous images collected by a binocular acquisition system in an orchard environment.
We summarize our main contributions below. First, we aim to solve the following existing problems: 1.
The traditional method of space domain fusion is to create a fusion model in the image gray space, which has the disadvantage that it is not easy to find the source image texture and boundary features.

2.
A PCNN model has the defects of parameter experience setting, unadaptive termination, and easy over-segmentation. In the ignition process, it ignores the impact of image change fluctuation on the results, resulting in pixel artifacts, area blurring, and unclear edges. 3.
The differences in imaging mechanisms between ToF and visible light heterogeneous images collected by a binocular acquisition system in an orchard environment lead to the problem of low fusion quality.
Second, the innovations and novelties of this paper are as follows: 1. A PCNN model guided by a saliency mechanism is proposed and applied to the fusion of ToF and visible light heterogeneous images collected by a binocular acquisition system in an orchard environment.

2.
The ToF low-frequency component is simplified after multiple lighting segmentation using a PCNN into a first-order Markov situation. The significance function is defined as first-order Markov mutual information.

3.
The significance function is used as the termination condition of a PCNN model iteration, and Kullback-Leibler (KL) divergence is used to measure the dynamic threshold amplification coefficient of the PCNN model.

4.
A new momentum-driven multi-objective artificial bee colony algorithm is proposed to optimize the parameters of link channel feedback, link strength, and dynamic threshold attenuation factor. The momentum update strategy of employing bees and observing bees is used. The grid density construction is used to ensure that the optimal solution distribution is not too dense. The absolute value of the difference between the grid index values of the same dimension of the nondominated solution is used as the deletion selection probability of the nondominated solution to construct the optimal solution set. Cross entropy (CE) and mutual information (MI), two image fusion quality evaluation functions, are selected as multi-objective fitness functions.

5.
The low-frequency components of ToF and color image after multiple lighting segmentation using a PCNN are fused using the weighted average rule, and the highfrequency components are fused using improved bilateral filters.
Three, the advantage of our work is as follows: the proposed NSST-SMPCNN method combines the saliency mechanism, saliency function, and the PCNN clustering segmentation mechanism, and has the advantages of a grayscale clustering lighting mechanism and the same grayscale attribute first lighting, which is suitable for heterogeneous image fusion in complex orchard environments in Gansu.
The paper structure is summarized as follows: Section 1 contains the introduction and a description of the related works, as well as the highlights and contributions of this paper. Then, basic concept definitions of an NSST and a PCNN are introduced. The proposed definition of significance function is also defined in Section 2. In Section 3, a PCNN model guided by a saliency mechanism is proposed. Then, a PCNN transform domain image fusion method guided by a saliency mechanism is constructed in the Section 4. Lastly, the final section contains a description of the experiment and the conclusions.

NSST Transform Domain Decomposition Method
The traditional method of spatial domain fusion is to create a fusion model in the image gray space. The disadvantage of this is that it is difficult to find the texture and boundary features of the source image. The NSST transform domain decomposition method, which is proposed in reference [4], is used to perform the non-subsampled pyramid filter bank (NSP) u-level transformation on the two accurately registered heterogeneous images to obtain one low-frequency sub-band and u high-frequency sub-bands, realizing translation invariance. The high-frequency sub-band is then decomposed into 2 v directional high-frequency subbands by shear filter bank (SF) v-level multi-directional decomposition, so as to effectively capture directional information and maintain anisotropy [4]. The decomposed sub-band is the same size as the source image, has high sparsity, and accurately represents the fusion information.

PCNN Lighting Segmentation Mechanism
The PCNN model proposed in reference [6] includes feedback input domain, coupling link domain, and pulse generation domain, which can be described by the mathematical equations shown in Formulas (1)- (5). A PCNN has a feature which classifies similar pattern features into categories based on the principles of similarity aggregation and capture characteristics, and has the mechanism of aggregation and illumination segmentation.
In the formula, I ij is the external stimulation of neurons, represented by the gray value of the input image; F ij (n) is the feedback input field; L ij (n) is the link input field; W ij,kl refers to the link coefficient; β indicates the link strength, which determines the weight of the coupling link channel; U ij (n) is the internal state signal of the model; θ ij is the dynamic threshold of neurons, V θ and V L are the dynamic threshold amplification coefficients, which control the threshold value increased after neuron activation; α L and α θ determine the decay rate of the feedback term and the dynamic threshold of the link channel, respectively; Y ij (n) is the pulse output of the current neuron, which is the response result of the comparison between the internal active item and the dynamic threshold in the pulse generator. When U ij (n) > θ ij (n), the ignition condition will be reached and the output Y ij (n) = 1.
Step represents a step function, and its output is 0 or 1; n represents the nth neuron in the image.

Proposed Definition of Significance Function
The saliency mechanism originates from the visual attention mechanism (VAM) proposed by Itti and other scholars [21], inspired by the behavior and neuronal structure of early primate visual systems [22]. When the saliency mechanism processes a scene, it automatically processes the regions of interest, and selectively ignores the regions of noninterest. In this paper, a new saliency mechanism to define the significance function is proposed.

Definition 1. Significance first-order Markov situation.
The ToF low-frequency component after an NSST decomposition is sent into the PCNN model. The PCNN is divided iteratively many times, showing a dynamic ignition segmentation state. The two ignition segmentation diagrams at the time intervals t and t + 1 are correlated, but independent of the ignition segmentation diagram at the previous time. Therefore, the ignition segmentation diagram at the time interval 2 can be defined as a first-order Markov situation.

Definition 2. Significance one-step transition probability.
When the model is in the state s u after ignition segmentation at time t, the probability of model transition to the state s v after ignition segmentation at time t + 1 is defined as the significant one-step transition probability, which is expressed in Formula (6).

Definition 3. Significant conditional entropy.
In the significance first-order Markov situation, the average uncertainty of the model when it is transferred to the state s v ∈ S under any state condition s u ∈ S is defined as the significance conditional entropy, which is expressed in Formula (7).
Definition 4. Significance first-order Markov information source entropy.
The overall uncertainty of the sequence formed by the ignition segmentation map in the significance first-order Markov situation is defined as the significance first-order Markov information source entropy, which is expressed in Formula (8).
Definition 5. Significance first-order Markov mutual information.
The amount of information transmitted in the model state transition of the ignition segmentation map at different times is defined as the significant first-order Markov mutual information, which is expressed as Formula (9).
The PCNN is divided by multiple iterations. The ignition segmentation graph with two time intervals has significant feature differences, representing the maximum information transmission rate and the maximum amount of mutual information in numerical terms. Because mutual information has a maximum under certain conditions, the significance function is numerically defined as significant first-order Markov mutual information. If Formula (7) and (8) are brought into Formula (9), Formula (10) is formed.

PCNN Model Guided by Saliency Mechanism
PCNN region segmentation and a saliency mechanism can locate the most interesting object region in the image well. Combining the saliency mechanism, saliency function and PCNN clustering segmentation mechanism, a PCNN model guided by a saliency mechanism is proposed, which has the advantages of a grayscale clustering lighting mechanism and the same grayscale attribute lighting priority, and is suitable for heterogeneous image fusion in complex orchard environments in Gansu.
The PCNN model has certain shortcomings, including that the parameters are limited by manual experience settings and cannot be terminated adaptively, and ignoring the impact of image changes and fluctuations on the results during the ignition process results in pixel artifacts, area blurring, and unclear edges. The iteration termination conditions and dynamic threshold amplification coefficients V θ and the feedback items of the link channel α L , link strength β, and dynamic threshold attenuation factor α θ are improved adaptively. A new momentum-driven multi-objective artificial bee colony algorithm (MMOABC) is used for parameter optimization and is applied to the proposed PCNN model guided by a saliency mechanism. The improved SMPCNN model has the characteristics of enhancing the same type of pulse connection, reducing the difficulty of parameter integration, and improving the performance of image segmentation.

Adaptive Iteration Termination Conditions
The traditional PCNN model has the defects of nonadaptive termination and oversegmentation. The authors of [23] used the maximum information entropy as the termination condition, but over-segmentation often occurs when the entropy is at its maximum, and the background with the same gray value will be mistaken for the target area and segmented together.
In this paper, the significance function is used as the criterion for model iteration termination, which is expressed as Equation (11). For a low-frequency ignition segmentation map, the greater the significance of the first-order Markov mutual information, the better the regional consistency.

Adaptive Dynamic Threshold Amplification Coefficient V θ
In the ToF image, the fruit target is often shown as a region with a high gray value and normal distribution. Two ignition segmentation images are used to measure the PCNN dynamic threshold amplification coefficient, which is expressed as Equation (12). The probability distribution p(s u ) corresponding to the state s u ∈ S, as well as the probability distribution p(s v ) corresponding to the state s v ∈ S, and the KL divergence of the two states are calculated. This formula is used to measure the similarity between the probability distributions of two ignition segmentation maps. The closer the probability distribution of the two ignition segmentation images is, the smaller the dynamic threshold amplification coefficient is, which will enable the PCNN model to ignite when the target region tends to be stable during continuous iteration.

Parameter Optimization of Momentum Driven Multi-Objective Artificial Bee Colony Algorithm
An artificial bee colony algorithm (ABC) [24] is a swarm intelligence optimization algorithm proposed to simulate the characteristics of bee swarms. It has the advantages of strong global optimization ability, few parameters, high accuracy, and strong robustness. However, its optimization strategy has the defects of simplicity and randomness, which make the algorithm premature, cause convergence stagnation, and other problems. In order to accelerate the convergence rate of the artificial bee colony algorithm, the concept of momentum [25,26] in deep learning is introduced, and a new momentum-driven multiobjective artificial bee colony algorithm is proposed to optimize the three parameters. including the feedback from the link channel α L , link strength β, and dynamic threshold attenuation factor α θ .

Hiring Bees Momentum Updating Strategy
NP food source information was randomly generated. During a food update evolution, a randomly selected food source X k = (x k1 , x k2 , · · · , x kd ) was attached to a hired bee in the bee colony. In the d-dimensional space, the randomly selected jth dimension component x ij of each food source X i = (x i1 , x i2 , · · · x ij , · · · , x id ) in the food source information space database was evolved through the following hired bee momentum update strategy, as shown in Equations (14) and (15), to obtain a new food source X new i = (x i1 , x i2 , · · · , x new ij , · · · , x id ). Among them, i, k ∈ [1, 2, · · · , NP], i = k, j ∈ [1, 2, · · · , d],r ∈ [−1, 1]. In Equations (14) and (15), a ij represents the update step size of the previous update evolution, a new ij represents the update step size obtained after the current momentum update evolution, γ represents momentum, and the value is 0.9.

Observation Bees Nesterov Momentum Updating Strategy
In a food update evolution, the selection probability of observation bees was calculated according to Formula (16), and a randomly selected food source X t = (x t1 , x t2 , · · · , x td ) was attached to an observation bee in the bee colony. In the d-dimensional space, the randomly selected jth dimension component x ij of each food source X i = (x i1 , x i2 , · · · x ij , · · · , x id ) in the food source information space database was evolved through the following observation bees Nesterov momentum updating strategy, as shown in Formulas (17) and (18), to obtain a new food source X new i = (x i1 , x i2 , · · · , x new ij , · · · , x id ). Among them, i, t ∈ [1, 2, · · · , NP], i = t, j ∈ [1, 2, · · · , d], r ∈ [−1, 1]. In Formulas (17) and (18), b ij represents the update step size of the previous update evolution, b new ij represents the update step size obtained after the current Nesterov momentum update evolution, γ represents momentum, and the value is 0.9. Target = 2, j = 1, 2, · · · , NP.

Pareto Grid Density Construction Method
In multi-objective optimization problems, individuals are judged by dominance and dense information. In this paper, a grid density construction method is used to ensure that the distribution of optimal solutions in the Pareto optimal solution set (also known as Pareto) is not too dense. The grid is a dynamic, nGrid bisected interval within the range of (−inf, +inf ). Here, nGrid is a variable, which represents the number of divided grids. The value inf represents a number, which is far less than infinity.
The maximum and minimum values of each dimension of the median value of the nondominated solution were determined. The predefined nGrid was used to divide the current interval, which was divided into nGrid + 1. The minimum interval starts from negative infinity inf, and the maximum interval ends at positive infinity + inf, to prevent the nondominated solutions from crossing the boundary, and make the nondominated solutions fall in the grid. The formula for solving the grid index value is shown in (19). The value low i represents the minimum boundary value of the grid, and Target represents the number of objective functions. i = 1, Target, Target = 2, j = 1, · · · , nGrid.

Pareto Optimal Solution Set Construction Method
First, constructing the optimal solution set requires a certain probability to randomly delete redundant nondominated solutions. The method to construct the deletion selection probability involves the use of the absolute value of the difference between the nondominated solution and the grid index value of the same dimension for operation. The formulas are shown in (20) and (21). The larger poss i the nondominated solution corresponding to Formula (20), the harder it will be to delete. The advantage of this is that the preference for a certain optimization objective brought by the nondominated solution interval is reduced, and the unified operation for all optimization objectives can be carried out fairly to obtain a relatively fair solution with the possibility of deletion.

Calculation Method of Multi-Objective Fitness
To solve the problem of the diversity of image fusion quality evaluation functions, two image fusion quality evaluation functions, cross entropy (CE) and mutual information (MI), are selected to form a multi-objective optimization problem for two objectives. The formula is shown in (22).

PCNN Model Structure Guided by Saliency Mechanism
The model structure is shown in Figure 1. fitness _ pareto max{CE, MI} 

PCNN Model Structure Guided by Saliency Mechanism
The model structure is shown in Figure 1.

Fusion Rules
(1) Low Frequency Fusion Rules In this paper, using the characteristics of a PCNN model's clustering and lighting segmentation, the significance function is used as the criterion of a PCNN model's iteration termination, and the ToF low-frequency component after an NSST decomposition is ignited and segmented. The component is recorded as C . According to the characteristics of the images collected by heterogeneous systems in the mountainous planting environment and the natural scenes of the disordered planting orchard picking operation in the Gansu Province, the low-frequency components of color images have sufficient detailed texture information, while the low-frequency components of ToF images have the characteristics of extracting targets at a certain distance and separating the background, but provide less detailed texture information. Therefore, the ToF low-frequency components and color image low-frequency components after multiple lighting segmentation using a PCNN are fused. The fusion rule uses weighted average, which is expressed as Formula (23), to highlight more foreground information belonging to the highlighted part of the ToF image.

Fusion Rules
(1) Low Frequency Fusion Rules In this paper, using the characteristics of a PCNN model's clustering and lighting segmentation, the significance function is used as the criterion of a PCNN model's iteration termination, and the ToF low-frequency component after an NSST decomposition is ignited and segmented. The component is recorded as C L ToF , and the low-frequency component of the color image after an NSST decomposition is recorded as C L RGB . According to the characteristics of the images collected by heterogeneous systems in the mountainous planting environment and the natural scenes of the disordered planting orchard picking operation in the Gansu Province, the low-frequency components of color images have sufficient detailed texture information, while the low-frequency components of ToF images have the characteristics of extracting targets at a certain distance and separating the background, but provide less detailed texture information. Therefore, the ToF low-frequency components and color image low-frequency components after multiple lighting segmentation using a PCNN are fused. The fusion rule uses weighted average, which is expressed as Formula (23), to highlight more foreground information belonging to the highlighted part of the ToF image.
(2) High frequency fusion rules Bilateral filtering is a local, nonlinear, and noniterative technology. High-frequency fusion rules are introduced to measure the similarity between the ToF image and color image at the corresponding position of the decomposed high-frequency component, as shown in Formula (24). Let the high-frequency component of the ToF image decomposed by an NSST be C H ToF , and the high-frequency component of the color image decomposed by an NSST be C H RGB . The spatial neighborhood Gaussian function w Neighborhood is shown in Equation (25), and the high-frequency component gray value similarity Gaussian function w Similarity is shown in Equation (26).
w Neighborhood w Similarity (24) w Neighborhood = e

Heterogeneous Image Fusion Process
The fusion process is shown in Figure 2.

Heterogeneous Image Fusion Process
The fusion process is shown in Figure 2.

NSST-SMPCNN Method Multi-Source Image Fusion Steps
NSST-SMPCNN algorithm is proposed, named as Algorithm 1. The fusion steps of NSST-SMPCNN algorithm for multi-source image are as follows.

NSST-SMPCNN Method Multi-Source Image Fusion Steps
NSST-SMPCNN algorithm is proposed, named as Algorithm 1. The fusion steps of NSST-SMPCNN algorithm for multi-source image are as follows. Step 1: NSST is performed, generate u low-frequency sub-band images and 2 v high-frequency sub-band images.
Step 5: Stop running and output the fused image.
Note: M and N represent image size, u represents NSST decomposition level, v represents NSST decomposition direction number, and n represents current ignition number. The maximum number of food source stagnation is limit, the maximum number of iterations of algorithm evolution is maxCycle, the number of food sources is NP, and the dimension of bee individual component is d. γ represents momentum, Rep represents the number of nondominated solutions, nGrid represents the number of divided grids, and Target represents the number of objective functions. Where, d = 3, Target = 2, γ = 0.9. α L represents the link channel feedback term, β represents link strength, and α θ represents dynamic threshold attenuation factor.

Image Fusion Evaluation Index
Six models were selected for testing to evaluate the image fusion performance of the heterogeneous vision system, including a non-subsampled contourlet transform (NSCT) model [27], a fusion method for infrared and visible light images based on an NSCT (ImNSCT) [28], a DWT model [29], a simplified pulse coupled neural network (SPCNN) model [30], a single target SPCNN fusion model (ST-SPCNN) [31] and the NSST-SMPCNN model described in this paper. Nine objective image evaluation indicators [32] were selected to objectively evaluate image quality, including average gradient (AG), edge strength (ES), information entropy (IE), standard deviation (SD), peak signal to noise ratio (PSNR), spatial frequency (SF), image clarity (IC), mutual information (MI), and structural similarity (SSI). The higher the values of these nine indicators, the better the fusion image quality.

Public Dataset Image Fusion Experiment
In this paper, three public datasets are used for the experimental testing of heterogeneous image fusion, namely, infrared and color vineyard heterogeneous public datasets taken in natural scenes [33] and apple RGB-D image datasets published by Universitat de Lleida in Spain named fuji_apple [34,35] and PApple_RGB-D-Size [36]. The above three datasets were recorded as dataset I, dataset II and dataset III, respectively, and four groups of data in each of the three datasets were selected for testing. The results are shown in Tables 1-3, respectively. The fusion effect is shown in Table 4. The data results show that the objective evaluation indexes of the NSST-SMPCNN method described in this paper, such as AG, ES, SF, IC, and MI are the best in dataset I. For dataset II and dataset III, AG, ES, IE, SF, IC, MI, and other objective evaluation indexes of the first and fourth groups of test data are the best. The values of SD and PSNR of the five other algorithms are better than those of the algorithm in this paper. The SSI value of the DWT algorithm is the best.

Heterogeneous Image Fusion Experiment of Natural Orchard
In this paper, a heterogeneous vision system is established using a ToF industrial depth camera (Basler AG, Ahrensburg, Germany) and a color camera (Canon Inc., Tokyo, Japan). The ToF camera can output four types of images, including a ToF intensity image, ToF range data, ToF confidence map, and ToF point cloud image [37]. The data collection site in the natural environment is located in the experimental base of the Fruit Research Institute, Qinzhou District, Tianshui City, Gansu Province, China. More than 1000 ToF intensity images, depth images, confidence images, and color images under different lighting conditions between 10:00 and 19:00 were collected using a heterogeneous vision system. The heterogeneous images collected from the natural scene of the orchard were recorded as dataset IV, and four groups of data were selected as samples, including ToF confidence images and corresponding visible light images for testing. The results are shown in Table 5, and the fusion effect is shown in Table 4. The data results show that the NSST-SMPCNN algorithm described in this paper has the best fusion effect on the ToF confidence image and the corresponding visible light image collected in the natural scene. The values of nine indicators, including AG, ES, IE, SD, PSNR, SF, IC, MI, and SSI indicated excellent performance.
In conclusion, the experimental results show that the NSST-SMPCNN algorithm presented in this paper performs well in a test using three common datasets, as indicated by AG, ES, SF, IC, MI, and other objective evaluation indicators. This is because the significance function is used as the iteration termination condition of the PCNN model described in this paper to realize adaptive ignition termination. A new momentum-driven multi-objective artificial bee colony algorithm is used to optimize the PCNN parameters, which enhances the mechanism of the PCNN model's gray aggregation lighting and same gray attribute priority lighting. For the dataset IV established in this paper, the NSST-SMPCNN algorithm proposed in this paper performs well in nine indicators. This shows that the weighted average rule is used to fuse the low-frequency components, which can highlight more foreground information belonging to the highlighted part in the ToF image. The high-frequency components are fused by the improved bilateral filter, which strengthens the similarity between the ToF image and the color image. The proposed NSST-SMPCNN method is suitable for heterogeneous image fusion in complex orchard environments in Gansu.

Conclusions
The traditional method of spatial domain fusion is to create a fusion model in the image gray space, which has the disadvantage of not finding the texture and boundary characteristics of the source image easily. A PCNN model has the defects of parameter experience setting, nonadaptive termination, and easy over-segmentation. This paper proposes a PCNN model guided by the saliency mechanism and applies it to the fusion of ToF and visible light heterogeneous images collected by a binocular acquisition system in an orchard environment. The iteration termination conditions and dynamic threshold amplification coefficients V θ , the feedback items of the link channel α L , link strength β, and dynamic threshold attenuation factor α θ are improved adaptively. A new momentumdriven multi-objective artificial bee colony algorithm (MMOABC) is used for parameter optimization. The proposed NSST-SMPCNN method combines the saliency mechanism, saliency function and PCNN clustering segmentation mechanism, and has the advantages of a grayscale clustering lighting mechanism and the same grayscale attribute first lighting, which is suitable for heterogeneous image fusion in complex orchard environments in Gansu. The data results show that the NSST-SMPCNN algorithm described in this paper has the best fusion effect on the ToF confidence image and the corresponding visible light image collected in the natural environment. The values of nine indicators, including AG, ES, IE, SD, PSNR, SF, IC, MI, and SSI, indicated excellent performance.
However, some data test results in the public dataset still have the disadvantage of a poor fusion effect, which needs further improvement. In future work, it is necessary to introduce a deep learning convolutional neural network to further explore the algorithm structure to capture better image features and improve the fusion effect.