Are Chinese Residents Willing to Recycle Express Packaging Waste ? Evidence from a Bayesian Regularized Neural Network Model

While enriching people’s lives, the rapid development of online shopping has posed a severe challenge to the environment. Questionnaires focusing on the intention to recycle packaging waste are designed. These questionnaires contain first-level variables such as recycling behavior attitude, recycling behavior cognition, situational factors, historical recycling behavior, and recycling behavior intention. With the collected questionnaire data, a regression analysis is first conducted on the selection of variables and the effect of variable prediction. After ensuring the validity of the variables, 15 second-level variables are extracted into eight principal components using principal component analysis. These components serve as input to a Bayesian regularized neural network. Subsequently, a three-layer (8-15-1) neural network model is constructed; the trained neural network model achieves a high degree of fit between the predicted and measured values of the test set, thus further proving the rationality of the selected variables and the neural network model. Finally, this study uses the connection weights matrix of the neural network model and the Garson formula to analyze in depth the specific impact of each second-level variable on the intention to recycle packaging waste. Note that given the particularity of packaging waste recycling behavior, the impact on social norms, recycling behavior knowledge, values, and publicity on behavioral intentions in second-level variables is different from that obtained in similar previous studies.


Introduction
Thanks to the rapid development of China's economy and urbanization [1] as well as the improvement in people's living standards, China has become the most developed country in the world for online shopping.This has brought convenience to people's lives, but has also been accompanied by a large increase in packaging waste that exerts tremendous pressure on the environment [2].According to the "Report on Status and Trends of Green Packaging Development in China's Express Industry" (2017) [3] issued by the China National Post Office in 2016, China's express service directly consumed about 3.2 billion woven bags, about 6.8 billion plastic bags, 3.7 billion packaging boxes, and a total of 330 million rolls of sticky tape.The corrugated box consumption was equivalent to 72 million trees.However, the overall recycling rate of China's express packaging waste is less than 20%.Residents discard express packaging waste as ordinary garbage because of the low profitability of recycling express packaging and the lack of convenient recycling options [4].Recyclable packaging waste consists of renewable resources, such as packaging boxes as well as non-degradable substances such as plastic bags and tapes.Therefore, the impact on discarding express packaged garbage should not be underestimated from either the economic or the environmental point of view.
From a macro perspective, the greenhouse effect has arouse global attention in recent years, especially for carbon dioxide emission which serious caused global warming [5].The final product structure and final demand structure are the factors that hinder the reduction in carbon emission intensity [6].Thus, recent years have witnessed strengthened control over the recycling industry in China as governments have gradually attached importance of the problem of proliferating express packaging waste [7].Various measures have been proposed to support China's developing concept of New-type urbanization [8], such as regulating express packaging production, optimizing recycling systems, and building a recycling platform, but they are all at the preliminary recommendation stage rather than existing legal regulations.Although the state is optimizing the external environment for Express Packaging Waste Recycling (EPWR), it is also important to pay attention to the behavioral intentions of the participants in express packaging garbage collection.Hence, it is very necessary to analyze the relevant psychological variables of the participants and predict their behavioral intentions [7].
Among many common methods, multiple linear regression may ignore the interaction between each dimension variable and the nonlinear causal relationship in this paper.Logistic regression is sensitive to the multicollinearity of independent variables in the model.If two highly correlated independent variables are placed into the model at the same time, the symbol of the weaker one may be reversed.The structural equation is the validated model which is based on the existing theory.Since the extended variables are added in our questionnaire, the relationship between variables needs to be further explored but not validated.Therefore, this paper selects a neural network that can learn and store a large number of input-output mode mapping relationships.In addition, this network automatically adjusts its internal neuron weight parameters to predict and analyze variables of different dimensions.Moreover, there are many factors affecting BIRC which may be more likely to interfere with each other, and the principal component analysis (PCA) is used to screen the factors to simplify the complexity of the neural network and improve the prediction accuracy.PCA has been effectively integrated with neural networks in the fields of electricity [9], agriculture [10], tourism [11], and industrial manufacturing [12], but not yet into research on environmental behavior.Furthermore, the fields mentioned above have rarely considered the regularization of neural networks when using principal component analysis and neural network models.Therefore, based on principal component analysis, this study predicts behavioral intentions by using Bayesian neural regularization networks.
By referring to the contributions of researchers in environmental behavior, this study designs a questionnaire about intentions in EPWR behavior.After the questionnaire is distributed, collected, and tested for reliability and validity, principal components are extracted from relevant variables using PCA.In terms of the results, the Bayesian regularized neural network model is employed to simulate intentions of EPWR.Finally, the influence of the main component in behavioral intention is analyzed by calculating the sensitivity of the output of the neural network model.The results can contribute to public participation in EPWR of China.
This paper is organized as follows.Section 2 presents a review of relevant literature.Section 3 provides an introduction to the methods used in this study.The preliminary analysis and pre-processing of the data required by the neural network model are described in Section 4. Section 5 introduces the construction and training of the Bayesian regularized neural network as well as a measurement and discussion of the sensitivity coefficient of each variable.Finally, we conclude this study.

Literature Reviews of Packaging Waste and Recycling Behavior
China grew fastest in the world, and China's growth has resulted in a burgeoning waste management problem [13].Municipal solid waste mainly consists of residential, institutional, street cleaning, commercial and industrial wastes in China.Chinese municipal solid waste has increased from 31.3 to 113.0 million tons from 1980 to 1998, following an annual increase rate of 3-10%.Chinese municipal solid waste categories in China comprise kitchen wastes, paper, plastic, glass, batteries, metal, brick and stones, fabric, pottery, and discarded domestic appliances [14].
Some researchers have realized that rapid development of express delivery and online shopping is imposing a serious burden on the environment [15] and have tried to mitigate the pollution caused by express packaging through low-carbon design [16], yet a focus on studying EPWR is rare.Most studies focus on studying the design and recycling of food packaging [17,18] and plastic packaging [19,20].Among these, European researchers carry out more studies on recycling of packaging waste: Rui et al. [21] study the economic feasibility of a packaging waste recycling system and compares the possibilities between Portugal and Belgium; Mrkajić et al. [22] use quantitative and qualitative methods to evaluate the effectiveness of the Serbian packaging waste recycling system and find that prolonging the producer responsibility system could effectively improve the operating efficiency of recycling; Yıldız-Geyhan et al. [23] measure different packaging waste recycling systems from the perspective of the social life cycle and discover that a regular recycling system scores better than existing recycling systems and informal recycling channels.
Recycling behavior has gradually become a topic of global concern as an easy-to-implement and enforceable environmentally responsible behavior [24,25].As early as a decade ago, Tonglet et al. [26] and Robinson and Read [27] investigate the recycling behavior of residents in the London Borough and Brixworth areas using questionnaires.In recent years, researchers study recycling behavior at a more microscopic scale.In predicting recycling behavior, Chan and Bishop [28] examine how the moral code extends the TPB theory.Similarly, Wan et al. [29] expand the model of recycling attitude and recycling behavior, then propose a new research variable known as policy effect perception.Taking the point when recycling attitude affects recycling behavior as an entry point, Huffman et al. [24] compare the various effects of social factors and worldview on both self-reported and observed recycling behavior.Miliute-Plepiene et al. [30], Oztekin et al. [31], Poškus and Žukauskien ė [25] focus on the effects of maturity, gender, and personality type on the recovery mechanism, respectively.With the continuous advancement of society, research on recycling behavior is no longer limited to traditional recyclables.Hu and Yu [32] and Wang et al. [33] study the intention to recycle e-waste.In studying recycling behaviors, new research methodologies are rare, and researchers focus mainly on structural equation model [29,31,32,34], linear or logistic regression [24,[35][36][37], or a combination of the above methods [38].

Theoretical Framework of Behavioral Science
As a necessary process of behavioral occurrence, behavioral intention is the decisive factor before the behavior occurs [39], as well as the psychological tendency and subjective probability of the individual before performing the behavior [40].Behavioral intention is an important mediator of behavior because other subjective psychological factors indirectly affect actual behavior through behavioral intention.Researchers often apply the Theory of Planned Behavior (TPB) and Attitude-Behavior-Condition (ABC) theory to predict behavior and behavioral intentions.TPB is the theory of the relationship between attitude and behavior as posed by Ajzen.It is the inheritance and continuation of rational behavior theory and attitude theory and is also an influential theoretical framework in various fields such as behavioral research.A large number of empirical studies have proven that it can significantly improve the ability to interpret and predict behavior [41].According to TPB, human behavior is planned, and recycling behavior is determined by behavioral intention.Behavioral attitudes, subjective norms, and perceived behavioral control are the three major factors influencing behavioral intentions.Unlike TPB, ABC theory treats external conditions as an important factor to promoting and restricting behavior.External conditions mainly refer to behavioral convenience, namely situational factors, and ABC theory holds that behavior will occur when the cumulative effect of external conditions and attitudes is positive [42].Mannetti et al. [38] propose that these two theoretical frameworks can be used to study people's participation in recycling and that the incentives of the two frameworks are attitudes and material incentives, respectively.Previous studies in psychology have focused on the framework of attitudes.In recent years, researchers in different fields have been more willing to let these two frameworks learn from each other.
Behavioral attitude is an important psychological variable for predicting environmental behavior, and positive environmental attitude will significantly promote the generation of environmental behavior [43].Sia et al. [44] hold that attitude variables include values, beliefs, and environmental concern, whereas Kaiser et al. [45] argue that environmental attitude variable includes environmental knowledge, environmental values, and environmental behavioral tendencies.Values are the foundation of attitude formation [46], and environmental issues always involve conflicts between individual and collective interests.Therefore, values play an important role in predicting environmental behavior.Stern et al. [47] divide values into ecological, egoistic, and altruistic values.Later researchers discover that different values could form different new ecological paradigms.For example, altruistic and ecological values are positively related to environmental behavior while egoistic values are in contrast [48].Environmental concern is also important to forming environmental attitudes, and improved attitude can further consolidate recycling behavior; hence, environmental concern is the positive latent variable of recycling intention [34,49].Base on the relevant theory of behavior, as an individual's personality varies there is a positive correlation between knowledge and behavior [50].Environmental knowledge is an important antecedent variable of behavior, which has a significant impact on the intention to recycle and, thus, promotes the generation of behavior [32,45,51].
On the cognitive level of recycling behavior, Stern et al. [47] find that environmental responsibility is of great importance for predicting recycling behavior and that individual behavioral intention is also restricted by other individuals or groups.Castronova [52] and Robinson and Read [25] consider that role models are conducive to promoting interactional emulation and learning potential; that is, a herd mentality can lead to generation of recycling behavior.Regarding the important components of the behavioral system, Davies et al. [53], Tonglet et al. [26], and Wan et al. [29] believe that behavioral perception has an impact on behavioral intention and that the reaction results mainly affect behavior through its information and motivation functions, that is, through psychological cognition [54].Behavioral control perception, a key variable in formation of TPB theory, has a positive effect on behavioral intentions.A strong perception of behavioral control can enhance an individual's willingness to carry out behavior [55].The conclusions of Davies et al. [53] and Tonglet et al. [26] support this theory, and Oztekin et al. [56] further find that compares with men, women recycling behavior is more susceptible to behavioral control perception.
As stated above, environmental behavior is also affected by the external environmental context [37,57].Of the situational factors, social norms are the basic principles for determining and adjusting people's common activities and the relationships between people and are the necessary code of conduct for the entire society and members of various social groups [58].Whitmarsh [59] discovers that public pressure from families and neighbors is highly effective in directing the environmental behavior of residents and can be a significant factor in predicting behavioral intention.Therefore, the role of social norms in environmental behavior should be emphasized.Miliute-Plepiene et al. [30] also propose that social norms are particularly important to the early stages of recycling systems.In addition, the perceived pressure of social norms is particularly significant in the Chinese cultural environment, which encourages people to adopt relevant behavior to integrate into society smoothly [60,61].Chen et al. [62] and Poortinga et al. [63] also find that economic incentives are important external dependent variables that influence behavioral intentions.Although policy institutions are an important manifestation of government-constrained individual behavior and states can adopt persuasive or mandatory mechanisms to increase the enthusiasm of public participation [64], which are an important inducement for residents to participate in specific behavior [65].At the same time, publicity can enhance residents' perceptions and understanding, thereby improving residents' behavioral choices and regulating the influence of behavioral intention on recycling behaviors [32,66,67].However, some studies find that the impact of publicity on behavioral intention varies due to differences in social and cultural background [68].
In addition, the interpersonal behavior theory proposed by Triandis [69] states that behavioral habits and rules also have an impact on the occurrence of behavior.In other words, the more entrenched a specific habit is, the fewer obstacles there are for implementing behavior and the easier it is to generate the behavior.Michiyo [70] finds that when predicting recycling behavior, historical recovery experience is a better predictor than behavioral attitude; Tonglet et al. [26], Klöckner and Oppedal [71], and Knussen and Yule [72] also consider recycling behavior habits and lifestyle as important predictive variables and find that the recycling habits of men have a greater effect on behavioral intentions [56].
In summary, researchers hold many different opinions on the factors affecting the intention to recycle, yet most of them are based on the theoretical framework of TPB and ABC.TPB allows variables to be added to the theoretical model to enhance its explanatory power and predictive validity [31,39,73].Thus, this study, referring to TPB and ABC theory, augments indicator variables such as knowledge of recycling, concern about recycling problems, herd mentality, behavioral result perception, social norms, and historical recycling behavior to improve the accuracy of behavioral intention prediction.The questionnaire selects a set of variables that predict the behavioral intention of recycling, as shown in Figure 1: recycling behavioral attitudes (including environmental values, concern about recycling problems, and knowledge of recycling), recycling behavior recognition (including environmental responsibility, herd mentality, behavioral result perception, and perceived behavioral control), situational factors (including social norms, economic incentives, perceived effectiveness of policy and publicity), and historical recycling behavior (including Habits adjustment behavior, and Interpersonal facilitation behavior).The specific logical hypothesis is that the psychological characteristic factors, namely recycling behavior attitude (RBA) and recycling behavior recognition (RBR), as well as historical recycling behavior (HCB) have a direct impact on the behavioral intention of recycling and conservation (BIRC).The situational factor (SF) is the adjustment factor of the psychological characteristics affecting BIRC, and the social population variable (SPV) affects people's recycling behavioral habits.In addition, the normative nature of the recycling system [29] and the availability of recycling facilities [36,41,74] also have impacts on behavioral intentions.Therefore, these considerations are included when designing the questionnaire content.Table 1 presents the meanings of the abbreviations for these variables in the text.
Sustainability 2018, 10, x FOR PEER REVIEW 5 of 24 In addition, the interpersonal behavior theory proposed by Triandis [69] states that behavioral habits and rules also have an impact on the occurrence of behavior.In other words, the more entrenched a specific habit is, the fewer obstacles there are for implementing behavior and the easier it is to generate the behavior.Michiyo [70] finds that when predicting recycling behavior, historical recovery experience is a better predictor than behavioral attitude; Tonglet et al. [26], Klöckner and Oppedal [71], and Knussen and Yule [72] also consider recycling behavior habits and lifestyle as important predictive variables and find that the recycling habits of men have a greater effect on behavioral intentions [56].
In summary, researchers hold many different opinions on the factors affecting the intention to recycle, yet most of them are based on the theoretical framework of TPB and ABC.TPB allows variables to be added to the theoretical model to enhance its explanatory power and predictive validity [31,39,73].Thus, this study, referring to TPB and ABC theory, augments indicator variables such as knowledge of recycling, concern about recycling problems, herd mentality, behavioral result perception, social norms, and historical recycling behavior to improve the accuracy of behavioral intention prediction.The questionnaire selects a set of variables that predict the behavioral intention of recycling, as shown in Figure 1: recycling behavioral attitudes (including environmental values, concern about recycling problems, and knowledge of recycling), recycling behavior recognition (including environmental responsibility, herd mentality, behavioral result perception, and perceived behavioral control), situational factors (including social norms, economic incentives, perceived effectiveness of policy and publicity), and historical recycling behavior (including Habits adjustment behavior, and Interpersonal facilitation behavior).The specific logical hypothesis is that the psychological characteristic factors, namely recycling behavior attitude (RBA) and recycling behavior recognition (RBR), as well as historical recycling behavior (HCB) have a direct impact on the behavioral intention of recycling and conservation (BIRC).The situational factor (SF) is the adjustment factor of the psychological characteristics affecting BIRC, and the social population variable (SPV) affects people's recycling behavioral habits.In addition, the normative nature of the recycling system [29] and the availability of recycling facilities [36,41,74] also have impacts on behavioral intentions.Therefore, these considerations are included when designing the questionnaire content.Table 1 presents the meanings of the abbreviations for these variables in the text.

Regression Analysis
Since the theoretical model in this study adds extended variables base on TPB and ABC theory, to ensure the validity of the predictive model, a hierarchical regression analysis is first carried out on the variables.Base of the variance (R 2 ) explained by the models, it is evaluated whether adding variables is reasonable.Finally, a brief analysis is performed on the predictive effects of each variable on the BIRC to provide a reference to the prediction results of the neural network model.

Principal Component Analysis (PCA)
As there are many variables in the questionnaire and a certain degree of collinearity between them, if all the indicators are inputted into the neural network, the network complexity would increase and network training performance would reduce.However, abandoning some variable indicators would result in loss of information.With principal component analysis, feature dimensionality reduction can be achieved by orthogonally transforming multiple features into a few integrated features, so that the new main components coming from the original variables can describe or explain most of the features of the multivariate variance-covariance structure [75].This approach can decrease the correlation between neural network input variables, streamline neural network structure, and improve neural network prediction accuracy [76].

BP Neural Network
As a feed forward network using an error back propagation (BP) algorithm, a BP neural network is usually composed of an input layer, one or more hidden layers, and an output layer [77].According to the Kolmogorov theorem, as long as the number of hidden layer is 3, a BP network can achieve an approximation of any arbitrary precision [78].The neurons between the layers are connected by the corresponding network weights.The process of weight adjustment is the process of network learning until the network error reaches the convergence criterion.As a result, through back propagation the output approaches the expected output.The essence is to discover the mapping relations between input and output contained in the finite sample data, so that the appropriate output can be given for an untrained input.This generalization ability is important to measuring the performance of the neural network [79].
Although the BP neural network has strong nonlinear mapping ability, the gradient descent method used here depends on the initial conditions, and the network may converge on a local minimum value instead of the global minimum based on the gradient descent.To achieve a better fit, the network needs to debug the data multiple times, which will lead to over-fitting [80].Moreover, its learning speed, accuracy, and generalization ability are not ideal.

Bayesian Regularized Neural Network
Regularization refers to limiting the scale of weights and thresholds to improve the generalization ability of the neural network.In other words, on the basis of the neural network error function MSE, a penalty term, which can approximate the complex function, is added, thus improving the neural network function as the following Equation ( 1): where the square of the network weights is described as Equation ( 2): W i is the weight of the neural network connection; n is the total number of samples; E D is the sum of the residuals of the expected value and target value of the neural network; and α and β represent the regularization parameters that determine the training target of the neural network and control the degree of fit achieved.
Bayesian regularization takes the objective function of the traditional neural network model as a likelihood function.The regularizer corresponds to the prior probability distribution on the network weights, and the network weights are regarded as a random variable [81].A Bayesian regularization neural network refers to a forward neural network based on Bayesian regularization training [82].Using a hypothesized parameter probability distribution, this network learns in the whole weight space and evaluates relevant parameters.It then adjusts the regularization parameter and performs adaptive adjustment of the regularization parameters using Bayesian inference based on the posterior distribution [83].According to the probability density of weights to determine the optimal weighting function, and under the premise of ensuring the smallest squared network error, the weights are minimized to provide effective control of network complexity and to improve network generalization ability [84].Bayesian regularization optimizes the fit of the neural network of the training samples and minimizes model complexity by improving the training performance function of the neural network.

Questionnaire Survey and Scale Test
The Likert 5 evaluation method is employed for the questionnaire, where one indicates that the description of an item is completely inconsistent and five indicates that the item description is completely consistent.Upon finishing the questionnaire design, to ensure the validity of the questionnaire, a small-scale pre-study is carried out.A total of 187 questionnaires are distributed in the pre-study, of which 151 questionnaires are valid.The pre-study questionnaire is tested for reliability and validity; except for two variables, "environmental responsibility" and "social norms", the Cronbach's alpha coefficients of the other variables are all above 0.72.After checking the test results, it is finding that the "item and overall correlation coefficient" of each question under these two variables is less than 0.2, and therefore these two items are deleted from the formal questionnaire.After re-testing the reliability, the Cronbach's alpha coefficient of all variables is between 0.71 and 0.92, indicating that the modified questionnaire has a high degree of confidence.In the structural validity test, all variables are divided into independent, dependent, and regulatory variables.The KMO values of the three are all around 0.8.The Bartlett spherical test chi-square values is large enough, and the significant probability Sig is 0.000, indicating that the structure of the pre-study questionnaire is good.
After this, the formal questionnaire is distributed.A total of 628 questionnaires are collected, with 526 valid questionnaires and a recovery efficiency of 84%.Table 2 shows the reliability and validity tests results of the questionnaire.These results indicate that the credibility and structure of the questionnaire design are improved on the pre-study questionnaire and that the questionnaire data could be used for further regression and prediction.

Descriptive Statistical Analysis
Gender, age, education level, occupation type, monthly income level, city of residence, number of permanent residents in the household, and family type are incorporated into the demographic sociological variables in the questionnaire (see Table 3).The proportion of men and women is essentially balanced.Age is mainly concentrated on the interval between 18 and 50 years old, accounting for 98.2% of the sample.The proportion of people with education from junior college to a Master's degree is 93.6%, which is roughly consistent with the age distribution and academic level of online shopping customers in China.The cities of residence cover the eastern, central, and western regions in China, including Hong Kong, Macao, and Taiwan.The rest of the demographic social variables provide more comprehensive income and family type information.Therefore, from the perspective of demographic sociological variables, the questionnaire respondents have a certain degree of representativeness, indicating that they could be used as a microcosm to study the intention of EPWR in China.To test the explanatory power of each variable before predicting the BIRC, hierarchical regression of the data is performed by adding one dimension each time according to RBA, RBR, SF, and RBH.
Table 4 shows that R 2 , representing the degree of fit of the model, gradually increase as variables are added, and the path coefficients Sig of the four regressions are all below 0.005.The increase in R 2 is the largest when the psychological dimension of recovery is added; R 2 also increase when adding historical recycling behavior, but not to a marked extent.This specifies that on the basis of TPB and ABC theory, it is reasonable to add the extended variables to the questionnaire.As for the significance of the path coefficient of each variable (see Table 5), the predictive effects of ECV and SN are not obvious for each variable under the four dimensions, but other variables have significant predictive effects on BIRC.It is worth note that KR and ER have a negative predictive effect on BIRC.Analysis of the predictive effect also suggests that, apart from the two individual variables, the approach used in this study is effective against selecting and designing variables to predict BIRC.Since the regression prediction results are used as a reference in this study, the two factors with less dramatic predictive effects are not discarded.

Principal Component Analysis
Although the data in this study share the same dimensions, they are normalized and mapped to the 0-1 range to ensure convergence speed and accuracy of the iterative solution in later calculations.
The normalization formula is described as Equation (3): where X * is the raw data of a variable, X min and X max are, respectively, the minimum and maximum values in the original data, and X is the normalized data.The principal components of the variables are then extracted by principal component analysis.When extracting the principal component the eigenvalues are set >0.6 to improve the contribution rate of the principal components and to ensure the accuracy of the behavioral intention prediction.As a result, there is a certain discrepancy between the factor loading distribution dimension of each variable and the scale design.The cumulative variance explanation rate is 81.625%, which means that the eight factors could explain 81.625% of the information about the 15 variables.The variables are compressed and integrated while ensuring the information on raw data.Table 6 shows the maximum value of each variable index is extracted to obtain the specific variable meaning of the principal component in which it is located, and each component is marked with different color.Then, according to the principal component equation and the eigenvector matrix, the principal component values could be calculated.The principal component equation is Equation ( 4): where a j is the variable factor loading in the vector matrix, X j . is the normalized value of each variable, Y is the main component value, and the eight principal component values Y 1 -Y 8 are sequentially calculated.Pearson correlation analysis is carried out and there is no correlation between the principal components.Therefore, Y 1 -Y 8 can be utilized as inputs of the BP neural network prediction model.The topology of a neural network typically consists of an input layer, one or more hidden layers, and an output layer.Generally speaking, as long as a sufficient number of hidden-layer neurons are present, a three-layer network can fully approximate any nonlinear function of finite discontinuities with arbitrary precision to achieve an arbitrary nonlinear mapping.Therefore, this study constructs a three-layer neural network consisting of an input layer, a hidden layer, and an output layer.

Selection of BP Neural Network Nodes
In general, the number of input and output variables determines the number of nodes in the input and output layers.The input data in this study are a matrix consisting of eight principal component values.The output data are the BIRC score matrix.Therefore, the number of nodes in the input and output layers were eight and one, respectively.
The number of nodes in the hidden layer is especially important to neural network performance.With too few nodes, the network may not be able to learn and identify the input information fully; with too many nodes, excessive fitting and poor fault tolerance may result, and the model training time may be extended.While ensuring the accuracy of model prediction, the minimum number of hidden-layer nodes should be selected.There is still no clear and unified calculation formula to select the number of nodes.Usually, after repeated trials by operators, the optimal number of hidden-layer nodes is identified by measuring the network training errors and the quality of network fit.By using Equation ( 5), this study eventually determines the number of hidden-layer nodes as 15 when the training effect is optimal.Eventually a three-layer 8-15-1 neural network model is established: where n and m are the number of nodes in the input and output layers respectively and α. is arbitrary constants between 1 and 10.

Selection of BP Neural Network Training Function and Training Parameters
The training function of the network is the Bayesian regularization algorithm Trainbr, the training performance function is MSE, the transfer function from the input layer to the hidden layer is the sigmoid tangent function tansig, and the transfer function from the hidden layer to the output layer is the linear function purelin.The network learning rate Lris set to 0.05, the maximum number of training iteration steps are 1000, and the training convergence criterion is 0.001.At the 172nd epoch, the maximum MU value is 7.85 × 10 +10 and, therefore, the network stops learning.MU is a friction coefficient and will increase when further iterations make the error increase; reaching the maximum MU value indicates that the minimum error is found and the training converges [81].Figure 3 shows the training convergence process.The convergence curve reveals that after 172 iterations, the fitting accuracy reaches 0.00054935, and the number of effective network parameters is 137.In the model training process, convergence speed is fast, learning efficiency is high, and the trained network can be used as a test network of prediction.

Training of Neural Network Models
learning.MU is a friction coefficient and will increase when further iterations make the error increase; reaching the maximum MU value indicates that the minimum error is found and the training converges [81].Figure 3 shows the training convergence process.The convergence curve reveals that after 172 iterations, the fitting accuracy reaches 0.00054935, and the number of effective network parameters is 137.In the model training process, convergence speed is fast, learning efficiency is high, and the trained network can be used as a test network of prediction.

Predictive Simulation by the Neural Network Model
To verify the validity of the network as determined, 80 test data points are entered into the trained neural network model.The expected output of the test is the normalized measured value of BIRC, and the simulation prediction output is the training result after inputting 80 primary component values into the network.
Figure 4 shows that among the 80 samples, there are 69 errors in the interval [-0.1, 0.1]; the maximum error, which indicates that the prediction accuracy, is high.The predicted and measured values of most of the samples in Figure 5 coincide or show a similar trend.However, the positive error in Figure 4 is relatively greater, which led in Figure 5 to a larger ratio of predicted values that are greater than the measured values.

Predictive Simulation by the Neural Network Model
To verify the validity of the network as determined, 80 test data points are entered into the trained neural network model.The expected output of the test is the normalized measured value of BIRC, and the simulation prediction output is the training result after inputting 80 primary component values into the network.
Figure 4 shows that among the 80 samples, there are 69 errors in the interval [−0.1, 0.1]; the maximum error, which indicates that the prediction accuracy, is high.The predicted and measured values of most of the samples in Figure 5 coincide or show a similar trend.However, the positive error in Figure 4 is relatively greater, which led in Figure 5 to a larger ratio of predicted values that are greater than the measured values.
To verify the validity of the network as determined, 80 test data points are entered into the trained neural network model.The expected output of the test is the normalized measured value of BIRC, and the simulation prediction output is the training result after inputting 80 primary component values into the network.
Figure 4 shows that among the 80 samples, there are 69 errors in the interval [-0.1, 0.1]; the maximum error, which indicates that the prediction accuracy, is high.The predicted and measured values of most of the samples in Figure 5 coincide or show a similar trend.However, the positive error in Figure 4 is relatively greater, which led in Figure 5 to a larger ratio of predicted values that are greater than the measured values.Since the current predicted value is obtained from the normalized input matrix, the predicted value is anti-normalized and compared with the measured BIRC values of the 80 samples.Figure 6 shows the difference between the two values.Many decimal places in the data are found after normalization.To predict the participants' BIRC more intuitively, the anti-normalized data are rounded upward so that the predicted result corresponds to the five-level Likert scales in the questionnaire.After anti-normalization, there are 64 samples of zero error between predicted and measured values, and the errors in the remaining samples are contained within the interval [-1, +1].This further proves that a Bayesian regularized neural network model based on the principal components constructed in this study has the characteristics of high robustness and good ability to predict behavioral intentions.When analyzing from the perspective of degree of fit (see Figure 7), the training set has the highest fitting level, and the fitting degree of the test set also shows the high generalization ability of the model because the overall fitting degree of the model is close to 95%.Moreover, this model does Since the current predicted value is obtained from the normalized input matrix, the predicted value is anti-normalized and compared with the measured BIRC values of the 80 samples.Figure 6 shows the difference between the two values.Many decimal places in the data are found after normalization.To predict the participants' BIRC more intuitively, the anti-normalized data are rounded upward so that the predicted result corresponds to the five-level Likert scales in the questionnaire.After anti-normalization, there are 64 samples of zero error between predicted and measured values, and the errors in the remaining samples are contained within the interval [−1, +1].This further proves that a Bayesian regularized neural network model based on the principal components constructed in this study has the characteristics of high robustness and good ability to predict behavioral intentions.
rounded upward so that the predicted result corresponds to the five-level Likert scales in the questionnaire.After anti-normalization, there are 64 samples of zero error between predicted and measured values, and the errors in the remaining samples are contained within the interval [-1, +1].This further proves that a Bayesian regularized neural network model based on the principal components constructed in this study has the characteristics of high robustness and good ability to predict behavioral intentions.When analyzing from the perspective of degree of fit (see Figure 7), the training set has the highest fitting level, and the fitting degree of the test set also shows the high generalization ability of the model because the overall fitting degree of the model is close to 95%.Moreover, this model does not exhibit the phenomenon that the prediction error is increased after the error has been reduced to a certain value (also called over-fitting).These results all prove the reasonableness of the neural network model design, as well as the selection of training methods and parameters, which further proves that a good choice of input variables can help achieve ideal training results.When analyzing from the perspective of degree of fit (see Figure 7), the training set has the highest fitting level, and the fitting degree of the test set also shows the high generalization ability of the model because the overall fitting degree of the model is close to 95%.Moreover, this model does not exhibit the phenomenon that the prediction error is increased after the error has been reduced to a certain value (also called over-fitting).These results all prove the reasonableness of the neural network model design, as well as the selection of training methods and parameters, which further proves that a good choice of input variables can help achieve ideal training results.

Results of Sensitivity Calculation for Principal Components
According to Garson [85], the influence of the input variable or the relative contribution value can be calculated as the product of the connection weight automatically adjusted by the neural network in training, which can reflect the degree of influence of the input variable on the output variable, i.e., the sensitivity.The sensitivity coefficient formula is shown in Equation ( 6):  According to Garson [85], the influence of the input variable or the relative contribution value can be calculated as the product of the connection weight automatically adjusted by the neural network in training, which can reflect the degree of influence of the input variable on the output variable, i.e., the sensitivity.The sensitivity coefficient formula is shown in Equation ( 6): where I j is the weight of influence of the jth inputs variable on the output variable, N i .and N h .are the numbers of input-layer and hidden-layer nodes, W ih . is the weight of the input layer to the hidden layer, and W ho is the weight of the hidden layer to the output layer.A larger I j value indicates a greater impact on the output and a higher sensitivity.Table 7 shows the weights upon completion of neural network training.Figure 8 presents the sensitivity coefficients of principal components 1-8 and the variables of the principal components that are substituted into the calculation formula.Of the eight principal components, the sensitivity coefficients of the second to sixth principal components are all greater than 12%, and that of the third principal component reaches 15.83%.This result shows that in this study, Behavioral Results Perception and Perceived Behavioral Control under the psychological cognitive dimension of EPWR have the most significant predictive effects on behavioral intention.The other four main components with a sensitivity coefficient greater than Of the eight principal components, the sensitivity coefficients of the second to sixth principal components are all greater than 12%, and that of the third principal component reaches 15.83%.This result shows that in this study, Behavioral Results Perception and Perceived Behavioral Control under the psychological cognitive dimension of EPWR have the most significant predictive effects on behavioral intention.The other four main components with a sensitivity coefficient greater than 12% is the following: KR, CRP; ER, HAB, SN; EI, PEP, and HM.Their predictive influence on behavioral intentions is more dramatic, and the variables just named explain most of the psychological characteristics of recovery, situational factors, and historical recycling behavior.Contrary to general expectation, although the sensitivity of the three values in the recycling behavior attitude dimension is greater than 10%, their predictive effect is not as good as that of other variables.The sensitivity factor of the seventh principal component is the lowest, indicating that the publicity index has limited effectiveness in predicting BIRC relative to other variables.

Analysis and Discussion of Sensitivity Results
Generally speaking, among all the variable dimensions to predict BIRC, the psychological cognitive dimension of recycling behavior has the most influential effect on behavioral intention prediction, which is consistent with the theory that behavior is the external activity dominated by psychology.The predictive utility of BRP and PBC also proves that the generation of behavior is regulated by the behavioral perception results [54,55], which plays a decisive role in the generation of BIRC through psychological cognition in EPWR.In other words, the richness of the resources and opportunities required by the individual to complete EPWR behavior largely determines their BIRC.According to social psychology, the individual's social psychology is also restricted by others or groups, which expresses certain social characteristics [25,58].Therefore, the higher sensitivity coefficients of HM and SN are also supported.This indicates that residents will learn and imitate the EPWR behavior of others or follow relative social practice when performing recycling behavior, thereby shaping a correct recycling awareness and understanding of the importance of their behavior being recognized by other people and the society.It is worth mentioning that according to the regression analysis described in Section 3.3.2, the predictive effect of SN on BIRC is not obvious.However, the principle of regression prediction is the influence of each single variable on the dependent variable after controlling for other variables, and SN is an important part of SF.In the theoretical model used in this study, SF regulates attitude and the psychological cognition of recycling behavior.Therefore, when predicting BIRC in EPWR, the single impact of social norms is limited, but it influences other variables to achieve more significant regulatory effects.
Behavioral attitude also plays a decisive role in behavioral intentions as an important dimension in theory of planned behavior.The coefficients of KR and CRP are relatively normal, but the regression prediction results show that the degree of mastery of KR is a negative predictor of BIRC.It is believed that, compared with other environmentally friendly behavior, EPWR behavior has lower grades and meager returns.Therefore, EPWR has not been officially and comprehensively promoted in China, which makes it fall into the "gray angel" of citizen perceptions.In other words, people that acquire more and deeper-level KR or enjoy higher education levels may be less involved in EPWR because of their busy work schedules, high salaries, and Chinese traditional concept of face [55].Nevertheless, people with lower education levels, due to unstable work, poor income, and the benefits available from EPWR, set recycling as a part of their source of income and would actively participate in recycling for money.People who are willing to recycle may be driven by the economic benefits rather than environmental values.Those with positive values may find it either too troublesome to participate in EPWR, or inconsistent with their status.Especially, the economic benefits of recycling are far less than their own wealth.Thus, this results in an attitude-behavior gap, which further leads to lower predictive efficacy for BIRC.This line of reasoning can also be used as an explanation for the negative predictive effect of ER on BIRC in regression prediction.
Apart from the psychological factors of the behavioral producers themselves, the external stimulation system also plays a regulatory role and is treated as a primary factor of determining behavior.Therefore, the government's adoption of economic means or the formulation and implementation of relevant policies and regulations can play a positive guiding role in EPWR behavior, but can also stiffen persistence in non-recycling behavior.Publicity is not as much of an incentive as economic means, nor is it as binding as policy means, thereby making the variable index with the lowest sensitivity coefficient when predicting BIRC in EPWR.This finding is different from the conclusions of some other researchers [26,56].From the perspective of historical recycling habits, both HAB and IFB are found to have more than 10% sensitivity to recycling behavior intentions, and HAB combined with ER and SN is found to have a greater degree of influence.This shows that active persuasion and encouragement have an impact on BIRC, but the individual's own EPWR experience and habits have a more profound effect on their own BIRC.When the behavior is completed, a "warm effect" will be generated, which can promote the generation of further environmentally friendly behavior [86], especially for low-cost environmentally friendly activities like EPWR.
Briefly, in the future EPWR process, more attention should be paid to the psychological cognition of residents, ensuring that they have a high sense of environmental responsibility, setting a role model to provide a reference template for residents' recycling behavior, and shaping a social atmosphere that encourages residents to participate in recycling.For those who do not have comprehensive environmental values and knowledge of recycling, efforts should be made to enable them to realize on a fundamental level that participating in EPWR is not only an act leading to certain economic benefits, but also an environmentally friendly behavior that can effectively protect the environment.As for residents that have higher environmental awareness, they should be brought to realize that participating in EPWR will not make them lose face, but is rather worthy of promotion, thus enabling them to transfer correct environmental awareness and attitude successfully into environmental behavior.In addition, a focus is also needed on cultivating and enhancing residents' positive BRP and PBC, enhancing exposure to the news media to the status quo of express packaging waste to enhance residents' attention to this issue and strengthening the popularization of relevant recycling knowledge.Therefore, it subtly influences the environmental values of residents.In terms of external situational factors, relevant departments should not increase publicity intensity excessively, but should put more energy into implementing economic instruments and policy means and establishing social norms to guide parents and train their children from an early age to recycle waste for environmental purposes.As a result, recycling behavior is habituated and rooted.This will also have a positive impact on individuals and even groups of the children's social circle, and a benign circle will be shaped.

Conclusions
This study has predicted the intention of express packaging waste recycling behavior based on data collected from a questionnaire.The questionnaire is designed based on TPB theory and ABC theory, and relevant literature is used to expand the set of variables: recycling knowledge, concern about recycling problems, herd mentality, behavioral results perception, social norms, and recycling history behavior.Recycling behavioral attitudes, recycling psychological cognition, situational factors, and historical recycling behavior constitute a variable dimension that measures behavioral intentions.In this study, a regression analysis is carried out on the rationality of extending the set of variables, and 15 variables are extracted into 8 principal components.This avoids collinearity between variables while simplifying the variable set, thus improving the training efficiency and predictive accuracy of the neural network model.Subsequently, the eight principal component values are entered into the neural network model, and 526 questionnaires are classified into a training set and a test set, with the former used to train the neural network model and the latter to verify the validity of the model after training.Finally, the sensitivity of each principal component to the output result is analyzed base on the weights in the neural network model.The main conclusions are as follows: 1.
The extended variable of historical recycling behavior effectively improves the predictive power of the intention to recycle.

2.
The input of the neural network can be effectively streamlined by extracting the principal components from the variables.

3.
A neural network based on Bayesian regularization can optimize the generalization ability of the network: the fitting precision is 0.0054935 after 172 iterations, and an ideal training effect is achieved.The simulation results from the verification set reveal that this study shows certain rationality in the selection of variables and training models.In the future, the attitude of BIRC could be accurately predicted by metrics of related variables.4.
According to the calculation results of the Garson formula, the sensitivity coefficients of behavioral result perception and perceived behavioral control are the highest among the second-level variables, whereas the sensitivity coefficient of publicity is the lowest.The predictive effect of values on behavioral intention is low, thus indicating a behavior-attitude gap that has arisen in the recycling behavior of citizens.The sensitivity of cognitive behavior among first-level variables is the highest, highlighting the importance of psychological cognition in recycling practice.As for recycling behavior attitude, concern about recycling problems and knowledge of recycling have a good predictive effect on behavioral intention.Social norms, economic incentives, and perceived effectiveness of policy of the situational factors all have higher sensitivity to behavioral intentions.Historical recycling behavior also makes a better contribution to behavioral intention prediction, and the forecasting accuracy for habit adjustment behavior is better.
As China's express packaging waste problem becomes worse, this study fills in some of the blanks in research into EPWR behavioral intentions.Apart from previous studies that utilize regression or BP neural network models to predict behavior or behavioral intentions, this study employs the Bayesian regularized neural network based on the main components of the variable index to predict behavioral intention.This methodology has strong generalization ability and high prediction accuracy, thereby achieving a sound balance between complexity and degree of fit of the neural network model.By measuring and analyzing the sensitivity of the eight principal components to behavioral intentions, this study could provide a reference to the government or relevant departments to conduct EPWR activities and encourage the public to become involved in EPWR.

Figure 1 .Figure 1 .
Figure 1.Research model of intention of express packaging waste recycling.
Discussion of the Bayesian Regularized BP Neural Network Model 5.1.Construction of the Bayesian Regularized BP Neural Network Model 5.1.1.Determination of the Number of BP Neural Network Layers

Figure 2
Figure 2 presents the model map used in this study.15 variable indicators are compressed into eight principal components by principal component analysis, and then the eight principal component values are used as input to the 8-15-1 three-layer Bayesian regularized BP neural network model.The 526 questionnaires collected are classified into training and validation sets, of which 446 are training sets and 80 are validation sets.The training sets are inputted into the established neural network model.At the 172nd epoch, the maximum MU value is 7.85 × 10 +10 and, therefore, the network stops learning.MU is a friction coefficient and will increase when further iterations make the error increase; reaching the maximum MU value indicates that the minimum error is found and the training converges[81].Figure3shows the training convergence process.The convergence curve reveals that after 172 iterations, the fitting accuracy reaches 0.00054935, and the number of effective network parameters is 137.In the model training process, convergence speed is fast, learning efficiency is high, and the trained network can be used as a test network of prediction.

Figure 2 .
Figure 2. Route map of PCA and Bayesian BP network.Figure 2. Route map of PCA and Bayesian BP network.

Figure 2 .
Figure 2. Route map of PCA and Bayesian BP network.Figure 2. Route map of PCA and Bayesian BP network.Sustainability 2018, 10, x FOR PEER REVIEW 14 of 24

Figure 3 .
Figure 3. Training process of neural network.

Figure 3 .
Figure 3. Training process of neural network.

Figure 4 .
Figure 4. Comparison error between predicted value and measured value.Figure 4. Comparison error between predicted value and measured value.

Figure 4 .
Figure 4. Comparison error between predicted value and measured value.Figure 4. Comparison error between predicted value and measured value.

24 Figure 5 .
Figure 5. Coincidence graph of predicted value and measured value.

Figure 6 .
Figure 6.Comparison error between predicted value and measured value after anti-normalized.

Figure 5 .
Figure 5. Coincidence graph of predicted value and measured value.

Figure 6 .
Figure 6.Comparison error between predicted value and measured value after anti-normalized.

Figure 6 .
Figure 6.Comparison error between predicted value and measured value after anti-normalized.

Figure 7 .
Figure 7. Fit of training and test sample.

Figure 7 .
Figure 7. Fit of training and test sample.

5. 3 .
Sensitivity Analysis Based on Neural Network Output Weights 5.3.1.Results of Sensitivity Calculation for Principal Components

Figure 8 .
Figure 8. Sensitivity coefficients and variable meanings of the main components for the output.

Table 2 .
Reliability and validity test results of questionnaire.

Table 3 .
Descriptive analysis of questionnaire population variables.
Note: 1 means married but do not have children or children that do not live together; 2 means married and live with children.
4.3.2.Analysis of the Predictive Effect of BIRCFurthermore, the predictive effects of RBA, RPR, SF, and RBH on the BIRC are analyzed by using linear regression to prior detection for neural networks.

Table 5 .
Analysis of the predictive effects of RBA, RPR SF, and RBH on the BIRC.

Table 6 .
Orthogonal rotation of matrix for components.

Table 7 .
Connection weight matrix of input layer to hidden layer and hidden layer to output layer.
Sensitivity coefficients and variable meanings of the main components for the output.