Using Dynamic Adjusting NGHS-ANN for Predicting the Recidivism Rate of Commuted Prisoners

: Commutation is a judicial policy that is implemented in most countries. The recidivism rate of commuted prisoners directly a ﬀ ects people’s perceptions and trust of commutation. Hence, if the recidivism rate of a commuted prisoner could be accurately predicted before the person returns to society, the number of reo ﬀ ences could be reduced; thereby, enhancing trust in the process. Therefore, it is of considerable importance that the recidivism rates of commuted prisoners are accurately predicted. The dynamic adjusting novel global harmony search (DANGHS) algorithm, as proposed in 2018, is an improved algorithm that combines dynamic parameter adjustment strategies and the novel global harmony search (NGHS). The DANGHS algorithm improves the searching ability of the NGHS algorithm by using dynamic adjustment strategies for genetic mutation probability. In this paper, we combined the DANGHS algorithm and an artiﬁcial neural network (ANN) into a DANGHS-ANN forecasting system to predict the recidivism rate of commuted prisoners. To verify the prediction performance of the DANGHS-ANN algorithm, we compared the experimental results with ﬁve other forecasting systems. The results showed that the proposed DANGHS-ANN algorithm gave more accurate predictions. In addition, the use of the threshold linear posterior decreasing strategy with the DANGHS-ANN forecasting system resulted in more accurate predictions of recidivism. Finally, the metaheuristic algorithm performs better searches with the dynamic parameter adjustment strategy than without it.


Introduction
Parole is the temporary and conditional release of a prisoner prior to the completion of their maximum sentence period.Commutation is the substitution of a lesser penalty for that originally given at the time of conviction.However, whether on parole or commutation, if the prisoner reoffends it can cause social disruption.This highlights the need for accurate recidivism predictions for parolees and commutation offenders.Carroll et al. [1] stated that "in Pennsylvania, the parolees with alcohol problems, younger parolees, and those originally convicted of property crimes (rather than assaultive or drug crimes) were more likely to commit new crimes on parole.Offenders with past heroin use were convicted of more serious crimes on parole.Absconding was significantly more predictable for cases with prior convictions, previous parole violations, and miscellaneous negative statements by the institution about the inmate's personality."In Williams' paper [2], they demonstrated that in California, non-sex offenders, drug registrants, offenders with more than one felony conviction, frequently unemployed offenders, offenders with unstable living arrangements, offenders aged 25 to 30, previous parole violators, and unmarried offenders were more likely to abscond without leave.MacKenzie and Spencer [3] showed that in northern Virginia, offenders committed more crimes when they had high-risk behaviors such as using drugs, alcohol abuse, and carrying a gun; conversely, they committed fewer crimes when they were employed or lived with spouses.In Benda's paper [4], the findings indicated that caregiving factors have an inverse relationship with the rate of recidivism.Therefore, low self-control, drug use and sales, gang membership, peer association with criminals, carrying weapons, and poor social skills have a positive relationship with recidivism rates.Trulson et al. [5] stated that "generally, males, those younger at first contact with the juvenile justice system, those with a greater number of felony adjudications, gang members, institutional dangers, those in poverty, and those with mental health issues were significantly more likely to recidivate."Previous studies have focused on qualitative research or statistical analysis.However, over the past two decades, many artificial intelligence methods, such as the artificial neural network (ANN) [6,7], support vector machine (SVM) [8], association rule (AR) [9], etc., have been developed and applied in many problems.Among these methods, the artificial neural network has been widely used to solve numerous types of forecasting problems and could derive a better accuracy than SVM in supervised learning [8,10].Therefore, this paper adopted ANN as its forecasting tool.
An ANN is a computational mechanism that is inspired by the human brain.A typical ANN structure, also known as a multilayer perceptron (MLP), contains a number of layers each composed of several basic components.The first layer is called input layer; the last layer is called output layer, and the other layers are hidden layers [11].There are two types of basic components in a typical neural-network structure, namely, neurons and the links between them, as shown in Figure 1.The n i neurons are the processing elements, and the links are the interconnections.Every link has a corresponding w j weight parameter or b i bias parameter.When a neuron receives stimuli from other neurons via the links, it processes the information and produces an output.There are three kinds of neurons, and as per the layers, they are categorized as input, hidden, and output.Input neurons receive stimuli from outside the network.Hidden neurons receive stimuli from neurons at the front of the network and relay the output to neurons at the back of the network.Output neurons transfer the output externally [12].
offenders aged 25 to 30, previous parole violators, and unmarried offenders were more likely to abscond without leave.MacKenzie and Spencer [3] showed that in northern Virginia, offenders committed more crimes when they had high-risk behaviors such as using drugs, alcohol abuse, and carrying a gun; conversely, they committed fewer crimes when they were employed or lived with spouses.In Benda's paper [4], the findings indicated that caregiving factors have an inverse relationship with the rate of recidivism.Therefore, low self-control, drug use and sales, gang membership, peer association with criminals, carrying weapons, and poor social skills have a positive relationship with recidivism rates.Trulson et al. [5] stated that "generally, males, those younger at first contact with the juvenile justice system, those with a greater number of felony adjudications, gang members, institutional dangers, those in poverty, and those with mental health issues were significantly more likely to recidivate."Previous studies have focused on qualitative research or statistical analysis.However, over the past two decades, many artificial intelligence methods, such as the artificial neural network (ANN) [6,7], support vector machine (SVM) [8], association rule (AR) [9], etc., have been developed and applied in many problems.Among these methods, the artificial neural network has been widely used to solve numerous types of forecasting problems and could derive a better accuracy than SVM in supervised learning [8,10].Therefore, this paper adopted ANN as its forecasting tool.
An ANN is a computational mechanism that is inspired by the human brain.A typical ANN structure, also known as a multilayer perceptron (MLP), contains a number of layers each composed of several basic components.The first layer is called input layer; the last layer is called output layer, and the other layers are hidden layers [11].There are two types of basic components in a typical neuralnetwork structure, namely, neurons and the links between them, as shown in Figure 1.The  neurons are the processing elements, and the links are the interconnections.Every link has a corresponding  weight parameter or  bias parameter.When a neuron receives stimuli from other neurons via the links, it processes the information and produces an output.There are three kinds of neurons, and as per the layers, they are categorized as input, hidden, and output.Input neurons receive stimuli from outside the network.Hidden neurons receive stimuli from neurons at the front of the network and relay the output to neurons at the back of the network.Output neurons transfer the output externally [12].In the ANN system, the weights were updated by a systemic algorithm during the learning process.The backpropagation (BP) learning algorithm is the most popular procedure for training an ANN [13].Initially, the weights and biases are assigned randomly.The BP uses gradient descent to search the point(s) with minimum error on an error surface (error as a function of the ANN weights In the ANN system, the weights were updated by a systemic algorithm during the learning process.The backpropagation (BP) learning algorithm is the most popular procedure for training an ANN [13].Initially, the weights and biases are assigned randomly.The BP uses gradient descent to search the point(s) with minimum error on an error surface (error as a function of the ANN weights and Mathematics 2019, 7, 1187 3 of 25 biases).In other words, the weights and biases are updated using the error, which is calculated with the output and actual data.Once the neural network training was completed, we could predict or classify new data using the calculation with the received stimuli (the new input data), the weights, and the biases.However, the gradient descent, i.e., the learning process of the backpropagation network (BPN), is easy to trap within the local optimum.
To eradicate the aforementioned disadvantage of BPN, combinations of different metaheuristic algorithms and ANN (metaheuristic-ANN) are used, as presented in many studies.Kattan and Abdullah [14] trained the ANNs for pattern-classification problems using the harmony search (HS) algorithm.Tavakoli et al. [11] used the HS-ANN, novel global harmony search ANN (NGHS-ANN), and intelligent global harmony search ANN (IGHS-ANN) for three well-known classification problems.Kumaran and Ravi [15] combined the ANN and the HS algorithm for forecasting long-term, sector-wise electrical energy use.Göçken et al. [16] integrated metaheuristic and ANN algorithms for improved stock price prediction.In these papers, the results showed that metaheuristic-ANN has a better forecasting ability than BPN.Moreover, in the past two decades, HS and varied HS algorithms have been widely proposed, discussed, and applied to many studies.Therefore, we combined different HS and ANN algorithms in this paper.
Metaheuristic-ANN involves a trial solution of metaheuristic algorithms with set weights and biases.Metaheuristic-ANN used a random search mechanism to determine the best weights and biases to minimize forecasting error, such as the mean squared error (MSE), the error rate, and so on.Figure 2 shows a small sample presenting the relationship between a metaheuristic algorithm and ANN.There are two input neurons, three hidden neurons, and one output neuron in Figure 2. Figure 3 shows the procedure of Metaheuristic-ANN.The MSE and error rates are given in Equations ( 1) and (2).
Mathematics 2019, 7, 1187 3 of 25 and biases).In other words, the weights and biases are updated using the error, which is calculated with the output and actual data.Once the neural network training was completed, we could predict or classify new data using the calculation with the received stimuli (the new input data), the weights, and the biases.However, the gradient descent, i.e., the learning process of the backpropagation network (BPN), is easy to trap within the local optimum.
To eradicate the aforementioned disadvantage of BPN, combinations of different metaheuristic algorithms and ANN (metaheuristic-ANN) are used, as presented in many studies.Kattan and Abdullah [14] trained the ANNs for pattern-classification problems using the harmony search (HS) algorithm.Tavakoli et al. [11] used the HS-ANN, novel global harmony search ANN (NGHS-ANN), and intelligent global harmony search ANN (IGHS-ANN) for three well-known classification problems.Kumaran and Ravi [15] combined the ANN and the HS algorithm for forecasting longterm, sector-wise electrical energy use.Göçken et al. [16] integrated metaheuristic and ANN algorithms for improved stock price prediction.In these papers, the results showed that metaheuristic-ANN has a better forecasting ability than BPN.Moreover, in the past two decades, HS and varied HS algorithms have been widely proposed, discussed, and applied to many studies.Therefore, we combined different HS and ANN algorithms in this paper.
Metaheuristic-ANN involves a trial solution of metaheuristic algorithms with set weights and biases.Metaheuristic-ANN used a random search mechanism to determine the best weights and biases to minimize forecasting error, such as the mean squared error (MSE), the error rate, and so on.Figure 2 shows a small sample presenting the relationship between a metaheuristic algorithm and ANN.There are two input neurons, three hidden neurons, and one output neuron in Figure 2. Figure 3 shows the procedure of Metaheuristic-ANN.The MSE and error rates are given in Equations ( 1) and (2).
A dynamic adjusting novel global harmony search (DANGHS) algorithm [17], as proposed in 2018, is a novel metaheuristic algorithm that combines a novel global harmony search (NGHS) [18][19][20] with dynamic adjusting parameter strategies.In NGHS, the value of the genetic mutation probability ( ) is a fixed given value.However, the searching ability of a metaheuristic algorithm can be improved by the appropriate parameters; the importance of which has been described in many studies [11,[18][19][20][21][22].Therefore, in DANGHS, the genetic mutation probability is dynamically adjusted for each iteration.Chiu et al. [17] found that the DANGHS algorithm is more efficient and effective than other HS algorithms.However, in their paper, they used the DANGHS algorithm to solve 14 benchmark continuous optimization problems only.In this paper, we would like to investigate the searching performance of DANGHS algorithm further.Therefore, a DANGHS-ANN recidivism forecasting system was proposed for the purposes of this paper.According to the numerical results, the DANGHS-ANN system provided more accurate forecasts than the five other systems (BPN, HS-ANN, IHS-ANN, SGHS-ANN, and NGHS-ANN).
The remainder of this paper is divided into three sections.Section 2 introduces the harmony search (HS), improved harmony search (IHS), self-adaptive global best harmony search (SGHS), novel global harmony search (NGHS), and dynamic adjusting novel global harmony search (DANGHS) algorithms.Section 3 discusses the experiments carried out to test and compare the performances of the six forecasting systems.Conclusions and suggestions for future research are provided in Section 4.

A Review of Five Harmony Search Algorithms
In this section, HS, IHS, SGHS, NGHS, and DANGHS are reviewed.
A dynamic adjusting novel global harmony search (DANGHS) algorithm [17], as proposed in 2018, is a novel metaheuristic algorithm that combines a novel global harmony search (NGHS) [18][19][20] with dynamic adjusting parameter strategies.In NGHS, the value of the genetic mutation probability (p m ) is a fixed given value.However, the searching ability of a metaheuristic algorithm can be improved by the appropriate parameters; the importance of which has been described in many studies [11,[18][19][20][21][22].Therefore, in DANGHS, the genetic mutation probability is dynamically adjusted for each iteration.Chiu et al. [17] found that the DANGHS algorithm is more efficient and effective than other HS algorithms.However, in their paper, they used the DANGHS algorithm to solve 14 benchmark continuous optimization problems only.In this paper, we would like to investigate the searching performance of DANGHS algorithm further.Therefore, a DANGHS-ANN recidivism forecasting system was proposed for the purposes of this paper.According to the numerical results, the DANGHS-ANN system provided more accurate forecasts than the five other systems (BPN, HS-ANN, IHS-ANN, SGHS-ANN, and NGHS-ANN).
The remainder of this paper is divided into three sections.Section 2 introduces the harmony search (HS), improved harmony search (IHS), self-adaptive global best harmony search (SGHS), novel global harmony search (NGHS), and dynamic adjusting novel global harmony search (DANGHS) algorithms.Section 3 discusses the experiments carried out to test and compare the performances of the six forecasting systems.Conclusions and suggestions for future research are provided in Section 4.

A Review of Five Harmony Search Algorithms
In this section, HS, IHS, SGHS, NGHS, and DANGHS are reviewed.

Harmony Search Algorithm
Geem, Kim, and Loganathan [23] first proposed the HS algorithm in 2001.In concept, HS is similar to other metaheuristic algorithms such as genetic algorithm (GA), particle swarm optimization (PSO), and ant colony optimization (ACO).These algorithms combine the rules of randomness to imitate the processes that inspired them.The HS algorithm draws its inspiration from the improvisation process of musicians, such as a jazz trio, and not from biological or physical processes [11,14].
In musical improvisation, musicians play pitches within a set range, and then combine and order them to form a harmony.If the harmony is pleasant, it is stored in each musician's memory thereby increasing the possibility of performing it again in the future [24].Similarly, in engineering optimization, each decision variable initially selects a value within a given, feasible range, and then combines all the variables together to give a single solution vector [20].In the HS algorithm, the trial solution (harmony) for the problem is be comprised of several decision variable values (pitches).Thus, a pleasing harmony means a good trial solution to the problem [11].If all the decision variable values compose a good trial solution, then the decision variable values are stored in each variable's memory.Therefore, the possibility of generating a good solution in the future is increased [20].Figure 4 shows a comparison between music improvisation and engineering optimization.In Figure 4, each musician of the jazz trio plays an instrument simultaneously to form a harmony.The pitch of the piano represents the value of the decision variable 1.

Harmony Search Algorithm
Geem, Kim, and Loganathan [23] first proposed the HS algorithm in 2001.In concept, HS is similar to other metaheuristic algorithms such as genetic algorithm (GA), particle swarm optimization (PSO), and ant colony optimization (ACO).These algorithms combine the rules of randomness to imitate the processes that inspired them.The HS algorithm draws its inspiration from the improvisation process of musicians, such as a jazz trio, and not from biological or physical processes [11,14].
In musical improvisation, musicians play pitches within a set range, and then combine and order them to form a harmony.If the harmony is pleasant, it is stored in each musician's memory thereby increasing the possibility of performing it again in the future [24].Similarly, in engineering optimization, each decision variable initially selects a value within a given, feasible range, and then combines all the variables together to give a single solution vector [20].In the HS algorithm, the trial solution (harmony) for the problem is be comprised of several decision variable values (pitches).Thus, a pleasing harmony means a good trial solution to the problem [11].If all the decision variable values compose a good trial solution, then the decision variable values are stored in each variable's memory.Therefore, the possibility of generating a good solution in the future is increased [20].Figure 4 shows a comparison between music improvisation and engineering optimization.In Figure 4, each musician of the jazz trio plays an instrument simultaneously to form a harmony.The pitch of the piano represents the value of the decision variable 1.The HS algorithm consists of several parameters.These parameters are the harmony memory size (m), the harmony memory considering rate (HMCR), the pitch adjusting rate (PAR), the bandwidth (BW), and the maximum number of iterations (NI).Among these parameters, the HMCR, PAR, and BW are particularly important because the HS generates a new trial solution from harmony memory (HM) or random selection according to the HMCR, and then the HS adjusts the new trial solution using the PAR and BW.The whole search process of HS algorithm can be described as the following steps.

•
Step 2: Generate the initial solutions (harmony memory) randomly and calculate the fitness of each solution.

•
Step 3: Generate a trial solution by HMCR, PAR, and BW.The pseudocode of HS algorithm is shown in Algorithm 1.
Algorithm 1 The Pseudocode of HS 1: For j = 1 to D do 2: If r ≤ HMCR then The HS algorithm consists of several parameters.These parameters are the harmony memory size (m), the harmony memory considering rate (HMCR), the pitch adjusting rate (PAR), the bandwidth (BW), and the maximum number of iterations (NI).Among these parameters, the HMCR, PAR, and BW are particularly important because the HS generates a new trial solution from harmony memory (HM) or random selection according to the HMCR, and then the HS adjusts the new trial solution using the PAR and BW.The whole search process of HS algorithm can be described as the following steps.

•
Step 1: Determine the problem and initial algorithm parameters, including m, HMCR, PAR, BW, current iteration k = 1, and NI.

•
Step 2: Generate the initial solutions (harmony memory) randomly and calculate the fitness of each solution.

•
Step 3: Generate a trial solution by HMCR, PAR, and BW.The pseudocode of HS algorithm is shown in Algorithm 1.
Algorithm 1 The Pseudocode of HS 1: For j = 1 to D do 2: If r 1 ≤ HMCR then 3: Here, D represents the number of problem dimensions.r 1 , r 2 , r 3 and r 4 represent the random numbers in the region of [0, 1].x k ij (i = 1, 2, . . ., m; j = 1, 2, . . ., D) represents the jth component of the ith solution in current iteration k. x jL represents the lower bound for decision variables x j , and x jU represents the upper bound.

•
Step 4: If the trial solution is better than the worst solution in the HM, replace the worst solution by the trial solution.

•
Step 5: If the maximum number of iterations NI is satisfied, return the best solution in the HM; otherwise, the current iteration k = k + 1 and go back to step 3.

Improved Harmony Search Algorithm
Mahdavi, Fesanghary, and Damangir [25] presented the IHS algorithm in 2007 for solving optimization problems.The main difference between IHS and the traditional HS method is that two key parameters are adjusted in each iteration, demonstrated by Equations ( 3) and (4).These parameters are PAR and BW.In their paper, they state that PAR and BW are important parameters to search decision variable values.These two parameters can potentially be useful in speeding up the convergence rate of the HS to the optimal solution.Therefore, the fine adjustment of these parameters is of particular interest.
In Equation ( 3), PAR k represents the pitch adjustment rates in the current iteration k; PAR min is the minimum adjustment rates, and PAR max is the maximum adjustment rates.In Equation ( 4), BW k is the distance bandwidth in current iteration k; BW min is the minimum bandwidth, and BW max is the maximum bandwidth.Figure 5 shows that the PAR and BW values vary dynamically with the iteration number.

Self-Adaptive Global Best Harmony Search Algorithm
In 2010, Pan et al. [21] introduced the SGHS algorithm for continuous optimization problems.The main difference between the SGHS and traditional HS method is that HMCR and PAR are dynamically adjusted using a normal distribution and BW is altered for each iteration.
In each iteration , SGHS generated the value of   by the mean HMCR (  ) and its standard deviation.Similarly, the value of   was calculated by the mean PAR (  ) and its standard deviation.In their paper, the   is in the range of [0.9, 1.0] and the standard deviation of HMCR is 0.01; the   is in the range of [0.0, 1.0] and the standard deviation of PAR is 0.05.
Furthermore, when the generated harmony was better than the worst harmony in HM, SGHS recorded the   and   .After a specified learning period (LP), SGHS recalculated the   by averaging all the recorded   values during the learning period.In the same way,   was recalculated by averaging all the recorded   values.In subsequent iterations, SGHS generated new   and   values by the new   ,   , and the given standard deviation.
In addition,   was decreased in the first half iterations in Equation ( 5), and then   was a fixed value (  ) in the second half iterations.
The whole search process of SGHS algorithm can be described as the following steps.


Step 2: Generate the initial solutions (harmony memory) randomly and calculate the fitness of each solution.


Step 3: Generate the algorithm parameters in current iteration  , including   ,   , and   . Step 4: Generate a trial solution by   ,   , and   .The pseudocode of SGHS algorithm is shown in Algorithm 2.

6:
Else if x j k+1 < x jL then

Self-Adaptive Global Best Harmony Search Algorithm
In 2010, Pan et al. [21] introduced the SGHS algorithm for continuous optimization problems.The main difference between the SGHS and traditional HS method is that HMCR and PAR are dynamically adjusted using a normal distribution and BW is altered for each iteration.
In each iteration k, SGHS generated the value of HMCR k by the mean HMCR (HMCR m ) and its standard deviation.Similarly, the value of PAR k was calculated by the mean PAR (PAR m ) and its standard deviation.In their paper, the HMCR m is in the range of [0.9, 1.0] and the standard deviation of HMCR is 0.01; the PAR m is in the range of [0.0, 1.0] and the standard deviation of PAR is 0.05.
Furthermore, when the generated harmony was better than the worst harmony in HM, SGHS recorded the HMCR k and PAR k .After a specified learning period (LP), SGHS recalculated the HMCR m by averaging all the recorded HMCR k values during the learning period.In the same way, PAR m was recalculated by averaging all the recorded PAR k values.In subsequent iterations, SGHS generated new HMCR k and PAR k values by the new HMCR m , PAR m , and the given standard deviation.
In addition, BW k was decreased in the first half iterations in Equation ( 5), and then BW k was a fixed value (BW min ) in the second half iterations.
The whole search process of SGHS algorithm can be described as the following steps.

•
Step 1: Determine the problem and initial algorithm parameters, including m, HMCR m , PAR m , BW max , BW min , LP, current iteration k = 1, and NI.

•
Step 2: Generate the initial solutions (harmony memory) randomly and calculate the fitness of each solution.

•
Step 3: Generate the algorithm parameters in current iteration k, including HMCR k , PAR k , and BW k .

•
Step 4: Generate a trial solution by HMCR k , PAR k , and BW k .The pseudocode of SGHS algorithm is shown in Algorithm 2.

Algorithm 2
The Pseudocode of SGHS [21] 1: For j = 1 to D do 2: If r 1 ≤ HMCR k then 3: Here, x k best,j represents the jth component of the best solution in current iteration k.

•
Step 5: If the trial solution is better than the worst solution in the HM, replace the worst solution by the trial solution and record the values of HMCR and PAR in current iteration k.

•
Step 6: Recalculate the HMCR m and PAR m .

•
Step 7: If the maximum number of iterations NI is satisfied, return the best solution in the HM; otherwise, the current iteration k = k + 1 and go back to Step 3.

Novel Global Harmony Search Algorithm
Zou et al. proposed the NGHS algorithm in 2010 for task assignment problems [18], continuous optimization problems [19], and unconstrained problems [20].The NGHS algorithm is an improved algorithm that combines HS, PSO [26][27][28][29], and GA [30][31][32].A prominent characteristic of PSO is that individual particles attempt to imitate the social experience.This means that the particles are affected by other better particles in the PSO algorithm.A prominent characteristic of GA is that it is possible for the trial solution to escape from the local optimum by mutation.In other words, NGHS tries to generate a new trial solution by moving the worst solution toward the best solution or by mutation.In addition, HMCR, PAR, and the BW are excluded from NGHS, while the genetic mutation probability (p m ) is included.Moreover, NGHS replaces the worst solution in HM with a new solution, even if the new solution is worse than the worst solution.The above three characteristics are the key differences between HS and NGHS algorithms.The whole search process of NGHS algorithm can be described as the following steps.

•
Step 1: Determine the problem and initial algorithm parameters, including m, p m , current iteration k = 1, and NI.

•
Step 2: Generate the initial solutions (harmony memory) randomly and calculate the fitness of each solution.

•
Step 3: Generate a trial solution by p m .The pseudocode of NGHS algorithm is shown in Algorithm 3.

Algorithm 3
The Pseudocode of NGHS [18][19][20] 1: For j = 1 to D do 2: x R = 2 × x k best,j − x k worst,j 3: If x R > x jU then 4: x R = x jU 5: Else if x R < x jL then 6: x R = x jL 7: End 8: If r 2 ≤ p m then 10: x k+1 j = x jL + r 3 × x jU − x jL 11: End 12: End Here, x R represent the trust region.x k worst,j represents the jth component of the worst solution in current iteration k.

•
Step 4: Replace the worst solution by the trial solution, even if the trial solution is worse than the worst solution.

•
Step 5: If the maximum number of iterations NI is satisfied, return the best solution in the HM; otherwise, the current iteration k = k + 1 and go back to Step 3.

Dynamic Adjusting Novel Global Harmony Search
Chiu et al. [17] firstly proposed the DANGHS algorithm for continuous optimization problems.As mentioned above, the main difference between DANGHS and NGHS is that the parameter, mutation probability (p m ), is dynamically adjusted in each iteration by the adjustment strategy.However, Chiu et al. pointed out that the mutation probability can be adjusted using different strategies.Hence, there are 16 different strategies investigated in their paper.All 16 strategies are shown in Table 1, and Figures 6-8 are used to illustrate them.In Table 1, p k m is the mutation probability in the current iteration k, p m_min is the minimum genetic mutation probability, p m_max is the maximum genetic mutation probability, mr is the modification rate, and cc is the coefficient of cycle.

Data Setting
The investigation samples were provided by the Information Department of the Taiwan Ministry of Justice.The samples are criminal tracing records established over three years, from July 16, 2007 to July 15, 2010.The data is solely used for academic research on predicting recidivism.In order to ensure personal privacy, the samples were preprocessed (deidentification).
The total number of samples collected for this paper was 9498.Of the samples, 8569 were male and 929 were female.For the purposes of this paper, the definition of recidivism was "having a record of prosecution."Of the samples collected, 5408 (56.94%) were recidivists and 4090 (43.06%) were non-recidivists.The input and output variables that were used are shown in Table 2.
Of the original 9498 samples, some were found to have the same combination of input variables, but had different output variables, as shown in Figure 9a.Such samples make it impossible to accurately train the ANN.Therefore, in this paper, we used a statistical method to recalculate the output variable as the recidivism rate, as shown in Figure 9b.After recalculation, the total number of samples was 6825.A recidivism rate above or equal to 0.5 was defined as high, and a rate below 0.5 was defined as low, as shown in the last column of Table 2.
Lastly, proper data representation plays an important role in the design of a successful ANN [33].Therefore, the output variable was categorized according to two binary numbers which represented the rate of recidivism where 10 represented a low rate, 01 a high rate, and all data in between 0.1 and 0.9 were standardized.Hence, there are 12 and 2 neurons in the input and output layers, respectively.

Computing Environment Settings
We used Microsoft Visual Studio 2010 C# (64-bit) as the compiler for writing the program to find the solution.The solution-finding equipment comprised an Intel Core (TM) i7-4720HQ (2.6 GHz) CPU, 8 GB of memory, and Windows 10 home edition (64-bit) OS.

Experimental Structure and Results of BPN
In the BPN experiments, 65% of the samples were used for training, 25% for validation, and 10% for testing [34].However, the performance of the forecasting system could be affected by the quality of the training samples.To prevent this, previously conducted studies have used K-fold cross validation.In this paper, we also used K-fold cross validation (K = 10) for repeated experiments to verify the robustness of the experimental results under different training samples.In each K-fold experiment, 30 independent experiments were carried out.The total number of training iterations (epochs) is 10,000.The number of hidden layers used is one, and the number of hidden neurons (NHN) is set at five in Equation ( 6) [33].In Equation ( 6),  represents the number of input neurons, and  represents the number of output neurons.Therefore, the total number of weights and biases are 77, and the example-to-weight ratio (EWR) is 88.64, as calculated in Equation ( 7).Dowla and Rogers [35] and Haykin [36] found that the EWR needs to be larger than 10.Hence, in this paper, the ANN is of a reasonable and acceptable structure.

Experimental Structure and Results of BPN
In the BPN experiments, 65% of the samples were used for training, 25% for validation, and 10% for testing [34].However, the performance of the forecasting system could be affected by the quality of the training samples.To prevent this, previously conducted studies have used K-fold cross validation.In this paper, we also used K-fold cross validation (K = 10) for repeated experiments to verify the robustness of the experimental results under different training samples.In each K-fold experiment, 30 independent experiments were carried out.The total number of training iterations (epochs) is 10,000.The number of hidden layers used is one, and the number of hidden neurons (NHN) is set at five in Equation ( 6) [33].In Equation (6), N in represents the number of input neurons, and N out represents the number of output neurons.Therefore, the total number of weights and biases are 77, and the example-to-weight ratio (EWR) is 88.64, as calculated in Equation ( 7).Dowla and Rogers [35] and Haykin [36] found that the EWR needs to be larger than 10.Hence, in this paper, the ANN is of a reasonable and acceptable structure.Figure 10 shows the MLP structure used in this paper.For the purposes of this study, the investigation is a classification problem, in which the forecasting output is either of a high (10), or low (01), recidivism rate.Therefore, the prediction error used is the error rate, as given in Equation ( 2).
Lastly, we used the trial and error method to decide the best learning rate (η) and momentum For the purposes of this study, the investigation is a classification problem, in which the forecasting output is either of a high (10), or low (01), recidivism rate.Therefore, the prediction error used is the error rate, as given in Equation (2).
Lastly, we used the trial and error method to decide the best learning rate (η) and momentum (α) of BPN.We define three kinds of learning rate, which are 0.1, 0.2, and 0.3.However, Zupan and Gasteiger suggest the learning rate and the momentum are equal to 1 [37].Therefore, the three kinds of momentum are 0.9, 0.8, and 0.7.The trial and error experimental results are shown in Table 3.In Table 3, LR represents the learning rate, MT represents the momentum, Std represents the standard deviation, and each value represents the mean error rate of 30 independent experiments.For example, 3.37 × 10 −1 represents the mean error rate of 30 independent experiments where LR is set to 0.1, MT is set to 0.9, and K is set to 1. Based on the mean (3.25 × 10 −1 , 3.53 × 10 −1 , and 3.52 × 10 −1 ) and the p-value, we can observe that the error rate (where the learning rate is set to 0.1 and the momentum is set to 0.9) is significantly smaller than the other two parameter combinations in the training, validation, and testing datasets.

Experimental Results of the Metaheuristic-ANN Forecasting System
We combined the DANGHS algorithm with ANN (DANGHS-ANN) to solve the recidivism rate prediction problem.In order to verify the performance of the DANGHS-ANN forecasting system, we compared the extensive experimental results of DANGHS-ANN with five other systems, including various HS-ANNs and one BPN.We referred to previous references [17,[20][21][22] and used the trial and error method to decide the parameters of different HS algorithms.The parameters of the compared HS algorithms are shown in Table 4.In each algorithm, in each K-fold experiment, thirty independent experiments (n) were carried out with 10,000 iterations.There was no overtraining detected in the metaheuristic-ANN forecasting system.Therefore, we combined the training and validation datasets into the learning dataset.This means that 90% of the samples were used for learning and 10% for testing.The experimental results from the learning and testing datasets, obtained using the 16 different adjustment strategies in the DANGHS-ANN forecasting system, are shown in Tables 5  and 6.The experimental results from the learning and testing datasets, as obtained using the six different forecasting systems, are shown in Tables 7 and 8. Figure 11 presents a typical solution history graph of the six different forecasting systems along with the various iterations.Tables 9-14 present the comparison between actual and predictive recidivism for the six forecasting systems.In Table 5, several experimental results are given.Firstly, in the learning dataset where K = 1, the error rate (3.1190 × 10 −1 ) with threshold linear posterior decreasing strategy (Threshold_4) is lower than those of the other strategies.Secondly, the mean of error rate (3.1132 × 10 −1 ) with Threshold_4 is the lowest of all strategies.Thirdly, the standard deviation (1.2896 × 10 −6 ) where Threshold_4 is lower than those of the other strategies.Lastly, according to the p-value, the error rate for Threshold_4 is significantly lower than those of the other strategies, except for the straight linear decreasing strategy (Straight_2).
In Table 6, several experimental results are given.Firstly, in the testing dataset where K = 1, the error rate (3.3934 × 10 −1 ) with the threshold linear posterior decreasing strategy (Threshold_4) is lower than those of the other strategies.Secondly, the mean of error rate (3.3878 × 10 −1 ) with Threshold_4 is the lowest of all the strategies.Thirdly, the standard deviation (3.2137 × 10 −5 ) with the threshold linear prior increasing strategy (Threshold_1) is lower than those of the other strategies.However, according to the p-value, the error rate with Threshold_4 is significantly lower than those of the other strategies, except for strategy Straight_2.Based on the experimental results in Tables 5  and 6, the best strategy for the DANGHS-ANN forecasting system, and specifically for the recidivism prediction problem, is the threshold linear posterior decreasing strategy.
In Table 7, several experimental results are given.Firstly, in the learning dataset, the DANGHS-ANN error rates are the lowest of all the other forecasting systems, for all K-fold experiments.Additionally, the DANGHS-ANN mean of error rate (3.1132 × 10 −1 ) and standard deviation (1.1356 × 10 −3 ) are the lowest of all the forecasting systems.This means that the forecasting ability of the DANGHS-ANN system is relatively robust.Also, according to the p-value, the DANGHS-ANN error rate is significantly lower than the other systems in the learning dataset.The mean of error rate of the IHS-ANN system (3.1315× 10 −1 ) is lower than that of the HS-ANN system (3.1357× 10 −1 ), and the mean of error rate of the DANGHS-ANN system (3.1132× 10 −1 ) is lower than that of the NGHS-ANN system (3.1545× 10 −1 ).
In Table 8, several experimental results are given.Firstly, in the testing dataset, most of the DANGHS-ANN error rates are smaller than those of the other forecasting systems, except where K = 2 and K = 7.Where K = 2 and K = 7, the IHS-ANN system has the lowest error rate (3.3982 × 10 −1 and 3.4056 × 10 −1 ).Secondly, the DANGHS-ANN mean of error rate (3.3878 × 10 −1 ) is lower than those of the other forecasting systems.Thirdly, the SGHS-ANN standard deviation (5.7819 × 10 −3 ) is the lowest of all the forecasting systems.This means that the forecasting ability of the SGHS-ANN system is relatively robust.However, according to the p-value, the error rate of the DANGHS-ANN system is significantly lower than the other systems in the testing dataset.Lastly, the IHS-ANN mean of error rate (3.4039 × 10 −1 ) is lower than that of the HS-ANN system (3.4126× 10 −1 ), and the DANGHS-ANN mean of error rate (3.3878 × 10 −1 ) is lower than that of the NGHS-ANN system (3.4421× 10 −1 ).
In Figure 11, four experimental results are given.Firstly, the BPN system graph displays a horizontal line, this means that it would easily fall within the local optimum, and unlikely escape it.According to the SGHS-ANN and NGHS-ANN graphs, they also easily fall within the local optimum in the latter iterations.However, the HS-ANN, IHS-ANN, and DANGHS-ANN graphs show that these three systems continuously decrease the error rate as the iterations progress.Moreover, according to the NGHS-ANN and DANGHS-ANN graphs, we find that the DANGHS-ANN error rate is lower than that of the NGHS-ANN system.In other words, the DANGHS-ANN system has a better forecasting ability than the NGHS-ANN system.
Finally, we analyze and discuss the comparison between actual and predictive recidivism.In Table 9, the number of predictions of low recidivism by the BPN system is 2652, of which the actual number of high recidivisms is 886, and the rate of high recidivism is 33.41%.In Table 10, the number of predictions of low recidivism by the HS-ANN system is 2596, of which the actual number of high recidivisms is 841, and the rate of high recidivism is 32.40%.In Table 11, the number of predictions of low recidivism by the IHS-ANN system is 2397, of which the actual number of high recidivisms is 767, and the rate of high recidivism is 32.00%.In Table 12, the number of predictions of low recidivism by the SGHS-ANN system is 2807, of which the actual number of high recidivisms is 931, and the rate of high recidivism is 33.17%.In Table 13, the number of predictions of low recidivism by the NGHS-ANN system is 2401, of which the actual number of high recidivisms is 792, and the rate of high recidivism is 32.99%.Lastly, in Table 14, the number of predictions of low recidivism by the DANGHS-ANN system is 2508, of which the actual number of high recidivisms is 798, and the rate of high recidivism is 31.82%.
Particular attention is paid to the decisions made for managing the prisoners.For example, the management decision for prisoners predicted to fall within the high recidivism category is to "continue serving their sentences."Therefore, this kind of decision will positively affect the safety of the public.Conversely, prisoners that are predicted to fall within the low recidivism category will re-enter society by commutation or parole.Their behavior thereafter could directly affect the safety of the public.The purpose of this paper is to develop a recidivism rate forecasting system to reduce the associated reoffences and improve public acceptance of commutation and parole policies.According to the above argument, the type I error of the traditional presumption of innocence principle is not applicable to this paper.Therefore, in Tables 9-14, in the BPN forecasting system the prisoners which are predicted to fall into the low recidivism category had the highest actual recidivism rate (33.41%) amongst all forecasting systems.However, with the DANGHS-ANN forecasting system, the prisoners which were predicted to fall into the low recidivism rate category had the lowest actual recidivism rate (31.82%).
According to the above experimental results, the DANGHS-ANN system was the most accurate in comparison with BPN and the four other HS-related ANN forecasting systems used for the prediction of recidivism rates of commuted prisoners.
Mathematics 2019, 7, 1187 17 of 25 In Table 6, several experimental results are given.Firstly, in the testing dataset where K = 1, the error rate (3.3934 × 10 −1 ) with the threshold linear posterior decreasing strategy (Threshold_4) is lower than those of the other strategies.Secondly, the mean of error rate (3.3878 × 10 −1 ) with Threshold_4 is the lowest of all the strategies.Thirdly, the standard deviation (3.2137 × 10 −5 ) with the threshold linear prior increasing strategy (Threshold_1) is lower than those of the other strategies.However, according to the p-value, the error rate with Threshold_4 is significantly lower than those of the other strategies, except for strategy Straight_2.Based on the experimental results in Tables 5 and 6, the best strategy for the DANGHS-ANN forecasting system, and specifically for the recidivism prediction problem, is the threshold linear posterior decreasing strategy.
In Table 7, several experimental results are given.Firstly, in the learning dataset, the DANGHS-ANN error rates are the lowest of all the other forecasting systems, for all K-fold experiments.Additionally, the DANGHS-ANN mean of error rate (3.1132 × 10 −1 ) and standard deviation (1.1356 × 10 −3 ) are the lowest of all the forecasting systems.This means that the forecasting ability of the DANGHS-ANN system is relatively robust.Also, according to the p-value, the DANGHS-ANN error rate is significantly lower than the other systems in the learning dataset.The mean of error rate of the IHS-ANN system (3.1315× 10 −1 ) is lower than that of the HS-ANN system (3.1357× 10 −1 ), and the mean of error rate of the DANGHS-ANN system (3.1132× 10 −1 ) is lower than that of the NGHS-ANN system (3.1545× 10 −1 ).
In Table 8, several experimental results are given.Firstly, in the testing dataset, most of the DANGHS-ANN error rates are smaller than those of the other forecasting systems, except where K = 2 and K = 7.Where K = 2 and K = 7, the IHS-ANN system has the lowest error rate (3.3982 × 10 −1 and 3.4056 × 10 −1 ).Secondly, the DANGHS-ANN mean of error rate (3.3878 × 10 −1 ) is lower than those of the other forecasting systems.Thirdly, the SGHS-ANN standard deviation (5.7819 × 10 −3 ) is the lowest of all the forecasting systems.This means that the forecasting ability of the SGHS-ANN system is relatively robust.However, according to the p-value, the error rate of the DANGHS-ANN system is significantly lower than the other systems in the testing dataset.Lastly, the IHS-ANN mean of error rate (3.4039 × 10 −1 ) is lower than that of the HS-ANN system (3.4126× 10 −1 ), and the DANGHS-ANN mean of error rate (3.3878 × 10 −1 ) is lower than that of the NGHS-ANN system (3.4421× 10 −1 ).
In Figure 11, four experimental results are given.Firstly, the BPN system graph displays a horizontal line, this means that it would easily fall within the local optimum, and unlikely escape it.According to the SGHS-ANN and NGHS-ANN graphs, they also easily fall within the local optimum in the latter iterations.However, the HS-ANN, IHS-ANN, and DANGHS-ANN graphs show that these three systems continuously decrease the error rate as the iterations progress.Moreover, according to the NGHS-ANN and DANGHS-ANN graphs, we find that the DANGHS-ANN error rate is lower than that of the NGHS-ANN system.In other words, the DANGHS-ANN system has a better forecasting ability than the NGHS-ANN system.

Conclusions and Future Research
We presented a combined DANGHS-ANN forecasting system.Extensive experiments and comparisons were carried out to solve the problem of accurately predicting recidivism rates for commuted prisoners.The experimental results provide several findings that are worth noting.
Firstly, using the threshold linear posterior decreasing strategy with the DANGHS-ANN forecasting system yielded the best results for recidivism prediction.Additionally, according to

Figure 4 .
Figure 4. Comparison between music improvisation and engineering optimization.

Figure 4 .
Figure 4. Comparison between music improvisation and engineering optimization.

Figure 5 .
Figure 5. (a) Linear variation of pitch adjusting rate (PAR) with the iteration number; (b) nonlinear variation of bandwidth (BW) with the iteration number.

Figure 5 .
Figure 5. (a) Linear variation of pitch adjusting rate (PAR) with the iteration number; (b) nonlinear variation of bandwidth (BW) with the iteration number.

Figure 6 .
Figure 6.Straight linear and threshold linear strategies.Figure 6. Straight linear and threshold linear strategies.

Figure 6 .
Figure 6.Straight linear and threshold linear strategies.Figure 6. Straight linear and threshold linear strategies.

Figure 6 .
Figure 6.Straight linear and threshold linear strategies.

Figure 6 .
Figure 6.Straight linear and threshold linear strategies.

Figure 9 .
Figure 9. Example of the original data and the statistical data.(a) Original data; (b) statistical data.

7 )Figure 9 .
Figure 9. Example of the original data and the statistical data.(a) Original data; (b) statistical data.

Table 3 .
Trial and error experimental results of backpropagation network (BPN).

Table 4 .
Parameters of compared harmony search (HS) algorithms.m: the harmony memory size; 2 HMCR: the harmony memory considering rate; 3 PAR: the pitch adjusting rate; 4 BW: the bandwidth; 5 LP: the learning period; 6 p m : the genetic mutation probability; 7 NI: the maximum number of iterations; 8 n: the total number of independent experiments. 1

Table 9 .
Comparison between the actual and predictive recidivism in BPN system.

Table 10 .
Comparison between the actual and predictive recidivism in HS-ANN system.

Table 11 .
Comparison between the actual and predictive recidivism in IHS-ANN system.

Table 12 .
Comparison between the actual and predictive recidivism in SGHS-ANN system.

Table 13 .
Comparison between the actual and predictive recidivism in NGHS-ANN system.

Table 14 .
Comparison between the actual and predictive recidivism in DANGHS-ANN system.