# Hybrid Approach with Improved Genetic Algorithm and Simulated Annealing for Thesis Sampling

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background

#### 2.1. Thesis Inspection via Sampling

#### 2.2. Sampling Rules

- (1)
- The sampling rate of masters’ theses from each master degree conferring institution is about 5%.
- (2)
- About 10% of foreign masters’ theses need to be extracted for sampling inspection.
- (3)
- The sampling rate of theses of masters whose tutor guided ten or more postgraduates in the same year is about 10%.

#### 2.3. Genetic Algorithm

- It has no bias for specific problem areas.
- It has the capacity to perform a fast and random search.
- It uses a simple search process with the evaluation function.
- It features randomness as the probabilistic mechanism used for the iteration.
- It is extensible and easy to integrate with other algorithms.

#### 2.4. Simulated Annealing

- Step 1:
- Initialization: initial temperature T (sufficiently large), original best solution x obtained after GA applied; solution obtained from the previous iteration ${x}_{new}$.
- Step 2:
- Calculate the increment $\Delta f=f\left({x}_{new}\right)-f\left(x\right)$, where $f\left(x\right)$ is the optimization target. Here, it is the function that minimizes the mean absolute error $\u03f5$ calculated according to the sampling rules after the sampling result is obtained by decoding the solution. We use the result of one minus $\u03f5$ to represent the fitness function value; see Section 3.4.
- Step 3:
- If $\Delta f>0$, ${x}_{new}$ is accepted as the new current solution, otherwise ${x}_{new}$ is accepted as the new current solution with the probability $\mathrm{exp}(\frac{\Delta f}{kT})$.
- Step 4:
- If the termination condition is satisfied, the current solution is output as the optimal solution to terminate the program. Otherwise, wait for a new ${x}_{new}$ to go to Step 2.

## 3. The Proposed Algorithm

#### 3.1. The Main Idea

- Step 1:
- Initialize the variables of GA and SA; determine the initial temperature of annealing ${T}_{0}$, temperature reduction parameter k, size of population s, crossover probability $pcross$ and mutation probability $pmutation$ by the result obtained from the experiment in Section 4.2;
- Step 2:
- Randomly generate initial population ${p}_{0}$, and encode each chromosome; the size of ${p}_{0}$ is 10;
- Step 3:
- Calculate the mean absolute error $\u03f5$ of the sampling scheme corresponding to each chromosome according to the sampling rules, then the fitness of each chromosome is denoted as f; let $f=1-\u03f5$. Determine the fitness function of each chromosome in the population ($f=1-\u03f5$); see Section 3.4;
- Step 4:
- Roulette wheel method is used to select s chromosomes from ${p}_{0}$ to constitute new population p, the probability of choosing an individual depends directly on its fitness value f;
- Step 5:
- Randomly choose two chromosomes from p, and apply the crossover operator to them;
- Step 6:
- Apply the mutation operator described in Section 3.3 to the new population;
- Step 7:
- Let the current population be the new population;
- Step 8:
- If the mean absolute error $\u03f5$ is less than 0.2%, then the convergence criterion is satisfied; stop. Otherwise, go to Step 4.

#### 3.2. Coding and Initialization Methods

- The floating point coding does not require a decoding process.
- It represents a larger range of numbers in the genetic algorithm, which is convenient in larger spaces.
- It also simplifies the traditional genetic algorithm and improves the computational efficiency.
- Furthermore, the floating point encoding prevents the precision errors caused by the hexadecimal conversion.
- It is easy to mix it with classical optimization methods.

#### 3.3. Selection, Crossover and Mutation

Algorithm 1: Mutation strategy. |

1. Input: Thesis data, mutation probability, chromosome length, number of chromosomes and the fitness |

2. value of each chromosome. |

3. Output: Mutated chromosomes |

4. Initialize parameter: $\u03f5$ |

5. for i = 1 to the size of the population |

5. do |

6. for k = 1 to the length of the chromosome |

7. do |

8. Denote the extracted thesis set (chromosome) as S, then calculate the error of the unique subset |

9. of S determined by the samplings rules according to the value of the genes. |

10. end for |

11. Assume that the subset a of set S has the maximum error, and subset b of set S has the minimum error. |

12. On the current chromosome, a mutation operation is performed by randomly replacing a gene |

13. corresponding to an element of set a with a gene that satisfies set b. |

14. Check whether the chromosome after the mutation meets the requirement that the same two genes |

15. do not exist on the chromosome. |

16. if the mutated chromosome meets the requirements |

17. The mutated chromosomes are accepted according to Equation (3); |

18. else |

19. Undo the mutation operation. |

20. end if |

21. end for |

#### 3.4. The Definition of Fitness Function

## 4. Experimental Setup

#### 4.1. Experimental Setup

#### 4.2. Parameters

#### 4.3. Redundancy Reduction in Data

## 5. Experimental Result Analysis

## 6. Conclusions and Future Work

- (1)
- With the same probability, it can be said to be unfair, because, for good individuals, we should reduce the probability of cross mutation so that it can be preserved as much as possible; and for inferior individuals, we should increase the probability of crossover and mutation so that the inferior condition can be changed as much as possible.
- (2)
- The same probability cannot meet the needs of the evolution process of the population. For example, in the early iteration, the population needs a higher crossover and mutation probability, which has reached the goal of quickly finding the optimal solution. In the later stage of convergence, the population needs to be smaller. The crossover and mutation probability can help the population converge quickly after finding the optimal solution.

## Author Contributions

## Funding

## Conflicts of Interest

## References

- State Council Degrees Committee, M.o.E. Notice of the Ministry of Education on Printing and Distributing the Measures for the Examination of Doctoral Dissertations of Master Degree; Bulletin of the Ministry of Education of the People’s Republic of China: Beijing, China, 2014.
- Xia, J.; Yang, X.; Shu, J. The concept of the construction of education informationization platform for postgraduate students from the provincial level and postgraduate program in Shanghai. Degree Postgrad. Educ.
**2014**, 11, 33–39. [Google Scholar] - Wu, C.W.; Aslam, M.; Jun, C.H. Variables sampling inspection scheme for resubmitted lots based on the process capability index Cpk. Eur. J. Oper. Res.
**2012**, 217, 560–566. [Google Scholar] [CrossRef] - Wang, Q. New Progress in Shanghai Graduate Education: Commemorating the 30th Anniversary of Graduate Education; Wang, Q., Ed.; Shanghai People’s Publishing House: Shanghai, China, 2009. [Google Scholar]
- Gao, L. Research on the Sampling System of Dissertation in China. Master’s Thesis, Xiangtan University, Xiangtan, China, 2011. [Google Scholar]
- Zhang, L. Research on Jiangsu province graduate dissertation evaluation mechanism. Shanghai Educ. Eval. Res.
**2014**, 2, 62–66. [Google Scholar] - Braun, H. On solving travelling salesman problems by genetic algorithms. In Proceedings of the International Conference on Parallel Problem Solving from Nature, Dortmund, Germany, 1–3 October 1990; pp. 129–133. [Google Scholar]
- Deng, Y.; Liu, Y.; Zhou, D. An improved genetic algorithm with initial population strategy for symmetric TSP. Math. Probl. Eng.
**2015**, 2015. [Google Scholar] [CrossRef] - McCall, J. Genetic algorithms for modelling and optimisation. J. Comput. Appl. Math.
**2005**, 184, 205–222. [Google Scholar] [CrossRef] - Goldberg, D.E. Genetic Algorithms in Search, Optimization and Machine Learning; Addison-Wesley Co.: Boston, MA, USA, 1989; pp. 2104–2116. [Google Scholar]
- Van Laarhoven, P.J.; Aarts, E.H. Simulated annealing. In Simulated Annealing: Theory and Applications; Springer: Heidelberg/Berlin, Germany, 1987; pp. 7–15. [Google Scholar]
- Yu, H.; Fang, H.; Yao, P.; Yuan, Y. A combined genetic algorithm/simulated annealing algorithm for large scale system energy integration. Comput. Chem. Eng.
**2000**, 24, 2023–2035. [Google Scholar] [CrossRef] - Jha, S.; Menon, V. BbmTTP: Beat-based parallel simulated annealing algorithm on GPGPUs for the mirrored traveling tournament problem. In Proceedings of the High Performance Computing Symposium, Society for Computer Simulation International, Tampa, FL, USA, 13–16 April 2014; p. 3. [Google Scholar]
- Kabova, E.A.; Cole, J.C.; Korb, O.; López-Ibáñez, M.; Williams, A.C.; Shankland, K. Improved performance of crystal structure solution from powder diffraction data through parameter tuning of a simulated annealing algorithm. J. Appl. Crystallogr.
**2017**, 50, 1411–1420. [Google Scholar] [CrossRef][Green Version] - Assad, A.; Deep, K. A Hybrid Harmony search and Simulated Annealing algorithm for continuous optimization. Inf. Sci.
**2018**, 450, 246–266. [Google Scholar] [CrossRef] - Janikow, C.Z.; Michalewicz, Z. An Experimental Comparison of Binary and Floating Point Representations in Genetic Algorithms. In Proceedings of the Fourth International Conference on Genetic Algorithms, San Diego, CA, USA, 1991; pp. 31–36. [Google Scholar]
- Chen, L. Real coded genetic algorithm optimization of long term reservoir operation 1. J. Am. Water Resour. Assoc.
**2003**, 39, 1157–1165. [Google Scholar] [CrossRef] - Goldberg, D.E.; Deb, K. A comparative analysis of selection schemes used in genetic algorithms. In Foundations of Genetic Algorithms; Elsevier: New York, NY, USA, 1991; Volume 1, pp. 69–93. [Google Scholar]
- Jebari, K.; Madiafi, M. Selection methods for genetic algorithms. Int. J. Emerg. Sci.
**2013**, 3, 333–344. [Google Scholar] - Chen, Y.; Yang, S.; Nie, Z. The application of a modified differential evolution strategy to some array pattern synthesis problems. IEEE Trans. Antennas Propag.
**2008**, 56, 1919–1927. [Google Scholar] [CrossRef] - Umbarkar, A.; Sheth, P. Crossover operators in genetic algorithms: A review. ICTACT J. Soft Comput.
**2015**, 6. [Google Scholar] [CrossRef] - Abdoun, O.; Abouchabaka, J.; Tajani, C. Analyzing the performance of mutation operators to solve the travelling salesman problem. arXiv, 2012; arXiv:1203.3099. [Google Scholar]
- Sarkar, S.; Sinha, P.; Changder, N.; Dutta, A. Coalition Structure Formation using Parallel Dynamic Programming. In Proceedings of the 10th International Conference on Agents and Artificial Intelligence, Madeira, Portugal, 16–18 January 2018. [Google Scholar]

Size of initial population is 10 |

Size of population $sizepop$ = 4 |

Probability of crossover $pcross$ = 0.97 |

Probability of mutation $pmutation$ = 0.3 |

Mode of selection $fselect$ = ‘roulette’ |

Method of coding $fcode$ = ‘float’ |

Mode of crossover $fcross$ = ‘float’ |

Temperature of annealing ${T}_{0}$ = 100 |

Temperature reduction parameter k = 0.98 |

Index | P${}_{\mathit{e}}$ | GA | H${}_{\mathit{a}}$ | H${}_{\mathit{a},\mathit{m}}$ |
---|---|---|---|---|

1 | 10% | 10.044% | 10.044% | 10.156% |

2 | 10% | 8.917% | 9.251% | 10.022% |

3 | 5% | 5.117% | 5.015% | 5.015% |

4 | 5% | 4.921% | 5.042% | 4.991% |

5 | 5% | 4.960% | 5.035% | 5.017% |

6 | 5% | 5.078% | 5.000% | 5.078% |

7 | 5% | 4.990% | 4.991% | 5.053% |

8 | 5% | 5.103% | 5.029% | 5.029% |

9 | 5% | 4.979% | 5.025% | 5.026% |

10 | 5% | 5.109% | 5.109% | 5.109% |

11 | 5% | 4.762% | 4.762% | 4.762% |

12 | 5% | 4.962% | 4.962% | 4.962% |

13 | 5% | 5.000% | 5.000% | 5.000% |

14 | 5% | 4.978% | 5.004% | 5.004% |

15 | 5% | 5.000% | 5.000% | 5.000% |

16 | 5% | 5.168% | 5.048% | 5.048% |

17 | 5% | 4.991% | 4.991% | 5.035% |

18 | 5% | 4.598% | 4.885% | 4.885% |

19 | 5% | 4.938% | 5.010% | 5.085% |

20 | 5% | 4.607% | 4.878% | 4.878% |

21 | 5% | 5.556% | 4.938% | 4.938% |

22 | 5% | 4.615% | 5.384% | 5.384% |

23 | 5% | 4.947% | 5.015% | 4.981% |

24 | 5% | 4.167% | 5.000% | 5.000% |

25 | 5% | 0.000% | 2.272% | 4.545% |

26 | 5% | 4.494% | 4.494% | 5.617% |

27 | 5% | 0.000% | 4.764% | 5.000% |

28 | 5% | 0.000% | 5.274% | 5.274% |

29 | 5% | 4.286% | 4.286% | 4.286% |

30 | 5% | 0.000% | 0.000% | 5.166% |

31 | 5% | 0.000% | 5.882% | 5.142% |

32 | 5% | 4.938% | 4.761% | 5.000% |

33 | 5% | 5.714% | 5.263% | 3.947% |

34 | 5% | 0.000% | 0.000% | 5.769% |

35 | 5% | 0.000% | 0.000% | 5.111% |

36 | 5% | 4.968% | 5.000% | 5.000% |

37 | 5% | 5.882% | 5.882% | 5.882% |

GA | H${}_{\mathit{a}}$ | H${}_{\mathit{a},\mathit{m}}$ | |
---|---|---|---|

Iterations | 31,568 | 32,594 | 287 |

Time (min) | 324.37 | 341.64 | 2.96 |

Mean absolute error (%) | 1.155 | 0.644 | 0.164 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Johnson, S.; Han, J.; Liu, Y.; Chen, L.; Wu, X. Hybrid Approach with Improved Genetic Algorithm and Simulated Annealing for Thesis Sampling. *Future Internet* **2018**, *10*, 71.
https://doi.org/10.3390/fi10080071

**AMA Style**

Johnson S, Han J, Liu Y, Chen L, Wu X. Hybrid Approach with Improved Genetic Algorithm and Simulated Annealing for Thesis Sampling. *Future Internet*. 2018; 10(8):71.
https://doi.org/10.3390/fi10080071

**Chicago/Turabian Style**

Johnson, Shardrom, Jinwu Han, Yuanchen Liu, Li Chen, and Xinlin Wu. 2018. "Hybrid Approach with Improved Genetic Algorithm and Simulated Annealing for Thesis Sampling" *Future Internet* 10, no. 8: 71.
https://doi.org/10.3390/fi10080071