The tourist site of interest classification that was obtained from the Naïve Bayes machine learning mining model conforms to tourists’ interests and needs. Under the condition, we design an optimal tourist site mining algorithm based on the membership degree searching propagating tree to mine tourist sites with optimal geographic distribution. The mined optimal tourist sites are designed as nodes of tour routes to develop a smart tour route planning algorithm combined with factors, such as tourism GIS services, traffic information services, and tourist site information services, which influence tourists’ travel experiences. The algorithm can output optimal tour routes, which conform to actual conditions, meet tourists’ interests and motive benefits, and decrease travel expenditures. Meanwhile, sub-optimal tour routes are also provided for tourists.

#### 3.1. Optimal Tourist Site Mining Algorithm based on Membership Degree Searching Propagating Tree

The tourist site of interest classification distribution matrix $A$ formed by the Naïve Bayes machine learning mining process is the critical model for smart machine to learn tourists’ needs and interests. In the matrix $A$, arbitrary row $\forall {A}_{w}$ represents one feasible sort of tourist site classification and quantity. Each row’s classification and quantity can meet the needs of tourists, but they differ in specific tourist sites, which will output different tour routes. According to the definition, as to one row ${A}_{w}$ of matrix $A$, the feasible sort of classification and quantity is ${\prod}_{j=1}^{m}{C}_{{s}_{j}}^{{a}_{wj}}},s.t.{a}_{wj}\ne 0$, but not all the sorts are the optimal ones. The tourists start from temporary accommodation in the city, visit all selected tourist sites, and finally return to temporary accommodation, and the whole process forms an integrated tour route. The selected tourist sites should meet the needs and interests while costing the minimum expenditure. Thus, within the neighbourhood range of a temporary accommodation center, the nearer to the center, the more beneficial of the tourist site will be. Thus, in $\prod}_{j=1}^{m}{C}_{{s}_{j}}^{{a}_{wj}$, there are different sorts of tourist classifications and quantities and only one sort is optimal in geographic distribution. The membership degree relationship is used to set up the neighbourhood searching arc for the seed tourist site and to iterate the tourist sites to generate the propagating tree. The process of searching for subordinate seed tourist sites is in the range of one tourist site cluster or cross-cluster, that is, the tourist sites may belong to the same cluster or different clusters. The final output result of the process is a propagating tree with optimal geographic distributed tourist sites, and notes on the tree are the mined optimal tourist sites.

**Definition** **7.** Tourist site clustering center$K$. The temporary accommodation which is confirmed and checked in before the trip is set as the starting point and terminal point of the whole tour activity. The temporary accommodation is the first critical point of the planned tour route, which is called the tourist site clustering center$K$. The center$K$is determined by the temporary accommodation’s location, here defined as the longitude and latitude$(l,B)$.

The center $K$ will change with the tourist’s decision on the temporary accommodation and will directly influence the propagating tree’s formation, shape and distribution, and influence the mined optimal tourist sites.

**Definition** **8.** Seed tourist site${G}_{e}$and seed tourist site vector${G}_{w}$. Starting from tourist site clustering center$K$, searched and confirmed optimal tourist sites within neighbourhood via objective function and membership degree are called seed tourist site${G}_{e}$. Under the condition of one sort of tourist site classification and quantity, the searched seed tourist sites for each tourist site classification${c}_{j}$is${a}_{wj}$, and${\sum}_{j=1}^{m}{a}_{wj}},s.t.\forall w$,$e\in (0,{\displaystyle {\sum}_{j=1}^{m}{a}_{wj}}]\in {\mathrm{Z}}^{+}$ is the total quantity of seed tourist sites, according to the tourist site of interest classification distribution matrix $A$ and its arbitrary row vector $\forall {A}_{w}$. We store $\sum}_{j=1}^{m}{a}_{wj$ seed tourist sites in the sequence of the propagating tree’s nodes in the vector element from left to right in order, and this vector is called the seed tourist site vector ${G}_{w}$.

Under the condition of the confirmed tourist site clustering center $K$, matrix $A$ can generate $p$ seed tourist site vectors ${G}_{w}$, and $w\in (0,p]\in {\mathrm{Z}}^{+}$, according to the definition. Each ${G}_{w}$ stores the searched optimal tourist sites of row ${A}_{w}$.

**Definition** **9.** Subordinate seed tourist site${G}_{e}^{\ast}$and non-subordinate seed tourist site$\neg {G}_{e}^{\ast}$. As to one seed tourist site${G}_{e}$, the searched and confirmed tourist site which is closest to starting center$K$or seed tourist site${G}_{e}$and it will be listed to store in propagating tree nodes via subordinate function and membership degree relationship model is called subordinate seed tourist site${G}_{e}^{\ast}$. In the same searching process, other tourist sites that are not listed to store in propagating tree nodes are called non-subordinate seed tourist site$\neg {G}_{e}^{\ast}$.

**Definition** **10.** Seed tourist site searching arc. We set initial seed tourist site${G}_{e}$as the circle center, and take the neighbourhood radius confirmed by the objective function as the arc. The arc is used to search the subordinate seed tourist site${G}_{e}^{\ast}$. The combined structure of the radius and arc is called the seed tourist site searching arc. The seed tourist site searching arc is the direction and path to search the seed tourist site. In one searching process, a smart machine scans all of the tourist sites, and seed tourist site will be bound to pass the seed tourist site searching arc with a minimum objective function value.

**Definition** **11.** Optimal tourist site propagating tree$tre{e}_{w}$. The structure tree searched and confirmed by seed tourist sites and subordinate seed tourist site generation model is called the optimal tourist site propagating tree. The nodes of the optimal tourist site propagating tree are tourist sites, which are optimally geographic distributed, conform to tourists’ interests and feature attributes, and cost the least expenditure.

According to the definition, under the condition of the confirmed tourist site clustering center $K$, matrix $A$ can generate $p$ optimal tourist site propagating trees $tre{e}_{w}$, and $w\in (0,p]\in {\mathrm{Z}}^{+}$. The optimal tourist site mining algorithm that is based on the membership degree searching propagating tree is designed and developed, according to definition and the thought of optimal tourist site propagating tree modeling.

**Step 1.** Confirm the propagating tree universe of discourse

According to the definition of the tourist site classification vector $C$, as to one certain tourism city, the specific tourist site of No. ${c}_{j}$ tourist site classification is ${c}_{js}$. The tourist site quantity of ${c}_{j}$ is ${s}_{j}$, $s\in (0,{s}_{j}]\in {\mathrm{Z}}^{+}$.

We set the city tourist site set as ${C}_{s}=\{{c}_{11},{c}_{12},\dots ,{c}_{1{s}_{1}},{c}_{21},{c}_{22},\dots ,{c}_{2{s}_{2}},\dots ,{c}_{m1},{c}_{m2},\dots ,{c}_{m{s}_{m}}\}\subset {R}^{s}$, which is called the universe of discourse. ${c}_{js}={({c}_{js}^{1},{c}_{js}^{2},\dots ,{c}_{js}^{u})}^{\mathrm{T}}\in {R}^{s}$ is the feature vector of samples to be observed, and it relates to one point of universe of discourse feature space, that is, the tourist site in city geographic space. ${c}_{js}^{\alpha}$ is the feature attribute value of No. $\alpha $ dimensions of feature vectors. As to the tourist site itself, feature attribute values contain the tourist site’s longitude $l$, tourist site’s latitude $B$, and tourist site’s attraction index $\epsilon $.

**Step 2.** Divide the propagating tree universe of discourse into clusters

Starting from the tourist site clustering center

$K$, we divide the propagating tree universe of discourse into

$m$ clusters

${c}_{j}$, and each division cluster

${c}_{j}$ relates to one tourist site classification, which forms a tourist site cluster

${c}_{j}$. Thus, the clusters of propagating tree universe of discourse are

${c}_{1}$,

${c}_{2}$, …,

${c}_{m}$, and they meet the formula (7) conditions.

In the process of searching seed tourist sites starting from the tourist site clustering center $K$, the searched subordinate seed tourist site ${G}_{e}^{\ast}$ and initial seed tourist site ${G}_{e}$ may be in the same classification cluster or in the different classification clusters, and the searching process should meet the constraint conditions. Here, the seed tourist sites in the same classification cluster are noted as ${G}_{e+}^{\ast}$, while in the different classification cluster, they are noted as ${G}_{e-}^{\ast}$.

**Step 3**. Set up the objective function and subordinate function

The space searching relationship of the tourist site clustering center $K$, seed tourist site ${G}_{e}$, and subordinate seed tourist site ${G}_{e}^{\ast}$ is determined by the clustering principle of $K$, ${G}_{e}$, and ${G}_{e}^{\ast}$. The principle is the second order Minkowski distance. According to the definition of tourist site feature attributes, the Minkowski distance between the tourist site clustering center $K$ and the first mined seed tourist site ${G}_{1}$, and the Minkowski distance between the seed tourist site ${G}_{e}$ and subordinate seed tourist site ${G}_{e}^{\ast}$ are determined by their feature attributes. Other than the tourist site’s longitude $l$, latitude $B$, and tourist site attraction index $\epsilon $, there exist factors that influence the process to search the subordinate seed tourist site.

**Definition** **12.** Membership degree direct influence factor${\lambda}_{v1}$. In the process of single searching, the factors that directly influence whether one certain tourist site is the subordinate seed tourist site${G}_{e}^{\ast}$of initial seed tourist site${G}_{e}$or not are called the membership degree direct influence factors${\lambda}_{v1}$,${v}_{1}\in (0,\mathrm{max}{v}_{1}]\subset {\mathrm{Z}}^{+}$.

**Definition** **13.** Membership degree indirect influence factor${\delta}_{v2}$. In the process of single searching, the factors that indirectly influence whether one certain tourist site is the subordinate seed tourist site${G}_{e}^{\ast}$of initial seed tourist site${G}_{e}$or not are called the membership degree indirect influence factors${\delta}_{v2}$,${v}_{2}\in (0,\mathrm{max}{v}_{2}]\subset {\mathrm{Z}}^{+}$.

The membership degree direct influence factors

${\lambda}_{v1}$ include the ferry distance between the tourist site clustering center

$K$ and tourist site (km), the ferry distance between the two tourist sites (km), the quantity of subways and bus lines between the ferry interval, the taxi fee of the ferry interval and the road traffic jam index, according to the actual travel process and city tourism service. The membership degree indirect influence factors

${\delta}_{v2}$ include the quantity of traffic light between the tourist site clustering center

$K$ and the tourist site, the quantity of traffic light between two tourist sites, the average walking distance from a tourist site to the nearest subway or bus station (km), the average waiting time for a taxi (h), and the average quantity of traffic jammed roads. According to definition, factor s

${\lambda}_{v1}$ and

${\delta}_{v2}$ are represented in text format. The symbol “

$dir+$” stands for factors

${\lambda}_{v1}$, and the symbol “

$indir-$” stands for factors

${\delta}_{v2}$. The text format is defined as <Factor, Relationship, Algorithm, Attribute>, and each factor is represented, as follows.

< Direct factor 1: <${\lambda}_{1}$, ferry distance, temporary accommodation → tourist site ${S}_{1}$ (km, ${S}_{1}\in {\mathrm{R}}^{+}$), |

${\lambda}_{1}={S}_{1}^{-1}$, $dir+$>; |

Indirect factor 1: <${\delta}_{1}$, quantity of traffic light, temporary accommodation → tourist site ${N}_{1}$ |

(${N}_{1}\in {\mathrm{Z}}^{+}$), ${\delta}_{1}=-0.01{N}_{1}$, $indir-$>> |

< Direct factor 2: <${\lambda}_{2}$, ferry distance, tourist site → tourist site ${S}_{2}$ (km, ${S}_{2}\in {\mathrm{R}}^{+}$), ${\lambda}_{2}={S}_{2}^{-1}$, $dir+$>; |

Indirect factor 2: <${\delta}_{2}$, quantity of traffic light, tourist site → tourist site ${N}_{2}$ (${N}_{2}\in {\mathrm{Z}}^{+}$), |

${\delta}_{2}=-0.01{N}_{2}$, $indir-$>> |

< Direct factor 3: <${\lambda}_{3}$, quantity of subway and bus line ${N}_{3}$ (${N}_{3}\in {\mathrm{Z}}^{+}$), ${\lambda}_{3}=0.1{N}_{3}$, $dir+$> |

Indirect factor 3: <${\delta}_{3}$, ferry distance, tourist site → nearest subway or bus station ${S}_{3}$ (km, |

${S}_{3}\in {\mathrm{R}}^{+}$), ${\delta}_{3}=-0.01{S}_{3}$, $indir-$>> |

< Direct factor 4: <${\lambda}_{4}$, taxi fee of ferry distance, $cost$ ($cost\in {\mathrm{R}}^{+}$), ${\lambda}_{4}=cos{t}^{-1}$, $dir+$> |

Indirect factor 4: <${\delta}_{4}$, average waiting time of taxi, $t$ (h, $t\in {\mathrm{R}}^{+}$), ${\delta}_{4}=-0.01t$, $indir-$>> |

< Direct factor 5: <${\lambda}_{5}$, road traffic jam index, $d$ ($d\in {\mathrm{R}}^{+}$), ${\lambda}_{5}=1-d$, $dir+$> |

Indirect factor 5: <${\delta}_{5}$, average quantity of traffic jam, ${N}_{4}$ (${N}_{4}\in {\mathrm{Z}}^{+}$), ${\delta}_{5}=-0.01{N}_{4}$, $indir-$>>. |

According to the definition, the city tourist site set

$C$ can be stored as a

$u\times {\displaystyle {\sum}_{j=1}^{m}{s}_{j}}$ dimension matrix. The matrix’s columns relates to specific tourist sites, while the rows relates to the feature attribute. The feature attributes include the membership degree direct influence factors

${\lambda}_{v1}$, membership degree indirect influence factors

${\delta}_{v2}$, tourist site longitude

$l$, tourist site latitude

$B$, the tourist site attraction index

$\epsilon $, and

$\mathrm{max}u=\mathrm{max}{v}_{1}+\mathrm{max}{v}_{2}+3$. According to Definitions 12 and 13, factors

${\lambda}_{v1}$ and

${\delta}_{v2}$ of the tourist site clustering center

$K$ and first searched seed tourist site, seed tourist site, and subordinate seed tourist site are determined by the tourist site clustering center

$K$ and relative tourist sites, in which, if one point changes, the values of factors

${\lambda}_{v1}$ and

${\delta}_{v2}$ will change simultaneously, thus the values of factors

${\lambda}_{v1}$ and

${\delta}_{v2}$ are fluctuating. The objective function of the tourist site clustering center

$K$ and first searched seed tourist site, seed tourist site, and subordinate seed tourist site are determined by feature attributes, as shown in formula (8).

**Definition** **14.** Objective function descending order vector$Q$. In the process of single searching subordinate seed tourist site${G}_{e}^{\ast}$, we store the searched objective function values in a vector in the sequence of elements from left to right in descending order, and this vector is called the objective function descending order vector$Q$.

**Definition** **15.** Objective function fluctuating curve. In the process of single searching a subordinate seed tourist site ${G}_{e}^{\ast}$, the fluctuating curve, which reflects objective function values tendency, is called the objective function fluctuating curve.

The objective function fluctuating curve changes with the tourist site clustering center $K$ and the selected tourist site classification and quantity ${A}_{w}$. When $K$ or ${A}_{w}$ changes greatly, the objective function fluctuating curve tendency will also change greatly. In the process of searching the subordinate seed tourist site ${G}_{e}^{\ast}$ starting from the clustering center $K$ or seed tourist site ${G}_{e}$, in one time of searching, one group of objective function values will be generated. The objective function fluctuating curve visually reflects the affinities relationship between the seed tourist site and other tourist sites in one single searching process.

**Definition** **16.** Seed tourist site full rank for classification${c}_{j}$. As to one certain nonzero tourist site classification${a}_{wj}\ne 0$of tourist site classification and quantity${A}_{w}$in matrix$A$, during the searching process, when the quantity of the searched seed tourist site for this classification reaches${a}_{wj}$, the seed tourist site for the classification${c}_{j}$is full rank under the condition of${A}_{w}$, and it is noted as${c}_{j}^{\wedge}$. When the seed tourist site for the classification is full rank, the propagating tree will not accept further searched seed tourist sites of the same classification.

One single searching process only confirms and mines one tourist site as the subordinate tourist site and, meanwhile, the other tourist sites are non-subordinate tourist sites. The subordinate function

$\mu ((K,{c}_{js}),{c}_{j\prime s\prime})=\mu (K,{c}_{js})({c}_{j\prime s\prime})$, which is noted by the membership degree that represents the subordinate relationship between tourist site

${c}_{j\prime s\prime}$ and initial seed tourist site

${c}_{js}$ or the tourist site clustering center

$K$ in one single searching process. The subordinate function is formula (9).

**Definition** **17.** Membership degree distribution matrix$\mu (c)$. When the tourist site${c}_{j\prime s\prime}$is the subordinate seed tourist site for the clustering center$K$or initial seed tourist site${c}_{js}$, the membership degree value of tourist site${c}_{j\prime s\prime}$is 1, or the value is 0. One single searching process can confirm one tourist site’s membership degree value as 1, and other tourist sites’ values as 0. The matrix that represents the subordinate relationship of all tourist sites via subordinate function values is called the membership degree distribution matrix$\mu (c)$.

As shown in formula (10), it represents the distribution of seed tourist sites. The matrix row is one sort of tourist site classification and quantity. The matrix column is the membership degree of the No.

$s$ tourist site for the sort of tourist site classification and quantity. The quantity of column is

$\mathrm{max}{s}_{j}$, and vacant elements are noted as 0. When the clustering center

$K$ or

${A}_{w}$ changes, the membership degree distribution matrix will also change.

**Step 4.** Set up the optimal tourist site mining algorithm

Objective function descending order vector $Q$ stores objective function values. If the tourist site classification relating to the first element of objective function value is not full rank and not listed in the previous seed tourist sites, and then the tourist site relating to the objective function value is mined as subordinate seed tourist site ${G}_{e}^{\ast}$ of the tourist site clustering center $K$ or initial seed tourist site ${G}_{e}$, if in the same cluster, note it as ${G}_{e+}^{\ast}$, if in the different cluster, we note it as ${G}_{e-}^{\ast}$. Tourist sites relating to the objective function values on other elements are non-subordinate seed tourist sites $\neg {G}_{e}^{\ast}$. Starting from the clustering center $K$, the process of searching the seed tourist site vector ${G}_{w}$ and obtaining the objective function descending order vector $Q$, as well as the membership degree distribution matrix $\mu (c)$ is as follows.

Sub-step 1. Confirm matrix $A$. The tourist selects one sort of tourist site classification and quantity vector ${A}_{w}$.

Sub-step 2. We set up $1\times {\displaystyle {\sum}_{j=1}^{m}{a}_{wj}}$ dimension seed tourist site vector ${G}_{w}$, $1\times {\displaystyle {\sum}_{j=1}^{m}{s}_{j}}$ dimension objective function descending order vector $Q$ and $m\times \mathrm{max}{s}_{j}$ dimension membership degree distribution matrix, and set all elements as 0.

Sub-step 3. We set up the Open list and Closed list. The open list is used to store all non-seed tourist sites to be searched. The Closed list is used to store all searched seed tourist sites. The storage format of the Open list and Closed list is the same as the tourist site classification vector $C$, and the elements for the two lists are set in the sequence of the tourist site classification and order. The Open list and Closed list contain $\sum}_{j=1}^{m}{s}_{j$ elements, respectively, according to the definition. We store all elements of the city tourist site set ${C}_{s}$ in the Open list.

Sub-step 4. Search and confirm the No.1 seed tourist site ${G}_{1}$. Here is the definition of seed tourist site searching angle.

**Definition** **18.** Seed tourist site searching angle$\phi $. Starting from one certain central point, we draw a ray${l}_{1}$directing to the geographic north and another ray${l}_{2}$connecting with the central point and another point. The included angle from the north ray${l}_{1}$to ray${l}_{2}$in a clockwise direction is called the searching angle. If the central point is the clustering center$K$or initial seed tourist site${G}_{e}$, the other point is one tourist site${c}_{js}$to be searched, and the included angle from the ray of the clustering center$K$or initial seed tourist site${G}_{e}$to the ray of the tourist site${c}_{js}$is called the seed tourist site searching angle$\phi $, noted as$\phi (K,{c}_{js})$or$\phi ({G}_{e},{c}_{js})$.

The process of searching the No.1 seed tourist site is as follows.

(I) The clustering center $K$ is set as the central point to confirm the $\sum}_{j=1}^{m}{S}_{j$ searching angle $\phi (K,{c}_{11})$, $\phi (K,{c}_{12})$, …, $\phi (K,{c}_{m{s}_{m}})$ for tourist sites;

(II) search and calculate the objective function value $\sigma (K,{c}_{11})$ in the direction of the searching angle $\phi (K,{c}_{11})$ and objective function value $\sigma (K,{c}_{12})$ in the direction of the searching angle $\phi (K,{c}_{12})$;

① if $\sigma (K,{c}_{11})\ge \sigma (K,{c}_{12})$, store $\phi (K,{c}_{11})$ into the first element of vector $Q$, and store $\phi (K,{c}_{12})$ into the second element of vector $Q$;

② if $\sigma (K,{c}_{11})<\sigma (K,{c}_{12})$, store $\phi (K,{c}_{12})$ into the first element of vector $Q$, and store $\phi (K,{c}_{11})$ into the second element of vector $Q$;

(III) search and calculate the objective function value $\sigma (K,{c}_{13})$ on the direction of the searching angle $\phi (K,{c}_{13})$:

① if $\sigma (K,{c}_{11})\ge \sigma (K,{c}_{12})\ge \sigma (K,{c}_{13})$, keep the first and second element unchanged, and store $\sigma (K,{c}_{13})$ into the third element of vector $Q$;

② if $\sigma (K,{c}_{11})\ge \sigma (K,{c}_{13})\ge \sigma (K,{c}_{12})$, keep the first element unchanged, and descend $\sigma (K,{c}_{12})$ to the third element of vector $Q$;

③ if $\sigma (K,{c}_{13})\ge \sigma (K,{c}_{11})\ge \sigma (K,{c}_{12})$, descend $\sigma (K,{c}_{11})$ and $\sigma (K,{c}_{12})$ to the second and third elements of vector $Q$, and ascend $\sigma (K,{c}_{13})$ to the first element of vector $Q$; and,

④ as to $\sigma (K,{c}_{11})<\sigma (K,{c}_{12})$, the comparison method of $\sigma (K,{c}_{13})$ and other two values is the same as step (III) sub-steps ①–③.

(IV) Return to step (I)–(III) and continue searching and comparing the objective function values of other searching angles, store the function values into vector $Q$, and finally find the objective function descending order vector ${Q}_{1}$ and objective function fluctuating curve $curv{e}_{1}$ searched by the central point of the clustering center $K$.

(V) Extract the first element value of vector ${Q}_{1}$, and its searching angle’s related tourist site is ${Q}_{11}$. Enter the following judgment steps:

① Search the Closed list. If ${Q}_{11}$ appears in the Closed list, jump to the second element ${Q}_{12}$ of vector ${Q}_{1}$;

② If ${Q}_{12}$ appears in the Closed list, continue to jump to the third element ${Q}_{13}$ of vector ${Q}_{1}$;

③ Start searching from tourist site ${Q}_{11}$, according to the method of step (V) sub-steps ① and ②, if tourist site ${Q}_{1{v}_{1}}$ appears in the Closed list, continue searching; if one certain tourist site ${Q}_{1{v}_{1}}$ does not appear in the Closed list, then jump to step ④, ${v}_{1}\in (0,{\displaystyle {\sum}_{j=1}^{m}{s}_{j}}]\subset {\mathrm{Z}}^{+}$;

④ Judge and confirm the tourist site classification ${c}_{j}$ for tourist site ${Q}_{1{v}_{1}}$:

(i) If the tourist site classification ${c}_{j}$ is not full rank $\neg {c}_{j}^{\wedge}$, and then confirm tourist site ${Q}_{1{v}_{1}}$, as the No.1 seed tourist site ${G}_{1}$ and store it into the first element of the seed tourist site vector ${G}_{w}$. Confirm the seed tourist site’s membership degree to the clustering center $K$ is 1. The other tourist sites’ membership degrees are all 0. Store ${G}_{1}$ into the Closed list and delete ${G}_{1}$ from the Open list; and,

(ii) If the tourist site classification ${c}_{j}$ is full rank ${c}_{j}^{\wedge}$, return to step (V) sub-steps ①–③ and search the next tourist site ${Q}_{1{v}_{2}}$ which does not appear in the Closed list. Enter the judgment of step (V) sub-step ④. Repeat the process until the seed tourist seed is searched and confirmed, and then store it into the first element of vector ${G}_{w}$.

Sub-step 5. Search and confirm the No.2 seed tourist site and subsequent seed tourist sites.

(I) According to Sub-step 4, set the initial seed tourist site ${G}_{1}$ as the central point. Search the No.2 seed tourist site ${G}_{2}$ in the whole geographic range and store it into vector. Confirm the membership degree of the seed tourist site to initial seed tourist site ${G}_{1}$ as 1, other tourist sites’ membership degrees as set as 0. Store ${G}_{2}$ into the Close list, and delete it from the Open list;

① If the tourist site classification for the seed tourist site ${G}_{1}$ is not full rank $\neg {c}_{j}^{\wedge}$, that is, ${G}_{1}$ and ${G}_{2}$ are in the same cluster, note ${G}_{2}$ as ${G}_{1+}^{*}$; and,

② If the tourist site classification for the seed tourist site ${G}_{1}$ is full rank ${c}_{j}^{\wedge}$, that is, ${G}_{1}$ and ${G}_{2}$ are in two different clusters, note ${G}_{2}$ as ${G}_{1-}^{*}$.

(II) Set the initial seed tourist site ${G}_{2}$ as the central point. Search the No.3 seed tourist site ${G}_{3}$ in the whole geographic range and store it into vector. Confirm the membership degree of the seed tourist site to initial seed tourist site ${G}_{2}$ as 1, and set the other tourist sites’ membership degrees as set as 0. Store ${G}_{3}$ into the Close list, and delete it from the Open list. The method to note the ${G}_{3}$ cluster is the same as Sub-step 5 step (I); and,

(III) According to Sub-step 5 step (I) and (II), search and store subsequent seed tourist sites until each tourist site of interest classification

${c}_{j}$ gets to full rank

${c}_{j}^{\wedge}$,

$j=1,2,\dots ,m$, and also the seed tourist site vector

${G}_{w}$ is full rank. The method to note cluster is the same as Sub-step 5 step (I). In the process of searching the seed tourist site, the objective function descending order vector and objective function curve relating to each seed tourist site are also obtained.

Figure 2 shows the process of searching and mining subordinate seed tourist sites with previously searched seed tourist sites as the central points.

**Step 5.** Generate the optimal tourist site propagating tree

Starting from the clustering center $K$, generate the optimal tourist site propagating tree in the sequence of the seed tourist site vector ${G}_{w}$ element. This tree is the tendency of optimal tourist sites that meet tourists’ needs and interests and have the optimal geographic distribution. It is also the visualized process for a smart machine to output the optimal tourist sites according to the selected tourist classification and quantity.

**Step 6.** Generate the membership degree distribution matrix $\mu (c)$.

Based on the searched seed tourist site vector ${G}_{w}$, the membership degree distribution matrix $\mu (c)$ is generated. This matrix can intuitively reflect the quantity of the seed tourist sites as well as their distribution of each tourist site classification.

#### 3.2. Tour Route Planning Algorithm Modeling based on Optimal Closed-loop Structure

The smart machine automatically plans optimal tour routes that meet tourists’ best motive benefits, according to the tourists’ interests learned from the Naïve Bayes machine learning module and optimal tourist sites searched by the membership degree searching propagating tree. All of the designed and developed algorithms are based on one-day trips. Within one day, the smart machine confirms no more than five optimal tourist sites for tourists and ensures that all of the mined tourist sites not only meet tourists’ needs and interests, cost the least expenditure with the optimal geographic distribution, but also consider tourists’ physical conditions, which helps tourists to have sufficient time to visit all the recommended tourist sites. Starting from temporary accommodation $K$, the whole trip of ferrying from one tourist site to another and visiting each tourist site and then returning to $K$ is an integrated closed-loop process, in which the quantity of visited optimal tourist sites is set as $\tau $, being noted as $\tau ={\displaystyle {\sum}_{j=1}^{m}{a}_{wj}}$, $\tau \in (0,5]\subset {\mathrm{Z}}^{+}$. Under the condition of confirmed $K$, there will be ${\mathrm{A}}_{\tau}^{\tau}$ sorts of tour routes, but not all of the tour routes can meet the tourists’ best motive benefits, there should be optimal ones and sub-optimal ones. The optimal ones will be the first important recommendation to tourists, while the sub-optimal ones will also be recommended to tourists. The attraction and motive benefits of one tour route for tourists depends on the influence of all the factors on the tour route, including factors ${\lambda}_{v1}$ and ${\delta}_{v2}$ in the actual trip, which are extracted to set up the objective function $J(\sigma (K,{c}_{js}),\sigma ({c}_{js},{c}_{j\prime s\prime}))$.

**Definition** **19.** Generation tree of the closed-loop structure${O}_{\omega}$. Starting from the temporary accommodation$K$, the whole trip of ferrying from one tourist site to another and visiting each tourist site and then returning to$K$is an integrated closed-loop structure, and this structure is called a generation tree closed-loop structure${O}_{\omega}$.

According to the quantity of tourist sites $\tau $, the ${\mathrm{A}}_{\tau}^{\tau}$ quantity of closed-loop structures can be confirmed, $\omega \in (0,{\mathrm{A}}_{\tau}^{\tau}]\subset {\mathrm{Z}}^{+}$, $\tau \in (0,5]\subset {\mathrm{Z}}^{+}$. One closed-loop structure relates to one tour route generation tree.

**Definition** **20.** Generation tree sub-unit$H(\cdot )$. In the whole trip process of one closed-loop structure, tourists will pass$\tau +1$independent ferry intervals, and each ferry interval is called generation tree sub-unit$H(\cdot )$.

According to the closed-loop structure, the ferry interval between $K$ and tourist, between two tourist sites, and between tourist site and $K$ are noted as $H(K,{G}_{e})$, $H({G}_{e},{G}_{e+1})$, and $H({G}_{e},K)$. A generation tree sub-unit is the basic unit structure to output a sub-unit motive function value and generation tree motive function value. Here, it is defined that generation tree sub-units are independent from each other; tourists’ motive benefit obtained in one sub-unit has no relationship with another sub-unit.

**Definition** **21.** Sub-unit motive function$I(\cdot )$. In each sub-unit, the function is designed with the same initial motive iteration value${I}_{0}$to iterate with the membership degree direct influence factors${\lambda}_{v1}$and indirect influence${\delta}_{v2}$and output motive iteration value of independent interval$H(\cdot )$. This function is called the sub-unit motive function$I(\cdot )$, as shown in formula (11).

The sub-unit motive function

$I(\cdot )$ reflects the motive benefits of the ferry interval. The higher the function value is, the bigger the influence of factors on motive benefits will be, and the more satisfaction tourists will have. In the ferry interval of a sub-unit, the motive function

$I(\cdot )$ is a monotone increasing function whose values will increase with tourists ferry distance increases. It finally outputs a maximum value of the interval, which is the sub-unit motive function

$I(\cdot )$ value. Different sub-units have different function values, thus the whole trip’s sub-unit motive function

$I(\cdot )$ values fluctuate with distance. Sub-unit motive function

$I(\cdot )$ value has the feature of non-direction, that is, in the same sub-unit, function

$I(\cdot )$ value remains unchanged back and forth.

**Definition** **22.** Sub-unit motive weight$h(\cdot )$. The reciprocal of the sub-unit motive function$I(\cdot )$value is defined as the sub-unit motive weight$h(\cdot )$. The sub-unit motive weight$h(\cdot )$is the edge weight for two connecting point in the closed-loop. It is used as an edge weight parameter to search the optimal closed-loop structure.

According to the definition, the sub-unit motive weight

$h(\cdot )$ meets formula (12). The sub-unit motive weight

$h(\cdot )$ also has the non-direction feature. Thus, the graph that is composed by the clustering center

$K$ and seed tourist sites

${G}_{e}$ is connected and non-direction graph.

**Definition** **23.** Generation tree weight function$L(\cdot )$. The function which is iterated by the$\tau +1$sub-unit motive weight$h(\cdot )$and reflects the motive benefits of one generation tree closed-loop’s tour route is called the generation tree weight function$L(\cdot )$, as shown in formula (13). One generation tree weight function$L(\cdot )$relates to one tour route, and the lower the function value is, the more motive benefits the tourists will get from the tour route.

According to definition, in one closed-loop structure, the generation tree weight function

$L(\cdot )$ is a monotone increasing function whose value increases with tourists’ ferrying distance increases, and finally outputs a maximum value.

${\mathrm{A}}_{\tau}^{\tau}$ function

$L(\cdot )$ values are the elements for generation tree weight function minimum heap.

**Definition** **24.** Generation tree weight function minimum heap$R$. The minimum heap, which is formed by generation tree weight function values stored as array elements, is called the generation tree weight function minimum heap$R$.

According to the seed tourist site quantity $\tau $ and generation tree quantity ${\mathrm{A}}_{\tau}^{\tau}$, the minimum heap meets the following conditions:

(1) it contains ${\mathrm{A}}_{\tau}^{\tau}$ elements;

(2) set $n={\mathrm{A}}_{\tau}^{\tau}$, its element serial numbers ${k}_{1}$, ${k}_{2}$, …, ${k}_{n}$ meet: ${k}_{i}\le {k}_{2i}$, ${k}_{i}\le {k}_{2i+1}$,$1\le i\le \u230an/2\u230b$;

(3) the level of parent node is No.0. The height of the tree is $d$, and the other nodes are either on the No. $d$ level or on the No. $d-1$ level;

(4) when $d\ge 1$, there are ${2}^{d-1}$ nodes on the No. $d-1$ level;

(5) the branch nodes of the No. $d-1$ level all gather on the left of the tree;

(6) element value of each node is smaller than its child nodes; and,

(7) of all the node elements in the same level, left element is smaller than the right one.

According to the the definition, tour route planning algorithm modeling that is based on optimal closed-loop structure is set up. The basic thought is, motive weights between clustering center $K$ and each seed tourist site ${G}_{e}$, seed tourist site ${G}_{e}$, and seed tourist site ${G}_{e\prime}$ are confirmed by sub-unit motive function. By searching the ${\mathrm{A}}_{\tau}^{\tau}$ generation tree weight function values, a minimum heap sorting algorithm is used to confirm the minimum heap with weight function values in ascending order, and finally confirm the optimal tour routes and sub-optimal tour routes. The specific steps for the algorithm are as follows.

**Step 1.** Confirm the algorithm parameters:

(I) Confirm ${\lambda}_{v1}$ and ${\delta}_{v2}$. Extract the basic geographic information data of a certain tourism city and confirm the membership degrees direct influence factors ${\lambda}_{v1}$ and indirect influence factors ${\delta}_{v2}$ between the clustering center $K$ and each seed tourist site, seed tourist site ${G}_{e}$, and seed tourist site ${G}_{e\prime}$;

(II) Confirm $l$, $B$ and $\epsilon $. Extract the basic geographic information data and confirm the longitude and latitude coordinates $(l,B)$ of the clustering center $K$ and each seed tourist site ${G}_{e}$. Mine the tourism data information and obtain tourist site attraction indexes. Set the attraction index of the clustering center $K$ as ${\epsilon}_{K}=0$, as it is the starting point of the tour route.

**Step 2.** Iterate and calculate the sub-unit motive function values. From formula (11), the ${\mathrm{C}}_{\tau +1}^{2}$ motive function $I(\cdot )$ values between the clustering center $K$ and each seed tourist site ${G}_{e}$, seed tourist site ${G}_{e}$, and seed tourist site ${G}_{e\prime}$.

Sub-step 1 Confirm the $\tau $ motive function values between the clustering center $K$ and each seed tourist site ${G}_{e}$. The clustering center $K$ is the starting point and terminal point of the tour route;

Sub-step 2. Confirm ${\mathrm{C}}_{\tau}^{2}$ motive function values between arbitrary two seed tourist sites.

**Step 3.** Confirm the sub-unit motive weight. According to the sub-unit motive function values, confirm the ${\mathrm{C}}_{\tau +1}^{2}$ sub-unit motive weights between the clustering center $K$ and each seed tourist site ${G}_{e}$, seed tourist site ${G}_{e}$, and seed tourist site ${G}_{e\prime}$. The motive weight value is the edge weight of the connected and non-direction graph composed of the clustering center $K$ and each seed tourist site ${G}_{e}$.

**Step 4.** Search generation tree weight function minimum heap $R$. Through an edge correcting method to search the ${\mathrm{A}}_{\tau}^{\tau}$ generation tree weight function values relating to ${\mathrm{A}}_{\tau}^{\tau}$ generation tree closed-loop’s tour routes. Search and obtain the generation tree weight function minimum heap $R$ sorted by the generation tree weight function values in array via a sorting algorithm.

Sub-step 1. Set up a generation tree basic structure loop. Define a virtual closed-loop circle and evenly place points of the clustering center

$K$ and all seed tourist sites

${G}_{e}$ on the circle. The connecting arc or line between two points can be clipped or connected in accordance with algorithm conditions, as shown in

Figure 3. For the convenience of setting up the algorithm, note the clustering center

$K$ as

${v}_{1}$, seed tourist site

${G}_{1}$ as

${v}_{2}$, and son on, and the seed tourist site

${G}_{\tau}$ as

${v}_{\tau +1}$.

Sub-step 2. Search the initial generation tree closed-loop structure ${O}_{1}$, and set:

${O}_{1}={v}_{1},{v}_{2},\dots ,{v}_{i},\dots ,{v}_{j},\dots ,{v}_{\tau +1},{v}_{1}$, $1<i\le j<\tau +1$, and $i,j,\tau \in {\mathrm{Z}}^{+}$.

(I) in structure ${O}_{1}$, search the $\tau +1$ sub-unit motive weights $h(\cdot )$ of adjacent ${v}_{i}$ and ${v}_{i+1}$;

(II) iterate the generation tree weight function value ${L}_{1}(K,K)$ of the closed-loop structure ${O}_{1}$; and,

(III) store the weight function value ${L}_{1}(K,K)$ into the parent node ${R}_{1}$ of minimum heap $R$.

Sub-step 3. Search the next generation tree closed-loop structure ${O}_{2}$. Find $\forall i,j$ and $i,j$ meet the following conditions:

(1) $1<i+1<j<\tau +1$;

(2) $h({v}_{i},{v}_{j})+h({v}_{i+1},{v}_{j+1})<h({v}_{i},{v}_{i+1})+h({v}_{j},{v}_{j+1})$.

Clip and rebuild the closed-loop structure ${O}_{1}$:

(I) delete sub-unit $H({v}_{i},{v}_{i+1})$ in ${O}_{1}$;

(II) delete sub-unit $H({v}_{j},{v}_{j+1})$ in ${O}_{1}$;

(III) add sub-unit $H({v}_{i},{v}_{j})$; and,

(IV) add sub-unit $H({v}_{i+1},{v}_{j+1})$.

The structure of the rebuilt generation tree closed-loop is:

${O}_{2}={v}_{1},{v}_{2},\dots ,{v}_{i},{v}_{j},{v}_{j+1},\dots ,{v}_{i+1},{v}_{j+1},{v}_{j+2},\dots ,{v}_{\tau +1},{v}_{1}$. Search the weight function value of generation tree closed-loop structure ${O}_{2}$.

(V) In structure ${O}_{2}$, search the $\tau +1$ sub-unit motive weights $h(\cdot )$ of adjacent ${v}_{i}$ and ${v}_{i+1}$;

(VI) iterate the generation tree weight function value ${L}_{2}(K,K)$ of the closed-loop structure ${O}_{2}$; and,

(VII) compare the generation tree weight function value ${L}_{1}(K,K)$ and ${L}_{2}(K,K)$, and update the generation tree weight function minimum heap $R$:

① If ${L}_{1}(K,K)\le {L}_{2}(K,K)$:

(i) keep the weight function value ${L}_{1}(K,K)$ storing in the parent node ${R}_{1}$ of minimum heap $R$ unchanged; and,

(ii) store the weight function value ${L}_{2}(K,K)$ into the child node ${R}_{2}$ of parent node ${R}_{1}$ in minimum heap $R$.

② If ${L}_{1}(K,K)>{L}_{2}(K,K)$:

(i) delete the parent node ${R}_{1}$ value ${L}_{1}(K,K)$; and,

(ii) store the weight function value ${L}_{2}(K,K)$ into the parent node ${R}_{1}$ in minimum heap $R$; and,

(iii) store the weight function value ${L}_{1}(K,K)$ into the child node ${R}_{2}$ of parent node ${R}_{1}$ in minimum heap $R$.

Sub-step 4. Return to Sub-step 3 and use the same method to search the next generation tree closed-loop structure ${O}_{3}$.

(I) in structure ${O}_{3}$, search $\tau +1$ sub-unit motive weights $h(\cdot )$ of adjacent ${v}_{i}$ and ${v}_{i+1}$;

(II) iterate the generation tree weight function value ${L}_{3}(K,K)$ of the closed-loop structure ${O}_{3}$; and,

(III) compare the generation tree weight function values ${L}_{1}(K,K)$, ${L}_{2}(K,K)$ and ${L}_{3}(K,K)$, and then update the generation tree weight function minimum heap $R$:

① If ${L}_{1}(K,K)\le {L}_{2}(K,K)$:

(i) if ${L}_{1}(K,K)\le {L}_{2}(K,K)\le {L}_{3}(K,K)$, keep the weight function values ${L}_{1}(K,K)$ and ${L}_{2}(K,K)$ storing unchanged, store the weight function value ${L}_{3}(K,K)$ into the child node ${R}_{3}$ of parent node ${R}_{1}$ in minimum heap $R$;

(ii) if ${L}_{1}(K,K)\le {L}_{3}(K,K)<{L}_{2}(K,K)$, keep the weight function value ${L}_{1}(K,K)$ storing unchanged, delete the child node ${R}_{2}$ value and store the weight function value ${L}_{3}(K,K)$ into the child node ${R}_{2}$ of parent node ${R}_{1}$, store the weight function value ${L}_{2}(K,K)$ into the child node ${R}_{3}$ of parent node ${R}_{1}$ in minimum heap $R$; and,

(iii) if ${L}_{3}(K,K)<{L}_{1}(K,K)\le {L}_{2}(K,K)$, delete the child node ${R}_{1}$ and ${R}_{2}$ values, store the weight function value ${L}_{3}(K,K)$ into the parent node ${R}_{1}$. Store the weight function value ${L}_{1}(K,K)$ and ${L}_{2}(K,K)$ into the child node ${R}_{2}$ and ${R}_{3}$ of parent node ${R}_{1}$ respectively in minimum heap $R$.

② If ${L}_{1}(K,K)>{L}_{2}(K,K)$:

(i) if ${L}_{3}(K,K)\ge {L}_{1}(K,K)>{L}_{2}(K,K)$, keep the weight function values ${L}_{1}(K,K)$ and ${L}_{2}(K,K)$ storing unchanged, store the weight function value ${L}_{3}(K,K)$ into the child node ${R}_{3}$ of parent node ${R}_{1}$ in minimum heap $R$;

(ii) if ${L}_{1}(K,K)>{L}_{3}(K,K)\ge {L}_{2}(K,K)$, keep the weight function value ${L}_{2}(K,K)$ storing unchanged, delete the child node ${R}_{2}$ value, and store the weight function value ${L}_{3}(K,K)$ into the child node ${R}_{2}$ of parent node ${R}_{1}$, store the weight function value ${L}_{1}(K,K)$ into the child node ${R}_{3}$ of parent node ${R}_{1}$ in minimum heap $R$; and,

(iii) if ${L}_{1}(K,K)>{L}_{2}(K,K)>{L}_{3}(K,K)$, delete the child node ${R}_{1}$ and ${R}_{2}$ values, store the weight function value ${L}_{3}(K,K)$ into the parent node, and store the weight function values ${L}_{1}(K,K)$ and ${L}_{2}(K,K)$ into the child node ${R}_{3}$ and ${R}_{2}$ of parent node ${R}_{1}$, respectively in minimum heap $R$.

Sub-step 5. Return to Sub-step 3, and use the same method to search all generation tree closed-loop structures ${O}_{4}$−${O}_{\tau}$ and find the rebuilt generation tree weight function minimum heap $R$. As step 4 ends, enter Step 5.

**Step 5.** Output tour route sorting heap relating to generation tree weight function minimum heap $R$. The weight function value ${L}_{\omega}(K,K)$ relates to the generation tree closed-loop structure ${O}_{\omega}$, which relates to the tour route. According to the algorithm rule, the weight function value that is stored in the parent node of minimum heap $R$ relates to the optimal tour route. As its output generation tree motive weight function value is the minimum one, the iteration value of all sub-unit motive function values is the maximum one. In the aspect of the comprehensive output result, the optimal tour route performs best on tourist site classification, tourist quantity, confirmed specific tourist sites, tourist sites distribution, tour sequence, GIS service, traffic information service, and tourist site star level, etc. The two child nodes of the parent node relate to sub-optimal tour routes. A smart machine will output the visualized results for tourists according to the input conditions.