In the first stage, the decision tree model is built. This process begins with an examination of the survey area characteristics. Then, the reduced binary elimination matrix is formed, which serves as an input variable for the decision tree model. Additionally, a decision tree has been made based on the matrix. Furthermore, the advantage of the decision tree lies in the fact that it needs to be created only once, assuming that there have been no changes in the elimination matrix, which implies that the performances of the technologies have remained the same. Hence, these performances can change only if new technologies or upgrades of existing technologies become available. The second stage starts with the evaluation of the built model by setting criteria for hydrographic survey quality. Then, it follows quantifying and ranking criteria. Additionally, it has to be noted, determining the criteria values ensures adequate quality in the optimization process. Finally, the last part of this stage is the result of optimization, i.e., ranking technologies from top to bottom. On the top is the optimal technology and on the bottom is the least optimal technology.
2.1. Stage 1—Problem Formulation Stage
The problem formulation stage begins with the diversity of the areas in which the hydrographic survey is carried out. Therefore, identifying the area’s characteristics is of paramount importance. To facilitate this process, the proposed distribution of characteristics and sub-characteristics of an area is represented in
Table 1.
Table 1 is made on the basis of processed literature [
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43]. It represents the framework that the hydrographer or marine industry employees uses. The literature references are used to obtain the minimum set of survey area data. The derived minimum survey area data set together with survey technologies from the basis for binary reduction matrix elements. Each identified characteristic has been divided by a single division into sub characteristics that are mutually exclusive. From
Table 1, it can be observed that the five (5) distribution characteristics are proposed to create the basics framework of the survey area and as such, form the basic elements for the binary reduction matrix. They are sorted into A, B, C, D, and E characteristics. Additionally, as it can be seen, each characteristic contains sub characteristics. Some characteristics have numerical values (A, B, and C), and some characteristics are descriptive (D and E). The proposed distribution of characteristics and sub characteristics can be applied to any area and easily supplemented with new data as needed. It has to be pointed out that not all areas are available for all technologies. Distribution A (survey area coverage per day) refers to the urgency of the hydrographic survey. For example, distribution A is divided into four (4) categories/variables:
, which covers a survey area of less than 1 km
2/day; variable
, the survey area coverage between 2 and 25 km
2/day; variable
, the survey area between 26 and 65 km
2/day, and with variable
covers the survey area coverage greater than 66 km
2/day. The distribution B (the minimum depth to be surveyed), C (the maximum depth of a survey area), and E (seabed type) refer to the different survey techniques’ performance depending on the depth and seabed type. For example, distribution B is divided into three (3) categories: variable
concerning those technologies which can give satisfactory results of work at depths less than or equal to 1 m, variable
represents minimum survey depths between 2 and 5 m, and the last variable
denotes the minimum depth to be surveyed between 6 and 20 m. Distribution D (exists a possibility of hazards to operation) refers to some technologies’ potential to disable survey operation and should also be considered. It contains three (3) descriptive sub characteristics/variables. Variable d
1 describes the possibility of hazards to operation to surface navigation, variable
describes the existence of a possibility of hazards to surface and underwater navigation, and variable
describes the possibility of hazards to surface and underwater navigation. Additionally, available technologies were: remotely operated underwater vehicle (ROV), unmanned aerial vehicle (UAV), light detection and ranging (LIDAR), autonomous underwater vehicle (AUV), satellite-derived bathymetry (SDB), and multibeam echosounder (MBES). They have a set of specific characteristics which the manufacturer determines.
Once survey area data are obtained and the available technology is known, a reduced elimination binary matrix can be created. A hydrographer fills the elimination matrix based on survey area data and performances of available technologies. Regarding the knowledge about survey area data, these are data that the hydrographer must be aware of for planning surveying. Additionally, as far as knowledge of technologies’ performances, these data vary depending on the model, and they are available from the manufacturer. Therefore, filling the elimination matrix is a simple process, and it is performed by a hydrographer based on available data. Since all available technologies are not suitable for all survey areas, the elimination matrix is used for rapid and transparent elimination of inappropriate technologies. Hence, the values from the elimination matrix represent the input in the decision tree model.
The basic reduced binary elimination matrix is shown in
Figure 2.
The rows in
Figure 2 represent characteristics and sub characteristics of the survey area (described in
Figure 1), and the columns
represent available technologies. The technologies are represented as alternatives that come into consideration. The reduced binary elimination matrix’s element
indicates the binary value (0 or 1) of
k-th alternatives
with respect to the
z-th characteristic survey area
, Expressions (1) and (2):
A binary filling is proposed to make the process simpler and faster. A value of 0 indicates that a particular alternative is not suitable for the specified area. Conversely, a value of 1 indicates that the observed alternative is suitable for the survey area characteristic. General characteristics of the survey area, to make the selection more precise, are divided into several sub characteristics, Expression (3):
where
are sub characteristics of
,
are sub characteristics of
, and
are sub characteristics of
.
In practice, the number of characteristics in the matrix can be smaller than is suggested in
Figure 2. If, for example, there is no possibility of hazard (characteristic D), then it can be dropped from the matrix and not considered. Each characteristic under consideration usually has only one sub characteristic. For example, if characteristic C is observed, the maximum depth is generally known, and the characteristic C is represented with only one sub characteristic. In contrast, the characteristic E (seabed type) of the survey area does not have to be unambiguously defined. For example, the bottom part may be rocky, while the other part may be heavy vegetation. In such a case, two sub characteristics of characteristic E are considered. All available alternatives and sub characteristics defined for a particular characteristic of the survey area define submatrices. The total number of submatrices depends on the total number of characteristics of the hydrographic survey area. Each submatrix must have a minimum of one non-zero column. Otherwise, the alternative to which the null column belongs for further analysis will be discarded. Such an outcome means that this alternative is not appropriate for the specified survey area. Additionally, it is important to note that if a technology is no longer available, the corresponding column from the matrix needs to be deleted. This procedure will not be reflected in other values in the elimination matrix.
The flowchart in
Figure 3 represents the proposed approach.
The flowchart in
Figure 3 is based on fuzzy rules. The process of identifying suitable technology with regards to the characteristics of the hydrographic survey area can be explained by the Expression (4):
The structural description does not necessarily need to be represented as rules. Hence, decision trees, which specify sequences of decisions that need to be made along with the resulting recommendation, are another popular means of expression [
44].
It is known that decision tree learning is a supervised machine learning technique for inducing a decision from training data [
45,
46]. It represents one of the most intuitive and frequently used data science techniques [
47,
48]. Additionally, in decision tree theory, the idea is to split the dataset based on the homogeneity of data [
49]. Many measures can be used to determine the best way to split the records. Hence, these measures are defined in terms of the class distribution of the attributes before and after splitting. They are called attribute selection measures (ASM). It has to be pointed out that the most popular ASM in the classification problem is the Gini index and information gain ratio [
50].
The Gini index determines the purity of a specific class after splitting along a particular attribute. The best split increases the purity of the sets resulting from the split. If a data set
contains examples from
classes, the Gini index,
, is defined as in Equation (5) [
51,
52]:
where
is the relative frequency of class
in
.
If the dataset is split on an attribute
A into two subsets
and
, with sizes
and
, respectively, the Gini index can be calculated as in Equation (6) [
51]:
Reduction in impurity is calculated as in expression Equation (7) [
51]:
The information gain ratio is a ratio of information gain to entropy. Information gain can be expressed as in Equation (8):
where
is a feature,
is different values for a feature,
is the subset of
with
.
Therefore, entropy is defined as the probability sum of each label times the log probability of that same label, as is shown by Equation (9) [
51,
52]:
2.2. Stage 2—Optimization Stage
After the technology suitable for the required hydrographic survey area has been selected and visually substantiated by the decision tree, multi-criterion optimization is applied. Multicriteria decision making (MCDM) is a procedure that combines the performance of decision alternatives across several contradicting, qualitative, and/or quantitative criteria and yields in a compromise solution [
53,
54]. Relevant methods are frequently applicable, implacable, or explicable, in numerous real-life problems. They are widely used where sets of decision alternatives are evaluated against conflicting criteria [
54,
55]. The most commonly used multicriteria decision method approach is the weighted sum method (WSM) [
56,
57,
58].
Figure 4 shows MCDM and WSM procedures.
From
Figure 4, it can be observed there are three major steps. The first step starts with the construction of a pair-wise comparison matrix
, type
, between sets of criteria as is shown by Equation (10) [
59]:
where is
criteria of positive reciprocal matrix
.
Further, the pair-wise comparison matrix between criteria is obtained using Saaty’s scale from
Table 2, [
60,
61,
62].
For quantifying qualitative data, Saaty suggested a scale of relative importance, as shown in
Table 2. The scale has five degrees of intensity (1, 3, 5, 7, and 9) and four intermediate levels (2, 4, 6, and 8). Thus, a value judgment corresponds to how many times one criterion is more important than another to each of them. Hence, the values used for any given pair of criteria vary from one (1) where they have equal importance to nine (9) where one is enormously more important than the other.
Once a comparison matrix has been made, it is necessary to sum the values in each column of the pair-wise matrix, compute each element of the matrix by its column total, and calculate the weight vector by finding the row average as in Equation (11) [
60,
63]:
The weighted sum matrix is found by multiplying the pair-wise comparison matrix and the weight vector. Then, dividing all the elements of the weighted sum matrix by their respective priority vector element, a consistency vector (
) is obtained as by Equation (12), [
60,
61,
62,
63]:
Step 1 ends with performing the consistency index (
) and consistency ration (
). CI measures the degree of inconsistency, as is shown by Equation (13) [
60,
61,
62,
63]:
where
is matrix size and
is the largest principal eigenvalue of the positive reciprocal pair-wise comparison matrix.
If the pair-wise comparisons are perfectly consistent, then
is equal to the size of the matrix and the
. The larger the inconsistency between comparisons is a consequence of the larger
and the larger
. Then, the consistency ration
, a metric is calculated as in Equation (14) [
58,
64]:
where
is the random consistency index obtained from a randomly generated pair-wise comparison matrix.
If
CR ≤ 0.1, then the comparisons are acceptable. However, if
CR > 0.1, then the values of the ratio are indicative of inconsistent judgments. In such a case, the judgments should be reconsidered and revised [
58,
60,
61,
62,
63,
65]. The last part of the step is to compute consistency vector, index, and ratio.
In Step 2, a decision matrix
of alternatives on each criterion is created as in Equation (15):
The next part of the second step includes calculating the normalized decision matrix with positive attributes as in Equation (16) [
58,
64]:
where is
is the score of the
-th alternative concerning the
-th criterion and
a maximum number of the
in the column of
.
From Equations (15) and (16), the weighted normalized decision matrix
is created and shown by Equation (17) [
58]:
Finally, step 3 in WSM, including the optimum solution of each alternative, is obtained by the following Equation (18) [
58,
63]:
where
represents the weighted sum score and
is the weight of the
-th criterion.