# Decision Fusion Framework for Hyperspectral Image Classification Based on Markov and Conditional Random Fields

*Reviewer 1:*Anonymous

*Reviewer 2:*Anonymous

*Reviewer 3:*Anonymous

**Round 1**

*Reviewer 1 Report*

The manuscript is certainly interesting. Unfortunately, some parts of the exposition present problems. It is a real shame that the notation and the equations are sometimes unclear.

I hope that the following suggestions can help to improve the exposition.

1)MRF and CRF acronyms are introduced in the abstract but not in the body text.

2)Only the acronyms MRF and CRF are reported in the keywords.

3)In line 35, it not so evident that spatial-spectral expression refers to the concept previously exposed.

4)The vector x is introduce in line 114 but the index n is introduced in line 137. I would like to suggest considering adopting an uppercase symbol to indicate number of pixel.

5)The symbol N_i in Eq. 1 is hastily introduced in line 122. Perhaps, this symbol should be bold since it represent a set of pixels.

6)The formulation of the second term of Eq.1 could puzzle the reader.

7)Perhaps lines 137 – 140 could be moved at beginning of section 2.2.

8)I would like to propose some changes in the lines 133 – 136: "In this work, the MRFs and CRFs are used as decision fusion methods, by combining multiple decision sources in their energy functions. We propose the fusion of two decision sources. The first is the probability outputs from the Multinomial Logistic Regression classifier (MLR) [36], i.e. a supervised classification of the spectral reflectance values. The second source of information is produced by considering the sparse spectral unmixing method SunSAL proposed in [29] ". Lines 141 and 144 should be adjusted accordingly.

9)I would like to suggest considering fusing section 2.1.1 with 2.3 and section 2.1.2 with 2.4

10) Section 2.3 and 2.4 contain a very complex notation and the figures are not completely clear. I would like to encourage the authors to improve this part. The authors consider whether this part of the manuscript could be extended making it more independent from the references.

11) MRF_a, MRF_p, CRF_a, CRP_p are not clear. I can suppose, for instance, that MRF_a is the MRF applied to output of SunSAL.

12) SVM classifier mentioned in Figure 6 could be added to the other approaches considered in the analysis of the performance of the proposed methods. Clearly, a different up-to-date classifier that does not adopt fusion approach would be equally suitable.

Indeed, I was particularly impressed by the quality of Figures and in particular by Figure 6. I am curious to know which program was used to produce this figure. Furthermore, I would like to know if the authors took inspiration from papers already published or they independently developed the layout of this figure.

Each plot of Figure 5 should have the same point of view and the same axis orientation. In addition, the vertical scale should be the same. In Figure 7 and Figure 8, the legends can be omitted. It could be sufficient that labels in x-axis were bold i.e. more evident. In this perspective, also the colors can be removed.

Furthermore, I would like to suggest checking ALL the bibliography because there are some problems. In particular:

1)In [25], an author is missing.

2)The authors of [27] are wrong.

3)In [44], the first name of the first author is also reported.

Indeed, I have been really interested in this manuscript so that I would like to provide the most interesting idea that popped in my head when I read it. Indeed, I am glad when the reviewing process comes close to a friendly discussion among colleagues. I hope that the authors can find my suggestion interesting.

Thinking about the characteristics of the proposed method, I came to the conclusion that Figure 6, Table II and Table III are not completely effective to synthesize the results. In particular, Figure 6 gives a limited contribution to understand the behavior of the proposed solution. For this reason, I would like to suggest considering a new table. I hope that the HYPOTETHICAL considerations reported about the table can be confirmed by the real data. In any case, they should be useful as a base for the discussion. If this new table and the corresponding discussion were really interesting, they could be included in the revised version of the manuscript.

The decision fusion framework considers SunSAL and MLR. Applying a MAP classifier to the abundances and probabilities obtained by SunSAL and MLR respectively, the corresponding classifiers are defined. Therefore, for each class C_k of the classification map, for each pixel P(I,j), these three events can occur:

•Event E1: both classifiers (SunSAL and MLR) mark P(i,j) as C_k.

•Event E2: only one classifier marks P(i,j) as C_k.

•Event E3: none classifier marks P(i,j) as C_k.

For each C_k:

•S(C_k,E1) is the set of pixels of Event E1.

•S(C_k,E2) is the set of pixels of Event E2.

•S(C_k,E3) is the set of pixels of Event E3.

And

•n_(C_k,E1) is number of pixels in S(C_k,E1).

•n_(C_k,E2) is number of pixels in S(C_k,E2).

•n_(C_k,E3) is number of pixels in S(C_k,E3).

The first column of the new table should report the labels of the classification (C1, C2, …, C10 in Figure 6) map.

The columns 2-4 should report n_(C_k,E1), n_(C_k,E2) and n_(C_k,E3) values. Indeed, column 4 is not particularly important.

The columns 5-7 should contain:

•The number of pixels nT_(C_k,E1) of S(C_k,E1) that are really C_k.

•The number of pixels nT_(C_k,E2) of S(C_k,E2) that are really C_k .

•The number of pixels nT_(C_k,E3) of S(C_k,E3) that are really C_k.

The columns 8-10 should contain:

•The number of pixels nFu_(C_k,E1) in (C_k,E1) that the decision fusion framework marks C_k.

•The number of pixels nFu_(C_k,E2) in (C_k,E2) that the decision fusion framework marks C_k.

•The number of pixels nFu_(C_k,E3) in (C_k,E3) that the decision fusion framework marks C_k.

The columns 11-13 should contain:

•The number of pixels nFuT_(C_k,E1) in (C_k,E1) that the decision fusion framework marks C_k and that are really C_k.

•The number of pixels nFuT_(C_k,E2) in (C_k,E2) that the decision fusion framework marks C_k and that are really C_k.

•The number of pixels nFuT_(C_k,E3) in (C_k,E3) that the decision fusion framework marks C_k and that are really C_k.

Reasonable the following facts should be usually observed:

•n_(C_k,E1) should be greater n_(C_k,E2) because the two classifiers should be generally in accordance.

•nT_(C_k,E1) should be usually just smaller than n_(C_k,E1) . When both classifiers provide the same class, it should be the right class. Indeed, this fact should be all the more true as the abundance and the probability chosen are much greater than the other ones.

•nFu_(C_k,E1) should be more or less equal to n_(C_k,E1). When the classifiers provide the same class, the decision fusion framework should confirm the previous choice.

•nFuT(C_k,E1) should be more or less equal to nT_(C_k,E1). When the classifiers provide the same class, the decision fusion framework should confirm the previous choice and it should be the right choice.

•Since also E3 event is analyzed, some anomalies could be highlighted. In particular nFu_(C_k,E3) could not be zero.

The comparison between n_(C_k,E2), nFu_(C_k,E2) and nFuT_(C_k,E2) should be particularly interesting. When the two classifiers produce discordant results, in fact, the decision fusion framework should resolve the conflict taking the right decision.

In my humble opinion, the proposed table should be help to understand the results obtained. For the sake of brevity, I take into consideration only University of Pavia dataset.

Class = [ 1 2 3 4 5 6 7 8 9]

Figure6-TableA_i,i = [22.40 66.60 32.10 85.60 83.10 30.30 47.50 20.80 88.10];

SunSAL = [33.04 68.95 60.59 84.61 95.43 46.80 48.80 36.74 98.61];

MLR = [50.72 67.17 70.93 88.95 97.81 55.85 80.75 61.60 95.52];

CRFL = [77.86 88.74 92.71 89.51 99.28 63.07 96.55 58.25 99.97];

•n_(C_2,E2) should be great and decision fusion framework produces relevant improvement.

•n_(C_4,E2) should be small so that decision fusion framework cannot work.

•The new table could play a key role to understand the results of k=8 class.

Further interesting consideration can be developed if the event E2 is divided in two different event: E2A when SunSal marks as C_k and E2B when MLR marks as C_k. Carrying out this analysis, it should be possible to assess which classifier plays a key-role in the decision fusion process. This analysis could be useful when the parameters are tuning.

*Author Response*

We thank the reviewers for their comments and suggestions for improvement. Below is a point-by-point reply to the reviewers. In the manuscript, all changes are denoted in red.

** **

**Reviewer 1**

1)MRF and CRF acronyms are introduced in the abstract but not in the body text.

We now have introduced the MRF and CRF acronyms in the body text.

2)Only the acronyms MRF and CRF are reported in the keywords.

We now have used the full names in the keywords.

3)In line 35, it not so evident that spatial-spectral expression refers to the concept previously exposed.

We now made this more clear.

4)The vector x is introduce in line 114 but the index n is introduced in line 137. I would like to suggest considering adopting an uppercase symbol to indicate number of pixel.

Since a lowercase symbol has been consistently used in the text as a pixel index, we prefer to keep this.

We now defined n as the number of pixels in line 114.

5)The symbol N_i in Eq. 1 is hastily introduced in line 122. Perhaps, this symbol should be bold since it represent a set of pixels.

We have adopted this suggestion throughout the manuscript.

6)The formulation of the second term of Eq.1 could puzzle the reader.

We now split the double sum into 2 separate sums. We hope that this reduces possible confusion.

7)Perhaps lines 137 – 140 could be moved at beginning of section 2.2.

We followed the suggestion.

8)I would like to propose some changes in the lines 133 – 136: "In this work, the MRFs and CRFs are used as decision fusion methods, by combining multiple decision sources in their energy functions. We propose the fusion of two decision sources. The first is the probability outputs from the Multinomial Logistic Regression classifier (MLR) [36], i.e. a supervised classification of the spectral reflectance values. The second source of information is produced by considering the sparse spectral unmixing method SunSAL proposed in [29] ". Lines 141 and 144 should be adjusted accordingly.

We have made these changes.

9)I would like to suggest considering fusing section 2.1.1 with 2.3 and section 2.1.2 with 2.4

We would like to explicitly make the distinction between using the graphical models as regularizers (2.11. and 2.1.2) and using them as decision fusion methods (2.3 and 2.4). In our opinion, merging these sections would weaken this statement.

10) Section 2.3 and 2.4 contain a very complex notation and the figures are not completely clear. I would like to encourage the authors to improve this part. The authors consider whether this part of the manuscript could be extended making it more independent from the references.

Given that the concepts of using MRF and CRF graphical models have been used before for decision fusion, we think it is only fair to properly cite these works. Also, the concept of cross links has been used before in a paper on the fusion of multispectral and Lidar data [reference 28]. In our paper, we apply the same concept for the fusion of the two decision sources from hyperspectral images, so in the text, we explicitly refer to that reference for more details. We tried to explain the procedure as clearly as possible. We do not believe that a thorough discussion on the optimization by graph-cut expansion or on the choice of a contrast sensitive Pots model would improve the readability of the manuscript, and prefer to refer to the proper literature for these.

We did include a complexity analysis of the methods in the revised version.

11) MRF_a, MRF_p, CRF_a, CRP_p are not clear. I can suppose, for instance, that MRF_a is the MRF applied to output of SunSAL.

The reviewer is right. We clarified this in the new version.

12) SVM classifier mentioned in Figure 6 could be added to the other approaches considered in the analysis of the performance of the proposed methods. Clearly, a different up-to-date classifier that does not adopt fusion approach would be equally suitable.

The reviewer is right. Since the MLR classifier is used in the fusion approaches (a SVM classifier with a soft classification output could be used as well), it seemed only logical to show that one in the tables.

Indeed, I was particularly impressed by the quality of Figures and in particular by Figure 6. I am curious to know which program was used to produce this figure. Furthermore, I would like to know if the authors took inspiration from papers already published or they independently developed the layout of this figure.

Figure 6 was produced using the standard Matlab built in method: **confusionmat** in a combination with the **plotConfMat** method from Vahe Tshitoyan (20/08/2017). His method can be downloaded from here:

https://github.com/vtshitoyan/plotConfMat/blob/master/plotConfMat.m

https://www.mathworks.com/matlabcentral/mlcdownloads/downloads/submissions/67631/versions/2/previews/plotConfMat.m/index.html

Each plot of Figure 5 should have the same point of view and the same axis orientation. In addition, the vertical scale should be the same. In Figure 7 and Figure 8, the legends can be omitted. It could be sufficient that labels in x-axis were bold i.e. more evident. In this perspective, also the colors can be removed.

We agree with the reviewer and adapted the figures accordingly. The plots in Figure 5 now have the same point of view and axis orientation. The boxplots of figure 7 and 9, the legends are omitted, and labels on the x-axis are in bold. We opted to retain colors. We realize that the original colors were not very distinguishable, therefore we improved the visualization by choosing a more distinguishable color palette following visualization color guidelines:

https://blog.graphiq.com/finding-the-right-color-palettes-for-data-visualizations-fcd4e707a283}{Right color palettes for data visualization}

https://www.perceptualedge.com/articles/b-eye/choosing_colors.pdf}{Choosing colors}

With these changes, each method has its own very distinct color, which facilitates locating the accuracies of each particular method with a very short glance at the plot, compared to when they all would be in black.

Furthermore, I would like to suggest checking ALL the bibliography because there are some problems. In particular:

1)In [25], an author is missing.

2)The authors of [27] are wrong.

3)In [44], the first name of the first author is also reported.

The references have been corrected, and the complete list has been checked and corrected.

Indeed, I have been really interested in this manuscript so that I would like to provide the most interesting idea that popped in my head when I read it. Indeed, I am glad when the reviewing process comes close to a friendly discussion among colleagues. I hope that the authors can find my suggestion interesting.

Thinking about the characteristics of the proposed method, I came to the conclusion that Figure 6, Table II and Table III are not completely effective to synthesize the results. In particular, Figure 6 gives a limited contribution to understand the behavior of the proposed solution. For this reason, I would like to suggest considering a new table. I hope that the HYPOTETHICAL considerations reported about the table can be confirmed by the real data. In any case, they should be useful as a base for the discussion. If this new table and the corresponding discussion were really interesting, they could be included in the revised version of the manuscript.

The decision fusion framework considers SunSAL and MLR. Applying a MAP classifier to the abundances and probabilities obtained by SunSAL and MLR respectively, the corresponding classifiers are defined. Therefore, for each class C_k of the classification map, for each pixel P(I,j), these three events can occur:

•Event E1: both classifiers (SunSAL and MLR) mark P(i,j) as C_k.

•Event E2: only one classifier marks P(i,j) as C_k.

•Event E3: none classifier marks P(i,j) as C_k.

For each C_k:

•S(C_k,E1) is the set of pixels of Event E1.

•S(C_k,E2) is the set of pixels of Event E2.

•S(C_k,E3) is the set of pixels of Event E3.

And

•n_(C_k,E1) is number of pixels in S(C_k,E1).

•n_(C_k,E2) is number of pixels in S(C_k,E2).

•n_(C_k,E3) is number of pixels in S(C_k,E3).

The first column of the new table should report the labels of the classification (C1, C2, …, C10 in Figure 6) map.

The columns 2-4 should report n_(C_k,E1), n_(C_k,E2) and n_(C_k,E3) values. Indeed, column 4 is not particularly important.

The columns 5-7 should contain:

•The number of pixels nT_(C_k,E1) of S(C_k,E1) that are really C_k.

•The number of pixels nT_(C_k,E2) of S(C_k,E2) that are really C_k .

•The number of pixels nT_(C_k,E3) of S(C_k,E3) that are really C_k.

The columns 8-10 should contain:

•The number of pixels nFu_(C_k,E1) in (C_k,E1) that the decision fusion framework marks C_k.

•The number of pixels nFu_(C_k,E2) in (C_k,E2) that the decision fusion framework marks C_k.

•The number of pixels nFu_(C_k,E3) in (C_k,E3) that the decision fusion framework marks C_k.

The columns 11-13 should contain:

•The number of pixels nFuT_(C_k,E1) in (C_k,E1) that the decision fusion framework marks C_k and that are really C_k.

•The number of pixels nFuT_(C_k,E2) in (C_k,E2) that the decision fusion framework marks C_k and that are really C_k.

•The number of pixels nFuT_(C_k,E3) in (C_k,E3) that the decision fusion framework marks C_k and that are really C_k.

Reasonable the following facts should be usually observed:

•n_(C_k,E1) should be greater n_(C_k,E2) because the two classifiers should be generally in accordance.

•nT_(C_k,E1) should be usually just smaller than n_(C_k,E1) . When both classifiers provide the same class, it should be the right class. Indeed, this fact should be all the more true as the abundance and the probability chosen are much greater than the other ones.

•nFu_(C_k,E1) should be more or less equal to n_(C_k,E1). When the classifiers provide the same class, the decision fusion framework should confirm the previous choice.

•nFuT(C_k,E1) should be more or less equal to nT_(C_k,E1). When the classifiers provide the same class, the decision fusion framework should confirm the previous choice and it should be the right choice.

•Since also E3 event is analyzed, some anomalies could be highlighted. In particular nFu_(C_k,E3) could not be zero.

The comparison between n_(C_k,E2), nFu_(C_k,E2) and nFuT_(C_k,E2) should be particularly interesting. When the two classifiers produce discordant results, in fact, the decision fusion framework should resolve the conflict taking the right decision.

In my humble opinion, the proposed table should be help to understand the results obtained. For the sake of brevity, I take into consideration only University of Pavia dataset.

Class = [ 1 2 3 4 5 6 7 8 9]

Figure6-TableA_i,i = [22.40 66.60 32.10 85.60 83.10 30.30 47.50 20.80 88.10];

SunSAL = [33.04 68.95 60.59 84.61 95.43 46.80 48.80 36.74 98.61];

MLR = [50.72 67.17 70.93 88.95 97.81 55.85 80.75 61.60 95.52];

CRFL = [77.86 88.74 92.71 89.51 99.28 63.07 96.55 58.25 99.97];

•n_(C_2,E2) should be great and decision fusion framework produces relevant improvement.

•n_(C_4,E2) should be small so that decision fusion framework cannot work.

•The new table could play a key role to understand the results of k=8 class.

Further interesting consideration can be developed if the event E2 is divided in two different event: E2A when SunSal marks as C_k and E2B when MLR marks as C_k. Carrying out this analysis, it should be possible to assess which classifier plays a key-role in the decision fusion process. This analysis could be useful when the parameters are tuning.

We appreciate the effort that the reviewer has taken to think along. As a matter of fact, we also have thought about other ways to present the experimental results. In the end, we decided to present the data in the standard way as is done in most of the literature, i.e. by listing classification accuracies and/or confusion matrices.

The method that is presented by the reviewer may indeed be an interesting way to validate the complementarity of both decision sources and the efficacy of their fusion.

There is however one important remark to make: as far as we can follow the reasoning of the reviewer, the conclusions from the reviewer would be correct when the proposed fusion methods would only base their decisions on the decisions made by the 2 sources. This is however not the case: the MRFL and CRFL methods perform a spatial regularization along with the decision fusion, and thus the decisions also rely on the decisions made on the neighbors! Because of this, some of the conclusions of the reviewer are not correct anymore, e.g. nFu_(C_k,E3) is not just representing some anomalies, but represents a large fraction of the decisions.

Another remark is that all these experiments are performed for very low numbers of training samples (10 per class). The results of figure 6 are based on one such experiment, for illustration purposes, while the results of the tables are averages over 100 independent runs. Notice the large standard deviations, which are unavoidable in case of low number of training samples. If the suggestion of the reviewer is done on one experiment, the results may be not representative, while averaged over 100 experiments, the effects that are envisaged may average out.

We did perform the counting based on one experiment, and the reviewer can find the results below. The reviewer will see that the numbers follow some of his conclusions while others do not. Because of this, the suggestion of the reviewer for a class-specific analysis based on these numbers is also not foolproof.

Anyhow, we decided to not include this analysis in the manuscript, since it would require quite some space to properly explain, and much more analysis and discussion. A comparison would e.g. be required between the methods MRF_a, MRF_p and MRLF, as to investigate the effect of the spatial regularization, but even then, this regularization will be done in different ways by all these methods.

**Pavia**** **Event counts (

**MRF)**:

class | E1 | E2 | E2a | E2b | E3 | nT_E1 | nT_E2 | nT_E3 | nF_E1 | nF_E2 | nF_E3 | nFuT_E1 | nFuT_E2 | nFuT_E3 |

1 | 885 | 4733 | 1640 | 3093 | 36978 | 696 | 3017 | 1917 | 725 | 2910 | 1486 | 650 | 2538 | 1261 |

2 | 9971 | 10526 | 5544 | 4982 | 22099 | 8615 | 7255 | 2105 | 9717 | 8206 | 1295 | 8571 | 6843 | 851 |

3 | 1135 | 3840 | 1304 | 2536 | 37621 | 757 | 824 | 228 | 953 | 1350 | 436 | 750 | 773 | 163 |

4 | 3563 | 2253 | 1578 | 675 | 36780 | 1639 | 370 | 50 | 3483 | 848 | 76 | 1628 | 318 | 17 |

5 | 1033 | 315 | 27 | 288 | 41248 | 921 | 200 | 9 | 1032 | 268 | 9 | 921 | 199 | 5 |

6 | 2163 | 10585 | 5540 | 5045 | 29848 | 1356 | 2384 | 1170 | 1249 | 1782 | 315 | 1131 | 1472 | 213 |

7 | 958 | 3327 | 2314 | 1013 | 38311 | 502 | 517 | 166 | 582 | 625 | 204 | 502 | 516 | 150 |

8 | 842 | 4419 | 1068 | 3351 | 37335 | 505 | 1639 | 684 | 684 | 2313 | 833 | 470 | 1422 | 384 |

9 | 936 | 2222 | 2095 | 127 | 39438 | 547 | 37 | 3 | 863 | 199 | 153 | 547 | 37 | 3 |

** **Event counts (

**CRF)**:

class | E1 | E2 | E2a | E2b | E3 | nT_E1 | nT_E2 | nT_E3 | nF_E1 | nF_E2 | nF_E3 | nFuT_E1 | nFuT_E2 | nFuT_E3 |

1 | 885 | 4733 | 1640 | 3093 | 36978 | 696 | 3017 | 1917 | 493 | 1708 | 1016 | 436 | 1473 | 860 |

2 | 9971 | 10526 | 5544 | 4982 | 22099 | 8615 | 7255 | 2105 | 9619 | 8123 | 1724 | 8451 | 6521 | 893 |

3 | 1135 | 3840 | 1304 | 2536 | 37621 | 757 | 824 | 228 | 926 | 1190 | 361 | 734 | 713 | 132 |

4 | 3563 | 2253 | 1578 | 675 | 36780 | 1639 | 370 | 50 | 3371 | 1056 | 274 | 1569 | 260 | 19 |

5 | 1033 | 315 | 27 | 288 | 41248 | 921 | 200 | 9 | 1021 | 255 | 12 | 915 | 191 | 6 |

6 | 2163 | 10585 | 5540 | 5045 | 29848 | 1356 | 2384 | 1170 | 1141 | 2165 | 874 | 1076 | 1355 | 195 |

7 | 958 | 3327 | 2314 | 1013 | 38311 | 502 | 517 | 166 | 711 | 1315 | 943 | 502 | 511 | 146 |

8 | 842 | 4419 | 1068 | 3351 | 37335 | 505 | 1639 | 684 | 430 | 1252 | 424 | 317 | 867 | 288 |

9 | 936 | 2222 | 2095 | 127 | 39438 | 547 | 37 | 3 | 904 | 602 | 686 | 547 | 37 | 3 |

**Pines**** **Event counts (

**MRF)**:

class | E1 | E2 | E2a | E2b | E3 | nT_E1 | nT_E2 | nT_E3 | nF_E1 | nF_E2 | nF_E3 | nFuT_E1 | nFuT_E2 | nFuT_E3 |

1 | 621 | 1505 | 950 | 555 | 7294 | 458 | 513 | 342 | 518 | 591 | 216 | 447 | 427 | 139 |

2 | 589 | 1635 | 677 | 958 | 7196 | 301 | 335 | 82 | 521 | 641 | 218 | 297 | 312 | 67 |

3 | 360 | 345 | 86 | 259 | 8715 | 321 | 87 | 19 | 324 | 84 | 5 | 319 | 72 | 2 |

4 | 532 | 241 | 96 | 145 | 8647 | 491 | 126 | 49 | 518 | 141 | 49 | 491 | 126 | 40 |

5 | 447 | 97 | 85 | 12 | 8876 | 432 | 8 | 0 | 447 | 12 | 2 | 432 | 8 | 0 |

6 | 390 | 930 | 598 | 332 | 8100 | 303 | 336 | 266 | 344 | 360 | 222 | 302 | 286 | 137 |

7 | 530 | 2178 | 687 | 1491 | 6712 | 458 | 1141 | 739 | 428 | 873 | 247 | 411 | 790 | 220 |

8 | 303 | 1219 | 794 | 425 | 7898 | 186 | 280 | 44 | 260 | 598 | 88 | 186 | 245 | 34 |

9 | 809 | 480 | 376 | 104 | 8131 | 757 | 339 | 71 | 798 | 363 | 66 | 754 | 322 | 59 |

10 | 154 | 740 | 336 | 404 | 8526 | 105 | 176 | 53 | 130 | 265 | 91 | 105 | 172 | 51 |

Event counts (**CRF)**:

class | E1 | E2 | E2a | E2b | E3 | nT_E1 | nT_E2 | nT_E3 | nF_E1 | nF_E2 | nF_E3 | nFuT_E1 | nFuT_E2 | nFuT_E3 |

1 | 621 | 1505 | 950 | 555 | 7294 | 458 | 513 | 342 | 527 | 615 | 241 | 456 | 461 | 167 |

2 | 589 | 1635 | 677 | 958 | 7196 | 301 | 335 | 82 | 440 | 524 | 155 | 301 | 313 | 77 |

3 | 360 | 345 | 86 | 259 | 8715 | 321 | 87 | 19 | 335 | 46 | 7 | 313 | 24 | 2 |

4 | 532 | 241 | 96 | 145 | 8647 | 491 | 126 | 49 | 523 | 133 | 46 | 491 | 117 | 38 |

5 | 447 | 97 | 85 | 12 | 8876 | 432 | 8 | 0 | 447 | 21 | 5 | 432 | 8 | 0 |

6 | 390 | 930 | 598 | 332 | 8100 | 303 | 336 | 266 | 357 | 372 | 217 | 303 | 291 | 146 |

7 | 530 | 2178 | 687 | 1491 | 6712 | 458 | 1141 | 739 | 449 | 996 | 345 | 430 | 877 | 298 |

8 | 303 | 1219 | 794 | 425 | 7898 | 186 | 280 | 44 | 259 | 578 | 63 | 186 | 255 | 35 |

9 | 809 | 480 | 376 | 104 | 8131 | 757 | 339 | 71 | 804 | 395 | 55 | 757 | 335 | 44 |

10 | 154 | 740 | 336 | 404 | 8526 | 105 | 176 | 53 | 121 | 270 | 74 | 101 | 148 | 26 |

** **

*Reviewer 2 Report*

The paper is well organized. A well planned set of experiments were conducted to verify the efficacy of the proposed method. One question: why the results presented in this paper are different from those in your 2018 IGARSS conference paper?

*Author Response*

We thank the reviewers for their comments and suggestions for improvement. Below is a point-by-point reply to the reviewers. In the manuscript, all changes are denoted in red.

**Reviewer 2**

The paper is well organized. A well planned set of experiments were conducted to verify the efficacy of the proposed method. One question: why the results presented in this paper are different from those in your 2018 IGARSS conference paper?

The reviewer has correctly noticed differences of the order of a few percent with the results of the IGARSS 2018 paper. There are a number of reasons to explain the differences. First of all, we performed new experiments. Given the small training sizes, the standard deviations on the results are always quite large. Second, in the IGARSS paper, we used different unary potentials (given by –alpha and –p rather than –ln(alpha) and –ln (p). For some reason, they gave slightly better results, but it was hard to justify their use (other than referring to a paper that did the same). In this manuscript, we choose to use the unary terms properly, for both the proposed methods and the methods we compared with. As a result, all accuracies went down with a few percent. Finally, in the IGARSS paper, we used fixed values for the parameters lambda, beta and gamma, while in this manuscript, we performed a grid search.

*Reviewer 3 Report*

The work presented here is about supervised pixel classification in hyperspectral remote sensing under small training size scenarios (10 pixels per class). Authors propose decision fusion frameworks based on Markov and Conditional Random Fields with cross Links (MRFL and CRFL, respectively). These frameworks use two decision sources: (i) fractional abundances from sparse spectral unmixing (SunSAL), and (ii) probability outputs from a supervised classifier (Multinomial Logistic Regression - MLR), where both spatial and spectral information are employed. Additionally, these frameworks can be extended to a third decision source, being quite flexible.

Experiments use two well-known datasets: Indian Pines and Pavia University. The proposed MRFL and CRFL methods are compared to other approaches: SunSAL, MLR, Linear Combination (LC), a couple of MRFG variants from literature (MRFG_a and MRFG), and MRF/CRF applied to abundances or probabilities as a single source (MRF_a, MRF_p, CRF_a and CRF_p). Comparison is made in terms of Overall Accuracy (OA), Average Accuracy (AA), class-wise accuracy and kappa coefficient, including visualization maps.

Further analysis on (i) Beta and Alpha effects, (ii) slightly different sources employed and (iii) using three different decision sources, the original two plus a new one (probabilities derived from morphological features) are also included.

In general terms, I think this is a good paper, well written, well structured, and holding some merit. Before publication, some issues would need to be addressed:

1) It would be great to have some analysis or at least comments on the computational complexity of the algorithms. How complex are they? Resources needed? Execution time? In other words, what is the cost of improving the classification accuracy?

2) The authors made it quite clear during the abstract and introduction about using very limited training data. However, in the analysis, only one training size is used (10 samples per class). Does that mean the proposed methods work not so good in comparison with the rest under larger training sizes? If so, where are the boundaries for this? 20 samples per class? 50?

3) Lines 275-278: ‘One can clearly notice that there is 275 more confusion between SunSAL and the MLR classifier than between the MLR and SVM classifiers, 276 indicating that the abundances are more complementary to the MLR probabilities than the SVM class 277 probabilities.’ I’m not sure about the meaning of this sentence. Could you please re-write/explain?

4) Lines 315 and 332: Are these subsection captions? They are not correctly formatted.

5) Section 3 caption is all in capitals. It should be corrected.

*Author Response*

We thank the reviewers for their comments and suggestions for improvement. Below is a point-by-point reply to the reviewers. In the manuscript, all changes are denoted in red.

**Reviewer 3**

The work presented here is about supervised pixel classification in hyperspectral remote sensing under small training size scenarios (10 pixels per class). Authors propose decision fusion frameworks based on Markov and Conditional Random Fields with cross Links (MRFL and CRFL, respectively). These frameworks use two decision sources: (i) fractional abundances from sparse spectral unmixing (SunSAL), and (ii) probability outputs from a supervised classifier (Multinomial Logistic Regression - MLR), where both spatial and spectral information are employed. Additionally, these frameworks can be extended to a third decision source, being quite flexible.

Experiments use two well-known datasets: Indian Pines and Pavia University. The proposed MRFL and CRFL methods are compared to other approaches: SunSAL, MLR, Linear Combination (LC), a couple of MRFG variants from literature (MRFG_a and MRFG), and MRF/CRF applied to abundances or probabilities as a single source (MRF_a, MRF_p, CRF_a and CRF_p). Comparison is made in terms of Overall Accuracy (OA), Average Accuracy (AA), class-wise accuracy and kappa coefficient, including visualization maps.

Further analysis on (i) Beta and Alpha effects, (ii) slightly different sources employed and (iii) using three different decision sources, the original two plus a new one (probabilities derived from morphological features) are also included.

In general terms, I think this is a good paper, well written, well structured, and holding some merit. Before publication, some issues would need to be addressed:

It would be great to have some analysis or at least comments on the computational complexity of the algorithms. How complex are they? Resources needed? Execution time? In other words, what is the cost of improving the classification accuracy?

We have included the following paragraph on the computational complexity of the proposed algorithms:

“Our proposed method uses the graph-cut $\alpha$ - expansion algorithm [38-41], which has a worst case complexity of O(mn^2|P|) for a single optimization problem where m denotes the number of edges, n denotes the number of nodes in the graph and |P| denotes the cost of the minimum cut. Thus, the theoretical computational complexity of our proposed method is: O(kCmn^2|P|), with k the upper bound of the number of iterations and C the number of classes. With a non-cautious addition of edges in the graph, for instance adding a cross link between each node and all other nodes from the second source, there would be a quadratic increase in the computational complexity.

On the other hand, the empirical complexity in real scenarios has been shown to be between linear and quadratic w.r.t. the graph size \cite{empComplexity}.”

@ARTICLE{empComplexity,

author = {Yuri Boykov and Vladimir Kolmogorov},

title = {An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision},

journal = {IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE},

year = {2001},

volume = {26},

pages = {359--374}}

We also included some words on the actual execution times of the experiments.

“The experiments were run on a PC with Intel i7-6700K and 32GB RAM. The execution time for one run with fixed parameters was in the order of a second for the MRFL and a minute for the CRFL. When performing grid search and averaging over 100 runs, we run the experiments on the UAntwerpen HPC (CalcUA Super-computing facility) having nodes with 128GB and 256GB RAM and 2.4GHz 14-core Broadwell CPUs, on which the different runs were distributed, leading to speedups with a factor of 10-50.”

2) The authors made it quite clear during the abstract and introduction about using very limited training data. However, in the analysis, only one training size is used (10 samples per class). Does that mean the proposed methods work not so good in comparison with the rest under larger training sizes? If so, where are the boundaries for this? 20 samples per class? 50?

This is a good question. We have been running experiments with increasing training sample sizes and we noticed that the differences between the different fusion methods became smaller. This indicates that the advantages of the proposed method level out for larger training sizes. We decided not to include these experiments since it would make the manuscript more complicated. However, in the new version, we have included this as a remark in the discussion.

3) Lines 275-278: ‘One can clearly notice that there is 275 more confusion between SunSAL and the MLR classifier than between the MLR and SVM classifiers, 276 indicating that the abundances are more complementary to the MLR probabilities than the SVM class 277 probabilities.’ I’m not sure about the meaning of this sentence. Could you please re-write/explain?

This part has been rewritten as:

“One can clearly notice that there is a higher spread in the confusion matrices of SunSAL versus MLR than in the ones of SVM versus MLR. This indicates that SunSAL and MLR disagree more than MLR and SVM do, and that the abundances provide more complementary information to the MLR probabilities than that the SVM class probabilities do. This makes the abundances a good candidate decision source in a decision fusion approach.”

4) Lines 315 and 332: Are these subsection captions? They are not correctly formatted.

This has been corrected.

5) Section 3 caption is all in capitals. It should be corrected.

This has been corrected.

**Round 2**

*Reviewer 1 Report*

Dear authors,

I was a little bit crestfallen that my suggestion was not included in the revised version of the manuscript. After spending a lot of time trying to improve a manuscript without being author, this result certainly is not particularly gratifying.

Thinking about it, I am glad I tried. I had an idea that I considered interesting and, positively and generously, I proposed it to the authors. Therefore, I would like to thank the authors for carrying out the experimentations. I hope that they have found some interesting elements in my proposal, anyway.

Unfortunately, since the journal gives only three days for providing this review, I am obliged to postpone later the exam of the tables provided.

The manuscript is certainly ready to be published. My only recommendation, left to the goodwill of the authors, is about including the reference about the routine plotConfMat developed by Vahe Tshitoyan [ref1] in the bibliography of the paper. In this way, the author of the routine will deserve due recognition and the manuscript will implicitly suggest how producing figures so effective. By the way, [ref1] is different from the link provided by the authors due to this last one did not work.

[ref1] https://www.mathworks.com/matlabcentral/fileexchange/64185-plot-confusion-matrix

*Reviewer 3 Report*

I’m satisfied with the changes introduced by the authors and, therefore, I think the paper can be published if the editors agree.