Exploration of Data Fusion Strategies Using Principal Component Analysis and Multiple Factor Analysis
Round 1
Reviewer 1 Report
Overview:
The present study evaluated the efficacy of unsupervised methods for generating data fusion models with proper chemical and sensorial data integration applicable for authenticity studies.
Overall, the paper is well written and the methods are carefully described. The objectives and the design of the study are clearly stated and identified in the Introduction section. The paper is well structured throughout, and the research design is appropriate.
With regard to the presentation of the results, I found the manuscript a bit difficult to navigate through. Nonetheless, displaying comprehensive workflow accompanied by detailed steps for data curation throughout, the topic of the paper is relevant and of interest for the readers of the journal.
Given the assessment of the differences between the PCA and the MFA unsupervised analyses when handling large, but most importantly, different types of data (chemical and sensorial), the present study provides a valuable tool which merits further exploration for authenticity purposes. In addition, model comparison and efficiency (including redundancies) are well discussed and provide inferences to support the findings on data fusion suitability. Furthermore, I believe that the approach presented by the authors could represent a valuable tool for the identification of novel fingerprinting markers based on associations between various attributes/descriptors supplied through data fusion.
I recommend undergoing minor revision.
Minor comments:
In the present study, the authors refer to the sensorial data sets of qualitative nature; as such, the descriptors (sample attributes) collected as ratings were further converted to frequencies, standardized and used for data fusion. Would the approach presented here be appropriate for the integration of E-Nose and/or E-Tongue sensorial datasets?
If the study focused on the stability of wines at various temperatures and different time periods, and the multivariate analysis was performed separately for each winery and each cultivar, how would this approach account for the effect of time pertaining wine stability?
Line 284: PDB and KZC data sets abbreviations appear for the first time and need explanation. Are these wineries? This information also needs to be included in the table captions.
Tables A1-A5: I believe they should be presented as Supplementary information and given as Table S1-S5.
Author Response
As long as the data from the E-nose/E-tongue are captured and statistically treated appropriately, the authors see no problem in applying the same fusion strategy. One of the purposes of our work was the evaluation of a certain strategy that could be adapted and applied in various data modalities.
The issue of time and sample stability was described more in depth in the previous paper 'A multivariate approach to evaluating chemical and sensorial evolution ...'. In this paper we focused rather on modelling on the data fusion using as application these data sets.
The coding of the samples is now included in the M&M section. L179: The sample sets are identified by thee letters corresponding to each winery (i.e., AVN, CDB, DTK, FRV, KZC, PDB).
Changed to Supplementary (Tables S1-S5).
Reviewer 2 Report
very nice paper
Author Response
n/a