Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Multimodal Data Fusion for Depression Detection Approach

Computation 2025, 13(1), 9; https://doi.org/10.3390/computation13010009

by Mariia Nykoniuk¹, Oleh Basystiuk^1,*

, Nataliya Shakhovska^1,2

and Nataliia Melnykova¹

Reviewer 1: Anonymous

Reviewer 2:

Tina Tomazic

Computation 2025, 13(1), 9; https://doi.org/10.3390/computation13010009

Submission received: 28 November 2024 / Revised: 22 December 2024 / Accepted: 25 December 2024 / Published: 2 January 2025

(This article belongs to the Special Issue Artificial Intelligence Applications in Public Health: 2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents two multimodal information fusion networks for detecting depression: early fusion and late fusion. These networks leverage CNN layers, Bi-LSTM, and a self-attention mechanism to integrate audio and text data from the DAIC-WOZ and EDAIC-WOZ datasets. The early fusion network achieved higher classification accuracy, suggesting the efficacy of combining data modalities at an early stage in the model architecture.

Strengths:

1. The paper's strength is its use of multimodal data (audio and text), which captures verbal and nonverbal indicators of depression, enhancing the model's ability to detect subtle signs.

2. The paper effectively compares early and late fusion models, offering insights into the advantages of each approach. The early fusion model's success underscores the importance of modality integration during feature extraction.

Drawbacks:

1. The training accuracy reaches over 95%, while the validation accuracy is slightly above 90%. This indicates possible overfitting, especially given the dataset's relatively small size and the models' high complexity. The fluctuations in the late fusion model's accuracy and loss plots further indicate overfitting or instability issues.

2. Although the paper mentions addressing the class imbalance, the techniques described (e.g., data augmentation) might not fully resolve this problem, especially given the class imbalance in the original DAIC-WOZ and extended EDAIC-WOZ datasets. The late fusion model's poorer performance than the early fusion model might indicate that imbalance issues were not adequately mitigated.

Recommendations:

1. Including metrics such as AUC, MCC, and sensitivity-specificity trade-offs would provide a deeper understanding of the model's performance, especially in detecting true cases of depression without misclassification.

2. Add a discussion section focusing on practical implementation, ethical challenges, and privacy concerns. Depression detection models must respect patient privacy and avoid stigmatization, so explaining how these issues will be managed in practice is crucial.

3. The paper hints at an attention mechanism, but a full transformer-based approach could capture cross-modal interactions more effectively. Consider leveraging a transformer model to improve the simultaneous handling of text and audio inputs.

Author Response

Comments 1: Including metrics such as AUC, MCC, and sensitivity-specificity trade-offs would provide a deeper understanding of the model's performance, especially in detecting true cases of depression without misclassification.
Response 1: Thanks a lot for your suggestion; We added AUC and MCC metrics input in Table 1, Table 2, and Table 3; not sure if we need to specify separate sensitivity-specificity trade-offs, as in our case, it's actually Recall for Depression and Recall for Non-Depression cases.

Comments 2: Add a discussion section focusing on practical implementation, ethical challenges, and privacy concerns. Depression detection models must respect patient privacy and avoid stigmatization, so explaining how these issues will be managed in practice is crucial.
Response 2: Thanks a lot for your suggestion; we reinforced the discussion section with privacy concerns and described the system's practical utilization more clearly.

Comments 3: The paper hints at an attention mechanism, but a full transformer-based approach could capture cross-modal interactions more effectively. Consider leveraging a transformer model to improve the simultaneous handling of text and audio inputs.
Response 3: Thank you very much for your suggestion; we do consider this suggestion to be important for our research, but we decided even earlier to separate the study of multimodal transformer molecules into a separate paper. And in this study, we concentrated more on classification models for determining the patient's condition.
Multimodal transformer models that will process simultaneous text and audio input should be separated into a separate paper with a detailed discussion of the results and future development prospects. We are at the final line with this research paper and will shortly present outcomes in new research paper.

Reviewer 2 Report

Comments and Suggestions for Authors

The paper contains sufficiently new and suitable information and adheres to the journal’s standards. The topic and level of formality are appropriate for the journal`s readership. Its style and readability are suitable. A lot of information is given throughout the article, but I would suggest revising the paper (major revision).

The article doesn’t demonstrate an adequate understanding of the relevant literature in the field and doesn’t cite an appropriate range of literature sources. References, resource material, and literature are poor. I suggest supplementing the Related Work chapter. Above all, the newest literature from the field should be used. I suggest adding some new appropriate works and supplementing the relevant (old) literature review. In the list of References, we can’t see the publishing year.

Figure 1- should be better explained in more detail.

The methodological concept is clear, and the selected methodology is scientifically appropriate. But I suggest explaining the Data Processing.

Further, I recommend rewriting the Conclusions and Discussion. The concluding remarks should be more specific and better explained. I miss some more implications! I would suggest strengthening the whole text here by pointing out what are the main scientific contributions of your paper. The future directions are presented appropriately.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Comments 1: The article doesn’t demonstrate an adequate understanding of the relevant literature in the field and doesn’t cite an appropriate range of literature sources.
Response 1: Thanks a lot; we an adequate understanding of the relevant literature in the field and doesn’t cite an appropriate range of literature sources in lines 70-104, 107-115.

Comments 2: I suggest adding some new appropriate works and supplementing the relevant (old) literature review.
Response 2: Thanks a lot; we reorganized references with new appropriate works and supplementing the relevant in literature review.

Comments 3: In the list of References, we can’t see the publishing year.
Response 3: Thanks a lot; we reorganized references with publishing year and retrieved dataset time.

Comments 4: Figure 1- should be better explained in more detail.
Response 4: Thanks a lot; we reinforced the Figure 1 with better explained in more detail lines 167-174.

Comments 5: I suggest explaining the Data Processing.
Response 5: Thanks a lot; we have a separate section 3.2 Data Processing which explain and overview data processing.

Comments 6: Further, I recommend rewriting the Conclusions and Discussion. The concluding remarks should be more specific and better explained. I miss some more implications!
Response 6: Thanks a lot; we reinforced the conclusion and discussion section with implications lines 544-553 and 574-588.

Comments 7: I would suggest strengthening the whole text here by pointing out what are the main scientific contributions of your paper.
Response 7: Thanks a lot; we reinforced the introduction section with contribution lines 53-64.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The review is ok.

Article Menu

Multimodal Data Fusion for Depression Detection Approach

Further Information

Guidelines

MDPI Initiatives

Follow MDPI