Dual-Branch Network for Blind Quality Assessment of Stereoscopic Omnidirectional Images: A Spherical and Perceptual Feature Integration Approach
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper proposes a novel deep learning framework for stereoscopic omnidirectional images (SOIs) by integrating spherical and perceptual features. the paper is well written with detailed discussion on results. the authors need to consider the following modifications:
- the abstract must be rewritten to motivate readers to further read the paper
- details about the design architecture must be summarized. detailed explanation of architecture should be in the architecture section.
- a summary about results should be added to abstract.
- figure 2 looks irrelevant.
- There is a typo in Tables 3,4,5,6 label.
- add more results in conclusions section and add conclusions as points.
- a nomenclature is needed as there is a lot of abbreviations in text.
- the authors should use two tables instead of table 6 one for PLCC and another for SROCC.
- more detailed comparison with previous work in literature is needed.
- Figure 5 needs more detailed explanation about its functionality.
-
Author Response
The detail response can be found in the attached files.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsAuthors present approach to no-reference quality assessment of stereoscopic images, based on a dual-branch neural network architecture that integrates spherical and perceptual information. Spherical convolution is employed to accurately preserve the spatial structure of the image, while a binocular difference module, utilizing discrete wavelet transform, is introduced to precisely extract depth-related features. Additionally, the authors design a mechanism for combining global and local features, enabling a more comprehensive analysis of visual content. However, there are a few important issues for which I did not find answers and which should be addressed in the text of the article:
a. I my opinion, Related Works section does not include studies that incorporate visual attention mechanisms, such as eye-tracking or saliency maps, which play a crucial role in image perception within VR environments. It also overlooks important research on video quality assessment, resulting in a lack of reference to temporal aspects. Additionally, there is no review of methods that offer adaptability to different types of image distortions. Furthermore, the discussion omits perceptual asymmetry and the impact of selective attention on the perceived quality of spherical content.
b. I think it requires further explanation the limitation of the described method, which lies in the fact that the algorithm operates on statically defined viewports, disregarding actual information about the user's gaze direction. Incorporating eye-tracking data could significantly improve the alignment of the quality assessment with real visual perception.
c. Moreover, described method demonstrates strong performance across various datasets, but its architecture lacks a mechanism for automatic adaptation to specific types of distortions.
d. A significant constraint, in my view, is that a fixed number of six viewports is used during preprocessing, which may fail to capture the diverse spatial perception patterns of individual users.
e. I noticed that described algorithm was designed for individual SOI frames, without incorporating mechanisms to account for temporal consistency or perceptual changes across video sequences.
f. That is my observation: nothing is known about how the proposed method handles unknown or mixed types of distortions that were not present in the training data.
g. Furthermore, some tables are labeled with a generic placeholder such as <This is a table> which suggests incomplete editorial work or temporary captions. They should be properly numbered e.g. Table 1, Table 2 and accompanied by descriptive, informative titles. In figures where colors are used for instance, to indicate data flow or different branches of the network thus a legend or brief explanation should be provided to clarify the meaning of each color.
Author Response
The detail response can be found in the attached file.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis paper presents a dual-branch deep learning framework for assessing the quality of stereoscopic omnidirectional images (SOIs). It combines spherical convolution for spatial preservation, binocular feature extraction for depth perception, and a global-local integration module to enhance accuracy by capturing both global structures and local details.
- Including the results measured by the performance metrics (e.g., PLCC, SROCC, RMSE) in the abstract would strengthen the evidence of superiority.
- The authors should include essential details concerning model architecture, training procedures, and design choices and the values used for model parameters
- Hyperparameters significantly affect deep learning models but are only briefly mentioned, therefore the authors need to Quantify the impact of dropout rate and learning rate choices
- Potential limitations of the model should be acknowledged
Comments for author File: Comments.pdf
Author Response
The detail response can be found in the attached file.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI conclude that the authors have addressed the submitted comments in a satisfactory manner, implementing significant revisions both in terms of content and editorial quality.
Although not all issues were fully resolved, with some treated as suggestions for future work,
I consider the extent of the changes sufficient and recommend the article for publication.