Accessible Interface for Museum Geological Exhibitions: PETRA—A Gesture-Controlled Experience of Three-Dimensional Rocks and Minerals
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe research established a gesture-controlled experience of 3D rocks & minerals. The findings provide more insight in an open-source, gesture-controlled system (PETRA). The conclusions are solid and convincing, and the manuscript is of higher quality. Although this work is of museum geological exhibitions, educational and teaching applications interest in the field, there are some concerns that could be addressed in the next version. The review comments are as follows:
- While the innovation is notable, the distinctions in technical principles (e.g., MediaPipe integration, gesture recognition algorithms) from existing solutions (e.g., Leap Motion/Kinect alternatives) should be further accentuated through comparative analysis in Section 2.3.
- Incorporate quantitative metrics (e.g., latency, accuracy rates) when benchmarking against similar gesture interaction systems (e.g., museum-based NUIs) to explicitly demonstrate technical superiority.
- Figure 1 (System Architecture Diagram) currently lacks detail; recommend adding data flow annotations (e.g., "Gesture Data → Processing → Rendered Feedback") to enhance clarity.
- Figure 4a (TouchDesigner Node Graph) requires annotations for key nodes to facilitate understanding among non-specialist readers.
- When framing PETRA as an "evolution of traditional museum kiosks" in Section 4.2, integrate museum learning theories (e.g., Falk & Dierking's Contextual Model) to strengthen theoretical grounding.
- The qualitative observations in Section 3.2 would benefit from supplementary quantitative metrics (e.g., average interaction duration, error trigger rate) to triangulate findings.
- The mention of "exploring full-body pose detection" in Section 4.4 could be contextualized within educational use cases (e.g., children’s immersive learning environments) to highlight practical relevance.
- As PETRA is defined as the project’s full name (not an acronym), ensure consistent usage throughout the manuscript, avoiding inconsistent abbreviation practices.
- Some sentences exhibit excessive complexity, potentially impeding readability. A comprehensive language edit is advised to prioritize conciseness and accessibility for diverse audiences.
- Some literature formats are not standardized, it is recommended to make modifications, such as Ref.7, Ref.8, Ref.11, Ref.23 …
Author Response
Dear colleague,
Dear reviewer,
Thank you for your insightful and thorough review. I sincerely appreciate your high-level feedback, which has been invaluable for strengthening the paper’s theoretical grounding and overall rigor. My detailed responses to your comments are below.
Comments 1: While the innovation is notable, the distinctions in technical principles (e.g., MediaPipe integration, gesture recognition algorithms) from existing solutions (e.g., Leap Motion/Kinect alternatives) should be further accentuated through comparative analysis in Section 2.3.
Response 1: Thank you for this excellent suggestion. I agree that a more direct technical comparison is beneficial. I have added a new paragraph to Section 2.3 that explicitly compares the technical principles of PETRA's vision-based approach (MediaPipe) with sensor-based systems like Kinect (structured light/time-of-flight) and Leap Motion (infrared stereo cameras), highlighting the differences in hardware dependency and underlying tracking methodology. This can be found in the updated Section 2.3 (Software implementation).
[Updated text added to the end of Section 2.3]
This software-driven approach distinguishes PETRA from earlier gesture-based systems that rely on specialized hardware. While sensors like the Kinect use structured light or time-of-flight cameras to generate a depth map of the environment, and Leap Motion uses infrared stereo cameras to specifically model hands, MediaPipe achieves robust tracking through machine learning models applied to a standard 2D webcam feed. This reliance on software rather than specialized hardware is a core tenet of PETRA's accessibility.
Comments 2: Incorporate quantitative metrics (e.g., latency, accuracy rates) when benchmarking against similar gesture interaction systems (e.g., museum-based NUIs) to explicitly demonstrate technical superiority.
Response 2: Thank you for pointing out the value of quantitative benchmarking. I agree that this is an important area for research. However, as the current study was an initial, qualitative case study focused on real-world feasibility and user reception, a formal quantitative benchmark against other systems was outside its scope. I have now explicitly acknowledged this as a key limitation and a primary direction for future work. This has been added to the "Evaluation Limitations" bullet point in Section 4.3 (Benefits and Limitations) and the "Future Work" paragraph in Section 4.4.
[Updated text in Section 4.3, Evaluation Limitations]
The findings from this case study are qualitative, based on direct observation of “in-the-wild” user interactions. While this provides valuable initial insights, a formal quantitative evaluation was not performed. A crucial direction for future work is to conduct controlled studies that not only use validated instruments like the System Usability Scale (SUS), but also benchmark PETRA's technical performance (e.g., gesture accuracy rates, system latency) against other interactive systems. This would provide the rigorous data needed to explicitly validate the system's effectiveness and potential technical advantages [1,30].
Comments 3: Figure 1 (System Architecture Diagram) currently lacks detail; recommend adding data flow annotations (e.g., "Gesture Data → Processing → Rendered Feedback") to enhance clarity.
Response 3: I agree. The figure has been updated to be more detailed, and the caption has been revised to explicitly describe the data flow in a step-by-step manner.
Comments 4: Figure 4a (TouchDesigner Node Graph) requires annotations for key nodes to facilitate understanding among non-specialist readers.
Response 4: Thank you for this excellent point. To improve clarity for a broader audience, I have now added annotations directly onto Figure 4a, labeling the key functional areas (i.e., "Camera Input", "Scene setup and Interaction logic", "3D Models & Audio", "UI setup", and "Final Output"). The figure caption has also been updated to reflect these additions.
[Updated Figure 4 caption]
... (a) The complete project network within the TouchDesigner development environment, illustrating the node-based logic and the color-coded organization described in the methods, where 1) is the main web camera input node; 2) the group nodes with scene setup and interaction logic; 3) the group nodes with 3D models and audio playlist; 4) user interface setup; and 5) the final output nodes. ...
Comments 5: When framing PETRA as an "evolution of traditional museum kiosks" in Section 4.2, integrate museum learning theories (e.g., Falk & Dierking's Contextual Model) to strengthen theoretical grounding.
Response 5: This is a constructive suggestion to strengthen the theoretical foundation. I have now integrated this concept into Section 4.2 (Contribution and Context), discussing how PETRA's open, social format supports learning across the personal, sociocultural, and physical contexts described by Falk & Dierking (2016).
[Updated text added to Section 4.2]
...the large-screen, public-facing nature of PETRA was observed to encourage collaborative use, offering a potential solution to this long-standing challenge in museum-based HMI (Shehade & Stylianou-Lambert, 2020). This fosters a shared experience that aligns with established museum learning theories, such as Falk & Dierking's Contextual Model of Learning, which emphasizes the importance of the sociocultural context in shaping a visitor's understanding (Falk and Dierking, 2016).
Comments 6: The qualitative observations in Section 3.2 would benefit from supplementary quantitative metrics (e.g., average interaction duration, error trigger rate) to triangulate findings.
Response 6: Thank you. I agree that supplementary quantitative metrics would be valuable. As this was an "in-the-wild" observational study, I was not able to formally collect this data. I have acknowledged this as a limitation of the current study and have added the need for collecting such metrics as a key component of the formal user studies proposed in the "Future Work" section (Section 4.4).
Comments 7: The mention of "exploring full-body pose detection" in Section 4.4 could be contextualized within educational use cases (e.g., children’s immersive learning environments) to highlight practical relevance.
Response 7: I agree. To highlight the practical relevance, I have revised that sentence in Section 4.4 to explicitly link full-body pose detection to immersive and playful learning environments for younger audiences.
[Updated text in Section 4.4]
...Another promising direction is exploring full-body pose detection to lay the groundwork for more advanced and embodied interactions, such as creating immersive learning games or animated avatars for younger visitors.
Response 8: Thank you for highlighting the need for consistency. I have reviewed the entire manuscript and ensured that "PETRA" is used consistently as a proper name. To further clarify this for the reader and provide context for the name, I have also added a sentence in the Introduction (Section 1) explaining its etymology.
[Updated text added to the manuscript in Section 1]
Addressing this need for accessible and engaging museum experiences, this paper introduces PETRA: an accessible interface for museum geological exhibitions. The name PETRA, meaning rock in both Greek and Latin, was chosen to reflect the project's focus on geological heritage. The system utilizes 3D models of geological specimens as its core content and allows visitors to explore them through intuitive, webcam-based gesture control.
Comments 9: Some sentences exhibit excessive complexity, potentially impeding readability. A comprehensive language edit is advised to prioritize conciseness and accessibility for diverse audiences.
Response 9: I appreciate this feedback. The entire manuscript has undergone a comprehensive language and style edit to improve clarity, simplify complex sentences, and enhance readability for a broad, interdisciplinary audience.
Comments 10: Some literature formats are not standardized, it is recommended to make modifications, such as Ref.7, Ref.8, Ref.11, Ref.23 …
Response 10: Thank you. I have thoroughly reviewed the entire reference list and corrected all entries to ensure they are complete and strictly adhere to the journal's formatting standards.
Reviewer 2 Report
Comments and Suggestions for AuthorsPeer Review Report
Manuscript Title: Accessible interface for museum geological exhibitions: PETRA – a gesture-controlled experience of 3D rocks & minerals
Recommendation: Minor Revision
General Comment
This is a well-written and highly relevant manuscript that introduces a novel, accessible gesture-based system for interacting with 3D geological specimens in museum and educational settings. The integration of MediaPipe and TouchDesigner, combined with low-cost hardware, aligns perfectly with the goals of digital mineralogy and the emerging Mineralogy 4.0 paradigm. The real-world implementation during a public event gives strong practical value to the research. However, to maximise its clarity, usability, and scientific impact, a number of revisions—mostly minor—are suggested below. These have been separated into issues of form and content, with references to the relevant lines in the manuscript.
I. Form
- Line 11
Replace “engaging solutions” with “more engaging and inclusive solutions” to emphasise accessibility and universal design.
- Lines 203–204
The sentence describing the "pinch gesture" control is overly long. Consider breaking it into two separate sentences for better readability.
- Line 392
The word “framework” is repeated. Reword the sentence to avoid redundancy.
- Line 300
“Metaphorical nature” could be made clearer as “intuitive metaphorical mapping”, which better reflects the human-object cognitive analogy.
- Lines 179–216
Add visual arrows or motion indicators to illustrate finger movements more clearly.
- Lines 253–256
Enrich the figure caption by describing the actions shown (e.g., “Child using pinch gesture to rotate the 3D model”).
- Multiple Headings
Standardise the capitalisation in all section and subsection titles for consistency.
- Lines 418–527
Ensure all DOI links are correctly hyperlinked and uniformly formatted in the references.
- Lines 150–157
In the licensing discussion, explicitly mention how the open licence facilitates access for low-resource educational institutions.
- Lines 161–165
Consider adding a figure or inset that visually labels the colour-coded functional zones in the TouchDesigner node layout.
II. Content
- Lines 347–350
The evaluation is entirely qualitative. Adding basic metrics like average interaction time or model switch frequency would strengthen the results.
- Lines 266–273 and 347–350
Consider the inclusion of validated usability tools (e.g., SUS or NASA-TLX) to quantify user experience in future work.
- Lines 197–200
Clarify how the gesture thresholds were chosen—through testing, empirical adjustment, or estimation?
- Line 339
Since lighting affects hand tracking, recommend minimum lighting levels (e.g., in lux or practical conditions) to guide replication.
- Lines 344–346
The lack of gesture-based access to mineral metadata is a limitation. A suggestion: use a simple additional gesture to trigger this function.
- Lines 301–326
A comparison table between PETRA, Kinect, Leap Motion, and VR systems (cost, complexity, hygiene, etc.) would improve contextualisation.
- Lines 122–129
Add minimum system requirements (processor, RAM, webcam resolution) to help others replicate or adopt the system.
- Lines 351–372
The authors should briefly discuss PETRA’s potential use in remote or hybrid teaching environments (e.g., online geology labs).
- Lines 363–373
While the potential in other domains is mentioned, one concrete example (e.g., protein structure exploration in biology) would improve the argument.
- Lines 374–387
Suggest exploring ML-based gesture adaptation to personalise the interface to different users and enhance accessibility.
Author Response
Dear colleague,
Dear reviewer,
Thank you for your exceptionally detailed and constructive review. I am very grateful for your meticulous suggestions on both the form and content, which have been instrumental in polishing the manuscript to a much higher standard. Please find my detailed responses below.
For the FORM section:
Comments 1: Line 11 Replace “engaging solutions” with “more engaging and inclusive solutions” to emphasise accessibility and universal design.
Response 1: Agree. I have made this change to better emphasize the project's goals of accessibility and universal design. This is updated in the Abstract.
Comments 2: Lines 203–204 The sentence describing the "pinch gesture" control is overly long. Consider breaking it into two separate sentences for better readability.
Response 2: Agree. For better readability, I have broken the long sentence describing the pinch gesture's technical definition into three separate, clearer sentences in Section 2.4.
[Updated text in Section 2.4]
... This single gesture enables simultaneous control over both rotation and zoom. Moving the pinched hand left, right, up, or down rotates the 3D model around its axes. At the same time, varying the distance between the pinched thumb and index finger smoothly zooms the model in or out.
Comments 3: Line 392 The word “framework” is repeated. Reword the sentence to avoid redundancy.
Response 3: Agree. I have used the term "paradigm" instead of "framework".
Comments 4: Line 300 “Metaphorical nature” could be made clearer as “intuitive metaphorical mapping”, which better reflects the human-object cognitive analogy.
Response 4: Agree. I have updated the wording in Section 4.1 to be more precise, as suggested.
Comments 5: Lines 179–216 Add visual arrows or motion indicators to illustrate finger movements more clearly.
Response 5: Agree. Figure 2 has been updated with visual arrows and motion indicators to more clearly illustrate the rotational "hinge" gesture and the "pinch" action.
Comments 6: Lines 253–256 Enrich the figure caption by describing the actions shown (e.g., “Child using pinch gesture to rotate the 3D model”).
Response 6: Thank you for this valuable suggestion. I agree that a more descriptive caption enhances the figure's impact. I have revised the caption for Figure 5 to specifically describe the types of engagement and the actions shown in the photos, as recommended.
Comments 7: Multiple Headings Standardise the capitalisation in all section and subsection titles for consistency.
Response 7: Agree. We have reviewed all section and subsection titles throughout the manuscript and standardized them to title case for consistency.
Comments 8: Lines 418–527 Ensure all DOI links are correctly hyperlinked and uniformly formatted in the references.
Response 8: Agree. The entire reference list has been checked to ensure all DOIs are correctly formatted and hyperlinked as per the journal's guidelines. I'm using the EndNote template from the MDPI journal.
Comments 9: Lines 150–157 In the licensing discussion, explicitly mention how the open licence facilitates access for low-resource educational institutions.
Response 9: This is an excellent point. I have added a sentence to the licensing paragraph in Section 2.2 to explicitly highlight this benefit.
Comments 10: Lines 161–165 Consider adding a figure or inset that visually labels the colour-coded functional zones in the TouchDesigner node layout.
Response 10: Thank you for this valuable suggestion. I agree completely that a visual guide to the project's organization is very helpful. To address this, and in response to a similar comment from another reviewer, I have updated Figure 4a. The figure now includes clear annotations that label the key color-coded functional zones of the TouchDesigner network. The figure's caption has also been updated to describe these zones, enhancing the clarity for readers who may not be familiar with the software.
For the CONTENT section:
Comments 11: Lines 347–350 The evaluation is entirely qualitative. Adding basic metrics like average interaction time or model switch frequency would strengthen the results.
Response 11: Agree. As noted in our response to another reviewer, I acknowledge this as a limitation of the current MS and have added the collection of these specific metrics as a key goal for the formal user studies mentioned in the "Future Work" section (Section 4.4).
Comments 12: Lines 266–273 and 347–350 Consider the inclusion of validated usability tools (e.g., SUS or NASA-TLX) to quantify user experience in future work.
Response 12: Thank you for this excellent and specific recommendation. I agree that using validated usability tools is a critical part of rigorous quantitative evaluation. I have updated the MS to explicitly include the use of instruments like the System Usability Scale (SUS) as a key component of planned future work. This information can be found in the "Evaluation Limitations" subsection of Section 4.3.
Comments 13: Lines 197–200 Clarify how the gesture thresholds were chosen—through testing, empirical adjustment, or estimation?
Response 13: Thank you for highlighting this missing detail. The thresholds were determined empirically. We have added a sentence to Section 2.4 to clarify this.
[Updated text in Section 2.4]
...the manipulation mode is activated when the initial Euclidean distance between the thumb tip (keypoint 4) and index finger tip (keypoint 8) falls below a set threshold. This threshold was determined empirically through iterative testing to find a balance between responsive activation and preventing accidental triggers.
Comments 14: Line 339 Since lighting affects hand tracking, recommend minimum lighting levels (e.g., in lux or practical conditions) to guide replication.
Response 14: This is a very practical point. While I did not measure lux levels, I can provide a more practical description of the required conditions. I have updated the "Technical Limitations" point in Section 4.3.
Comments 15: Lines 344–346 The lack of gesture-based access to mineral metadata is a limitation. A suggestion: use a simple additional gesture to trigger this function.
Response 15: Agree. I acknowledge this limitation and your suggestion is an excellent one. I have updated the "Future Work" section (Section 4.4) to include this specific idea.
Comments 16: Lines 301–326 A comparison table between PETRA, Kinect, Leap Motion, and VR systems (cost, complexity, hygiene, etc.) would improve contextualisation.
Response 16: Agree. This is a fantastic way to visually summarize the benefits of the PETRA system. I have added a new table in the Discussion section (Section 4.3) comparing these systems across several key attributes (see Table 1).
Comments 17: Lines 122–129 Add minimum system requirements (processor, RAM, webcam resolution) to help others replicate or adopt the system.
Response 17: Thank you for this important suggestion for reproducibility. I have revised Section 2.1 (System architecture and hardware) to integrate a sentence detailing the recommended minimum system requirements for the processor, RAM, graphics card, and webcam.
Comments 18: Lines 351–372 The authors should briefly discuss PETRA’s potential use in remote or hybrid teaching environments (e.g., online geology labs).
Response 18: Thank you for this suggestion. I have added a sentence to Section 4.4 discussing this specific application.
Comments 19: Lines 363–373 While the potential in other domains is mentioned, one concrete example (e.g., protein structure exploration in biology) would improve the argument.
Response 19: Agree. This example can be found in the third paragraph of Section 4.4.
Comments 20: Lines 374–387 Suggest exploring ML-based gesture adaptation to personalise the interface to different users and enhance accessibility.
Response 20: This is an excellent suggestion. I agree that personalized gesture adaptation is a very promising research direction for this work. I have added this concept as a more advanced point in the "Future Work" section (Section 4.4) to highlight it as a key avenue for enhancing the system's accessibility and inclusivity.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe article presents an innovative and highly relevant solution for interactivity in geological exhibitions. The PETRA system, based on gesture control using only a webcam and open-source software, offers a low-cost and highly accessible alternative for situations where interactivity makes all the difference.
Overall, the text is clear, well-organized, and well-founded. The language is both scientific and technical, yet also accessible to a broader audience. The technical implementation of the system is well described, with clear specifications regarding hardware, software, and interaction logic.
The work provides direct contributions to the fields of “Mineralogy 4.0” and science education. I am personally very pleased to see initiatives like this, especially as a university professor working in an institution that is increasingly facing funding challenges for innovative teaching solutions. Tools like this are essential for advancing teaching methodologies.
The author himself points out some limitations of the tool, such as the single-user support constraint. However, since this system is still in development and open-source, the dissemination of this tool becomes essential for its own technical and scientific advancement.
Author Response
Dear colleague,
Dear reviewer,
I would like to extend my sincere thanks for your exceptionally positive and encouraging review of my manuscript.
I am particularly grateful for your recognition of the project's core goals -- creating a low-cost and accessible solution for interactivity in geological exhibitions -- and for your kind words about the clarity and organization of the paper.
Your comments regarding the challenges faced by educational institutions with limited funding and your view that tools like PETRA are essential are significant to me. This perspective validates the entire motivation behind this work.
Thank you once again for your time, your supportive feedback, and your thoughtful engagement with this research.
Warm regards!
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe current version of this paper does not require further revisions.