Co-Design of Smartphone- and Smartwatch-Based Occupational Health Visualisations in Office Environments
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe document proposes a co-design to develop occupational health visualisations based on data acquired through smartphones and smartwatches. The multi-stakeholder workshop identifies relevant themes and implements visualisations related to physical activity, posture, noise, and heart rate. The validation is limited to a qualitative follow-up session. The contribution is primarily methodological, and as it is currently presented, it lacks robust quantitative validation. The authors are kindly requested to integrate the document and respond to the questions posed below. Thank you
1) Could the authors detail the reason why the monitored sample is limited to a single public organization? How could the generalisability of the system be ensured? It should be noted that the issue of occupational health varies depending on the activity performed by the employee. A more generalised analysis would allow the model to be implemented in other work contexts as well.
2) What is the total number and why is it not statistically justified?
3) How do the authors justify the use of consumer-grade sensors without validation compared to certified instruments? Recent studies on validation and advanced biomedical monitoring, such as doi: 10.3390/electronics15040833, demonstrate the importance of validated architectures for biomedical data acquisition. It is appropriate for the authors to validate their document with these studies and similar ones to make the model more scientifically robust.
4) The paper presents a qualitative validation rather than a quantitative one. Could the authors elaborate on the reason for this choice?
5) How did you control bias in the co-design workshops?
6) Why didn't you consider advanced signal analysis approaches (e.g., AI or predictive models)? I am not asking the authors to implement a model from scratch but only to integrate the document with the study doi: 10.3390/signals6030038 which highlights how CNN techniques improve the interpretation of sensory data.
7) How can the study be replicated in other contexts?
8) What explanation do the authors provide for managing sensor errors (e.g., uncalibrated noise)?
9) How do the authors intend to overcome the limitations of the current platform, which lacks EMG signals and environmental sensors? I suggest the integration of recent studies doi: 10.3390/app15052439 that have designed and implemented advanced sensor systems for complex monitoring.
10) Why was no statistical analysis of the results performed?
11) What evidence demonstrates that visualisations improve workers' health?
12) Why is there no direct comparison with existing systems (e.g., SWELL)?
Comments on the Quality of English LanguageThe paper presents an excessive use of long and complex sentences that reduce readability. The style is redundant and highly descriptive, particularly in the discussion sections. The document lacks conciseness in the methodological sections, which are verbose but provide little information. Some passages are ambiguous (for example, the distinction between 'illustrative' and 'evaluative' validation).
Author Response
Dear Reviewer, we thank you for your detailed review and addressed your suggestions as well as possible. Below, we either provide the made changes according to your comment and/or provide further context for clarification.
Comment 1: Could the authors detail the reason why the monitored sample is limited to a single public organization? How could the generalisability of the system be ensured? It should be noted that the issue of occupational health varies depending on the activity performed by the employee. A more generalised analysis would allow the model to be implemented in other work contexts as well.
Response 1: The study was conducted within a single partner organisation (CML) as part of the PrevOccupAI+ project. The project was funded for this particular purpose. This is consistent with the nature of generative co-design research, which is by definition context-specific and organisation-embedded. The depth of contextual understanding that generative co-design requires necessitates close collaboration with a single organisation over an extended period. This approach is fully consistent with the co-design literature cited in the manuscript. The restriction to this single organisation has been pointed out as a limitation in section 5.4.
Comment 2: What is the total number and why is it not statistically justified?
Response 2: The total number of participants and their full profile are reported in Table 1 (Section 3.3.1). A total of 12 participants took part in the co-design workshop.
With respect to the absence of statistical justification for the participant number, we respectfully note that statistical power analysis is not an applicable criterion for co-design research. Co-design is a qualitative, participatory methodology in which sample size is determined by two considerations: (1) the pragmatic constraint of stakeholder availability within the partner organisation, and (2) the principle of role diversity, which ensures that the resulting design insights reflect the perspectives of all relevant stakeholder groups. This is fully consistent with established practice across the co-design literature cited in the manuscript. Of these, none apply statistical sampling rationale to their participant numbers.
Comment 3: How do the authors justify the use of consumer-grade sensors without validation compared to certified instruments? Recent studies on validation and advanced biomedical monitoring, such as doi: 10.3390/electronics15040833, demonstrate the importance of validated architectures for biomedical data acquisition. It is appropriate for the authors to validate their document with these studies and similar ones to make the model more scientifically robust.
Response 3: With respect to the justification for using consumer-grade sensors, the manuscript grounds this choice in an established body of literature demonstrating the feasibility of smartphone and smartwatch sensors for (occupational) health monitoring purposes. Multiple cited studies have demonstrated the validity of smartphone accelerometers for activity classification, postural load estimation, and sedentary behaviour detection in occupational contexts (see Introduction and Related Work). The use of consumer-grade devices is furthermore the deliberate and defining characteristic of the PrevOccupAI+ platform, which aims to deliver occupational health monitoring without requiring dedicated research-grade instrumentation. The limitations of consumer-grade sensing are explicitly and transparently acknowledged in Section 5.4, where it is stated that the resulting data streams cannot be considered equivalent to measurements obtained with dedicated, properly calibrated instrumentation. For the noise assessment specifically, the manuscript explicitly frames the dBA values as estimates suitable for identifying patterns and tendencies, and recommends follow-up with dedicated calibrated equipment when consistently elevated noise levels are identified.
Regarding the suggested reference (doi:10.3390/electronics15040833), we have carefully reviewed this paper and respectfully note that it is not relevant to the present study. The reference presents the design and simulation-based validation of an embedded acquisition circuit for PCB integrity monitoring in biomedical devices, using Hall-effect current sensors and a CNN-based classifier. This work addresses a different application domain and sensing modality, with no connection to occupational health monitoring, wearable consumer devices, or data visualisation. Following MDPI's guidance on reviewer-suggested references, we have concluded that its inclusion would not enhance the present manuscript and have therefore chosen not to cite it. If the reviewer insists on including this reference, then we kindly ask for a clear rationale behind this choice by identifying how this reference relates to our study.
Comment 4: The paper presents a qualitative validation rather than a quantitative one. Could the authors elaborate on the reason for this choice?
Response 4: The follow-up session was designed as a qualitative interpretability check, which is the standard closing step within the applied generative co-design framework by Bird et al. Within this framework, the post-design phase serves to validate that the implemented outputs are interpretable and aligned with stakeholder expectations, and to gather structured feedback for targeted refinement. It does not constitute, nor does it claim to constitute, a formal usability or efficacy evaluation.
The present study is a co-design and early-stage development study. Its primary contributions are the structured elicitation of occupational health visualisation requirements through a participatory process, the feasibility assessment of those requirements against the available sensing platform, and the implementation of an initial set of visualisations grounded in those requirements. A population-level quantitative evaluation represents the natural next step in the research programme and is explicitly identified as a priority for future work in Section 5.5 of the revised manuscript.
We further note that the manuscript is fully transparent about this limitation. The Limitations section (Section 5.4) explicitly states that the primary focus of the present study is the co-design process and the development of the resulting visualisations, and that a population-level quantitative assessment has yet to be carried out.
Comment 5: How did you control bias in the co-design workshops?
Response 5: Bias control in the co-design workshop was addressed at two levels, both of which are described in the manuscript. With respect to facilitator bias, the manuscript explicitly states that the facilitators adopted a non-directive role during both the brainstorming and co-design activities, remaining available exclusively to answer participant questions regarding the functionality and data capabilities of the utilised sensors (Section 3.3.2). This is the standard procedural safeguard against facilitator-induced bias in generative co-design workshops, ensuring that the design directions emerged from participant knowledge and experience rather than from researcher guidance.
With respect to participant selection bias, the manuscript acknowledges in the Limitations section (Section 5.4) that the co-design workshop involved a relatively small number of participants recruited from a single public administration organisation, which limits the generalisability of the identified themes and design requirements to other occupational sectors or cultural contexts. This limitation is explicitly identified and directions for future work including larger and more occupationally diverse participant groups are proposed accordingly.
In case the reviewer is referring to other forms of bias, then we please ask to specify this further.
Comment 6: Why didn't you consider advanced signal analysis approaches (e.g., AI or predictive models)? The authors should integrate the study doi:10.3390/signals6030038 which highlights how CNN techniques improve the interpretation of sensory data.
Response 6: With respect to the absence of AI or predictive models, the present study is a co-design and visualisation study, not a signal classification study. Its objective is to obtain occupational health visualisation requirements from a multi-stakeholder group and to implement interpretable visualisations grounded in those requirements. The introduction of AI-based predictive models would serve an entirely different purpose which is neither the stated objective of the present study nor a requirement that emerged from the co-design process. The workshop participants did not request automated classification. They requested interpretable visualisations that could support self-monitoring, structured reflection, and evidence-based communication between workers and occupational health professionals.
Concerning the suggested reference (doi:10.3390/signals6030038), we have carefully reviewed this paper and respectfully note that we are not seeing how this study relates to ours. The reference present a CNN-LSTM hybrid model for classifying bioactive edible oils based on their infrared thermographic signatures, in the context of food quality assessment and nutritional traceability. This work has no connection to occupational health monitoring, wearable consumer-grade sensing, data visualisation, or co-design methodology. Following MDPI's guidance on reviewer-suggested references, we have concluded that its inclusion would not enhance the present manuscript and have therefore chosen not to cite it. If the reviewer insists on including this reference, then we kindly ask for a clear rationale behind this choice by identifying how this reference relates to our study.
Comment 7: How can the study be replicated in other contexts?
Response 7: The study is replicable in other contexts through direct application of the utilised generative co-design framework by Bird et al. The manuscript provides a detailed description of all phases of this framework as applied in the present study, including the field study and questionnaire design, the workshop structure and facilitation approach, the transcription and inductive coding procedures, the feasibility assessment criteria, and the visualisation implementation pipeline. This level of methodological transparency is precisely what enables replication.
With respect to contextual transferability, the framework should be generally applicable to the majority of traditional office work environments. The risk factors associated with office work are broadly consistent across office settings regardless of the specific organisation. For non-office occupational settings, the same methodological pipeline would apply, with appropriate adaptation of the participant composition and the risk factor scope. This direction is explicitly identified in the Limitations section (Section 5.4) as a priority for future work.
Comment 8: What explanation do the authors provide for managing sensor errors (e.g., uncalibrated noise)?
Response 8: Sensor limitations and their management are addressed at multiple levels in the manuscript.
With respect to the general use of consumer-grade devices, the manuscript explicitly acknowledges in the Limitations section (Section 5.4) that the data streams derived from consumer-grade sensors cannot be considered equivalent to measurements obtained with dedicated, properly calibrated instrumentation. This limitation is inherent to the study design, which deliberately employs widely accessible consumer devices rather than research-grade equipment. This is consistent with the growing body of literature demonstrating the feasibility of such platforms for (occupational) health monitoring (as cited in the manuscript).
Concerning the noise sensor, the smartphone microphone was calibrated against a silent room reference prior to data collection to establish a baseline offset. This detail has been added to Section 4.4.3 in the revised manuscript. Furthermore, the manuscript explicitly states that the resulting dBA values should be interpreted as general indicators noise exposure rather than as definitive exposure values. If elevated noise patterns are consistently registered, then these should be followed up with dedicated and properly calibrated sound level measurement equipment.
With respect to the remaining sensor modalities, the IMU-based activity recognition and postural load derivations are grounded in a body of cited literature demonstrating their feasibility and validity in occupational contexts, which constitutes the methodological justification for their use in the present study.
Comment 9: How do the authors intend to overcome the limitations of the current platform, which lacks EMG signals and environmental sensors? I suggest the integration of recent studies doi: 10.3390/app15052439 that have designed and implemented advanced sensor systems for complex monitoring.
Response 9: The manuscript already addresses the path forward for both sensor modalities in the Future Work section (Section 5.5).
Considering EMG, the PrevOccupAI+ platform is application-based and designed for extensibility. Surface EMG sensors can be integrated into the existing acquisition framework through established biomedical sensor APIs (e.g., muscleBAN of PLUX biosignals: https://www.pluxbiosignals.com/products/muscleban-ble).
With respect to dedicated environmental sensors, namely lux meters and thermometers: this too is explicitly identified in Section 5.5 as a priority extension. In case an integration of these devices is not possible, traditional measures using dedicated sensors can be carried out by the occupational safety and health specialists at CML.
Regarding the suggested reference (doi:10.3390/app15052439), we have carefully reviewed this paper and respectfully note that it is not relevant to the present study. The reference present a monitoring system for specific absorption rate and temperature variation due to electromagnetic field exposure in indoor environments. This work addresses a different risk domain which was not identified as an occupational health risk factor by the co-design participants and is not within the scope of the present study. Furthermore, EMF exposure as a non-thermal biological risk factor remains under active scientific investigation, as the suggested paper itself acknowledges. It has not been established as a primary occupational health concern in office environments in the manner of the risk domains addressed in the present manuscript. We do believe that research into this topic is relevant and should be carried forward. However, at the current stage, we consinder it out of scope for our current study. Following MDPI's guidance on reviewer-suggested references, we have concluded that its inclusion would not enhance the present manuscript and have therefore chosen not to cite it. If the reviewer insists on including this reference, then we kindly ask for a clear rationale behind this choice by identifying how this reference relates to our study.
Comment 10: Why was no statistical analysis of the results performed?
Response 10: The present study is a qualitative co-design study, and the analytical approach was selected accordingly. The primary outputs of the co-design process, such as the inductive codes, thematic categories, and low-fidelity visualisation prototypes, are qualitative artefacts that are not statistically evaluated. These were analysed using thematic analysis following the established framework of Clarke and Braun. This is the standard and appropriate analytical method for this type of data.
With respect to the implemented visualisations, a quantitative evaluation encompassing structured usability testing, measures of cognitive burden, and assessment of impact on occupational health behaviour is explicitly identified as a priority for future work in Section 5.5. Such an evaluation requires a longitudinal intervention study with a sufficiently large and diverse user population, and represents the natural next phase of this research programme rather than a component of the present study.
Comment 11: What evidence demonstrates that visualisations improve workers' health?
Response 11: The present study does not claim that the developed visualisations improve workers' health. The manuscript explicitly frames the visualisations as the output of a co-design process, confirmed as interpretable and aligned with stakeholder requirements through a qualitative follow-up session. Demonstrating health impact requires longitudinal intervention studies with sufficient statistical power, which constitute a distinct and subsequent phase of the research programme. This is explicitly acknowledged in the Limitations section (Section 5.4) and the Future Work section (Section 5.5).
Comment 12: Why is there no direct comparison with existing systems (e.g., SWELL)?
Response 12: A direct quantitative comparison with existing systems such as SWELL is not appropriate at the current stage of the research for two reasons.
First, the present system and existing systems such as SWELL differ fundamentally in scope, sensing configuration, study design, and intended use context. SWELL is a laboratory-based multimodal system employing research-grade sensors and a controlled data collection environment, whereas the present work employs consumer-grade devices in a real-world occupational setting and is grounded in a participatory co-design methodology. A meaningful direct comparison would require equivalent evaluation conditions, which do not exist at this stage of the research.
Second, a direct system comparison should only be carried out after the visualisations have been evaluated quantitively.
Comments on the Quality of English Language: The paper presents an excessive use of long and complex sentences that reduce readability. The style is redundant and highly descriptive, particularly in the discussion sections. The document lacks conciseness in the methodological sections, which are verbose but provide little information. Some passages are ambiguous (for example, the distinction between 'illustrative' and 'evaluative' validation).
Response: Regarding sentence length and redundancy, the manuscript has been thoroughly revised to improve readability, with long and complex sentences broken into shorter ones. Redundant phrasing has been removed as well.
Regarding the level of detail in the methodological sections, we respectfully maintain that the reported detail is both necessary and deliberate. Co-design studies require transparent reporting of participant selection criteria, workshop structure, transcription procedures, coding methodology, and feasibility assessment to allow critical appraisal and replication. The current level of detail is consistent with established reporting standards in co-design research in digital health and follows the applied co-design approach.
Regarding the distinction between illustrative and evaluative validation, we acknowledge that the original phrasing was ambiguous. The relevant passage in the Limitations section (Section 5.4) has been revised to clarify that the primary focus of the present study is the co-design process and the development of the resulting visualisations.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper proposed a technique.to observe the occupational health with the common communication devices. They combined participatory design with consumer-grade sensing and visualization for office environments. Their reports included 10 respondents, a 12-person multi-stakeholder co-design workshop, 17 inductive themes, 27 visualization elements, and an implemented set of visualizations for activity, heart rate, noise, and posture, followed by a participant validation session.
The paper has a clearly structured, feasibility assessment, implementation, and coherent content. However, the content focuses on an early design-and-development report rather than a fully validated occupational health visualization study. Some issues need to be enhanced.
- The rigorous evaluation of the developed visualizations is weak. The “interpretability and clarity” were qualitative and light, lacking the Quantitative criteria. The reader finds it difficult to understand the strong assessment of usability or effectiveness. Please provide a structured usability instrument for cognitive burden. That is, the manuscript only shows the visualizations of the co-designed and implemented, but does not provide effective evidence to support decision-making or behavior change.
- The description of the available sensing platform is limited. For example, smartwatch heart rate is sampled only in four 20-minute windows, noise is estimated with an uncalibrated smartphone microphone, and muscular tension is inferred indirectly. More information is needed to support these statements.
- The paper described that WhisperX was used for transcription, and ChatGPT (GPT-4.1) was used both to merge transcripts and generate preliminary codes. Please provide more information about the methodological precision, such as which prompts or coding rules were used. How were disagreements between researchers resolved?
How was bias from LLM-assisted coding mitigated?
- The questionnaire results are clearly described, but they remain descriptive, and the results are weak. Only 10 participants responded. Please clarify that the questionnaire was used primarily to prepare the workshop and reduce the interpretive weight placed on its findings.
- The linkage between raw data, processing pipeline, and final figures needs to be enhanced. Some figures for activity, heart rate, noise, and posture are presented. However, more precise methodological details on how the visual outputs were generated are needed. For example, how were heart-rate classes defined? How were noise categories mapped from microphone-derived values? What preprocessing, smoothing, or artifact removal steps were applied?
- The manuscript claims that this is the first structured co-design study of multimodal occupational health visualizations in office work using smartphone/smartwatch sensing. The “first” claims are hard to defend. Please use the sentence “to the best of our knowledge, among the first…”
- In Figure 1. Device placement within the PrevOccupAI+ setup as presented to the co-design workshop participants. (a) Smartphone (Xiaomi Redmi Note 9, Beijing, China); (b) Smartwatch (OPPO watch 41mm, DongGuan City, China). And Table 2. Information extractable from the utilized devices, as presented to the co-design workshop participants.
Please illustrate the sensing performance of these sensors, such as Accelerometer, Gyroscope, Magnetometer, Rotation Vector, etc.
- The quality of Figures 2 and 3 is so poor; please improve them.
- Figures 4-10, some letters are not English; it is difficult to understand.
- The English grammar and syntax contain many errors and require detailed improvement.
- The format of the document does not conform to the journal's requirements.
Comments on the Quality of English Language
The English should be improved to more clearly express the research.
Author Response
Dear Reviewer, we thank you for your detailed review and addressed your suggestions as well as possible. Below, we either provide the made changes according to your comment and/or provide further context for clarification.
Comment 1: The rigorous evaluation of the developed visualizations is weak. The “interpretability and clarity” were qualitative and light, lacking the Quantitative criteria. The reader finds it difficult to understand the strong assessment of usability or effectiveness. Please provide a structured usability instrument for cognitive burden. That is, the manuscript only shows the visualizations of the co-designed and implemented, but does not provide effective evidence to support decision-making or behavior change.
Response 1: With respect to the qualitative nature of the follow-up session, the present study is a co-design and early-stage development study grounded in the generative co-design framework by Bird et al. Within this framework, the post-design phase serves to validate that the implemented outputs are interpretable and aligned with stakeholder expectations, and to gather structured feedback for targeted refinement. A qualitative follow-up session is the methodologically appropriate closing step at this stage of the research. The manuscript is fully transparent about this: the Limitations section (Section 5.4) explicitly states that the primary focus of the present study is the co-design process and the development of the resulting visualisations, and that a population-level quantitative assessment remains outstanding.
With respect to the request for a structured usability instrument for cognitive burden. The application of these instruments is explicitly planned as a priority for future work, as stated in Section 5.5 of the revised manuscript.
Concerning the evidence for decision-making or behaviour change support, demonstrating such effects requires longitudinal intervention studies with sufficient statistical power, which represent a distinct and subsequent phase of the research programme. The present study establishes the design foundation upon which such an evaluation can and should be built.
Comment 2: The description of the available sensing platform is limited. For example, smartwatch heart rate is sampled only in four 20-minute windows, noise is estimated with an uncalibrated smartphone microphone, and muscular tension is inferred indirectly. More information is needed to support these statements.
Response 2: We respectfully note that each of the three points raised are already addressed in the manuscript at appropriate locations. With respect to heart rate sampling, the four 20-minute acquisition windows and their rationale (battery limitations of the smartwatch) are explicitly described in Section 3.3.2 and 4.4.2. This constraint is further acknowledged in the Limitations section (Section 5.4), where it is noted that this may result in the under-detection of transient physiological events occurring between acquisition periods.
With respect to noise estimation, the smartphone microphone was calibrated against a silent room reference prior to data collection, as now explicitly stated Section 4.4.3. The manuscript further clarifies that the resulting dBA values represent estimates of the acoustic exposure profile rather than precision measurements, and that consistently elevated noise patterns should be followed up with dedicated calibrated equipment.
Regarding muscular tension, the manuscript explicitly states in both the feasibility assessment (Section 4.3.1) and the Limitations section (Section 5.4) that direct quantification of muscular tension requires surface electromyography, which is not part of the current platform. Section 4.3.1 clearly states: "While the smartphone IMU provides trunk inclination and lateral tilt data that can indicate awkward or sustained static postures, these do not constitute a direct tension measure and should not be treated as equivalent." In other words, muscular tension could be inferred indirectly through the IMU measures, but only true EMG can give a proper assessment.
Finally, we note that the present study is a co-design and visualisation study rather than a sensing platform paper. The sensing platform is described as contextual background necessary to understand the data available for the visualisations. We believe that expanding the platform description beyond its current scope would shift the manuscript's focus away from the co-design process. However, if the reviewer sees the need for a more detailed description, then we kindly ask to specify what exactly should be described.
Comment 3: The paper described that WhisperX was used for transcription, and ChatGPT (GPT-4.1) was used both to merge transcripts and generate preliminary codes. Please provide more information about the methodological precision, such as which prompts or coding rules were used. How were disagreements between researchers resolved? How was bias from LLM-assisted coding mitigated?
Response 3: Regarding the prompts used with ChatGPT, we respectfully note that reporting exact prompts would not guarantee methodological reproducibility, given the non-deterministic nature of LLM outputs. More importantly, the LLM was never used as a primary analytical instrument. In both the transcript merging and the preliminary coding steps, the LLM output served exclusively as a starting point to accelerate an otherwise time-intensive process. All outputs were independently reviewed, critically evaluated, and refined by both researchers before being used in any further analysis. This is now stated more explicitly in the revised manuscript. We further note that this approach is consistent with emerging practices for LLM-assisted qualitative analysis, as supported by the cited literature.
Regarding disagreement resolution, the revised text now explicitly states that any discrepancies between the two researchers were resolved through discussion until consensus was reached, which is the standard procedure in qualitative research.
Regarding LLM bias mitigation, the critical verification role of the researchers at every stage of the process constitutes the primary safeguard against uncritical adoption of LLM-generated outputs. The thematic analysis followed the established framework of Clarke and Braun (as cited in the manuscript), with the LLM serving only a supportive and subordinate function within that framework.
Comment 4: The questionnaire results are clearly described, but they remain descriptive, and the results are weak. Only 10 participants responded. Please clarify that the questionnaire was used primarily to prepare the workshop and reduce the interpretive weight placed on its findings.
Response 4: The questionnaire was never intended as a standalone research instrument. Therefore, we agree that placing it in the Results section was most likely not an optimal solution. The manuscript has been revised accordingly in two ways. First, the Methods section (Section 3.2.2) has been updated to explicitly state that the questionnaire served a preparatory role for the co-design workshop. Second, the questionnaire results have been removed from the Results section and can now be found in the appendix (Appendix A.2). These are now only reported for transparency and completeness.
Comment 5: The linkage between raw data, processing pipeline, and final figures needs to be enhanced. Some figures for activity, heart rate, noise, and posture are presented. However, more precise methodological details on how the visual outputs were generated are needed. For example, how were heart-rate classes defined? How were noise categories mapped from microphone-derived values? What preprocessing, smoothing, or artifact removal steps were applied?
Response 5: We respectfully note that the present study is a co-design and visualisation study rather than a signal processing paper. The main focus of the study is the co-design process and the visualisations that resulted from this process. While writing the article, we tried to strike a balance with regards to the methodological detail provided for each visualisation domain. We focused on providing the rational behind each design decision, rather than describing the full signal processing pipelines as it would divert the focus from the co-design and would make the article longer than it already is. The processing pipelines will be deferred to a future publication that will present a dataset collected within the project, as well as detailed processing pipelines for each sensor.
With respect to heart rate classification, the classification scheme is described in Section 4.4.2 of the manuscript. Heart rate was expressed via the heart rate ratio. The three classification levels are defined with explicit thresholds. The reasoning behind each threshold is stated directly in the text providing corresponding citations.
Regarding the noise category mapping, the smartphone microphone acquires ambient sound levels in dBA, as described in the manuscript. We have specifically programmed the data acquisition application in this way as recording raw microphone data (i.e., conversations) would not comply with GDPR. The four noise categories were defined in consultation with the occupational health specialists during the follow-up co-design session.
Comment 6: The manuscript claims that this is the first structured co-design study of multimodal occupational health visualizations in office work using smartphone/smartwatch sensing. The “first” claims are hard to defend. Please use the sentence “to the best of our knowledge, among the first…”
Response 6: The absolute "first" framing has been softened at both locations in the manuscript where such claims appeared.
- Contributions list: [...] To the best of our knowledge, this work is among the first structured co-design studies of this kind in the context of office work.
- Related work (last paragraph): Thus, the present study positions itself among the first to address this gap by applying the generative co-design framework~\cite{bird2021generative} to [...].
Comment 7: Please illustrate the sensing performance of these sensors, such as Accelerometer, Gyroscope, Magnetometer, Rotation Vector, etc.
Response 7: The sensing performance of the smartphone and smartwatch sensors utilised in this study is supported by an established body of literature demonstrating their suitability for the specified tasks. The feasibility of smartphone accelerometers for human activity classification, postural load estimation, and sedentary behaviour detection in occupational contexts has been demonstrated across multiple cited studies, as cited in the manuscript. The use of smartwatch optical heart rate monitors for cardiovascular monitoring in daily and occupational settings is similarly well established. The use of smartphone microphones for ambient noise estimation has also been explored in occupational and public health contexts (this has now been pointed out more clearer in the introduction). The present study therefore builds on an existing evidence base rather than introducing these sensor modalities for the first time. We consider the identification of the utilised devices alongside this body of supporting literature to be consistent with standard reporting practice in field of work (e.g., mHealth). If the reviewer has specific performance metrics in mind beyond what is addressed by the cited literature, we would welcome clarification and would be happy to provide additional information accordingly.
Comment 8: The quality of Figures 2 and 3 is so poor; please improve them.
Response 8: The image quality of Figures 2 and 3 has been improved in the revised manuscript, with enhanced lighting and sufficient resolution to ensure that all text and visual elements are clearly readable and can be examined in detail upon zooming.
With respect to the visual design of the content depicted in these figures, we respectfully note that Figures 2 and 3 are direct photographic records of the low-fidelity visualisation prototypes produced by the two participant groups during the co-design workshop, as explicitly stated in the manuscript (Section 3.3.2 and Section 4.2). These artefacts are the authentic outputs of the co-design process and their appearance reflects the materials and methods available to the participants during the session. Altering or redrawing their content would misrepresent the co-design outputs and undermine the authenticity of the research record. We therefore consider the current depiction of these figures appropriate and consistent with standard practice in co-design research reporting.
Comment 9: Figures 4-10, some letters are not English; it is difficult to understand.
Response 9: All figures have been regenerated as vector-based .eps files to ensure optimal rendering quality at any resolution. Furthermore, all labels, annotations, and titles appearing within the figures that were previously in Portuguese have been translated to English.
Comment 10: The English grammar and syntax contain many errors and require detailed improvement.
Response 10: The manuscript has been thoroughly revised to address grammatical and syntactic issues throughout. Specifically, the following changes were applied: (1) British English spelling has been standardised consistently across the manuscript (e.g., -ise endings, behaviour, colour); (2) overly long and complex sentences have been broken into shorter ones, (3) missing articles have been inserted where absent; and (4) redundant phrasing has been removed to improve conciseness and readability.
Comment 11: The format of the document does not conform to the journal's requirements.
Response 11: We believe that this has now been thoroughly addressed by improving the manuscript through integration of all previous comments.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have given excellent answers to the review comments and have revised the paper well.
Reviewer 2 Report
Comments and Suggestions for AuthorsAll problems have been addressed adequately.

