An Eye-Tracking-Driven Evaluation Framework for Age-Friendly Smart Home Interface
Abstract
1. Introduction
2. Related Work
3. Materials and Methods
3.1. Framework Overview
3.2. Metric System Based on WCAG
3.3. Cognitive Load Modeling via Eye-Tracking
3.4. LLM-Based Evaluation Agent
3.5. Validation Design
4. Results
4.1. Cognitive Load Analysis and Weight Derivation
4.2. Interface Evaluation Results
4.3. Method Validation
5. Discussion
5.1. Cognitive Load Differences and Design Implications
5.2. Methodological Contribution: Behavior-Grounded Weighting
5.3. Reliability of LLM-Assisted Evaluation
5.4. External Validity and Scope
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- World Health Organization. Beyond the Decade of Healthy Ageing; World Health Organization: Geneva, Switzerland, 2024; ISBN 978-92-4-007353-1. [Google Scholar]
- Saaty, T.L. The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation; McGraw-Hill International Book Company: New York, NY, USA, 1980; ISBN 978-0-07-054371-3. [Google Scholar]
- Rotaru, O.; Orhei, C.; Vasiu, R. Hybrid Usability Evaluation of an Automotive REM Tool: Human and LLM-Based Heuristic Assessment of IBM Doors Next. Appl. Sci. 2026, 16, 723. [Google Scholar] [CrossRef]
- Wang, Q.; Jing, L.; Zhou, L.; Tian, J.; Chen, X.; Zhang, W.; Wang, H.; Zhou, W.; Gao, Y. Usability Evaluation of mHealth Apps for Elderly Individuals: A Scoping Review. BMC Med. Inform. Decis. Mak. 2022, 22, 317. [Google Scholar] [CrossRef] [PubMed]
- Zhou, C.; Yuan, F.; Huang, T.; Zhang, Y.; Kaner, J. The Impact of Interface Design Element Features on Task Performance in Older Adults: Evidence from Eye-Tracking and EEG Signals. Int. J. Environ. Res. Public Health 2022, 19, 9251. [Google Scholar] [CrossRef]
- Zhou, C.; Dai, Y.; Huang, T.; Zhao, H.; Kaner, J. An Empirical Study on the Influence of Smart Home Interface Design on the Interaction Performance of the Elderly. Int. J. Environ. Res. Public Health 2022, 19, 9105. [Google Scholar] [CrossRef]
- Li, G.; Tang, T. Online Performance and Interface Design Implications among Older Adults: A Systematic Review of Eye Tracking Studies. Appl. Ergon. 2025, 128, 104538. [Google Scholar] [CrossRef]
- Web Content Accessibility Guidelines (WCAG) 2.2. Available online: https://www.w3.org/TR/WCAG22/ (accessed on 23 April 2026).
- Salman, H.M.; Wan Ahmad, W.F.; Sulaiman, S. Usability Evaluation of the Smartphone User Interface in Supporting Elderly Users From Experts’ Perspective. IEEE Access 2018, 6, 22578–22591. [Google Scholar] [CrossRef]
- Silva, P.A.; Holden, K.; Jordan, P. Towards a List of Heuristics to Evaluate Smartphone Apps Targeted at Older Adults: A Study with Apps That Aim at Promoting Health and Well-Being. In Proceedings of the 2015 48th Hawaii International Conference on System Sciences; IEEE: Kauai, HI, USA, 2015; pp. 3237–3246. [Google Scholar]
- Ashraf, A.; Zhu, X.; Liu, J.; Rauf, Q.; Firdaus, R. Usability Evaluation Framework of Smart Home Applications for Senior Citizens. In Proceedings of the 2022 12th International Conference on Software Technology and Engineering, ICSTE; IEEE Computer Soc: Los Alamitos, CA, USA, 2022; pp. 29–39. [Google Scholar]
- He, H.; Raja Ghazilla, R.A.; Abdul-Rashid, S.H. A Systematic Review of the Usability of Telemedicine Interface Design for Older Adults. Appl. Sci. 2025, 15, 5458. [Google Scholar] [CrossRef]
- Liu, W.; Li, Y.; Cai, J. Research on Aging Design of Passenger Car Center Control Interface Based on Kano/AHP/QFD Models. Electronics 2024, 13, 5004. [Google Scholar] [CrossRef]
- Ye, J.; Han, Y.; Li, W.; Yang, C. Visual Selective Attention Analysis for Elderly Friendly Fresh E-Commerce Product Interfaces. Appl. Sci. 2025, 15, 4470. [Google Scholar] [CrossRef]
- Kim, J.; Ahn, J.-H.; Kim, Y. Immersive Interaction for Inclusive Virtual Reality Navigation: Enhancing Accessibility for Socially Underprivileged Users. Electronics 2025, 14, 1046. [Google Scholar] [CrossRef]
- Platt, N.; Luchs, E.; Nizamani, S. Catching UX Flaws in Code: Leveraging LLMs to Identify Usability Flaws at the Development Stage. In Proceedings of the 2025 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC); IEEE: Raleigh, NC, USA, 2025; pp. 152–158. [Google Scholar]
- Duan, P.; Cheng, C.-Y.; Li, G.; Hartmann, B.; Li, Y. UICrit: Enhancing Automated Design Evaluation with a UI Critique Dataset. In Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–17. [Google Scholar]
- Hsueh, N.-L.; Lin, H.-J.; Lai, L.-C. Applying Large Language Model to User Experience Testing. Electronics 2024, 13, 4633. [Google Scholar] [CrossRef]
- Li, D.; Jiang, B.; Huang, L.; Beigi, A.; Zhao, C.; Tan, Z.; Bhattacharjee, A.; Jiang, Y.; Chen, C.; Wu, T.; et al. From Generation to Judgment: Opportunities and Challenges of LLM-as-a-Judge. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing; Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V., Eds.; Association for Computational Linguistics: Suzhou, China, 2025; pp. 2757–2791. [Google Scholar]
- Skaramagkas, V.; Giannakakis, G.; Ktistakis, E.; Manousos, D.; Karatzanis, I.; Tachos, N.; Tripoliti, E.; Marias, K.; Fotiadis, D.I.; Tsiknakis, M. Review of Eye Tracking Metrics Involved in Emotional and Cognitive Processes. IEEE Rev. Biomed. Eng. 2023, 16, 260–277. [Google Scholar] [CrossRef] [PubMed]
- Lu, Y.; Kim, M. Eye-Tracking Response Modeling and Design Optimization Method for Smart Home Interface Based on Transformer Attention Mechanism. Electronics 2026, 15, 1562. [Google Scholar] [CrossRef]
- Wildenbos, G.A.; Peute, L.; Jaspers, M. Aging Barriers Influencing Mobile Health Usability for Older Adults: A Literature Based Framework (MOLD-US). Int. J. Med. Inform. 2018, 114, 66–75. [Google Scholar] [CrossRef]
- Owsley, C. Aging and Vision. Vis. Res. 2011, 51, 1610–1622. [Google Scholar] [CrossRef] [PubMed]
- Wacharamanotham, C.; Hurtmanns, J.; Mertens, A.; Kronenbuerger, M.; Schlick, C.; Borchers, J. Evaluating Swabbing: A Touchscreen Input Method for Elderly Users with Tremor. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; Association for Computing Machinery: New York, NY, USA, 2011; pp. 623–626. [Google Scholar]
- Jacko, J.A.; Scott, I.U.; Sainfort, F.; Barnard, L.; Edwards, P.J.; Emery, V.K.; Kongnakorn, T.; Moloney, K.P.; Zorich, B.S. Older Adults and Visual Impairment: What Do Exposure Times and Accuracy Tell Us about Performance Gains Associated with Multimodal Feedback? In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; Association for Computing Machinery: New York, NY, USA, 2003; pp. 33–40. [Google Scholar]
- Khan, R.; Vernooij, J.; Salvatori, D.; Hierck, B.P. Assessing Cognitive Load Using EEG and Eye-Tracking in 3-D Learning Environments: A Systematic Review. Multimodal Technol. Interact. 2025, 9, 99. [Google Scholar] [CrossRef]
- Neugarten, B.L. Age Groups in American Society and the Rise of the Young-Old. Ann. Am. Acad. Political Soc. Sci. 1974, 415, 187–198. [Google Scholar] [CrossRef]
- National Working Commission on Aging, Ministry of Civil Affairs of China. 2024 Annual Report on the Development of National Aging Affairs; National Working Commission on Aging, Ministry of Civil Affairs of China: Beijing, China, 2025.
- Creavin, S.; Wisniewski, S.; Noel-Storr, A.; Trevelyan, C.; Hampton, T.; Rayment, D.; Thom, V.; Nash, K.; Elhamoui, H.; Milligan, R.; et al. Mini-Mental State Examination (MMSE) for the Detection of Dementia in Clinically Unevaluated People Aged 65 and over in Community and Primary Care Populations. Cochrane Database Syst. Rev. 2016, 1, CD011145. [Google Scholar] [CrossRef]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Routledge: New York, NY, USA, 2013; ISBN 978-0-203-77158-7. [Google Scholar]
- Nardo, M.; Saisana, M.; Saltelli, A.; Tarantola, S.; Hoffmann, A.; Giovannini, E. Handbook on Constructing Composite Indicators: Methodology and User Guide; OECD Statistics Working Papers; OECD Publishing: Paris, French, 2005. [Google Scholar] [CrossRef]
- Greco, S.; Ishizaka, A.; Tasiou, M.; Torrisi, G. On the Methodological Framework of Composite Indices: A Review of the Issues of Weighting, Aggregation, and Robustness. Soc. Indic. Res. 2019, 141, 61–94. [Google Scholar] [CrossRef]
- You, K.; Zhang, H.; Schoop, E.; Weers, F.; Swearngin, A.; Nichols, J.; Yang, Y.; Gan, Z. Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs; Springer Nature: Cham, Switzerland, 2024. [Google Scholar]
- Wang, P.; Li, L.; Chen, L.; Cai, Z.; Zhu, D.; Lin, B.; Cao, Y.; Liu, Q.; Liu, T.; Sui, Z. Large Language Models Are Not Fair Evaluators. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 9440–9450. [Google Scholar]
- Chen, D.; Chen, R.; Zhang, S.; Wang, Y.; Liu, Y.; Zhou, H.; Zhang, Q.; Wan, Y.; Zhou, P.; Sun, L. MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark. In Proceedings of the 41st International Conference on Machine Learning; PMLR: Vienna, Austria, 2024; pp. 6562–6595. [Google Scholar]
- Bernard, M.; Liao, C.H.; Mills, M. The Effects of Font Type and Size on the Legibility and Reading Time of Online Text by Older Adults. In Proceedings of the CHI ’01 Extended Abstracts on Human Factors in Computing Systems; Association for Computing Machinery: New York, NY, USA, 2001; pp. 175–176. [Google Scholar]
- Hou, G.; Anicetus, U.; He, J. How to Design Font Size for Older Adults: A Systematic Literature Review with a Mobile Device. Front. Psychol. 2022, 13, 931646. [Google Scholar] [CrossRef]
- Koo, T.K.; Li, M.Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med. 2016, 15, 155–163. [Google Scholar] [CrossRef]
- Wynn, J.S.; Olsen, R.K.; Binns, M.A.; Buchsbaum, B.R.; Ryan, J.D. Fixation Reinstatement Supports Visuospatial Memory in Older Adults. J. Exp. Psychol. Hum. Percept. Perform. 2018, 44, 1119–1127. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Hua, C.; Pan, W.; Chen, H.; Bu, L. Enhancing Attention Allocation in Smart Home Interactions: A Multimodal Approach for Hearing-Impaired Elders with Mild Cognitive Impairmen. Int. J. Hum.-Comput. Interact. 2025, 1–26. [Google Scholar] [CrossRef]
- Power, C.; Freire, A.; Petrie, H.; Swallow, D. Guidelines Are Only Half of the Story: Accessibility Problems Encountered by Blind Users on the Web. In Conference on Human Factors in Computing Systems—Proceedings; Association for Computing Machinery: New York, NY, USA, 2012. [Google Scholar] [CrossRef]
- Mankoff, J.; Fait, H.; Tran, T. Is Your Web Page Accessible? In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; ACM Conferences: New York, NY, USA, 2005; pp. 41–50. ISBN 978-1-58113-998-3. [Google Scholar]
- Petrie, H.; Kheir, O. The Relationship between Accessibility and Usability of Websites. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; Association for Computing Machinery: New York, NY, USA, 2007; pp. 397–406. [Google Scholar]
- Shiffman, L.M. Effects of Aging on Adult Hand Function. Am. J. Occup. Ther. 1992, 46, 785–792. [Google Scholar] [CrossRef] [PubMed]





| ID | Dimension | WCAG Criterion | Level | Judgment Type | Core Judgment Condition |
|---|---|---|---|---|---|
| V1 | Visual | SC 1.4.3 Contrast | AA | Rule-based | Foreground-to-background luminance ratio ≥ 4.5:1 (regular text) or ≥3:1 (large text) |
| V2 | Visual | SC 1.4.4 Text scaling | AA | Rule-based | Supports 200% lossless scaling of base font size |
| V3 | Visual | SC 1.4.11 Non-text contrast | AA | Semantic | UI components and icons with visual distinguishability ≥ 3:1 |
| O1 | Operational | SC 2.5.8 Target size | AA | Rule-based | Interactive elements not smaller than 24 × 24 CSS pixels |
| O2 | Operational | SC 2.5.2 Pointer cancelation | A | Rule-based | Provides one of four mechanisms: up-event trigger, abort, undo, or essential exception |
| O3 | Operational | SC 2.4.3 Focus order | A | Semantic | Focus order maintains consistency with visual layout in meaning and operability |
| A1 | Accessibility | SC 4.1.3 Status messages | AA | Semantic | Status changes are programmatically perceivable by assistive technology |
| A2 | Accessibility | SC 2.1.1 Keyboard accessible | A | Semantic | All functions are reachable through non-pointer methods |
| A3 | Accessibility | SC 3.3.4 Error prevention | AA | Rule-based | High-risk operations provide one of: reversible, checkable, or confirmation mechanism |
| Indicator | Type | = 100 | = 75 | = 50 | = 25 |
|---|---|---|---|---|---|
| V1 Contrast | Rule | Regular text ≥ 7:1, large text ≥ 4.5:1 | Regular text ≥ 4.5:1, large text ≥ 3:1 | Regular text 3:1 to 4.5:1 | Regular text < 3:1 |
| V2 Text scaling | Rule | Supports 200% lossless scaling and base font size ≥ 14 pt | Supports 200% lossless scaling | Supports 200% scaling but with truncation or functional loss | Does not support 200% scaling |
| V3 Non-text contrast | Semantic | UI component contrast ≥ 4.5:1 | UI component contrast ≥ 3:1 | Some UI components reach 3:1 | UI component contrast < 3:1 |
| O1 Target size | Rule | All elements ≥ 48 × 48 CSS pixels | All elements ≥ 24 × 24 CSS pixels | Some elements between 16 × 16 and 24 × 24 CSS pixels | Elements < 24 × 24 CSS pixels present |
| O2 Pointer cancelation | Rule | Provides both up-event trigger and undo mechanism, plus at least two other mechanisms | Satisfies one of the four SC 2.5.2 mechanisms | Does not satisfy SC 2.5.2 but provides error feedback | Does not satisfy SC 2.5.2 and no error feedback |
| O3 Focus order | Semantic | Focus order fully consistent with visual layout and skips decorative elements | Focus order consistent with visual layout in core interaction areas | Some areas inconsistent but do not affect core tasks | Focus order disorganized or skips key elements |
| A1 Status messages | Semantic | Status conveyed through three channels: visual, screen reader, and audio | Status programmatically perceivable by screen reader plus visual feedback | Visual feedback only, with textual description | Visual feedback only, without textual description |
| A2 Keyboard accessible | Semantic | All functions support two non-pointer paths: voice and assistive technology | All functions support at least one non-pointer path | Some functions support non-pointer paths | Pointer input only |
| A3 Error prevention | Rule | Satisfies all three mechanisms: reversible, checkable, and confirmation | Satisfies one of the three mechanisms | Visual warning only | No safeguard |
| Module | Functional Role | Key Design Features |
|---|---|---|
| Role setting | Anchor the reasoning perspective | Uses physiological decline features of older adults (reduced contrast sensitivity, tremor, reduced working memory) as the basis for judgment; excludes general esthetic preferences and younger-user interaction habits |
| Evaluation scope | Declare input constraints | States the static limitation of Figma-exported CSS; for indicators involving runtime behavior (V2, O2, A3), applies fallback rules when judgment evidence is insufficient |
| Task objective | Define the execution action | Produces four-tier scores for each of the nine indicators based on combined screenshot (global visual) and code (element-level parameter) input |
| Evaluation criteria | Constrain output latitude | Encodes tier thresholds as “if…then…” rules; includes WCAG exemptions (large text, decorative text, placeholders); specifies lowest-tier rule for multi-element conflicts and code–screenshot discrepancy |
| Evidence constraint | Exclude speculative output | Requires the evidence field to contain verifiable evidence (code snippets, numerical calculations, or specific element counts); prohibits uncertainty terms such as “probably”, “possibly”, or “presumably” |
| Output format | Standardize the data interface | Returns , weighted_score, evidence, and suggestion for each indicator in JSON, plus scenario-level total_score, compliance_level, and summary |
| Indicator | Scenario A, M (SD) | Scenario V, M (SD) | Scenario O, M (SD) |
|---|---|---|---|
| Fixation duration (ms) | 409.26 (85.04) | 364.77 (71.64) | 344.34 (88.62) |
| Blink rate (per s) | 0.40 (0.15) | 0.38 (0.15) | 0.35 (0.19) |
| Saccade rate (per s) | 2.20 (0.52) | 2.26 (0.51) | 2.26 (0.50) |
| Pupil change rate (%) | 5.23 (1.23) | 4.65 (1.21) | 4.01 (0.96) |
| Contrast | t (34) | p (Raw) | p (Bonferroni) | Cohen’s | Significance |
|---|---|---|---|---|---|
| A vs. O | −5.060 | <0.001 | <0.001 | 0.855 | *** |
| A vs. V | −2.961 | 0.0056 | 0.0167 | 0.501 | * |
| V vs. O | 2.043 | 0.0488 | 0.1465 | 0.345 | ns |
| Scenario | μ = 1.0 | μ = 1.5 | μ = 1.855 |
|---|---|---|---|
| Emergency call page | 91.67 | 90.48 | 89.82 |
| Lighting control page | 80.56 | 79.76 | 79.32 |
| Air conditioning control page | 80.56 | 79.76 | 79.32 |
| Smart assistive chair detail page | 77.78 | 77.38 | 77.16 |
| Bedroom control page | 69.44 | 69.05 | 68.83 |
| Home page | 69.44 | 69.05 | 68.83 |
| Indicator | Home | Emergency | Lighting | AC | Chair | Bedroom | |
|---|---|---|---|---|---|---|---|
| V1 Contrast (rule) | 9.52% | 75 | 75 | 100 | 100 | 100 | 75 |
| V2 Text scaling (rule) | 9.52% | 75 | 100 | 100 | 100 | 75 | 75 |
| V3 Non-text contrast (semantic) | 9.52% | 50 | 100 | 50 | 50 | 75 | 50 |
| O1 Target size (rule) | 9.52% | 75 | 100 | 75 | 75 | 75 | 75 |
| O2 Pointer cancelation (rule) | 9.52% | 100 | 100 | 100 | 100 | 100 | 100 |
| O3 Focus order (semantic) | 9.52% | 50 | 100 | 75 | 75 | 50 | 50 |
| A1 Status messages (semantic) | 14.29% | 50 | 75 | 50 | 50 | 75 | 25 |
| A2 Keyboard accessible (semantic) | 14.29% | 75 | 75 | 75 | 75 | 75 | 75 |
| A3 Error prevention (rule) | 14.29% | 75 | 100 | 100 | 100 | 75 | 100 |
| Composite score (5-run mean) | 69.04 | 90.47 | 79.75 | 79.75 | 77.38 | 68.57 | |
| Compliance level | <75 | ≥90 | 75–80 | 75–80 | 75–80 | <75 |
| Scenario | Kendall’s W (Experts) | Significance | Expert Mean vs. LLM (Spearman’s ) | Weighted Cohen’s | Significance |
|---|---|---|---|---|---|
| Emergency call page | 0.92 | < 0.001 | 0.93 | 0.88 | < 0.001 |
| Air conditioning control page | 0.85 | < 0.001 | 0.88 | 0.81 | < 0.001 |
| Lighting control page | 0.84 | < 0.001 | 0.86 | 0.80 | < 0.001 |
| Smart assistive chair detail page | 0.78 | < 0.001 | 0.82 | 0.74 | < 0.001 |
| Bedroom control page | 0.74 | < 0.01 | 0.79 | 0.68 | < 0.01 |
| Home page | 0.66 | < 0.01 | 0.71 | 0.58 | < 0.01 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, Z.; Chen, Y. An Eye-Tracking-Driven Evaluation Framework for Age-Friendly Smart Home Interface. Appl. Sci. 2026, 16, 5454. https://doi.org/10.3390/app16115454
Huang Z, Chen Y. An Eye-Tracking-Driven Evaluation Framework for Age-Friendly Smart Home Interface. Applied Sciences. 2026; 16(11):5454. https://doi.org/10.3390/app16115454
Chicago/Turabian StyleHuang, Zixin, and Yushu Chen. 2026. "An Eye-Tracking-Driven Evaluation Framework for Age-Friendly Smart Home Interface" Applied Sciences 16, no. 11: 5454. https://doi.org/10.3390/app16115454
APA StyleHuang, Z., & Chen, Y. (2026). An Eye-Tracking-Driven Evaluation Framework for Age-Friendly Smart Home Interface. Applied Sciences, 16(11), 5454. https://doi.org/10.3390/app16115454

