The Tabular Accessibility Dataset: A Benchmark for LLM-Based Web Accessibility Auditing
Abstract
1. Summary
2. Data Description
- $framework indicates the Javascript framework (i.e., “angular”, “react”, and “vue”) or the “php” language.
- $validity indicates whether the generated table will be valid according to the WCAG (“accessible”) or not (“invalid”).
- angular-table-accessible.js;
- angular-table-invalid.js;
- php-table-accessible.php;
- php-table-invalid.php;
- react-table-accessible.js;
- react-table-invalid.js;
- vue-table-accessible.html;
- vue-table-invalid.html.
3. Methods
3.1. Languages and Frameworks Covered
3.2. Details on Development
- A minimal table using the Options API;
- A minimal table using the Composition API;
- An accessible table using the Options API;
- An accessible table using the Composition API.
- Each student was required to select a unique dataset of at least 50 rows and 5 attributes, representing real-world data such as songs, movies, environmental measurements, etc.
- Students were encouraged to modularize their code using multiple components where appropriate and to leverage native Vue features (e.g., props, slots, reactivity).
- All components had to remain autonomous, meaning no shared logic or global imports were allowed across implementations. Code duplication was explicitly permitted to ensure isolation.
- Deliverables were submitted as compressed .zip archives, with the source code and data included, but build artifacts (e.g., node_modules, dist) were excluded for portability and reproducibility.
- A set of rules and consistency guidelines for code submission was defined.
- Each student proposed a unique dataset through a dedicated online form, ensuring diversity and avoiding duplication across the cohort.
- Students implemented their Vue components according to the predefined rules and submitted their initial code.
- The professor performed a first review of all submissions, providing individual feedback.
- Students revised and corrected their code based on the professor’s comments and submitted the final version.
- The professor collected all the submitted works and performed manual quality checks, including
- Normalization of directory structures;
- Removal of unnecessary or temporary files (e.g., node_modules,dist, editor configuration);
- Verification of code readability, indentation, and consistency;
- Ensuring presence of both accessible and non-accessible versions.
- The entire dataset was anonymized to remove any personally identifiable information or metadata.
3.3. Implementation Diversity Within Table Structures
3.4. Selected Datasets
3.5. Usage and Application
- Source Code Analysis Only: Researchers may focus solely on the raw HTML, Vue, React, Angular, or PHP source code to test the ability of LLMs or static analyzers to detect accessibility issues without relying on runtime context. This is especially relevant for pre-deployment code review tools.
- Rendered Output Analysis: By executing the components and inspecting the resulting DOM, researchers can simulate what a user’s browser would process. This method is suitable for testing tools that analyze accessibility post-rendering (e.g., using Axe-core or Lighthouse).
- Combined Static and Dynamic Analysis: The most comprehensive use involves combining code-level insights with runtime behaviors, allowing for a comparison of LLM predictions and traditional tool outputs. This approach is particularly useful for validating whether LLMs can correctly infer the final structure and accessibility state of dynamic components.
4. User Notes
4.1. Compliance with FAIR Principles
- Findability: The dataset is uniquely identified by a Digital Object Identifier (DOI). This permanent reference acts like a digital fingerprint, ensuring it can always be found and is cited accurately.
- Accessibility: The dataset is hosted on Zenodo, a repository dedicated to open science that supports long-term preservation.
- Interoperability: The included examples are composed of plain text files. This makes the data highly flexible and easy to work with, as the source code can be analyzed directly with any standard text editor or programming tool.
- Reusability: The content is published under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. This a very permissive license that allows for the unrestricted use, modification, and distribution of the dataset as long as the original creators are credited.
4.2. Dataset Limitations
5. Conclusions and Future Work
- Extend the dataset with additional User Interface (UI) components such as modals, forms, and menus that pose more advanced accessibility challenges.
- Incorporate runtime accessibility audit results (e.g., from Axe-core, Lighthouse) as labels to enable supervised learning approaches.
- Develop benchmark tasks for evaluating the accessibility-awareness of LLMs through code completion, explanation, or repair prompts.
- Expand to additional frameworks such as Svelte Nuxt.js and potentially server-rendered environments such as Django or ASP.NET.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
ML | Machine Learning |
LLM | Large Language Model |
SDGs | Sustainable Development Goals |
WCAG | Web Content Accessibility Guidelines |
HCI | Human–Computer Interaction |
DOI | Digital Object Identifier |
SFC | Single-File Component |
UI | User Interface |
References
- Ferri, D.; Favalli, S. Web Accessibility for People with Disabilities in the European Union: Paving the Road to Social Inclusion. Societies 2018, 8, 40. [Google Scholar] [CrossRef]
- Bricout, J.; Baker, P.M.A.; Moon, N.W.; Sharma, B. Exploring the Smart Future of Participation: Community, Inclusivity, and People With Disabilities. Int. J.-E-Plan. Res. 2021, 10, 94–108. [Google Scholar] [CrossRef]
- Teixeira, P.; Eusébio, C.; Teixeira, L. Understanding the integration of accessibility requirements in the development process of information systems: A systematic literature review. Requir. Eng. 2024, 29, 143–176. [Google Scholar] [CrossRef]
- Abou-Zahra, S.; Brewer, J. Standards, Guidelines, and Trends. In Web Accessibility; Springer: London, UK, 2019; pp. 225–246. [Google Scholar] [CrossRef]
- WebAIM-Web Accessibility in Mind. WAVE Web Accessibility Evaluation Tools. 2025. Available online: https://wave.webaim.org/ (accessed on 6 August 2025).
- Deque Systems Inc. Axe: The Accessibility Engine. 2024. Available online: https://www.deque.com/axe/ (accessed on 6 August 2025).
- Chrome for Developers. Introduzione a Lighthouse. 2025. Available online: https://developer.chrome.com/docs/lighthouse/overview?hl=en (accessed on 6 August 2025).
- Ara, J.; Sik-Lanyi, C. Automated evaluation of accessibility issues of webpage content: Tool and evaluation. Sci. Rep. 2025, 15, 9516. [Google Scholar] [CrossRef] [PubMed]
- Alarcon, R.; Moreno, L.; Martinez, P. Lexical Simplification System to Improve Web Accessibility. IEEE Access 2021, 9, 58755–58767. [Google Scholar] [CrossRef]
- Roumeliotis, K.I.; Tselikas, N.D. Evaluating Progressive Web App Accessibility for People with Disabilities. Network 2022, 2, 350–369. [Google Scholar] [CrossRef]
- Seixas Pereira, L.; Duarte, C. Evaluating and monitoring digital accessibility: Practitioners’ perspectives on challenges and opportunities. Univers. Access Inf. Soc. 2025, 24, 2553–2571. [Google Scholar] [CrossRef]
- Abou-Zahra, S.; Brewer, J.; Cooper, M. Artificial Intelligence (AI) for Web Accessibility: Is Conformance Evaluation a Way Forward? In Proceedings of the 15th International Web for All Conference, W4A ’18, Lyon, France, 23–25 April 2018; ACM: New York, NY, USA, 2018; pp. 1–4. [Google Scholar] [CrossRef]
- Makati, T. Machine learning for accessible web navigation. In Proceedings of the 19th International Web for All Conference, W4A’22, Lyon, France, 25–26 April 2022; ACM: New York, NY, USA, 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Ara, J.; Sik-Lanyi, C. Webpage Accessibility Evaluation Using Machine Learning Technique. In Proceedings of the 2023 14th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Budapest, Hungary, 22–23 September 2023; IEEE: New York, NY, USA, 2023; Volume 9, pp. 000069–000074. [Google Scholar] [CrossRef]
- Pedemonte, G.; Leotta, M.; Ribaudo, M. Improving Web Accessibility With an LLM-Based Tool: A Preliminary Evaluation for STEM Images. IEEE Access 2025, 13, 107566–107582. [Google Scholar] [CrossRef]
- Oswal, S.K.; Oswal, H.K. Conversational AI for Accessible Website Design: Integrating LLM Assistants in Website Builders. In New Frontiers for Inclusion; Springer Nature: Cham, Switzerland, 2025; pp. 251–261. [Google Scholar] [CrossRef]
- Othman, A.; Dhouib, A.; Nasser Al Jabor, A. Fostering websites accessibility: A case study on the use of the Large Language Models ChatGPT for automatic remediation. In Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments, PETRA ’23, Corfu, Greece, 5–7 July 2023; ACM: New York, NY, USA, 2023; pp. 707–713. [Google Scholar] [CrossRef]
- Delnevo, G.; Andruccioli, M.; Mirri, S. On the Interaction with Large Language Models for Web Accessibility: Implications and Challenges. In Proceedings of the 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 6–9 January 2024; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
- López-Gil, J.M.; Pereira, J. Turning manual web accessibility success criteria into automatic: An LLM-based approach. Univers. Access Inf. Soc. 2024, 24, 837–852. [Google Scholar] [CrossRef]
- Andruccioli, M.; Bassi, B.; Delnevo, G.; Salomoni, P. Leveraging Large Language Models for Sustainable and Inclusive Web Accessibility. Preprints 2025. [Google Scholar] [CrossRef]
- Amazon Music Reviews Dataset. Available online: https://www.kaggle.com/datasets/eswarchandt/amazon-music-reviews (accessed on 15 May 2025).
- NASA. Near-Earth Asteroids and Comets. Available online: https://data.nasa.gov/resource/2vr3-k9wn.json (accessed on 6 August 2025).
- NVIDIA Stock Data. Available online: https://www.kaggle.com/datasets/muhammaddawood42/nvidia-stock-data (accessed on 15 May 2025).
- Real World Smartphones Dataset. Available online: https://www.kaggle.com/datasets/abhijitdahatonde/real-world-smartphones-dataset (accessed on 15 May 2025).
- Used Car Dataset. Available online: https://www.kaggle.com/datasets/mohitkumar282/used-car-dataset?select=used_car_dataset.csv (accessed on 15 May 2025).
- FPS in Video Games Dataset. Available online: https://www.kaggle.com/datasets/kritikseth/achieved-frames-per-second-fps-in-video-games (accessed on 15 May 2025).
- Pokémon Dataset. Available online: https://www.kaggle.com/datasets/jaidalmotra/pokemon-dataset (accessed on 15 May 2025).
- Footballers with 50+ International Goals. Available online: https://www.kaggle.com/datasets/whisperingkahuna/footballers-with-50-international-goals-men (accessed on 15 May 2025).
- EPL Dataset 2022-2023. Available online: https://www.kaggle.com/datasets/acothaha/epl-dataset-20222023-update-every-week (accessed on 15 May 2025).
- Global Health Statistics. Available online: https://www.kaggle.com/datasets/malaiarasugraj/global-health-statistics (accessed on 15 May 2025).
- 2023 World Population by Country. Available online: https://www.kaggle.com/datasets/rajkumarpandey02/2023-world-population-by-country?resource=download&select=countries-table.json (accessed on 15 May 2025).
- Air Quality and Pollution Assessment. Available online: https://www.kaggle.com/datasets/mujtabamatin/air-quality-and-pollution-assessment (accessed on 15 May 2025).
- Crime Rate by Country 2024. Available online: https://www.kaggle.com/datasets/shahriarkabir/crime-rate-by-country-2024 (accessed on 15 May 2025).
- NBA Players Data. Available online: https://www.kaggle.com/datasets/justinas/nba-players-data (accessed on 15 May 2025).
- Fortnite Players Stats. Available online: https://www.kaggle.com/datasets/iyadali/fortnite-players-stats?select=Fortnite_players_stats.csv (accessed on 15 May 2025).
- League of Legends Master Players. Available online: https://www.kaggle.com/datasets/jasperan/league-of-legends-master-players (accessed on 15 May 2025).
- Top 100 Most Streamed Songs on Spotify. Available online: https://www.kaggle.com/datasets/pavan9065/top-100-most-streamed-songs-on-spotify/data (accessed on 15 May 2025).
- The Office Dataset. Available online: https://www.kaggle.com/datasets/nehaprabhavalkar/the-office-dataset (accessed on 15 May 2025).
- Recipes3k Dataset. Available online: https://www.kaggle.com/datasets/crispen5gar/recipes3k (accessed on 15 May 2025).
- NASA Meteorite Landings (Y77D-TH95). Available online: https://data.nasa.gov/resource/y77d-th95.json (accessed on 15 May 2025).
- Google Play Store Dataset. Available online: https://www.kaggle.com/datasets/arnikaer/googleplaystore (accessed on 15 May 2025).
- Erasmus Mobility Statistics (2014–2019). Available online: https://www.kaggle.com/datasets/donjoeml/erasmus-mobility-statistics-2014-2019 (accessed on 15 May 2025).
- Bank Marketing Dataset. Available online: https://www.kaggle.com/datasets/mahdiehhajian/bank-marketing (accessed on 15 May 2025).
- Steam Game Recommendations. Available online: https://www.kaggle.com/datasets/antonkozyriev/game-recommendations-on-steam (accessed on 15 May 2025).
- Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef] [PubMed]
Split Approach | Deliveries |
---|---|
Single file | 3, 4, 10, 13, 17, 18, 19, 23 |
Body-only | 1, 8, 12 |
Header and body | 2, 5, 6, 7, 9, 11, 14, 15, 16, 20, 21, 22, 24, 25 |
Delivery | Name | Reference |
---|---|---|
delivery-01 | Cinema anomalies | N/A |
delivery-02 | Amazon Musical Instruments Reviews | [21] |
delivery-03 | NASA - Near-Earth Asteroids and Comets | [22] |
delivery-04 | NVIDIA-STOCK-DATA | [23] |
delivery-05 | Real World Smartphone’s | [24] |
delivery-06 | Used Car | [25] |
delivery-07 | Achieved Frames per Second (FPS) in Video Games | [26] |
delivery-08 | Pokemon | [27] |
delivery-09 | Footballers with 50+ International Goals [men] | [28] |
delivery-10 | EPL Dataset 2022/2023 | [29] |
delivery-11 | Global Heath Statistics | [30] |
delivery-12 | World Population by Country | [31] |
delivery-13 | Air Quality and Pollution Assessment | [32] |
delivery-14 | Crime Rate by Country 2024 | [33] |
delivery-15 | NBA Players | [34] |
delivery-16 | Fortnite Players Stats | [35] |
delivery-17 | League of Legends Master+ Players | [36] |
delivery-18 | Top 100 Most Streamed Songs on Spotify | [37] |
delivery-19 | The Office Dataset | [38] |
delivery-20 | Food Recipes | [39] |
delivery-21 | NASA - Earth Meteorite Landings | [40] |
delivery-22 | Google Playstore | [41] |
delivery-23 | Erasmus mobility statistics 2014_2019 | [42] |
delivery-24 | Bank marketing | [43] |
delivery-25 | Game Recommendations on Steam | [44] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Andruccioli, M.; Bassi, B.; Delnevo, G.; Salomoni, P. The Tabular Accessibility Dataset: A Benchmark for LLM-Based Web Accessibility Auditing. Data 2025, 10, 149. https://doi.org/10.3390/data10090149
Andruccioli M, Bassi B, Delnevo G, Salomoni P. The Tabular Accessibility Dataset: A Benchmark for LLM-Based Web Accessibility Auditing. Data. 2025; 10(9):149. https://doi.org/10.3390/data10090149
Chicago/Turabian StyleAndruccioli, Manuel, Barry Bassi, Giovanni Delnevo, and Paola Salomoni. 2025. "The Tabular Accessibility Dataset: A Benchmark for LLM-Based Web Accessibility Auditing" Data 10, no. 9: 149. https://doi.org/10.3390/data10090149
APA StyleAndruccioli, M., Bassi, B., Delnevo, G., & Salomoni, P. (2025). The Tabular Accessibility Dataset: A Benchmark for LLM-Based Web Accessibility Auditing. Data, 10(9), 149. https://doi.org/10.3390/data10090149