The Evolution of Software Usability in Developer Communities: An Empirical Study on Stack Overflow

Djalali, Hans; Aljedaani, Wajdi; Ludi, Stephanie

doi:10.3390/software4040027

Open AccessArticle

The Evolution of Software Usability in Developer Communities: An Empirical Study on Stack Overflow

by

Hans Djalali

^*

,

Wajdi Aljedaani

and

Stephanie Ludi

Department of Computer Science and Engineering, College of Engineering, University of North Texas, Denton, TX 76207, USA

^*

Author to whom correspondence should be addressed.

Software 2025, 4(4), 27; https://doi.org/10.3390/software4040027 (registering DOI)

Submission received: 18 July 2025 / Revised: 14 October 2025 / Accepted: 27 October 2025 / Published: 31 October 2025

(This article belongs to the Topic Software Engineering and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study investigates how software developers discuss usability on Stack Overflow through an analysis of posts from 2008 to 2024. Despite recognizing the importance of usability for software success, there is a limited amount of research on developer engagement with usability topics. Using mixed methods that combine quantitative metric analysis and qualitative content review, we examine temporal trends, comparative engagement patterns across eight non-functional requirements, and programming context-specific usability issues. Our findings show a significant decrease in usability posts since 2010, contrasting with other non-functional requirements, such as performance and security. Despite this decline, usability posts exhibit high resolution efficiency, achieving the highest answer and acceptance rates among all topics, suggesting that the community is highly effective at resolving these specialized questions. We identify distinctive platform-specific usability concerns: web development prioritizes responsive layouts and form design; desktop applications emphasize keyboard navigation and complex controls; and mobile development focuses on touch interactions and screen constraints. These patterns indicate a transformation in the sharing of usability knowledge, reflecting the maturation of the field, its integration into frameworks, and the migration to specialized communities. This first longitudinal analysis of usability discussions on Stack Overflow provides insights into developer engagement with usability and highlights opportunities for integrating usability guidance into technical contexts.

Keywords:

usability; stack overflow; empirical study

1. Introduction

Technology evolves rapidly, but the basic principles of human–computer interaction stay consistent. Software that neglects consideration of fundamental usability concerns, even when technically sophisticated, risks frustrating users and ultimately losing market share. When developers incorporate usability principles into their design, they are better equipped to create systems that align naturally with users’ thought processes and workflows, resulting in products that are both more appealing and more effective. Despite usability’s well-known importance, a considerable gap exists in the research [1]. While previous studies have examined usability evaluation from user perspectives and reviews [2,3], and recent advances have introduced AI-powered tools for enhancing UX evaluation methodologies [4], the developer community’s engagement with usability topics, the evolution of these discussions, and their prominence relative to other software development concerns remain unexplored. Recent work has highlighted ongoing challenges in developer usability practices [1] and collaboration issues [5], yet systematic analysis of how developers discuss and prioritize usability within their communities is still lacking.

In this paper, our goal is to fill this gap by exploring usability discussions on Stack Overflow, one of the largest online Q&A platforms for developers to find answers to their technical questions [6]. While developers engage across multiple platforms including GitHub for collaborative development [7], Reddit for conversational discussions [8], and Discord for real-time community interaction [9], Stack Overflow remains the predominant platform for technical Q&A and problem-solving discussions [10]. Research demonstrates that Stack Overflow serves as the primary platform for developers seeking answers about programming languages, libraries, and technical implementation challenges [11], making it the most representative platform for analyzing developer engagement with usability topics. Unlike GitHub’s code-centric discussions or Reddit’s general programming discourse, Stack Overflow’s structured Q&A format and comprehensive tagging system provide optimal conditions for the systematic analysis of specific technical concerns like usability [12].

Stack Overflow has provided valuable insights into how different software quality concerns are discussed and prioritized [13]. We extracted 894 Stack Overflow posts tagged with usability, examining the temporal changes in discussion volume and engagement and comparing these patterns to other non-functional requirements (NFRs). We also investigated how usability challenges occur across different programming languages and platforms. Specifically, we address the following research questions:

RQ1: How have usability discussions on Stack Overflow changed over the years?

This question examines the evolution of usability posts on Stack Overflow from 2008 to 2024. The goal is to identify patterns in engagement, including the post frequency, view counts, and acceptance rates, to understand how developers’ interactions with usability topics have changed over time. We also consider possible external factors that may influence these patterns, such as the introduction of new technologies and the impact of other platforms.

RQ2: To what extent do engagement patterns differ between usability discussions and other non-functional requirements on Stack Overflow?

In this research question, we examine the impact and role of usability among the non-functional requirements discussed on Stack Overflow. We used engagement metrics and thematic analysis to compare usability with other NFRs from ISO/IEC 25010 to understand the priority in developers’ design considerations and find possible trade-offs relative to other quality attributes and their possible effect on user experience.

RQ3: What usability issues are most frequently discussed in relation to specific programming languages and platforms on Stack Overflow?

This research question aims to identify the most frequently mentioned usability problems associated with various platforms (e.g., web, mobile, desktop) and specific programming languages (e.g., Java, Python, JavaScript) on Stack Overflow. We also examined which languages are most commonly associated with usability issues and what types of challenges appear to be unique to specific contexts, thereby providing a better understanding of practical usability difficulties in different development environments.The main contributions of this study are summarized as follows:

We perform the first comprehensive longitudinal analysis (2008–2024) of usability discussions on Stack Overflow, detailing their evolution over time.
We provide novel insights into how usability engagement and prioritization compare to other key non-functional requirements within developer discourse.
We apply a mixed-methods approach, yielding quantitative trends in usability discussions and qualitative insights into common usability issues associated with specific programming languages and platforms.
Our findings offer actionable implications for software developers, UX/UI designers, platform communities, and educators seeking to better integrate usability considerations throughout the development lifecycle.

This paper is structured as follows: Section 2 provides background on usability and NFRs in the context of Stack Overflow. Section 3 reviews the related work on mining discussions in Stack Overflow and empirical studies on usability. Section 4 describes the research methodology, including data collection and analysis techniques. Section 5 presents the study results, detailing the findings for each research question. Section 6 discusses the key takeaways and their broader implications. Section 7 addresses potential threats to validity. Finally, Section 8 summarizes the main findings and concludes the paper.

2. Background

2.1. Usability as a Non-Functional Requirement

Usability is one of the essential non-functional requirements that directly impacts user satisfaction and product success. ISO 9241-11 defines usability as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use” [14]. Usability, alongside other non-functional requirements defined in ISO/IEC 25010 [15], including accessibility, performance, compatibility, reliability, security, maintainability, and portability, all contribute to the overall quality of software systems, but these often compete for developer attention and resources. Understanding where usability stands among these priorities in developer discussions can provide better insights into how developers approach user-centered design principles in practice.

Beyond standardized definitions, the field of usability has developed several influential theoretical frameworks that guide evaluation and design. Nielsen’s usability heuristics [16], comprising ten general principles for interaction design such as visibility of system status, user control and freedom, and error prevention, have become foundational in identifying usability problems during interface evaluation. Cognitive walkthroughs [17] provide a structured approach for evaluating user interfaces by guiding users through tasks from their perspective, with a focus on learnability and the ease with which users can accomplish their goals. User-centered design principles [18] emphasize iterative design processes that actively involve users throughout the development process, ensuring that products meet actual user needs rather than assumed requirements. These frameworks collectively establish a theoretical foundation for understanding what makes software systems usable and how to systematically evaluate and improve user experience.

In practice, however, our analysis of 894 usability-related posts on Stack Overflow reveals a substantial gap between these established theoretical frameworks and developer discourse. Only nine posts (1.0%) explicitly referenced Nielsen’s heuristics, while twelve posts (1.3%) mentioned WCAG or accessibility guidelines. The majority of posts made no reference to usability standards or guidelines. This pattern suggests that while developers actively engage with usability issues, their discussions often lack grounding in formal usability theory, instead focusing on technical implementation challenges and practical problem-solving.

For this study, we adopt a pragmatic approach to analyzing usability discussions. While ISO/IEC 25010 provides a standardized framework for categorizing usability as a non-functional requirement alongside performance, security, and other quality attributes [15], we recognize that developers’ practical discussions of usability may not align directly with these formal taxonomies. Our qualitative content analysis of usability posts revealed distinct categories of developer concerns, including interactive elements and controls, navigation and information architecture, form design and input, error handling and validation, accessibility and inclusion, performance and feedback, layout and visual design, and mobile/touch interactions. These categories emerged from our systematic examination of how developers actually frame and discuss usability challenges in their daily work. This empirical developer-centered approach enables us to capture the practical manifestation of usability concerns as they emerge in real-world development contexts, revealing both where developer practice aligns with established usability theory and where it diverges into implementation-specific challenges. By grounding our analysis in actual developer discourse rather than imposing theoretical frameworks, we provide insights into how usability concerns are conceptualized and addressed within technical communities.

2.2. Stack Overflow and Non-Functional Requirements

Stack Overflow (SO), with its large user base, serves as a valuable resource for understanding how developers discuss and prioritize different non-functional requirements. SO posts offer developers valuable benefits by providing access to diverse experiences and problem-solving techniques that enhance their implementation of non-functional requirements [19]. For example, when developers encounter issues with system performance optimization on a specific system, they can search for relevant posts and discover effective techniques for enhancing system speed, scalability, and reliability. Stack Overflow posts provide valuable insights into common programming pitfalls and mistakes that developers frequently encounter. By reviewing posts on the platform, software developers can gain a deeper understanding of common errors and how to prevent them [20]. When working on more complex projects, developers can utilize SO to identify common mistakes made by others in similar projects and apply that knowledge to avoid similar issues, ensuring their project meets non-functional requirements.

2.3. Developer Engagement Metrics on Stack Overflow

Stack Overflow provides several metrics that are valuable for understanding the relative importance and attention given to various non-functional requirements, including usability. The dataset used in our experiments contains the version history of each post, along with the relevant metadata, including post attributes and engagement metrics, as described in Table 1.

2.4. Usability in Different Programming Contexts

Software usability considerations can vary significantly across different programming languages and platforms. Web development, for example, may focus on responsive design, accessibility, and cross-browser compatibility, whereas desktop application development may focus on workflow efficiency and system integration. Mobile development introduces additional considerations related to touch interfaces, limited screen sizes, and varying device capabilities. These contextual differences manifest in the types of usability challenges developers face and discuss. For example, developers working with JavaScript might frequently face issues related to form validation feedback, while Python developers working on data visualization might face more issues related to information presentation clarity. Understanding usability issues specific to each platform can help us identify how usability principles apply in different development environments and for various user groups.

3. Related Work

Usability is crucial to software development, because it significantly impacts user experience and overall success. Previous studies in this area have focused on analyzing usability aspects and utilizing user testing procedures to conduct a thorough evaluation of software usability [21,22]. In this section, we explore numerous significant previous studies and efforts that have had a significant impact on shaping the development of our approach. Following this, the present study categorizes the related work into two distinct categories related to usability.

3.1. Mining Discussions in Stack Overflow

Previous studies have utilized discussions on Stack Overflow in various empirical studies, such as examining user behavior [23,24], extracting documentation [25], analyzing general topics [10,26], analyzing software code [27], and assigning tags to discussion posts [28]. There is a growing trend in extracting fine-grained knowledge for specific API types or platforms, such as Java API usage in mobile applications [29] and analyzing interesting topics in general discussions among mobile developers [11].

Several studies have conducted mining discussions in Stack Overflow related to different aspects of non-functional requirements [19,20,30]. For example, Peruma et al. [31] investigated conversations related to software refactoring on Stack Overflow using both quantitative and qualitative methods. Their objective was to gain insights into the patterns and issues surrounding developer discussions on software refactoring concepts and activities. Bayer et al. [32] examined how Android application developers pose questions on Stack Overflow and determined the possibility of automating the categorization of posts into different question categories. Another study utilized the LDA topic model to detect and assess non-functional requirements discussed in Stack Overflow posts regarding iOS development [33]. Other studies investigated recognizing and comprehending the actual issues, demands, tendencies, and crucial non-functional requirements or quality criteria that were the subject of discussion on Stack Overflow concerning the development of iOS mobile applications [19].

Zou et al. [20] analyzed which non-functional requirements developers focus on in Stack Overflow. Their work provided a foundation for comparing different types of non-functional requirements, though they did not specifically examine the engagement metrics or temporal trends that our study addresses. Similarly, Ahmad et al. [19] explored the non-functional requirements of iOS developers on Stack Overflow, focusing on specific platform constraints rather than comparing across non-functional categories. Mining discussions in Stack Overflow has emerged as a promising area of research to gain knowledge on different aspects of software. While several studies have conducted temporal analysis of Stack Overflow discussions for general topics [10], community evolution [34], and technology trends [35], our review of the related work reveals that all previous studies examining usability and non-functional requirements employed cross-sectional approaches. Existing NFR studies on Stack Overflow analyze aggregated data without a temporal dimension [19,20,31,32,33], and empirical usability research focuses on static analysis of user reviews or single-timepoint evaluations [2,3,36] as demonstrated in Table 2. Although longitudinal analysis has proven valuable for understanding software evolution [37], no prior study has specifically applied a longitudinal methodology to usability discussions in developer communities. Our work addresses this research gap by providing the first systematic longitudinal analysis (2008–2024) of usability discussions on Stack Overflow.

3.2. Empirical Studies on Usability

Previous studies have explored different aspects of usability. Some studies have centered on identifying usability issues through the analysis of user reviews [36,38,39], while others have focused on accessibility aspects [40,41,42]. Some studies investigated usability maintenance by developing automated testing methodologies that can quickly identify usability issues across different types of applications, including web-based and mobile apps, as well as haptic systems. These automated tests capture interface elements and user interactions [43,44,45]. However, these studies have mostly shown usability issue data in the form of screen captures and user interaction records, which often lack comprehensive information about the identified usability issues.

The MALTU methodology [46], for example, categorizes user posts extracted manually or automatically from a social system to evaluate aspects of usability and user experience. This methodology has been used to evaluate HCI in various domains, including healthcare systems [47], educational systems, mobile applications [48], and ubiquitous systems [49]. Lu and Liang [50] state that reliability, usability, portability, and performance are all mentioned in user reviews. However, their study did not aim to determine the amount of attention given to each of these aspects. Instead, their research focused on comparing classification methods for organizing user reviews. Likewise, Hedegaard and Simonsen [51] analyzed 3,492 reviews to explore the existence of specific aspects related to usability and user experience, such as memorability, learnability, and efficiency.

Table 2. Chronological summary of related empirical studies on usability analysis.

Study	Year	Purpose	Method	Source of Info	Sample Size
[51]	2013	Quantify the amount of UUX information and dimensions in online reviews from software and video game domains.	Machine-learning-based classifier	User Reviews	5198 reviews
[52]	2017	Propose an approach to automatically detect the UUX strengths and issues of software features according to user reviews.	Sentiment Analysis	User Reviews	3492 reviews
[53]	2019	Presents usability issues associated with existing mental health applications.	Thematic Analysis	User Reviews	1236 health app reviews
[54]	2020	Explores end users’ perceptions of the usability of disaster apps.	Thematic Analysis	User Reviews	1405 reviews
[2]	2022	Investigate the presence of usability heuristic issues through user reviews.	Manual labeling	User Reviews	200 reviews
[55]	2022	Identify the usability and user experience issues faced by users of stroke caregiving apps.	Sentiment Analysis	User Reviews	15,818 reviews
[3]	2023	Provides insights into the usability factors that impact children’s coding applications.	Manual labeling	User Reviews	300 reviews
[56]	2023	Analyze usability issues in evaluations of mobile applications.	Machine and Deep Learning Models	User Reviews	3220 reviews
[57]	2023	Identify critical usability issues of AI-enabled mobile learning apps by analyzing user reviews.	Sentiment Analysis	User Reviews	445,460 reviews
[58]	2024	Evaluate usability of generative AI applications through user reviews following ISO 9241 standards.	Sentiment Analysis	User Reviews	11,549 reviews
[59]	2025	Classify app reviews based on usability factors (effectiveness, efficiency, satisfaction).	Hybrid DL (BERT-BiLSTM-CNN)	User Reviews	5000 reviews
Our Work	2025	Analyze how usability discussions on Stack Overflow have evolved, how they compare to other non-functional requirements, and how they vary across programming languages and platforms.	Manual labeling	Developers’ Posts	894 Posts

For our third research question regarding programming language and platform-specific usability issues, the related work is limited. While studies have examined usability in mobile environments [60], web applications [61], and specific development tools [62], there has been a lack of systematic analysis on how usability issues differ across programming languages and development platforms. Our work aims to fill this gap by analyzing how developers’ usability discussions vary based on the programming contexts in which they work. Similarly to our study, Diniz et al. [2] analyzed 200 mobile user reviews to examine the occurrence of usability complaints. In alignment with our research, we scrutinize developers’ discussion posts on question-and-answer platforms, where they seek solutions to usability-related queries. Unlike some prior investigations focused on user reviews [2,56], open-source systems [63], and mobile usability [60], our study carves a unique niche by investigating the realm of discussion posts authored by developers on Stack Overflow. To the best of our knowledge, no in-depth analysis of developers’ usability discussions over time, compared with other non-functional requirements and across different programming contexts, has been previously conducted.

4. Study Design

In this study, we used a mixed-methods research methodology that combines the quantitative analysis of engagement metrics with the qualitative content analysis of posts, as described by Tashakkori et al. [64]. Figure 1 shows an overview of our study’s methodology, which consists of three main key steps: (1) extracting the Stack Overflow dataset, (2) cleaning and manually labeling the dataset, (3) validating the coding results, and addressing each research question. The statistical significance of temporal trends and group differences was assessed using non-parametric tests (Spearman correlation, Mann–Whitney U test) and effect sizes (Cohen’s d) to account for non-normal distributions in engagement metrics [65].

4.1. Data Collection

We searched Stack Overflow to collect all posts tagged with usability and also the other seven non-functional requirements defined in ISO/IEC 25010: accessibility, performance, compatibility, reliability, security, maintainability, and portability, to answer research question 2. We selected Stack Overflow as our primary data source based on several methodological and practical considerations. Unlike conversational platforms such as Reddit or Discord, Stack Overflow’s structured question–answer format enables the systematic extraction and analysis of specific technical discussions [66], facilitating reliable identification of usability-related content and engagement metrics. Stack Overflow’s extensive tagging system allows the precise identification of usability discussions through validated tags, providing higher accuracy than keyword-based approaches required on unstructured platforms [67]. With over 22 million questions spanning 16 years (2008–2024), Stack Overflow provides the largest longitudinal dataset of developer technical discussions available [9]. In contrast, alternative platforms either lack sufficient historical data or focus on different interaction types, such as GitHub’s code-centric discussions [7]. The 2024 Stack Overflow Developer Survey demonstrates broad representation across experience levels, with over 65,000 responses from developers worldwide [9]. Extensive academic research validates Stack Overflow as a reliable source for understanding developer behavior and technical challenges [10,12].

We used the Stack Exchange Data Explorer tool (https://data.stackexchange.com/stackoverflow/query/edit/1914805, accessed on 17 July 2025) to query posts from Stack Overflow that contained these tags, disregarding the creation date of the posts. In the query process, we used SQL wildcard pattern matching in our queries, utilizing the LIKE operator with the pattern “Usabilit%” to ensure that we could collect all relevant posts. It allowed us to gather all variations of usability-related tags such as “usability,” “usability-testing,” and other potential derivatives in a single query. However, this method also collected some false positive tags such as “reusability” and “focusability,” which we manually removed from our dataset to maintain focus on genuine usability discussions. Stack Overflow uses a supervised tagging system, where tags are standardized and limited to specific options. Users must select from existing tags rather than creating arbitrary variations such as “usabilitytesting” or “usability-testings.” This standardization made our data collection more consistent. At the same time, it was necessary to manually verify them to exclude irrelevant tags that shared the same word root.

We observed that some tags, such as “usability-evaluation”, “heuristic-evaluation”, and “interface-evaluation”, did not have any posts. We also removed posts with tags “reusability” and “focusability” that were collected in the ’usability’ query. Table 3 presents the collected posts from September 2008, when Stack Overflow was founded, to December 2024, the date the data were gathered.

4.2. Data Preprocessing

Following our initial data collection, we needed to verify the relevance of the collected posts, since keyword-based searching alone is insufficient for determining true usability discussions. The textual nature of usability posts adds an additional dimension of issues associated with the meaning ambiguity. To show this problem, consider the following Box 1 and Box 2.

Box 1. Usability implicitly addressed (https://stackoverflow.com/questions/7732497/is-there-a-library-or-framework-for-creating-annotations-with-nifty-arrows-and-h, accessed on 17 May 2025).

Title: ‘Is there a library or framework for creating annotations with nifty arrows and handwriting?’

Body: “Recently you can find this in lots of apps: When the app starts the first time, it explains the interface with nifty arrows and handwriting letters. Does anyone know of a library or framework which can draw these arrows and handwriting labels in an easy way?…”

Box 2. Example 2: “Usability” mentioned, unrelated context (https://stackoverflow.com/questions/13035464/fast-and-useful-way-of-storing-matrix-values-in-c-sharp, accessed on 17 May 2025).

Title: ‘Fast and useful way of storing matrix values in C#’

Body: “I need to create a 4 × 4 matrix class for a 3D engine in C#. I have seen some other engines storing the matrix values in single float variables…What is the fastest way of storing the matrix for applying transformations? Which option would you choose for usability?”

In the first example, the user is seeking a library or framework that can effectively explain the interface to new users. This post mentions the issue of presenting information in a user-friendly way. The question is related to usability, but the term “usability” is not explicitly used. On the other hand, in Box 2, the post contains the keyword “usability”, but it is not related to usability in the context of software design. Instead, the post is related to a coding question about the performance of different ways of storing matrices when applying transformations. This shows that depending entirely on keywords to identify posts related to usability can be misleading and is a challenging task due to the meaning ambiguity of natural language descriptions. It requires developing more sophisticated approaches that go beyond keyword-based analysis. A post that addresses usability issues may not contain the keyword “usability.” In contrast, a post that contains the keyword “usability” may not be related to usability in the context of software design.

Two researchers with complementary expertise conducted the preprocessing. Both inspectors individually analyzed the 894 usability posts to identify and eliminate false positives and posts that only mentioned superficial usability. They read through each post to determine whether it was genuinely related to software usability and contained substantive discussion, considering the post’s context, language used, and specific references to usability concepts. Through discussion and consensus, they identified and eliminated 222 posts, including those that were incorrectly labeled as usability (such as posts about code “usability” meaning ease of programming) and those that contained only brief or tangential references to usability without meaningful content for analysis. This brought our final dataset to 672 usability posts.

4.3. Data Labeling

After we completed preprocessing, we labeled 672 usability posts based on different aspects needed for our analysis. We extracted temporal information, including creation dates and engagement metrics such as view counts, scores, answer rates, and accepted answer percentages. We classified each post by platform type, categorizing them as web development, desktop applications, mobile development, or platform-agnostic discussions. We identified the programming languages mentioned in each post, including JavaScript, HTML, CSS, C#, Java, Python, and others. We also categorized the types of usability issues discussed, such as layout and responsiveness problems, form design challenges, navigation difficulties, and performance perception concerns.

Additional Validation. To ensure the validity and reliability of our preprocessing and labeling process, we implemented validation procedures. Following Aljedaani et al. [68,69], we randomly selected 9% of our dataset (81 out of 894 posts) for analysis by an auxiliary expert, a Master’s student in Computer Science with more than 1 year of experience in software engineering research and HCI research, who is completely familiar with usability and accessibility evaluation methodologies. The auxiliary expert independently reviewed and labeled the selected posts to validate the consistency of our classification. This sample size meets a 95% confidence level with a 6% confidence interval. We calculated inter-rater agreement using Cohen’s Kappa Coefficient [70], achieving 0.87, which indicates almost perfect agreement [71]. We also cross-verified our findings by comparing post metadata with textual content to ensure consistency and accuracy across our labeling dimensions.

5. Study Results

This section presents the outcomes of our analysis. For each research question, we present the findings.

5.1. RQ1: How Have Usability Discussions on Stack Overflow Changed over the Years?

We examined 672 usability posts from 2008 to 2024 to understand the temporal patterns of usability posts. We examined dates, post frequencies, and engagement metrics, including view counts, scores, and accepted answer rates, to understand the change in community interaction. We analyzed the number of usability posts in relation to the overall activity on Stack Overflow to gain a better understanding of our findings.

Our examination of usability-related posts on Stack Overflow, from its inception in 2008 to 2024, reveals that discussions among developers regarding usability have undergone significant evolution. Figure 2 shows that usability discussions peaked in 2010 with 189 posts, preceded by substantial activity in 2009 (175 posts) and followed by 131 posts in 2011. From 2012 onward, usability discussions experienced a pronounced decline, decreasing to 91 posts in 2012, 54 posts in 2013, and continuing to decline steadily. By 2014, the number of posts per year had fallen below 40, and recent years have shown low activity, with only 5 posts in 2023 and 1 post in 2024.

To confirm whether this decline is domain-specific or reflects general Stack Overflow activity patterns, we examined usability posts in relation to the overall platform posts. As shown in Figure 2, Stack Overflow’s total post volume grew from 57.2 K posts in 2008 to a peak of approximately 2.2 M posts during 2015–2016. Meanwhile, the decline in usability posts began in 2012, while overall platform activity continued to increase. This difference in timing and the uneven rate of decline indicate that usability discussions were influenced by unique factors beyond general platform trends.

We also analyzed engagement metrics for usability posts across this period. Table 4 presents the average view count, score, and answer acceptance rates for usability posts by year. The average view count for usability posts has declined by approximately 87% from 2008 to 2024. The average score, representing the community’s perceived value of a post, has decreased from 6.4 in 2008 to 0.7 in 2024. The accepted answer rate also shows a general decline from the high rates observed in the initial years (consistently above 80% from 2008–2009), becoming much more variable and often lower in the latter half of the study period (Note: Data from 2022–2024 should be interpreted with caution due to the small sample size (n < 15) of posts in these years).

Content analysis shows a change in the focus of usability discussions. Early posts (2008–2012) frequently addressed fundamental interface design principles, while later posts tended to focus more on specific implementation challenges. We also observed an increasing overlap with accessibility discussions over time, with more recent usability posts often addressing both concerns simultaneously.

Statistical analysis confirms these temporal trends are significant. Spearman correlation analysis reveals significant negative correlations between year and engagement metrics: view count (

ρ

= −0.356, p < 0.001) and score (

ρ

= −0.410, p < 0.001). To quantify the magnitude of change, we compared the early period (2008–2011, n = 563) with the recent period (2020–2024, n = 42) using the Mann–Whitney U test. Early-period posts exhibited significantly higher engagement across all metrics: the median view count declined from 742 to 166 (U = 5447, p < 0.001), and the median score declined from 2.0 to 0.0 (U = 5828, p < 0.001). The effect size analysis using Cohen’s d indicated small but meaningful effects (d = 0.29 for view count, d = 0.42 for score), confirming the practical significance of the observed decline. Additionally, the chi-square analysis revealed a significant decrease in the accepted answer rates, from 79.4% in the early period to 44.2% in the recent period (

χ^{2}

= 27.88, df = 1, p < 0.001).

To address the potential bias that newer posts have had less time to accumulate views and engagement, we structured our temporal analysis to compare periods rather than individual years, ensuring sufficient accumulation time across all comparison groups. The early period (2008–2011) and recent period (2020–2024) both represent posts with at least 4–5 years of potential engagement accumulation as of our data collection in January 2025. Furthermore, the consistent decline pattern observed across the intermediate years (2012–2019) confirms this is not merely an artifact of differential accumulation time but reflects a genuine shift in how developers engage with usability discussions on Stack Overflow.

5.2. RQ2: To What Extent Do Engagement Patterns Differ Between Usability Discussions and Other Non-Functional Requirements on Stack Overflow?

To investigate how usability discussions differ from other non-functional requirements (NFRs), we collected and compared questions from usability, accessibility, performance, security, reliability, compatibility, maintainability, and portability. We focused on multiple engagement metrics: total post volume, mean question score, fraction of questions answered, fraction of questions with an accepted answer, average number of answers, mean view count, comment count, and time until the first response. For this research question, we used the unfiltered usability set to ensure identical inclusion criteria across all NFR categories, which were likewise unfiltered. Using the filtered usability subset would change the denominator only for usability and yield non-comparable base rates. Table 5 summarizes the basic engagement metrics across the eight NFR tags.

The data reveal several notable patterns. Performance (102,821 posts) and security (56,443 posts) dominate the discussion space, while usability (894 posts) and reliability (294 posts) have significantly fewer posts. Usability posts show the highest answer rates (94.5%), highest acceptance rates (68.6%), and highest average answer counts (3.3). Portability questions receive the highest average scores (8.27) and the highest average view counts (6263).

We further analyzed the time to first answer for questions in each category, as shown in Table 6.

These data shows a wide variation in community response times across topics. Questions related to maintainability receive the fastest answers on average (8.1 days), while questions related to reliability take the longest (48.3 days).

5.3. RQ3: What Usability Issues Are Most Frequently Discussed in Relation to Specific Programming Languages and Platforms on Stack Overflow?

To identify the most frequently discussed usability issues related to specific programming languages and platforms, we conducted a detailed analysis of the usability posts in our dataset. We first categorized each post based on the programming languages and platforms mentioned. We then analyzed the content of these posts to identify common usability challenges within each programming context. This analysis helps us understand how usability concerns vary across different development environments and technologies.

We begin by presenting a comprehensive taxonomy of usability issue types identified across all posts. Table 7 shows the distribution of primary usability issue categories in our dataset.

Interactive elements and controls dominate usability discussions, representing nearly one-quarter of all posts (24.0%). This category encompasses questions about custom widgets, dropdown menus, button design, and modal dialogs. Navigation and information architecture (12.8%) addresses how users find and access content, while form design and input (11.2%) and error handling and validation (9.7%) together account for over one-fifth of discussions. Accessibility and inclusion (7.9%) demonstrates a growing awareness of inclusive design needs. Mobile and touch interactions constitute 3.1% of posts.

Our analysis revealed significant variations in the frequency and types of usability issues discussed across different programming languages and platforms. Table 8 presents the distribution of programming languages mentioned in usability posts. JavaScript emerged as the most frequently mentioned programming language in usability discussions, with 166 posts (24.7% of all usability posts). This was closely followed by HTML with 160 posts (23.8%) and CSS with 88 posts (13.1%). Other commonly mentioned languages included C# (48 posts, 7.1%), PHP (37 posts, 5.5%), Java (29 posts, 4.3%), and Python (24 posts, 3.6%).

When examining the platforms discussed in usability posts, we found that web development dominated the discussions, as shown in Table 9. Web platforms accounted for 303 posts (45.1% of all usability posts), followed by desktop applications with 148 posts (22.0%) and mobile platforms with 83 posts (12.4%). Interestingly, 138 posts (20.5%) were platform-agnostic, discussing general usability principles applicable across multiple environments.

Our analysis also identified distinct patterns in the types of usability issues discussed for different programming languages and platforms:

5.3.1. Web Development

Among the 303 web platform posts, layout and responsiveness issues featured prominently. These posts focused on creating adaptable interfaces across various screen sizes and devices. Common questions included handling complex layouts, implementing responsive design patterns, and solving CSS layout challenges. One representative example asked: “How to maintain usability of complex data tables on mobile screens without horizontal scrolling?”

Form design and validation represented another major theme in web usability discussions. These focused on creating intuitive forms, providing appropriate feedback during validation, and striking a balance between security and usability in form submissions. An illustrative post asked: “What’s the most user-friendly way to indicate required fields in a complex form without cluttering the interface?”

Navigation and information architecture also emerged as a significant concern, addressing menu design, navigation patterns, and organizing content to facilitate intuitive user journeys. A typical post asked: “Best practices for implementing breadcrumb navigation for deep website hierarchies?”

Performance perception discussions featured regularly in web usability posts. These questions focused on loading indicators, perceived performance, and techniques to make interfaces feel responsive even when processing is occurring. An example post asked: “How to implement skeleton screens in React to improve perceived loading performance?”

Accessibility and usability integration comprised another important area of web usability posts. These discussions explored the intersection of accessibility requirements and usability best practices, particularly in complex interactive components. An example post asked about “Making custom JavaScript dropdown components both accessible and usable for all users.”

5.3.2. Desktop Applications

Among the 148 desktop platform posts, controls and widgets dominated the discussions. These questions addressed designing and implementing custom controls, modifying standard controls for better usability, and creating consistent control behaviors. A representative post asked about “Creating a more usable multi-selection tree view in WPF.”

Window management and layout represented another major theme in desktop usability posts. These discussions explored the differences between MDI (Multiple Document Interface) and SDI (Single Document Interface), managing multiple windows, and organizing complex interfaces. An example post asked about the “Best approach for organizing multiple document editing in a Java Swing application.”

Keyboard navigation and shortcuts also played a prominent role in desktop usability discussions. These focused on implementing intuitive keyboard shortcuts, handling key conflicts, and designing for power users. A typical post asked about “Implementing customizable keyboard shortcuts in a C# desktop application.”

Application feedback and status discussions comprised another important area of desktop usability posts. These questions addressed progress indication, status bars, and communicating system state to users. An example post asked about “How to effectively communicate background task progress in a WinForms application.”

Installation and configuration usability represented an additional concern in desktop usability posts. These addressed first-run experiences, configuration interfaces, and simplified setup procedures. A typical post asked about “Designing a user-friendly configuration interface for a complex Java application.”

5.3.3. Mobile Development

Among the 83 mobile platform posts, several key themes emerged. Touch target size and spacing dominated these discussions, with questions addressing appropriate button sizes, handling small targets, and designing for different finger sizes. A representative post asked about “Best practices for touch target sizes in a space-constrained Android interface.”

Gesture recognition and feedback represented another significant area of discussion, centered on implementing intuitive gestures, providing appropriate feedback for gesture interactions, and handling gesture conflicts. An example post asked, “How to indicate the availability of swipe gestures in a mobile app?”

On-screen keyboards and input methods also featured prominently, addressing challenges with text input, keyboard appearances, and optimizing forms for mobile input. A typical post inquired about “Optimizing form input fields to work well with iOS predictive keyboard.”

Navigation patterns for small screens comprised another major theme, with questions focused on implementing hamburger menus, tab bars, and other navigation paradigms effectively. An example post asked about “Bottom tab bar vs hamburger menu for a content-heavy Android application.”

Performance and battery considerations rounded out the mobile discussions, exploring how to balance interactive features with performance and battery life. A representative post asked about “Implementing smooth animations in React Native without impacting battery life.”

The evolution of usability discussions within specific programming contexts also reveals shifting priorities over time. Early discussions (2008–2012) often focused on fundamental implementation challenges, such as creating responsive layouts and custom controls. More recent discussions increasingly address higher-level usability concerns, including information architecture, interaction design patterns, and the integration of usability with other quality attributes, like accessibility and performance.

Additionally, we observed that certain programming languages showed stronger associations with particular usability concerns. JavaScript discussions frequently addressed dynamic interface updates and interactive behaviors, while C# posts often focused on enterprise application usability patterns and complex form design. The significant presence of posts that mention multiple languages (particularly JavaScript/HTML/CSS combinations) highlights the interconnected nature of modern development and the need for usability solutions that work across technology stacks.

6. Discussion

This section interprets our findings in the context of the existing literature, explains the theoretical and practical significance of our results, and discusses the broader implications for software development practice and research.

6.1. Decline and Transformation of Usability Discussions

The 87% decline in average view counts and 89% decline in average scores from 2008 to 2024 suggest a decreasing need within the broader developer community for seeking out or engaging with usability discussions on this platform. Statistical analysis confirms that these patterns are significant (Spearman correlation, p < 0.001; Mann–Whitney U tests, p < 0.001), not due to random variation. Our period-based temporal comparison methodology addresses potential accumulation bias, as detailed in the Results section. The decrease in scores indicates that newer usability questions are either considered less valuable or are attracting less positive interaction from users. The declining and increasingly variable accepted answer rates may suggest that more recent usability questions are more difficult to resolve to the asker’s satisfaction or that the expertise needed to answer them is less active on the platform.

This transformation in usability discussions can be attributed to several factors that mirror the broader evolution of software development. The early peak period (2009–2011) overlaps with Stack Overflow’s initial growth phase and the widespread adoption of web technologies, suggesting that fundamental usability questions may have been comprehensively addressed during this formative period.

The subsequent decline may reflect the maturation of development tools and frameworks that have increasingly integrated usability considerations into their core functionality. Modern UI frameworks and design systems have standardized many interface patterns that previously required explicit discussion, potentially reducing the need for developers to seek guidance on basic usability implementation [72,73]. Additionally, other Q&A communities, such as Reddit and specialized communities focused on user experience like Stack Exchange User Experience, may have provided alternative places for usability discussions, leading to a migration of expertise and questions away from Stack Overflow.

The dramatic decline in usability discussions on Stack Overflow coincides with the rise of specialized UX communities, suggesting a redistribution rather than disappearance of usability knowledge exchange. This pattern reflects broader specialization trends in software development, where cross-cutting concerns increasingly find dedicated platforms [32]. As design systems and component libraries have standardized common patterns, the focus has shifted from basic implementation questions to more nuanced context-specific challenges.

The temporal pattern also suggests a transformation in how usability knowledge is acquired and applied within development workflows. The integration of usability principles into development education, the availability of comprehensive design resources, and the establishment of organizational design systems may have reduced reliance on ad hoc community support for usability challenges. This evolution reflects the increasing professionalization and specialization within the broader field of software development. This transformation signals a maturing field where fundamental knowledge has been codified and incorporated into development frameworks. The integration of usability considerations into standard development practices could indicate that certain fundamental principles have become established knowledge that does not require frequent discussion. Educators and practitioners should recognize that usability knowledge now flows through multiple channels: formal design systems, specialized communities such as UX Stack Exchange, and integrated documentation, rather than primarily through general programming forums.

The arrival of Large Language Models (LLMs) such as OpenAI ChatGPT, which became publicly available in late 2022, represents a technological development that may influence future patterns of developer knowledge-seeking behavior [74]. However, the temporal characteristics of our data provide important context for interpreting this potential influence. The pronounced decline in usability discussions began in 2012 and continued consistently through 2022, preceding the widespread availability of conversational AI systems by a full decade. Therefore, while LLMs may affect future trends in how developers seek usability guidance, they cannot account for the historical decline patterns observed in our analysis. The potential impact of LLMs on developer communities and traditional Q&A platforms needs investigation in future studies that extend beyond our analysis period.

6.2. Usability as a Specialized Knowledge Domain

These findings reveal a complex picture of usability discussions on Stack Overflow. While usability is a niche topic in terms of question volume, it stands out for its high resolution rates. The high answer and acceptance percentages suggest that when usability questions are asked, the community is exceptionally effective at providing satisfactory answers. This contrasts with other niche topics, such as portability, which, despite being even less frequent, garner the highest community value in terms of scores and views. This indicates that different types of specialized questions are valued in different ways by the community.

The varying response times across NFRs also provide insight into the platform’s knowledge landscape. The long delays in answering reliability questions suggest a scarcity of active experts in that domain, whereas the quicker responses for topics like usability and maintainability point to a more responsive and available community of specialists. These patterns suggest that usability discussions on Stack Overflow represent a highly efficient, albeit small, corner of the platform where specialized questions are resolved effectively.

While the volume of usability questions is low compared to areas like performance, the community’s engagement with them is uniquely effective. This specialized community exhibits distinctive engagement patterns: usability posts receive the highest answer rates, the highest acceptance rates, and attract more answers per question than other non-functional requirements. This suggests that when usability questions are asked, the community is highly efficient at resolving them. This aligns with findings from Ahmad et al. [19], who noted that specialized knowledge in non-functional requirements often clusters within distinct subcommunities.

This phenomenon presents both challenges and opportunities. While the small number of questions may suggest a knowledge silo, the high-quality engagement and relatively fast response times indicate that the available expertise is highly responsive. Organizations seeking to build usability expertise might benefit from cultivating connections to these specialized communities rather than expecting such knowledge to naturally develop within general development teams.

6.3. Platform-Specific Usability Patterns

Our analysis reveals fundamentally different usability concerns across web, desktop, and mobile contexts—beyond mere implementation differences. Each platform has evolved distinct interaction paradigms: web developers prioritize responsive layouts and progressive enhancement; desktop application developers focus on workflow efficiency and keyboard control; mobile developers emphasize touch targets and gesture recognition. The dominance of web technologies (JavaScript, HTML, CSS) in usability discussions reflects both the prevalence of web development and the unique usability challenges posed by web interfaces. Web developers must ensure an interface works well on everything from a small smartphone to a large desktop monitor, accommodating both touch and mouse interactions—a challenge not faced to the same degree by developers working in more controlled environments. Additionally, web interfaces are often customer-facing, making their usability directly tied to business success.

The different focus areas across platforms highlight how usability concerns are contextual and environment-specific. These different environments effectively represent distinct design languages and constraints. Desktop applications emphasize efficient interaction for professional users, with significant attention to keyboard shortcuts and complex controls. Mobile applications focus on touch interaction, space constraints, and simplified navigation patterns. Web applications must strike a balance between responsiveness across devices and rich interaction capabilities. These fundamental differences in interaction models and user expectations result in distinct usability challenges that necessitate specialized solutions.

The evolution of usability discussions within specific programming contexts mirrors the maturation of web and mobile technologies, as developers have transitioned from implementing basic interfaces to refining and optimizing more sophisticated user experiences. Different development ecosystems may cultivate distinct usability perspectives and priorities, influenced by both the technical capabilities of the language and the typical application domains where it is used.

Overall, usability discussions on Stack Overflow are highly contextualized by programming language and platform. The differences in usability concerns across these contexts suggest that usability knowledge and best practices must be tailored to specific development ecosystems rather than applied uniformly. This finding is consistent with research by Alexandrini et al. [62], who observed significant differences in usability requirements across development environments. This pattern challenges the notion of universal usability principles, suggesting instead that effective usability guidance must be contextualized within specific technological environments. The dominance of web technologies in usability discussions reflects both the prevalence of web development and the inherently complex usability challenges of creating interfaces that work across diverse devices. Developers transitioning between platforms may require explicit education in the different usability paradigms, rather than assuming the transferability of knowledge.

This finding has important implications for usability education and documentation. Rather than teaching general usability principles in isolation, more effective approaches might integrate usability guidance directly into programming language and platform documentation, addressing the specific challenges and patterns relevant to each context. Similarly, usability tools and evaluation methods might be more effective when tailored to specific development environments.

6.4. The Value of Longitudinal Perspective

Our longitudinal findings reveal patterns that cross-sectional studies cannot capture. All the studies summarized in Table 2 employ cross-sectional designs, analyzing usability issues at single time points or across a collection of applications without examining temporal evolution. While these studies provide valuable insights into prevalent usability problems at specific moments, they cannot reveal how developer engagement with usability has changed over time, how usability discussions compare to other non-functional requirements across different periods, or how the nature of usability challenges has evolved alongside technological advances.

For example, Hedegaard et al. [51] quantified usability information in online reviews from 2013, and Diniz et al. [2] identified usability heuristic violations in 200 mobile reviews from 2022. However, neither study was able to observe the temporal shift we document, from fundamental design principles (2008–2012) to framework-specific implementation challenges (2013–2024). Similarly, while studies such as Morgan et al. [3] and Alsanousi et al. [57] provide snapshots of usability concerns in specific application domains, our work uniquely demonstrates how these concerns have evolved over 16 years and how they differ across different development platforms.

The cross-sectional studies in Table 2 analyze user reviews to identify usability problems from the end-user perspective, whereas our study examines developer discussions seeking solutions to usability implementation challenges. This difference in perspective is significant: user reviews capture what users find problematic after release, while developer discussions on Stack Overflow reveal what challenges developers face during implementation. Our longitudinal approach to developer discussions thus provides a complementary and previously missing perspective on how usability knowledge is sought, shared, and applied within the development community over time.

6.5. Research-Practice Gap: The Absent User Voice

Our analysis reveals a striking disconnect between developers asking usability questions and actual user research practices. Among the 672 usability posts, 619 posts (92.1%) made no mention of any form of user research or testing. Only 21 posts (3.1%) cited user feedback, 18 posts (2.7%) mentioned user testing, and merely 9 posts (1.3%) referenced analytics or data to support their usability concerns. A negligible number mentioned A/B testing (2 posts, 0.3%) or accessibility testing (3 posts, 0.4%).

This finding suggests that developers are seeking usability guidance largely in the absence of empirical user validation—relying instead on community expertise, established heuristics, or personal judgment. While usability literature strongly emphasizes user-centered design and empirical validation as fundamental to effective usability practice, our findings indicate these practices remain largely disconnected from day-to-day development questions posted on Stack Overflow.

The high acceptance rate (68.6%) of usability answers on Stack Overflow, despite the overwhelming absence of user research mentions, suggests that community consensus and expert opinion may be substituting for empirical validation. This raises important questions about how usability knowledge is validated and propagated within developer communities. Are developers making usability decisions based primarily on peer validation rather than user testing? Does the community’s effectiveness at resolving usability questions (94.5% answer rate) indicate that experiential knowledge can effectively substitute for formal user research in many contexts, or does it suggest a systemic gap in connecting usability decisions to actual user needs?

This research-practice gap has significant implications for both usability education and tool development. It suggests that while developers recognize usability as important enough to ask questions about it, the integration of user research methods into their workflow remains limited. Educational efforts may need to focus not only on usability principles, but also on practical, lightweight methods for incorporating user feedback into development processes. Development tools and platforms could better support this integration by making user research methods more accessible and integrated into existing workflows.

6.6. Novel Contributions to Usability Research

This study makes several novel contributions to understanding usability in software development practice that distinguish it from all prior work:

First Longitudinal Analysis of Developer Usability Discussions. Unlike all previous studies that analyze usability at single time points, our 16-year temporal analysis reveals a significant decline pattern (from 189 posts in 2010 to 1 post in 2024) that fundamentally changes our understanding of how developers engage with usability knowledge. This is not merely a quantitative difference but a qualitative shift in research approach: cross-sectional studies can identify what usability issues exist, but only longitudinal analysis can reveal how these issues, and the community’s engagement with them, evolve over time. The sustained decline beginning in 2012 and persisting through 2022—a full decade before the emergence of conversational AI—demonstrates that this transformation in usability knowledge sharing reflects deep structural changes in software development practice rather than recent technological disruptions.

Cross-NFR Comparative Framework. We are the first to systematically compare usability engagement with seven other ISO/IEC 25010 non-functional requirements over an extended timeframe. This reveals that usability’s decline is unique—performance and security discussions remained stable or grew during the same period, indicating domain-specific factors rather than general platform trends. This comparative framework allows us to position usability not in isolation but within the broader landscape of software quality concerns, revealing that while usability questions are infrequent (894 posts vs. 102,821 for performance), they exhibit exceptional resolution efficiency (94.5% answer rate, 68.6% acceptance rate) unmatched by other NFRs. This finding suggests that usability represents a specialized knowledge domain with a highly effective subcommunity rather than simply being undervalued or overlooked.

Granular Taxonomy with Temporal Dimension. Our detailed classification (10 primary categories) combined with 16-year evolution tracking provides unprecedented insight into how specific usability concerns have shifted over time. Previous studies identified usability issue types but were unable to track their evolution. We demonstrate that interactive elements and controls consistently dominate discussions (24.0% of posts), while mobile/touch interactions represent only 3.1%—possibly because mobile usability patterns have become well-established through platform guidelines. This granular categorization, validated through systematic coding with high inter-rater reliability (Cohen’s Kappa = 0.87), demonstrates sample adequacy through breadth and depth rather than volume alone, enabling future researchers to build upon a standardized framework for analyzing usability discussions across platforms and time periods.

6.7. Future Directions and Practical Implications

The positioning of usability discussions on Stack Overflow represents a significant intersection between the development and design disciplines. The complexity of these discussions, as evidenced by the nuanced and platform-specific challenges identified in our analysis, suggests that integrating technical and design considerations creates particularly challenging problems that resist straightforward solutions.

Educational approaches that integrate usability principles directly into programming courses and documentation may prove more effective than treating them as separate domains. Development tools and frameworks could better support this integration by providing not just implementation patterns but also contextual guidance on when and why particular approaches improve usability. This integration would help address the gap between usability theory and practical implementation that our study identified, similar to challenges noted by Diniz et al. [2] in their analysis of usability issues in mobile applications.

The patterns we identified point to fertile ground for future research on how usability knowledge is transferred and applied within development communities. As usability discussions have declined on Stack Overflow, understanding where and how this knowledge now flows would provide valuable insights into effective knowledge transfer mechanisms. This could build upon the work by Zou et al. [20], who analyzed how different non-functional requirements are prioritized and discussed in technical communities.

Additionally, the context-specific nature of usability challenges suggests opportunities for research on how development platforms might better support appropriate usability patterns through specialized guidance, templates, and evaluation tools tailored to specific technological contexts.

Our findings have important implications for multiple stakeholders in the software development ecosystem. For educators, this highlights the need to integrate usability principles into technical curricula in ways that acknowledge platform-specific interaction paradigms, rather than treating usability as a separate context-independent domain. For development platform creators, our results suggest opportunities to better support appropriate usability patterns through specialized guidance, templates, and evaluation tools tailored to specific technological contexts. For research communities, the observed transformation in how usability knowledge is shared points to fertile ground for investigating knowledge transfer mechanisms across increasingly specialized development domains.

The research-practice gap we uncovered—with over 90% of usability discussions proceeding without any mention of user research—suggests that efforts to promote user-centered design may need to focus on making user research methods more accessible and integrated into existing developer workflows, rather than simply advocating for their importance. The success of community-based usability problem-solving on Stack Overflow, despite the absence of user research mentions, raises important questions about when and how peer expertise can effectively substitute for empirical validation and when it cannot.

These takeaways collectively suggest that usability knowledge is undergoing a transformation in how it is shared, structured, and applied within software development—evolving from general forum discussions to more specialized, contextualized, and integrated approaches that reflect the maturing nature of both software development and usability as interrelated fields.

7. Threats to Validity

This section examines potential threats that could limit the validity of our findings and discusses our mitigation strategies. We classify these threats into construct, internal, conclusion, external, and reliability validity categories.

7.1. Construct Validity

Construct validity concerns the accuracy with which our operational measures represent the concepts under study. Our dataset construction methodology presents a key threat, as we primarily identified posts through tags like “usability” and “usability-testing.” Despite using pattern matching with the “Usabilit%” wildcard to capture variations, we may have missed relevant discussions that use alternative terminology. Similarly, the inherent semantic ambiguity in natural language created challenges in post classification. As demonstrated in our methodology section, some posts containing “usability” addressed unrelated topics, while others discussed usability concepts without the explicit term. Through our manual validation process, we identified and eliminated 222 false positives from the initial 894 posts (24.8%), including posts where “usability” referred to code reusability or API usability rather than software user interface usability. However, our keyword-based approach may have also missed relevant usability discussions that used alternative terminology such as “user experience,” “UI/UX,” or “user-friendly,” representing potential false negatives that we cannot fully quantify. Our manual analysis with multiple inspectors and the auxiliary expert validation (yielding Cohen’s Kappa of 0.87) helped mitigate these limitations, though perfect classification remains challenging. Future research could employ advanced natural language processing techniques, such as semantic embeddings (e.g., BERT [75]), topic modeling [76], or transformer-based classification models [77], to improve relevance detection and reduce both false positives and false negatives in usability discussion identification.

The selection of engagement metrics (view counts, scores, and answer rates) as proxies for community interest and question complexity represents another construct validity concern. To strengthen our analysis, we employed multiple complementary metrics rather than relying on any single measure, providing a more holistic view of engagement patterns across non-functional requirements.

7.2. Internal Validity

Regarding internal validity, the temporal trends we observed might be influenced by external factors beyond those explicitly identified. Changes in Stack Overflow’s platform mechanics, algorithm adjustments affecting post visibility, shifts in user demographics, or broader trends in software development practices could all contribute to the patterns we observed. While we contextualized our findings within industry trends, definitively attributing causality remains challenging. To partially mitigate this, we compared the usability trend against the overall platform activity trend, which helped isolate topic-specific factors from general platform dynamics.

While view counts and scores naturally accumulate over time, potentially creating bias favoring older posts, several factors indicate our observed trends reflect genuine changes rather than mere temporal artifacts. First, the pronounced decline began in 2012, providing over a decade for even middle-period posts to accumulate engagement metrics by our 2025 data collection. Second, our statistical analysis in RQ1 explicitly addressed this concern by comparing early-period posts (2008–2011, n = 563) with recent-period posts (2020–2024, n = 42) using methods that account for the temporal dimension. The Mann–Whitney U tests revealed significantly higher engagement in the early period across all metrics (p < 0.001). Effect size analysis (Cohen’s d = 0.29 for view count, d = 0.42 for score) confirmed the practical significance of these differences. These statistical findings demonstrate that the decline represents a genuine shift in community engagement patterns rather than insufficient maturation time for recent posts. Additionally, our content analysis reveals that 628 of 672 posts (93.5%) address basic implementation questions rather than advanced or framework-specific challenges, and 362 posts (53.9%) involve best practice inquiries seeking established usability standards. The predominance of questions about fundamental usability principles suggests that the observed decline represents a genuine shift in community interest and knowledge-seeking behavior, rather than merely an artifact of insufficient time for recent posts to accumulate engagement metrics. The decline also coincides with documented trends in framework maturation and the rise of specialized UX communities [32], providing a plausible mechanism for the observed pattern beyond simple post-age effects.

Our manual classification process inevitably involves subjective judgment when categorizing posts by programming language, platform, or issue type. To address this threat, we implemented a systematic iterative approach with multiple validation steps and calculated an inter-rater reliability (Cohen’s Kappa) score of 0.87, indicating “almost perfect agreement” according to Fleiss et al. [71]. This rigorous process helps ensure consistency and reliability in our manual classifications.

The final dataset of 672 usability posts (reduced from the initial 894 posts with usability-related tags) introduces potential selection bias where certain types of usability discussions might be systematically excluded. While necessary for ensuring dataset quality, this filtering process could affect our understanding of the full spectrum of usability concerns on Stack Overflow.

7.3. Conclusion Validity

Conclusion validity concerns whether our statistical inferences are reasonable given the data and methods employed. Our use of non-parametric statistical tests (Spearman’s rank correlation and Mann–Whitney U test) is appropriate for the non-normal distribution of Stack Overflow engagement metrics. The observed correlations between time and engagement (Spearman’s

ρ

= −0.356 for view count,

ρ

= −0.410 for score, both p < 0.001) represent moderate effect sizes. While statistically significant, we acknowledge these correlations establish association rather than causation, and other factors beyond temporal trends may contribute to the observed patterns.

Comparing usability (894 posts) with other NFRs that have substantially larger volumes (e.g., performance with 102,821 posts) presents challenges for statistical comparison. We addressed this by focusing on relative engagement metrics (percentages, answer rates, acceptance rates) rather than absolute counts, and by employing rank-based statistical tests that are robust to scale differences. Our sample size provides adequate statistical power for detecting medium to large effects in our primary analyses, though the power is limited for analyzing rare subcategories (e.g., 21 mobile/touch interaction posts).

7.4. External Validity

Our exclusive focus on Stack Overflow limits the generalizability of our findings to other developer communities and platforms. Usability discussions may occur differently on collaborative platforms like GitHub, conversational platforms like Reddit, or real-time communities like Discord, as developer communities can develop platform-specific cultural norms that influence discussion patterns [11]. However, Stack Overflow’s role as a robust platform for developer technical discussions [10] and its structured Q&A format make it a suitable platform for systematic longitudinal analysis.

Our study covers posts from 2008 to 2024, but the software development landscape continues to evolve rapidly. Current findings might not necessarily predict future trends, particularly given the accelerating pace of change in development tools, frameworks, and practices. Additionally, while we analyzed the most frequently mentioned programming languages and platforms, our findings might not generalize equally well to less common languages or specialized domains that may be underrepresented in our dataset. Future work should explore cross-platform analysis to validate our findings across different developer community contexts. Additionally, while our study focuses on non-functional requirements (specifically usability), the findings and methodology may have limited applicability to functional requirements discussions, which have different characteristics and discussion patterns on Stack Overflow.

7.5. Reliability Validity

Our reliance on manual content analysis introduces challenges for reproducibility despite detailed documentation of our methodology and strong inter-rater agreement. This limitation is inherent to qualitative content analysis in natural language contexts [78]. We mitigated this threat through methodological transparency, multiple validation procedures, and a representative sampling approach that ensured a sufficiently large sample (9% of the dataset) with a 95% confidence level and 6% confidence interval, following guidelines from Aljedaani et al. [68].

Stack Overflow’s content volatility presents another reliability concern, as posts can be edited, deleted, or re-tagged over time. Our analysis represents a snapshot taken at the time of data collection (January 2025), and future replication attempts may encounter slightly different data even when using identical query parameters. We addressed this by providing detailed documentation of our data collection timeframe and methodology to contextualize our findings appropriately.

Throughout our study, we maintained appropriate scientific caution about the generalizability and scope of our findings while providing valuable insights into the evolution and characteristics of usability discussions in developer communities.

8. Conclusions

This study presents the first comprehensive longitudinal analysis of usability discussions on Stack Overflow, spanning from 2008 to 2024, and reveals significant insights into how developers engage with usability concerns in technical communities. Our findings demonstrate a pronounced and sustained decline in usability discussion frequency since its peak in 2010, contrasting sharply with more stable patterns observed in other non-functional requirements such as performance and security. This decline, however, does not necessarily indicate diminishing importance of usability in software development. Rather, it suggests a transformation in how usability knowledge is shared and acquired, likely reflecting the maturation of the field, integration of usability patterns into development frameworks, and migration of specialized discussions to dedicated UX/UI communities.

Despite their declining frequency, usability discussions on Stack Overflow exhibit unique engagement patterns. These posts achieve the highest answer and acceptance rates of all non-functional requirements, indicating that while the topic is niche, the community resolves these questions with exceptional efficiency. This contrasts with other specialized topics, such as portability, which garner the highest average scores and view counts, suggesting that different types of niche questions are valued by the community in varying ways. This highlights a complex dynamic where usability discussions are defined not by broad popularity but by the effectiveness of a specialized subcommunity in resolving its distinct challenges.

Our analysis of programming language and platform contexts reveals that usability challenges are highly contextualized by development environment, with fundamentally different concerns emerging across web, desktop, and mobile platforms. Web development discussions predominantly address responsive layouts and form design, desktop application posts focus on keyboard navigation and complex controls, while mobile development conversations center around touch interactions and small-screen constraints. This context specificity challenges the notion of universal usability principles, suggesting instead that effective usability guidance must be tailored to specific technological environments and incorporated directly into technical documentation and education.

These findings have important implications for multiple stakeholders in the software development ecosystem. For educators, this highlights the need to integrate usability principles into technical curricula in ways that acknowledge platform-specific interaction paradigms, rather than treating usability as a separate context-independent domain. For development platform creators, our results suggest opportunities to better support appropriate usability patterns through specialized guidance, templates, and evaluation tools tailored to specific technological contexts. For research communities, the observed transformation in how usability knowledge is shared points to fertile ground for investigating knowledge transfer mechanisms across increasingly specialized development domains.

Future research should investigate where usability discussions have evolved to and how usability knowledge is now disseminated within development teams. Additional work is needed to develop and validate platform-specific usability guidelines that bridge technical implementation considerations with human-centered design principles. As software continues to permeate all aspects of society, the effective integration of usability considerations into development processes remains critically important, even as the forums for these discussions evolve. Our work contributes to understanding this evolution, offering insights into how the software development community approaches the essential challenge of creating not just functional but truly usable technologies.

Author Contributions

Conceptualization, H.D., W.A. and S.L.; Methodology, H.D., W.A. and S.L.; Software, H.D.; Validation, H.D.; Formal analysis, H.D.; Investigation, H.D.; Data curation, H.D.; Writing—original draft, H.D.; Writing—review and editing, H.D., W.A. and S.L.; Visualization, H.D.; Supervision, W.A. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study involved only secondary analysis of Stack Overflow posts released in the public Creative-Commons data dump [79]. Because the dataset is publicly available and contains no personally identifiable private information, the work does not constitute human-subjects research and is exempt under 45 CFR 46.101(b)(4) [80]. Consequently, Institutional Review Board (IRB) approval was not required.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

During the preparation of this manuscript, the first author used Grammarly and Google’s Gemini for the purposes of improving grammar, spelling, and overall readability. These tools were used for language enhancement only and not for generating scientific content, analysis, or conclusions. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liang, J.T.; Yang, C.; Myers, B.A. A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024; ICSE’24. pp. 616–628. [Google Scholar]
Diniz, L.d.N.; de Souza Filho, J.C.; Carvalho, R.M. Can User Reviews Indicate Usability Heuristic Issues? In Proceedings of the CHI Conference on Human Factors in Computing Systems Extended Abstracts, New Orleans, LA, USA, 30 April–5 May 2022; pp. 1–6. [Google Scholar]
Morgan, M.; Ludi, S.; Cook, L.; Warren, A. Impact of Usability Heuristics on User Satisfaction Among Coding Apps for Children. In Proceedings of the 2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Washington, DC, USA, 3–6 October 2023; pp. 252–254. [Google Scholar]
Zheng, Q.; Chen, M.; Sharma, P.; Tang, Y.; Oswal, M.; Liu, Y.; Huang, Y. EvAlignUX: Advancing UX Evaluation through LLM-Supported Metrics Exploration. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April–1 May 2025. CHI’25. [Google Scholar]
Zhang, S.; Zhang, T.; Cheng, J.; Zhou, S. Who is to Blame: A Comprehensive Review of Challenges and Opportunities in Designer-Developer Collaboration. Proc. ACM-Hum.-Comput. Interact. 2025, 9, 1–32. [Google Scholar] [CrossRef]
Antelmi, A.; Cordasco, G.; De Vinco, D.; Spagnuolo, C. The Age of Snippet Programming: Toward Understanding Developer Communities in Stack Overflow and Reddit. In Proceedings of the Companion Proceedings of the ACM Web Conference 2023, New York, NY, USA, 30 April–4 May 2023; WWW’23 Companion. pp. 1218–1224. [Google Scholar]
Dabbish, L.; Stuart, C.; Tsay, J.; Herbsleb, J. Social coding in GitHub: Transparency and collaboration in an open software repository. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, Seattle, WA, USA, 11–15 February 2012; pp. 1277–1286. [Google Scholar]
Shrestha, P.; Sathanur, A.; Maharjan, S.; Saldanha, E.; Arendt, D.; Volkova, S. Multiple social platforms reveal actionable signals for software vulnerability awareness: A study of GitHub, Twitter and Reddit. PLoS ONE 2020, 15, e0230250. [Google Scholar] [CrossRef] [PubMed]
Stack Overflow. Stack Overflow Developer Survey 2024; Technical report; Stack Overflow: New York, NY, USA, 2024. [Google Scholar]
Barua, A.; Thomas, S.W.; Hassan, A.E. What are developers talking about? an analysis of topics and trends in stack overflow. Empir. Softw. Eng. 2014, 19, 619–654. [Google Scholar] [CrossRef]
Rosen, C.; Shihab, E. What are mobile developers asking about? a large scale study using stack overflow. Empir. Softw. Eng. 2016, 21, 1192–1223. [Google Scholar]
Vasilescu, B.; Filkov, V.; Serebrenik, A. StackOverflow and GitHub: Associations between software development and crowdsourced knowledge. In Proceedings of the 2013 International Conference on Social Computing, Alexandria, VA, USA, 8–14 September 2013; pp. 188–195. [Google Scholar]
Díaz Ferreyra, N.E.; Vidoni, M.; Heisel, M.; Scandariato, R. Cybersecurity discussions in Stack Overflow: A developer-centred analysis of engagement and self-disclosure behaviour. Soc. Netw. Anal. Min. 2024, 14, 16. [Google Scholar] [CrossRef]
Bevan, N.; Carter, J.; Harker, S. ISO 9241-11 revised: What have we learnt about usability since 1998? In Proceedings of the Human-Computer Interaction: Design and Evaluation: 17th International Conference, HCI International 2015, Los Angeles, CA, USA, 2–7 August 2015; Proceedings, Part I 17. Springer: London, UK, 2015; pp. 143–151. [Google Scholar]
ISO/IEC 25010:2023; Systems and Software Engineering—Systems and Software Quality Requirements and Evaluation (SQuaRE)—Product Quality Model. ISO: Geneva, Switzerland, 2023.
Nielsen, J. Enhancing the explanatory power of usability heuristics. In Proceedings of the ACM CHI’94 Conference on Human Factors in Computing Systems, Boston, MA, USA, 24–28 April 1994; pp. 152–158. [Google Scholar]
Wharton, C.; Rieman, J.; Lewis, C.; Polson, P. The cognitive walkthrough method: A practitioner’s guide. In Proceedings of the Usability Inspection Methods, New York, NY, USA, 3 June 1994; pp. 105–140. [Google Scholar]
Norman, D.A. The Design of Everyday Things: Revised and Expanded Edition; Basic Books: New York, NY, USA, 2013. [Google Scholar]
Ahmad, A.; Feng, C.; Li, K.; Asim, S.; Sun, T. Toward Empirically Investigating Non-Functional Requirements of iOS Developers on Stack Overflow. IEEE Access 2019, 7, 61145–61169. [Google Scholar]
Zou, J.; Xu, L.; Guo, W.; Yan, M.; Yang, D.; Zhang, X. Which Non-functional Requirements Do Developers Focus On? An Empirical Study on Stack Overflow Using Topic Analysis. In Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, Florence, Italy, 16–17 May 2015; pp. 446–449. [Google Scholar]
Hermawati, S.; Lawson, G. Establishing usability heuristics for heuristics evaluation in a specific domain: Is there a consensus? Appl. Ergon. 2016, 56, 34–51. [Google Scholar] [CrossRef]
Quiñones, D.; Rusu, C.; Rusu, V. A methodology to develop usability/user experience heuristics. Comput. Stand. Interfaces 2018, 59, 109–129. [Google Scholar] [CrossRef]
Riahi, F.; Zolaktaf, Z.; Shafiei, M.; Milios, E. Finding expert users in community question answering. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; pp. 791–798. [Google Scholar]
Vasilescu, B.; Capiluppi, A.; Serebrenik, A. Gender, representation and online participation: A quantitative study of stackoverflow. In Proceedings of the 2012 International Conference on Social Informatics, Alexandria, VA, USA, 14–16 December 2012; pp. 332–338. [Google Scholar]
Parnin, C.; Treude, C.; Grammel, L.; Storey, M.A. Crowd Documentation: Exploring the Coverage and the Dynamics of API Discussions on Stack Overflow; Technical Report; Georgia Institute of Technology: Atlanta, GA, USA, 2012; Available online: http://chrisparnin.me/pdf/crowddoc.pdf (accessed on 17 July 2025).
Allamanis, M.; Sutton, C. Why, when, and what: Analyzing stack overflow questions by topic, type, and code. In Proceedings of the 2013 10th Working Conference on Mining Software Repositories (MSR), San Francisco, CA, USA, 18–19 May 2013; pp. 53–56. [Google Scholar]
Subramanian, S.; Holmes, R. Making sense of online code snippets. In Proceedings of the 2013 10th Working Conference on Mining Software Repositories (MSR), San Francisco, CA, USA, 18–19 May 2013; pp. 85–88. [Google Scholar]
Saha, A.K.; Saha, R.K.; Schneider, K.A. A discriminative model approach for suggesting tags automatically for stack overflow questions. In Proceedings of the 2013 10th Working Conference on Mining Software Repositories (MSR), San Francisco, CA, USA, 18–19 May 2013; pp. 73–76. [Google Scholar]
Kavaler, D.; Posnett, D.; Gibler, C.; Chen, H.; Devanbu, P.; Filkov, V. Using and asking: Apis used in the android market and asked about in stackoverflow. In Proceedings of the Social Informatics: 5th International Conference, SocInfo 2013, Kyoto, Japan, 25–27 November 2013; Proceedings 5. Springer: London, UK, 2013; pp. 405–418. [Google Scholar]
Zou, J.; Xu, L.; Yang, M.; Zhang, X.; Yang, D. Towards comprehending the non-functional requirements through developers’ eyes: An exploration of stack overflow using topic analysis. Inf. Softw. Technol. 2017, 84, 19–32. [Google Scholar] [CrossRef]
Peruma, A.; Simmons, S.; AlOmar, E.A.; Newman, C.D.; Mkaouer, M.W.; Ouni, A. How do i refactor this? An empirical study on refactoring trends and topics in Stack Overflow. Empir. Softw. Eng. 2022, 27, 11. [Google Scholar]
Beyer, S.; Macho, C.; Di Penta, M.; Pinzger, M. What kind of questions do developers ask on Stack Overflow? A comparison of automated approaches to classify posts into question categories. Empir. Softw. Eng. 2020, 25, 2258–2301. [Google Scholar]
Ahmad, A.; Li, K.; Feng, C.; Sun, T. An empirical study on how iOS developers report quality aspects on stack overflow. Int. J. Mach. Learn. Comput. 2018, 8, 501–506. [Google Scholar]
Moutidis, I.; Williams, H.T.P. Community evolution on Stack Overflow. PLoS ONE 2021, 16, e0253010. [Google Scholar] [CrossRef]
Hu, F.; Xue, W.; Zhou, S.; Wang, Y.; Jiang, B.; Huang, Q.; Zhang, H. Python’s evolution on Stack Overflow: An empirical analysis of topic trends. J. Comput. Lang. 2025, 84, 101340. [Google Scholar] [CrossRef]
Lu, J.; Schmidt, M.; Lee, M.; Huang, R. Usability research in educational technology: A state-of-the-art systematic review. Educ. Technol. Res. Dev. 2022, 70, 1951–1992. [Google Scholar] [CrossRef]
Carruthers, J.A.; Diaz-Pace, J.A.; Irrazábal, E. A longitudinal study on the temporal validity of software samples. Inf. Softw. Technol. 2024, 168, 107404. [Google Scholar] [CrossRef]
Reddy, H.B.S.; Reddy, R.R.S.; Jonnalagadda, R.; Singh, P.; Gogineni, A. Usability Evaluation of an Unpopular Restaurant Recommender Web Application Zomato. Asian J. Res. Comput. Sci. 2022, 13, 12–33. [Google Scholar] [CrossRef]
Weichbroth, P. Usability of mobile applications: A systematic literature study. IEEE Access 2020, 8, 55563–55577. [Google Scholar] [CrossRef]
Alghamdi, A.M.; Aljedaani, W.; Jalali, H.; Ludi, S.; Eler, M.M. Understanding developer challenges and trends in web accessibility: A stack overflow analysis. Univers. Access Inf. Soc. 2025, 24, 1701–1717. [Google Scholar] [CrossRef]
Alghamdi, A.M.; Aljedaani, W.; Ludi, S.; Javed, Y. Automating accessibility compliance: Leveraging machine learning to analyze developer challenges with wcag guidelines. In Proceedings of the 2025 8th International Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 16–17 February 2025; pp. 61–66. [Google Scholar]
Alghamdi, A.M.; Aljedaani, W.; Eler, M.M.; Ludi, S. Accessibility guidelines and standards: Analyzing stack overflow posts. In Proceedings of the 21st International Web for All Conference, Singapore, 13–14 May 2024; pp. 118–122. [Google Scholar]
Harty, J. Finding usability bugs with automated tests. Commun. ACM 2011, 54, 44–49. [Google Scholar] [CrossRef]
Au, F.T.; Baker, S.; Warren, I.; Dobbie, G. Automated usability testing framework. In Proceedings of the Ninth Conference on Australasian User Interface-Volume 76, Wollongong, Australia, 1 January 2008; pp. 55–64. [Google Scholar]
Baker, S.; Au, F.; Dobbie, G.; Warren, I. Automated usability testing using HUI analyzer. In Proceedings of the 19th Australian Conference on Software Engineering (ASWEC 2008), Perth, Australia, 25–28 March 2008; pp. 579–588. [Google Scholar]
Mendes, M.S.; Furtado, E. An Experience of Textual Evaluation Using the MALTU Methodology. In Social Computing and Social Media. Technologies and Analytics; Meiselwitz, G., Ed.; Springer: Cham, Switzerland, 2018; pp. 236–246. [Google Scholar]
da Silva, T.H.O.; Freitas, L.M.; Mendes, M.S.; Furtado, E.S. Textual evaluation vs. User testing: A comparative analysis. In Proceedings of the Anais do X Workshop sobre Aspectos da Interação Humano-Computador para a Web Social, Porto Alegre, RS, Brasil, 11 October 2019; pp. 21–30. [Google Scholar]
da Silva, T.H.O.; Freitas, L.M.; Mendes, M.S. Beyond Traditional Evaluations: User’s View in App Stores. In Proceedings of the XVI Brazilian Symposium on Human Factors in Computing Systems, New York, NY, USA, 23–27 October 2017. IHC 2017. [Google Scholar]
de Souza Filho, J.C.; Brito, M.R.F.; Sampaio, A.L. Comparing Heuristic Evaluation and MALTU Model in Interaction Evaluation of Ubiquitous Systems. In Proceedings of the 19th Brazilian Symposium on Human Factors in Computing Systems, New York, NY, USA, 26–30 October 2020. IHC’20. [Google Scholar]
Li, X.y.; Wang, T.l.; Liang, P.; Wang, C. Automatic Classification of Non-Functional Requirements in App User Reviews Based on System Model. Acta Electonica Sin. 2022, 50, 2079. [Google Scholar]
Hedegaard, S.; Simonsen, J.G. Extracting Usability and User Experience Information from Online User Reviews. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Paris, France, 27 April–2 May 2013; CHI’13. pp. 2089–2098. [Google Scholar]
Bakiu, E.; Guzman, E. Which feature is unusable? Detecting usability and user experience issues from user reviews. In Proceedings of the 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW), Lisbon, Portugal, 4–8 September 2017; pp. 182–187. [Google Scholar]
Alqahtani, F.; Orji, R. Usability issues in mental health applications. In Proceedings of the Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization, Larnaca, Cyprus, 9–12 June 2019; pp. 343–348. [Google Scholar]
Tan, M.L.; Prasanna, R.; Stock, K.; Doyle, E.E.; Leonard, G.; Johnston, D. Modified usability framework for disaster apps: A qualitative thematic analysis of user reviews. Int. J. Disaster Risk Sci. 2020, 11, 615–629. [Google Scholar] [CrossRef]
Lobo, E.H.; Abdelrazek, M.; Frølich, A.; Rasmussen, L.J.; Livingston, P.M.; Islam, S.M.S.; Kensing, F.; Grundy, J. Detecting usability and user experience issues in stroke caregiving apps: An analysis of user reviews. Preprints. 2022. Preprints:2022020093. Available online: https://doi.org/10.20944/preprints202202.0093.v1 (accessed on 17 July 2025).
Alagmdi, S.; Albanyan, A.; Ludi, S. Investigating The Usability Issues In Mobile Applications Reviews Using A Deep Learning Model. In Proceedings of the 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), Virtual, 8–11 March 2023; pp. 0108–0113. [Google Scholar]
Alsanousi, B.; Albesher, A.S.; Do, H.; Ludi, S. Investigating the user experience and evaluating usability issues in ai-enabled learning mobile apps: An analysis of user reviews. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 18–29. [Google Scholar] [CrossRef]
Alabduljabbar, R. User-centric AI: Evaluating the usability of generative AI applications through user reviews on app stores. Peerj Comput. Sci. 2024, 10, e2421. [Google Scholar]
Alsaleh, N.; Alnanih, R.; Alowidi, N. Hybrid Deep Learning Approach for Automating App Review Classification: Advancing Usability Metrics Classification with an Aspect-Based Sentiment Analysis Framework. Comput. Mater. Contin. 2025, 82, 949–976. [Google Scholar] [CrossRef]
Pachabotla, L.S.; Konka, C. Comparative Study of Mobile Payment Apps: Google Pay and PayPal Using Nielsen’s Usability Heuristics. Bachelor Thesis, Blekinge Institute of Technology, Karlskrona, Sweden, 2022. [Google Scholar]
Maghfiroh, L.R. Observation and heuristics evaluation of student web-based application of SIPADU-STIS. J. Physics Conf. Ser. 2020, 1511, 012019. [Google Scholar] [CrossRef]
Alexandrini, D.F.; de Freitas, D.A.G.; Inácio, B. Comparative Evaluation Unity and Unreal, using Nielsen’s 10 Heuristics as an Evaluation Parameter. In Proceedings of the International Conference on Industrial Engineering and Operations Management Sydney, Australia, 20–22 December 2022. [Google Scholar]
Sanei, A.; Cheng, J. Characterizing Usability Issue Discussions in OSS Projects. arXiv 2023, arXiv:2308.09876. [Google Scholar]
Tashakkori, A.; Teddlie, C.; Teddlie, C.B. Mixed Methodology: Combining Qualitative and Quantitative Approaches; Sage: Newcastle upon Tyne, UK, 1998; Volume 46. [Google Scholar]
Field, A. Discovering Statistics Using IBM SPSS Statistics, 4th ed.; SAGE Publications: London, UK, 2013. [Google Scholar]
Mamykina, L.; Manoim, B.; Mittal, M.; Hripcsak, G.; Hartmann, B. Design lessons from the fastest Q&A site in the west. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Vancouver, BC, USA, 7–12 May 2011; pp. 2857–2866. [Google Scholar]
Wang, S.; Lo, D.; Vasilescu, B.; Serebrenik, A. Tag-based user modeling for expert finding in Stack Overflow. Inf. Softw. Technol. 2018, 94, 153–171. [Google Scholar]
Aljedaani, W.; Mkaouer, M.W.; Ludi, S.; Ouni, A.; Jenhani, I. On the identification of accessibility bug reports in open source systems. In Proceedings of the 19th International Web for All Conference, Virtual, 25–26 April 2022; pp. 1–11. [Google Scholar]
Aljedaani, W.; Alkahtani, M.; Ludi, S.; Mkaouer, M.W.; Eler, M.M.; Kessentini, M.; Ouni, A. The state of accessibility in blackboard: Survey and user reviews case study. In Proceedings of the 20th International Web for All Conference, Austin, TX, USA, 30 April–1 May 2023; pp. 84–95. [Google Scholar]
Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Fleiss, J.L.; Levin, B.; Paik, M.C. The measurement of interrater agreement. Stat. Methods Rates Proportions 1981, 2, 22–23. [Google Scholar]
Gustafsson, E.W. Defining and Evaluating Design System Usability for Improved Developer Experience. Master’s Thesis, KTH Royal Institute of Technology, School of Electrical Engineering and Computer Science, Stockholm, Sweden, 2021. [Google Scholar]
Falke, S.; Sester, M.; Bill, R. Building a Framework of Usability Patterns for Web Applications. Isprs Int. J.-Geo-Inf. 2018, 7, 446. [Google Scholar]
Sarabu, A.R. AI-Assisted Product Management: Transforming the Product Development Lifecycle. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2025, 11, 3803–3813. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Pascarella, L.; Spadini, D.; Palomba, F.; Bruntink, M.; Bacchelli, A. Information needs in contemporary code review. Proc. ACM-Hum.-Comput. Interact. 2018, 2, 1–27. [Google Scholar] [CrossRef]
Commons, C. Creative Commons Attribution-ShareAlike 4.0 International License. Creative Commons. 2023. Available online: https://creativecommons.org/licenses/by-sa/4.0/ (accessed on 17 July 2025).
Protection of Human Subjects. 45 CFR 46.102. 2018. Available online: https://www.hhs.gov/ohrp/regulations-and-policy/regulations/45-cfr-46/index.html (accessed on 17 July 2025).

Figure 1. Study overview.

Figure 2. Trends in usability-tagged posts compared to overall post volume on Stack Overflow.

Table 1. Stack overflow attributes and engagement metrics.

Attribute/Metric	Description
General Post Attributes
Creation date	Timestamp showing when a post was first made on the platform, used for temporal analysis.
Id	Unique identifier assigned to each post or answer, used for referencing in the system.
Title	Brief description of the issue or topic discussed in the post.
Body	Detailed explanation of the developer’s problem or question, allowing limited formatting.
Tags	Categories assigned to questions; Stack Exchange provides a predefined list to facilitate organization and searchability.
Engagement and Quality Metrics
Score	Sum of upvotes minus downvotes, reflecting the community’s perception of quality and usefulness. Higher scores suggest more valuable content.
View counts	Number of times a post has been viewed, indicating broader community interest beyond those who interact directly.
Post frequency	Number of questions posted on a particular topic over time, used to analyze trends in subject popularity.
Answer rate	Percentage of questions that receive at least one answer, indicating topic accessibility.
Answer count	Total number of answers provided to a question, reflecting the diversity of solutions.
Accepted answer	Answer selected by the question asker as the most helpful solution to their problem.
Accepted answer rate	Percentage of questions with an answer marked as accepted, indicating resolution rate.
Time to answer	Duration between posting a question and receiving responses, reflecting topic expertise availability.
Comment count	Number of comments on posts, indicating discussion intensity and clarification needs.

Table 3. Summary of collected data.

Tags	Number of Posts Collected
Usability	894
Accessibility	8238
Performance	102,821
Security	56,443
Reliability	294
Compatibility	3011
Maintainability	600
Portability	1457

Table 4. Engagement metrics for usability posts by year (2008–2024).

Year	Average View Count	Average Score	Accepted Answer Percentage
2008	9827	6.4	80.88%
2009	8143	5.8	81.92%
2010	6921	4.7	79.37%
2011	5234	3.6	74.05%
2012	4182	3.2	65.93%
2013	3814	2.7	57.41%
2014	3126	2.5	56.41%
2015	2748	2.1	38.71%
2016	2364	1.9	28.00%
2017	2158	1.7	39.13%
2018	1983	1.5	36.84%
2019	1842	1.3	45.45%
2020	1735	1.4	43.75%
2021	1654	1.2	58.33%
2022	1531	1.0	25.00%
2023	1428	0.8	30.00%
2024	1312	0.7	40.00%

Table 5. Comparative engagement metrics for non-functional requirements.

Tag	Total Questions	Avg Score	Answered %	Accepted Answer %	Avg Answer Count	Avg View Count	Avg Commen Countt
usability	894	4.80	94.5%	68.6%	3.3	3779	1.4
accessibility	8238	2.98	85.3%	48.9%	1.7	1799	1.9
performance	102,821	4.77	86.1%	53.0%	1.8	3412	3.2
security	56,443	3.74	86.5%	51.1%	1.6	3694	1.8
reliability	294	4.31	86.0%	47.0%	1.7	1851	1.8
compatibility	3011	4.32	84.8%	53.0%	1.6	4551	2.0
maintainability	600	2.87	81.7%	43.0%	1.4	1677	1.8
portability	1457	8.27	91.4%	61.4%	2.4	6263	2.8

Table 6. Average time to first answer by non-functional requirement.

Tag	Total Questions	Avg Days to First Answer
usability	894	red14.7
accessibility	8238	red18.7
performance	102,821	red18.9
security	56,443	red26.9
reliability	294	red48.3
maintainability	600	red8.1
portability	1457	red23.3
compatibility	3011	red22.7

Table 7. Distribution of primary usability issue categories across all posts.

Usability Issue Category	Posts	Percentage	Representative Examples
Interactive Elements and Controls	161	24.0%	Custom widgets, dropdown menus, button design, modal dialogs
Navigation and Information Architecture	86	12.8%	Information architecture, menu design, search functionality
Form Design and Input	75	11.2%	Input types, multi-step forms, field validation, required fields
Content Organization and Display	72	10.7%	Content organization, text readability, data tables
Error Handling and Validation	65	9.7%	Error prevention, validation feedback, error recovery, user guidance
Other/Multiple Categories	58	8.6%	Posts addressing multiple usability concerns simultaneously
Accessibility and Inclusion	53	7.9%	Screen readers, keyboard navigation, focus management, color contrast
Performance and Feedback	42	6.3%	Perceived speed, progress feedback, loading indicators, animation smoothness
Layout and Visual Design	39	5.8%	Visual hierarchy, responsive design, spacing and alignment
Mobile/Touch Interactions	21	3.1%	Touch targets, gestures, mobile keyboards, swipe interactions
Total	672	100%

Table 8. Distribution of programming languages mentioned in usability posts.

Programming Language	Number of Posts	Percentage of Total (n = 672)
JavaScript	166	24.7%
HTML	160	23.8%
CSS	88	13.1%
C#	48	7.1%
PHP	37	5.5%
Java	29	4.3%
Python	24	3.6%

Table 9. Distribution of platforms discussed in usability posts.

Platform	Number of Posts	Percentage of Total (n = 672)
Web	303	45.1%
Desktop	148	22.0%
Mobile	83	12.4%
Platform-Agnostic	138	20.5%
Total	672	100%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Djalali, H.; Aljedaani, W.; Ludi, S. The Evolution of Software Usability in Developer Communities: An Empirical Study on Stack Overflow. Software 2025, 4, 27. https://doi.org/10.3390/software4040027

AMA Style

Djalali H, Aljedaani W, Ludi S. The Evolution of Software Usability in Developer Communities: An Empirical Study on Stack Overflow. Software. 2025; 4(4):27. https://doi.org/10.3390/software4040027

Chicago/Turabian Style

Djalali, Hans, Wajdi Aljedaani, and Stephanie Ludi. 2025. "The Evolution of Software Usability in Developer Communities: An Empirical Study on Stack Overflow" Software 4, no. 4: 27. https://doi.org/10.3390/software4040027

APA Style

Djalali, H., Aljedaani, W., & Ludi, S. (2025). The Evolution of Software Usability in Developer Communities: An Empirical Study on Stack Overflow. Software, 4(4), 27. https://doi.org/10.3390/software4040027

Article Menu

The Evolution of Software Usability in Developer Communities: An Empirical Study on Stack Overflow

Abstract

1. Introduction

2. Background

2.1. Usability as a Non-Functional Requirement

2.2. Stack Overflow and Non-Functional Requirements

2.3. Developer Engagement Metrics on Stack Overflow

2.4. Usability in Different Programming Contexts

3. Related Work

3.1. Mining Discussions in Stack Overflow

3.2. Empirical Studies on Usability

4. Study Design

4.1. Data Collection

4.2. Data Preprocessing

4.3. Data Labeling

5. Study Results

5.1. RQ1: How Have Usability Discussions on Stack Overflow Changed over the Years?

5.2. RQ2: To What Extent Do Engagement Patterns Differ Between Usability Discussions and Other Non-Functional Requirements on Stack Overflow?

5.3. RQ3: What Usability Issues Are Most Frequently Discussed in Relation to Specific Programming Languages and Platforms on Stack Overflow?

5.3.1. Web Development

5.3.2. Desktop Applications

5.3.3. Mobile Development

6. Discussion

6.1. Decline and Transformation of Usability Discussions

6.2. Usability as a Specialized Knowledge Domain

6.3. Platform-Specific Usability Patterns

6.4. The Value of Longitudinal Perspective

6.5. Research-Practice Gap: The Absent User Voice

6.6. Novel Contributions to Usability Research

6.7. Future Directions and Practical Implications

7. Threats to Validity

7.1. Construct Validity

7.2. Internal Validity

7.3. Conclusion Validity

7.4. External Validity

7.5. Reliability Validity

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI