Deriving Empirically Grounded NFR Specifications from Practitioner Discourse: A Validated Methodology Applied to Trustworthy APIs in the AI Era

Singjai, Apitchaka

doi:10.3390/info17030304

Open AccessArticle

Deriving Empirically Grounded NFR Specifications from Practitioner Discourse: A Validated Methodology Applied to Trustworthy APIs in the AI Era

by

Apitchaka Singjai

Software Engineering Department, College of Arts Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand

Information 2026, 17(3), 304; https://doi.org/10.3390/info17030304

Submission received: 23 January 2026 / Revised: 12 March 2026 / Accepted: 16 March 2026 / Published: 22 March 2026

(This article belongs to the Special Issue Intelligent Software Engineering: Synergy Between AI and Software Engineering)

Download

Browse Figures

Versions Notes

Abstract

Specifying non-functional requirements (NFRs) for rapidly evolving domains such as trustworthy APIs in the AI era is challenging as best practices emerge through practitioner discourse faster than traditional requirements engineering can capture them. We present a systematic methodology for deriving prioritized NFR specifications from multimedia practitioner discourse combining AI-assisted transcript analysis, grounded theory principles, and Theme Coverage Score (TCS) validation. Our five-task approach integrates purposive sampling, automated transcription with speaker diarization, grounded theory coding extracting stakeholder-specific themes with TCS quantification, MoSCoW prioritization using empirically derived thresholds (Must Have ≥85%, Should Have 65–84%, Could Have 45–64%, and Won’t Have <45%), and NFR specification consistent with ISO/IEC 25010:2023 principles of stakeholder perspective, measurable quality criteria, and explicit rationale. Applying this methodology to 22 expert presentations on trustworthy APIs yields Weighted Coverage Score of 0.71 and 30 prioritized NFR specifications across five trustworthiness dimensions. MoSCoW classification produces 11 Must Have requirements (Robustness and Transparency), 9 Should Have, 6 Could Have, and 4 Won’t Have. The analysis reveals systematic disparities where Fairness contributes zero Must Have or Should Have requirements due to insufficient practitioner consensus. Each NFR emphasizes stakeholder perspective, measurable quality criteria, and explicit rationale, enabling systematic verification. The validated methodology with complete replication package enables empirically grounded, prioritized NFR derivation from practitioner discourse in any rapidly evolving domain.

Keywords:

non-functional requirements; requirements prioritization; ISO/IEC 25010; trustworthy AI; API design; grounded theory; practitioner knowledge; requirements specification

Graphical Abstract

1. Introduction

Non-functional requirements (NFRs) specify the constraints and restrictions under which functional requirements must be implemented [1,2]. These persistent quality constraints govern how well systems perform their intended functions, encompassing performance, security, usability, and increasingly critical concerns such as Fairness, Transparency, and Privacy [3]. While functional requirements specify what systems must do, NFRs constrain how systems achieve those functions [2].

In rapidly evolving domains like trustworthy APIs in the AI era, practitioners share emerging best practices primarily through gray literature channels such as conferences, webinars, and technical talks rather than peer-reviewed publications [4,5], meaning that NFR specifications must be grounded in practitioner discourse to reflect actual implementation priorities [2].

Traditional NFR elicitation approaches each face distinct limitations [6,7]: survey-based methods capture stated preferences but are susceptible to response bias; expert interviews provide rich contextual data but face scalability constraints; implementation analysis reveals outcomes but obscures decision rationale; and documentation analysis is hampered by publication lag relative to practice evolution. No single technique comprehensively addresses all elicitation challenges [7].

We propose an alternative: the systematic analysis of practitioner discourse where professionals voluntarily explain experiences to peers through presentations, technical talks, and webinars. This approach reveals actual priorities through authentic emphasis patterns rather than stated preferences, captures explicit reasoning and trade-offs through practitioners’ own explanations, scales to larger corpora via public video platforms without researcher intervention bias, and operates in authentic professional contexts. The growth of user-generated educational content on platforms like YouTube [8] and the increasing availability of conference presentation videos have made this approach newly viable. Conferences worldwide publish presentation videos openly, companies host webinars sharing architectural experiences, and expert podcasts feature technical discussions. However, this rich ecosystem of multimedia practitioner discourse remains largely untapped as an empirical data source for software engineering research, despite text-based practitioner platforms such as Stack Overflow, Reddit, Hacker News, and GitHub Discussions being extensively studied as developer knowledge-sharing channels [9,10,11,12], and despite containing extensive evidence of how practitioners conceptualize, prioritize, and operationalize quality requirements.

Despite this potential, no systematic methodology exists for deriving validated NFR specifications from multimedia practitioner sources [13,14]. Key challenges include ensuring empirical grounding—that requirements reflect practitioner consensus rather than researcher interpretation [15]—enabling replicability through explicit procedures [16], and providing evidence-based prioritization. Traditional qualitative analysis provides rich insights but often lacks the rigor and reproducibility demanded by software engineering [17]. Furthermore, derived requirements must follow established specification standards such as ISO/IEC 25010:2023 [3] to ensure clarity, measurability, and actionability.

This motivates our research question: How can the systematic analysis of practitioner discourse enable the derivation of empirically grounded, prioritized NFR specifications for rapidly evolving software domains?

The remainder of this paper is organized as follows. Section 2 presents our five-task methodology including goal definition, corpus collection, grounded theory coding procedures, Theme Coverage Score calculation, and MoSCoW prioritization. Section 3 presents the application to trustworthy APIs in the AI era including domain specification, 30 derived NFR specifications, MoSCoW distribution, and validation results revealing systematic disparities across trustworthiness dimensions. Section 4 discusses implications for practitioners and researchers, methodological insights, threats to validity, and future research direction approaches. Section 5 draws the conclusion.

2. Materials and Methods

This section presents a systematic methodology for deriving empirically grounded NFR specifications from multimedia practitioner discourse. The methodology is domain agnostic and applicable to any rapidly evolving field where practitioner knowledge advances faster than formal documentation. As illustrated in Figure 1, the methodology follows a BPMN 2.0 process model comprising an initial goal definition step followed by five sequential tasks: (1) data collection through purposive video sampling, (2) automated transcription with speaker diarization, (3) Grounded Theory Coding including open coding, axial coding, selective coding, and the Theme Coverage Score (TCS) validation, (4) MoSCoW prioritization using empirically derived thresholds, and (5) NFR specification with coverage score evaluation. Each component is described with sufficient procedural detail to enable independent replication in any target domain.

2.1. Research Domain and Goal Definition (Initial Step)

The methodology begins with explicit definition of the research domain and analytical goals, represented by the start event in Figure 1. This initial step requires researchers to specify four critical elements: (1) the target domain for analysis, identifying the specific technological area or problem space; (2) the quality framework or dimensions to be investigated, either by adopting existing standards such as ISO/IEC 25010:2023 or developing domain-appropriate dimensions; (3) stakeholder perspectives of interest (e.g., service providers versus consumers), recognizing that different groups often have distinct concerns; and (4) the intended use of derived NFR specifications, whether for system design guidance, research insights, or policy recommendations. The methodology is most appropriate for domains exhibiting rapid evolution where best practices emerge faster than formal documentation, for active practitioner communities sharing experiences through public presentations, and where there is a clear need for systematic requirements’ derivation. This explicit goal definition ensures focused data collection and provides clear criteria for evaluating methodology effectiveness, with the specific domain, framework, and goals applied in this study presented in Section 3.

2.2. Task 1: Data Collection

Purpose: Systematically collect video presentations capturing authentic practitioner discourse on the research domain.

Input: Video platform (e.g., YouTube) and domain-specific search queries.

Output: Corpus of selected video presentations with complete metadata.

Methodology Foundation: Purposive sampling with thematic saturation criterion.

2.2.1. Video Source and Selection Strategy

Video presentations are systematically collected from public platforms (e.g., YouTube), focusing on practitioners who voluntarily share implementation experiences with peer audiences. The selection strategy prioritizes authentic practitioner discourse where speakers explicitly discuss design decisions, challenges encountered, and lessons learned from real-world deployments. Purposive sampling is employed to maximize information richness, selecting videos from diverse sources including conference presentations (technical talks at industry/academic events), company technical blogs and engineering channels, expert webinars and practitioner workshops, and authorized technical podcasts with practitioner interviews.

2.2.2. Inclusion and Exclusion Criteria

Videos must meet specific inclusion criteria: (1) practitioner authorship, demonstrated through speaker credentials, organizational affiliations, or presentation context showing genuine implementation experience rather than purely theoretical perspectives; (2) domain relevance with substantive technical content addressing the target research domain; (3) an appropriate duration of 15–90 min (shorter videos lack depth; longer videos contain excessive peripheral content); (4) clear audio quality, enabling accurate transcription; (5) English-language content for consistency (acknowledged as a scope limitation); and (6) public accessibility without authentication requirements. Videos are excluded if they are AI-generated or synthetic presentations lacking authentic practitioner perspective, purely academic presentations without implementation experience, product demonstrations without substantive technical discussion, recordings with poor audio quality preventing accurate transcription, duplicate presentations (identical content at multiple venues), or introductory tutorials aimed at beginners. This focused selection ensures the corpus captures authentic practitioner discourse within defined scope boundaries.

2.2.3. Sampling Strategy and Saturation Criterion

Purposive sampling continues until thematic saturation is achieved, defined as the point at which successive sources yield no new themes [18]. This criterion balances comprehensiveness with practical constraints, ensuring adequate coverage without indefinite expansion. The saturation procedure involves collecting and analyzing videos iteratively, tracking the emergence of new themes per video, continuing sampling while new themes emerge, stopping when three consecutive videos contribute no new themes, and collecting 2–3 additional videos beyond saturation to confirm the stability. Additional videos beyond saturation confirm the adequate coverage of practitioner perspectives rather than sampling artifacts.

2.2.4. Metadata Documentation

For each selected video, complete metadata is systematically documented including the unique identifier (sequential numbering for reference), title and speakers (full presentation title and speaker names), organizational affiliations (speaker companies/institutions), the publication date and venue (upload date and presentation context), the duration (total video length in minutes), and the platform URL (permanent link for verification and replication). This comprehensive metadata enables independent verification and supports traceability throughout the analysis process.

2.3. Task 2: Automated Transcription with Speaker Diarization

Purpose: Transform video audio into structured, analyzable text transcripts with accurate speaker attribution.

Input: Selected video presentations (Task 1 output).

Output: High-accuracy transcripts with speaker identification, timestamps, and domain-specific terminology.

Methodology Foundation: Human–machine teaming combining automated tools with expert verification.

Human–machine teaming is employed for efficient, accurate transcription. For single-speaker videos, automated transcription services (https://youtube-transcript.io (accessed on 19 August 2025)) extract initial transcripts, followed by AI-assisted post-processing via Google AI Studio (https://aistudio.google.com (accessed on 19 August 2025)) to generate structured scripts with speaker names, timestamps, paragraph breaks, and domain-specific terminology correction. This process requires approximately 5 min per video.

For multi-speaker videos, audio files are extracted from YouTube videos and uploaded to Otter.ai (https://otter.ai (accessed on 19 August 2025)) for automated transcription with speaker diarization. Otter.ai generates initial transcripts with generic speaker labels (Speaker 1, Speaker 2, etc.). AI-assisted speaker identification is then performed using Google AI Studio by providing video metadata and on-screen speaker information to map generic labels to actual speaker names. Human verification confirms speaker attribution accuracy by cross-referencing the transcript against video content. This process requires approximately 10–15 min per video, including the processing time and verification.

The transcription’s accuracy for domain-specific technical terms is verified through spot-checking the sample content against the source audio. Claude 3.5 Sonnet (Anthropic, accessed via Google AI Studio) was used for transcripts’ post-processing, including the speaker identification, timestamp formatting, paragraph structuring, and domain-specific terminology correction. All AI-generated outputs were human-verified for accuracy.

2.4. Task 3: Grounded Theory Coding with TCS Validation

Purpose: Systematically analyze transcripts to extract empirically grounded themes with quantified practitioner consensus.

Input: High-quality transcripts (Task 2 output).

Output: Validated themes with Theme Coverage Scores (TCS), mapped to quality framework.

Methodology Foundation: Grounded theory three-phase coding with quantitative validation.

2.4.1. Grounded Theory Foundation

The coding approach is grounded in grounded theory (GT), a systematic qualitative research method for inductively developing theory from empirical data [13,14,15]. GT originated in sociology but has been successfully applied in software engineering research to understand complex socio-technical phenomena [19,20,21]. GT is particularly appropriate for this methodology because it facilitates theory development directly from practitioner discourse rather than via testing predetermined hypotheses, allowing requirements to emerge from actual practice rather than researchers’ assumptions.

GT prescribes three systematic coding phases that directly inform this methodology: open coding for breaking down data into discrete concepts without imposing predetermined categories [22], axial coding for identifying relationships between concepts and mapping to theoretical dimensions [23], and selective coding for integrating validated themes into coherent theory [15,23]. GT’s iterative sampling continues until new data yields no new theoretical insights [15,23], providing systematic stopping criteria that balance thoroughness with practical constraints.

2.4.2. Step 1: Open Coding

For each video transcript, systematic open coding is conducted to extract structured information without imposing predetermined categories. Three categories of information are systematically extracted: (1) the main topics discussed, capturing the speaker’s core message and presentation context; (2) coverage assessment, evaluating the degree to which each quality dimension from the research framework is discussed; and (3) the key takeaways by stakeholder, extracting actionable insights separately for relevant stakeholder groups as defined in the goal specification.

Each key takeaway must be actionable (specifying what to do or consider), grounded (directly stated or clearly implied by the speaker), and attributable (quotable with timestamp reference). All extracted information is documented with supporting evidence quotes (verbatim from transcript), timestamps (enabling verification against source video), implementation context (speaker’s organizational setting), and confidence ratings (high/medium/low based on explicitness). This per-video coding creates a structured database enabling subsequent cross-source analysis while maintaining complete traceability to the source material.

2.4.3. Step 2: Axial Coding

Coded takeaways from open coding are systematically mapped to the quality dimensions specified in the research framework. The framework’s coverage for each video is assessed using a five-level qualitative scale: None (0%), Minimal (25%), Moderate (50%), Substantial (75%), and Comprehensive (100%). The coverage is assessed separately for each stakeholder group defined in the goal specification, as practitioners often address different audiences with distinct concerns. This stakeholder-aware coding enables the identification of perspective-specific requirements and systematic gaps.

Patterns across videos are analyzed by examining frequency (how many sources discuss each dimension), stakeholder differences (variation in emphasis across stakeholder groups), cross-cutting themes (themes appearing across multiple dimensions), explicit trade-offs (tensions mentioned by practitioners), and systematic gaps (dimensions consistently underemphasized). This pattern analysis reveals both common concerns and systematic blind spots in practitioner discourse.

2.4.4. Step 3: Selective Coding and TCS Calculation

Recurring themes representing coherent practitioner concerns across presentations are identified through iterative refinement. Themes are validated by assessing internal consistency (theme captures coherent practitioner concern), external distinction (theme is clearly distinguishable from other themes), empirical grounding (theme is supported by explicit practitioner statements), and actionability (theme translates to concrete NFR specification).

{TCS}_{theme} = \frac{\sum_{s \in S_{theme}} C_{s, d}}{| S_{theme} |}

(1)

The Theme Coverage Score (TCS) quantifies the practitioner consensus for each validated theme using Equation (1), where

S_{theme}

is the set of sources discussing the theme,

C_{s, d}

is the coverage percentage (0, 25, 50, 75, or 100%) from axial coding for dimension d in source s, and

| S_{theme} |

is the number of sources discussing the theme. TCS ranges from 0–100%, with higher values indicating broader and deeper coverage, representing average depth of discussion across all sources addressing the theme.

2.5. Task 4: MoSCoW Prioritization

Purpose: Transform Theme Coverage Scores into prioritized requirement classifications using empirically derived thresholds.

Input: Validated themes with TCS values (Task 3 output).

Output: Themes classified into MoSCoW categories (Must/Should/Could/Won’t Have).

Methodology Foundation: Empirically derived thresholds from TCS distribution analysis.

2.5.1. MoSCoW Framework and Threshold Derivation

The MoSCoW method [24,25] classifies requirements into four priority levels: Must Have (critical requirements essential for success), Should Have (important requirements that should be included if possible), Could Have (desirable enhancements that could improve the solution), and Won’t Have (requirements with insufficient justification for current inclusion). Traditional MoSCoW relies on expert judgment; this methodology derives thresholds empirically from TCS distributions, eliminating subjective bias.

Priority assignment uses TCS-based thresholds derived from natural clustering in the data. The threshold derivation process involves calculating TCS for all themes, analyzing the TCS distribution (histogram and quartiles), identifying natural breaks in distribution, and assigning thresholds based on gaps in coverage patterns. The resulting thresholds are Must Have (TCS

\geq 85 %

; strong practitioner consensus with comprehensive coverage), Should Have (TCS 65–84%; significant discussion with substantial coverage), Could Have (TCS 45–64%; emerging themes with moderate coverage), and Won’t Have (TCS

< 45 %

; insufficient practitioner consensus for current inclusion). These thresholds emerge from natural clustering observed across themes, ensuring priorities reflect actual practitioner emphasis patterns rather than arbitrary cutoffs.

2.5.2. Priority Assignment Procedure

For each validated theme, the priority assignment procedure involves retrieving the TCS value from selective coding, applying the threshold ranges to classify the theme, documenting the rationale for why the threshold was met or not met, and validating consistency to ensure the classification aligns with the qualitative assessment. This systematic procedure eliminates expert rating requirements while maintaining methodological rigor, enabling objective prioritization based on the empirical evidence of practitioner consensus.

2.6. Task 5: NFR Specification with Coverage Score Evaluation

Purpose: Transform prioritized themes into structured NFR specifications with validation through Weighted Coverage Score.

Input: Prioritized themes (Task 4 output).

Output: Complete NFR specifications with evaluation metrics.

Methodology Foundation: ISO/IEC 25010:2023 quality model structure with multi-faceted validation.

2.6.1. NFR Specification Template

Each NFR follows a structured template providing clarity through active voice, measurable conditions, and explicit rationale, consistent with ISO/IEC 25010:2023 principles of stakeholder perspective, measurable quality criteria, and explicit quality characteristic identification. The template format is as follows: “The [stakeholder] [must/should/could/won’t] [action/capability] [measurable quality measure] for [quality dimension].” The template’s components include the stakeholder (specific stakeholder group from goal definition), priority modal (Must, Should, Could, or Won’t from MoSCoW classification), action/capability (empirically grounded theme from selective coding), measurable quality measure (quantitative or observable criterion derived from practitioner discourse), and quality dimension (dimension from research framework).

Each NFR maintains comprehensive evidence documentation, including the TCS score and source count (TCS value 0–100%, number of sources discussing theme, and distribution across coverage levels), priority classification (MoSCoW category with threshold justification and rationale), stakeholder assignment (stakeholder group, role-specific considerations, and implementation context), measurement rationale (explaining why this specific measure, how practitioners verify compliance, real-world constraints and success criteria, and the lessons learned from practitioners’ discourse), and source evidence (video presentation identifiers, specific quotes with timestamps, and cross-references to source metadata). This comprehensive documentation enables the systematic verification of requirements against practitioner discourse while maintaining actionability for software development teams.

2.6.2. Weighted Coverage Score Validation

The Weighted Coverage Score (WCS) represents the normalized average of all Theme Coverage Scores as shown in Equation (2), where N is the total number of themes and

T C S_{i}

is the Theme Coverage Score for theme i expressed as a percentage (0–100%). WCS quantifies how thoroughly the derived NFR specifications capture practitioner discourse, with values approaching 1.0 indicating comprehensive coverage, values between 0.70 and 0.85 indicating acceptable coverage by analogy with inter-rater reliability standards in content analysis [16], and values below 0.70 suggesting inadequate sampling. Importantly, the WCS preserves discriminatory power to reveal systematic gaps where certain dimensions receive minimal practitioner attention, representing important empirical findings rather than methodological failures.

W C S = \frac{\sum_{i = 1}^{N} T C S_{i}}{N \times 100}

(2)

The WCS is supplemented with established qualitative quality criteria from GT tradition [14,17]: credibility through transparent coding procedures with documented decision trails (inter-coder reliability if multiple coders, explicit coding rules, and an audit trail of analytical decisions), dependability through explicit replication artifacts enabling independent verification (complete methodology documentation, publicly available transcripts, and coding rules and examples), transferability through detailed documentation supporting application to other domain’s domain-independent procedures, generalization guidelines, and adaptation criteria for new domains), and confirmability through evidence traceability linking each requirement to specific practitioner statements (timestamp references, direct quotes, and source attribution). This multi-faceted validation ensures NFR specifications are both empirically grounded and methodologically rigorous.

3. Results

We present the results from applying our methodology (Section 2) to trustworthy APIs in the AI era. This section first specifies the research domain and goals (Section 3.1), and then presents results organized by the five methodology tasks: data collection outcomes (Section 3.2), transcription outcomes (Section 3.3), grounded theory coding findings (Section 3.4), prioritization results (Section 3.5), and derived NFR specifications with validation (Section 3.6). This case study demonstrates both methodological validity and reveals critical insights about practitioner priorities in trustworthy API design.

3.1. Research Domain and Goal Specification

Following the methodology’s initial step (Section 2.1), we explicitly define the research domain, quality framework, stakeholder perspectives, and analytical goals for this case study application.

3.1.1. Domain Selection: Trustworthy APIs in the AI Era

Trustworthy APIs in the AI era represent a rapidly evolving domain where best practices emerge through practitioner discourse faster than formal documentation can capture them. The domain exhibits all three characteristics identified in Section 2.1: (1) rapid evolution driven by AI/ML technology advancement, (2) an active practitioner community sharing experiences through conferences, webinars, and technical talks, and (3) a critical need for systematic requirements to guide API design in high-stakes AI applications. The domain scope encompasses application programming interfaces (APIs) that expose, integrate, or support AI/ML capabilities, including APIs providing AI/ML services (e.g., inference endpoints, model-as-a-service), APIs integrating third-party AI/ML capabilities, APIs with AI-enhanced functionality (e.g., intelligent routing and adaptive rate limiting), and infrastructure APIs supporting AI/ML workloads.

3.1.2. Quality Framework: Five Trustworthiness Dimensions

We adopt a quality framework based on established trustworthy AI principles, specifying five trustworthiness dimensions: (1) Explainability: the ability to provide understandable explanations of API behavior, decisions, and AI model reasoning; (2) Fairness: the freedom from bias and discrimination in API outputs, ensuring equitable treatment across user groups; (3) Robustness: resilience against attacks, errors, and unexpected inputs with reliable performance under varying conditions; (4) Data Privacy: the protection of sensitive information, compliance with privacy regulations, and secure data handling; and (5) Transparency: visibility into API operations, costs, limitations, and changes with clear documentation and communication. These dimensions align with ISO/IEC 25010:2023 quality characteristics while addressing AI-specific concerns.

3.1.3. Stakeholder Perspectives

Two primary stakeholder groups are analyzed: API Providers (organizations and developers responsible for designing, implementing, deploying, and maintaining APIs) with a focus on systems’ design, architecture, security frameworks, monitoring, and governance, and concerns about scalability, cost management, regulatory compliance, and service reliability, and API Consumers (organizations and developers who integrate and utilize third-party APIs in their applications) with a focus on integration patterns, error handling, vendor evaluation, and cost optimization, and concerns about service availability, API changes, debugging support, and pricing predictability. This dual perspective enables identification of requirements from both supply-side and demand-side viewpoints.

3.1.4. Research Goals

The specific analytical goals are (1) derive prioritized NFRs systematically from practitioner discourse, reflecting actual practitioner consensus rather than prescriptive frameworks; (2) quantify dimension coverage to measure the extent to which practitioners discuss each trustworthiness dimension, revealing emphasis patterns and systematic gaps; (3) identify stakeholder-specific requirements distinguishing concerns relevant to API providers versus consumers; (4) validate methodology effectiveness through the Weighted Coverage Score and qualitative validation criteria; and (5) reveal practitioner priorities by exposing disparities between prescriptive frameworks (which often present all dimensions as equally important) and actual practitioner emphasis. These goals guide the data collection, analysis, and interpretation throughout the study.

3.2. Task 1 Results: Data Collection Outcomes

Table 1 presents the complete corpus of 22 expert presentations from YouTube spanning April 2019 to July 2025. The corpus represents diverse sources including conference talks (TWAI-3, TWAI-4, and TWAI-11), expert webinars (TWAI-1, TWAI-5, and TWAI-7), podcast episodes (TWAI-13, TWAI-15, and TWAI-20), and technical deep dives (TWAI-16 and TWAI-19). The speaker diversity includes API platform architects, security researchers, governance specialists, and industry thought leaders from organizations ranging from startups to major technology companies. Thematic saturation was achieved after 22 presentations, with the final three videos contributing no new themes, confirming the adequate coverage of practitioner perspectives across the domain.

3.3. Task 2 Results: Transcription Outcomes

All 22 presentations were successfully transcribed using the human–machine teaming approach described in Section 2.3. Single-speaker videos (14 presentations) required an average of 5 min processing time per video, while multi-speaker videos (eight presentations) required 10–15 min, including speaker diarization and verification. Spot-checking verification confirmed that transcriptional accuracy is high for domain-specific technical terminology across all transcripts. The resulting transcript corpus, spanning presentations ranging from conference talks to podcast discussions, provides a rich empirical grounding for subsequent thematic coding.

3.4. Task 3 Results: Grounded Theory Coding Findings

Systematic coding across open coding, axial coding, and selective coding (Section 2.4) yielded 30 empirically grounded themes organized across five trustworthiness dimensions and two stakeholder perspectives. Table 2 presents the dimension-level coverage assessment showing how thoroughly each dimension was discussed across the 22 presentations.

The coverage analysis reveals striking disparities across dimensions. Robustness dominates practitioner discourse with 88.64% average coverage, discussed substantively or comprehensively (score ≥75%) in 20 of 22 presentations and with at least moderate coverage in all 22. Transparency achieves 84.09% average coverage with 20 of 22 presentations providing at least substantial discussion. Data Privacy receives moderate attention at 54.55% average coverage, while Explainability achieves 65.91%. Critically, Fairness achieves only 27.27% average coverage with 16 of 22 presentations (73%) providing no substantive discussion (score ≤25%). This systematic gap between prescriptive frameworks emphasizing fairness and actual practitioner priorities represents a critical finding for trustworthy API development.

3.5. Task 4 Results: MoSCoW Prioritization Outcomes

Applying the empirically derived MoSCoW thresholds (Section 2.5) to the 30 themes produces 11 Must Have requirements (37%), 9 Should Have requirements (30%), 6 Could Have requirements (20%), and 4 Won’t Have classifications (13%). All six Robustness themes achieve Must Have status with Theme Coverage Scores ranging 87.5–100%, demonstrating strong practitioner consensus on security, resilience, and reliability concerns. Four of six Transparency themes are classified as Must Have, reflecting practitioners’ emphasis on observability, documentation, and cost monitoring. Two consumer-perspective Explainability themes (E1: documentation quality signal, TCS 62.5%; E2: error message clarity, and TCS 62.5%) are classified as Could Have, reflecting emerging but not yet consensus practitioner guidance on consumer-side Explainability evaluation. In contrast, Fairness contributes zero Must Have or Should Have requirements, with all six themes classified as either Could Have or Won’t Have due to insufficient practitioner consensus.

One noteworthy edge case is Data Privacy P2 (Encryption), which achieves a TCS of 100% from a single source (TWAI-7). Although this score nominally satisfies the Must Have threshold (≥85%), the classification is downgraded to Should Have to reflect the insufficient breadth of practitioner consensus: a single comprehensive discussion, however authoritative, does not constitute the multi-source agreement required for a Must Have requirement. This illustrates an important methodological nuance—TCS captures both coverage intensity and source breadth, but when

n = 1

, high intensity alone should not override the consensus criterion that underpins Must Have prioritization.

3.6. Task 5 Results: NFR Specifications and Validation

Table 3 presents the complete set of 30 derived themes with Theme Coverage Scores, MoSCoW classifications, and stakeholder-level coverage. TCS values quantify both the breadth (number of sources) and depth (coverage intensity) of practitioner consensus for each theme. Stakeholder-level analysis reveals meaningful patterns: for Robustness, providers (93.3%) and consumers (91.7%) achieve near-parity, while Data Privacy shows substantial divergence with providers (73.3%) exceeding consumers (51.4%). Both stakeholder groups exhibit Fairness underrepresentation, with providers (33.3%) and consumers (41.7%) both falling below the 45% threshold for the Could Have classification.

3.6.1. Derived Non-Functional Requirements

The 30 empirically grounded NFRs are presented below, organized by the MoSCoW priority classification based on the Theme Coverage Score (TCS) thresholds (Section 2.5).

3.6.2. Must Have Requirements (TCS ≥ 85%)

These 11 requirements represent a strong practitioner consensus with comprehensive coverage across presentations.

Robustness (six requirements)

NFR-R-001: The API provider MUST implement comprehensive security frameworks with AI-specific threat modeling achieving the detection and mitigation of prompt injection, model extraction, data poisoning, and adversarial attacks for Robustness.
NFR-R-002: The API provider MUST design modular API architectures enabling independent scaling, updating, and the replacement of AI model components without system-wide disruption for Robustness.
NFR-R-003: The API provider MUST implement behavioral anomaly detection systems identifying model extraction attempts, membership inference attacks, and systematic probing patterns for Robustness.
NFR-R-004: The API consumer MUST establish infrastructure readiness including adequate computational resources, network capacity, and failover mechanisms achieving high availability before production AI API integration for Robustness.
NFR-R-005: The API consumer MUST implement hard budget caps and automated spending controls preventing runaway costs from unpredictable AI API token consumption for Robustness.
NFR-R-006: The API consumer MUST conduct comprehensive integration testing validating edge cases, error handling, rate limit scenarios, and cost predictions before production deployment for Robustness.

Transparency (four requirements)

NFR-T-001: The API provider MUST implement progressive observability systems enabling drill down from high-level metrics to detailed distributed traces for operational investigation for Transparency.
NFR-T-002: The API provider MUST publish comprehensive OpenAPI 3.0+ specifications with semantic annotations enabling both human developers and AI agents to understand the capabilities and constraints for Transparency.
NFR-T-003: The API consumer MUST implement token tracking systems monitoring the input and output token consumption per request, per user, and per application feature for Transparency.
NFR-T-004: The API consumer MUST provide cost monitoring dashboards displaying cumulative spend, cost trends, and predictive alerts before budget thresholds exceeded for Transparency.

Data Privacy (one requirement)

NFR-P-001: The API consumer MUST validate data access permissions ensuring zero unauthorized data exposure by verifying authorization policies before every API call containing user data for Data Privacy.

3.6.3. Should Have Requirements (65% ≤ TCS < 85%)

These nine requirements represent significant practitioner discussion with substantial coverage, important for trustworthiness but not achieving a universal consensus.

Transparency (two requirements)

NFR-T-005: The API provider SHOULD maintain audit trails logging all API requests, configuration changes, and access control modifications with tamper-evident storage for Transparency.
NFR-T-006: The API consumer SHOULD implement usage analytics tracking API call patterns, performance metrics, and feature adoption for Transparency.

Data Privacy (three requirements)

NFR-P-002: The API provider SHOULD enforce logical data separation ensuring that different customers’ data, requests, and model interactions remain isolated for Data Privacy.
NFR-P-003: The API provider SHOULD implement end-to-end encryption using TLS 1.3+ for data in transit and AES-256 for data at rest for Data Privacy.
NFR-P-004: The API consumer SHOULD implement data minimization sending only necessary information by removing or redacting sensitive fields for Data Privacy.

Explainability (four requirements)

NFR-E-001: The API provider SHOULD provide OpenAPI specifications with semantic annotations describing parameter meanings, usage contexts, and constraints for Explainability.
NFR-E-002: The API provider SHOULD implement automated documentation generation maintaining synchronization between API specifications, code examples, and implementation for Explainability.
NFR-E-003: The API provider SHOULD provide natural language descriptions explaining API behavior, parameter effects, and common use cases complementing technical specifications for Explainability.
NFR-E-006: The API consumer SHOULD provide code examples covering common use cases, error scenarios, and integration patterns in multiple programming languages for Explainability.

3.6.4. Could Have Requirements (45% ≤ TCS < 65%)

These six requirements represent emerging themes with moderate coverage, which desirable but not yet consensus best practices.

Data Privacy (one requirement)

NFR-P-005: The API provider COULD implement row-level security mechanisms enforcing fine-grained access control at data element level for Data Privacy.

Explainability (two requirements)

NFR-E-004: The API consumer COULD evaluate API documentation quality by assessing completeness, accuracy, example coverage, and maintenance frequency as a trust signal for Explainability.
NFR-E-005: The API consumer COULD implement error handling translating API error responses into actionable guidance for developers and end users for Explainability.

Fairness (3 requirements)

NFR-F-001: The API provider COULD offer special pricing tiers for verified educational institutions and non-profit organizations for Fairness.
NFR-F-002: The API consumer COULD advocate for pricing models considering regional economic differences and organizational size for Fairness.
NFR-F-003: The API provider COULD provide free sandbox environments with realistic capabilities enabling evaluation without financial commitment for Fairness.

3.6.5. Won’t Have Classifications (TCS < 45%)

These four items are termed classifications rather than requirements, reflecting that insufficient practitioner consensus prevents their elevation to actionable NFR status. They remain as potential future considerations pending increased practitioner consensus.

Data Privacy (one classification)

NFR-P-006: Advanced Privacy-Preserving Patterns—Won’t Have (TCS: 0%; one source with no substantive discussion). Insufficient practitioner attention to extract an actionable requirement for privacy-preserving patterns such as federated learning or differential privacy.

Fairness (three classifications)

NFR-F-004: Free Tier Access Standardization—Won’t Have (TCS: 25.0%; two sources with brief mentions). This is discussed as a business strategy rather than fairness requirement with insufficient consensus.
NFR-F-005: Equal Access Guarantees—Won’t Have (TCS: 25.0%; four sources with minimal discussion). Scattered conceptual mentions without concrete implementation guidance.
NFR-F-006: Proportional Rate Limiting by Organization Size—Won’t Have (TCS: 25.0%; two sources with brief mentions). Theoretical consideration without technical implementation guidance.

3.6.6. Distribution Summary

The 30 NFRs distribute as follows:

Must Have: eleven requirements (36.7%)—six Robustness, four Transparency, and one Data Privacy.
Should Have: nine requirements (30.0%)—two Transparency, three Data Privacy, and four Explainability.
Could Have: six requirements (20.0%)—one Data Privacy, two Explainability, and three Fairness.
Won’t Have: four classifications (13.3%)—one Data Privacy and three Fairness.

Key Finding: Fairness contributes zero Must Have or Should Have requirements, with all six Fairness themes classified as either Could Have or Won’t Have due to insufficient practitioner consensus. This represents a 2.4:1 disparity between the highest-coverage dimension (Robustness; 93.2% average TCS) and the lowest-coverage dimension (Fairness; 38.5% average TCS).

The per-dimension TCS subtotals are Robustness (559.10), Transparency (542.20), Data Privacy (374.20), Explainability (418.80), and Fairness (231.25), yielding a grand total of 2125.55. Applying Equation (2) yields a WCS of

2125.55 / 3000 = 0.7085

, which falls within the acceptable range (0.70–0.85) established for grounded theory saturation and content analysis reliability, indicating that the derived NFR specifications adequately capture practitioners’ discourse while preserving the discriminatory power to reveal systematic disparities across dimensions.

3.7. Summary of Findings

Our research question asked the following: How can the systematic analysis of practitioner discourse enable the derivation of empirically grounded, prioritized NFR specifications for rapidly evolving software domains? Figure 2 illustrates the integrated answer: how the methodology transforms multimedia practitioner discourse (input) through five systematic tasks (process) into prioritized NFR specifications (output).

Input: The methodology begins with 22 expert presentations collected through purposive sampling until thematic saturation (final three videos contributing zero new themes), encompassing conference talks (TWAI-3, TWAI-4, and TWAI-11), webinars (TWAI-1, TWAI-5, and TWAI-7), podcast episodes (TWAI-13, TWAI-15, and TWAI-20), and technical deep dives (TWAI-16 and TWAI-19) across five trustworthiness dimensions and twoIn Proceedings of the stakeholder perspectives.

Process: Five sequential tasks transform raw discourse into validated NFR specifications. Task 1 applied purposive sampling with explicit inclusion/exclusion criteria targeting practitioners with genuine implementation experience. Task 2 achieved high transcriptional accuracy through human–machine teaming (single-speaker: ~5 min; multi-speaker with diarization: 10–15 min). Task 3 applied grounded theory coding through open, axial (five-level coverage assessment: 0/25/50/75/100%), and selective coding, yielding 30 themes with Theme Coverage Scores. Task 4 applied MoSCoW prioritization using empirically derived thresholds (Must Have ≥85%, Should Have 65–84%, Could Have 45–64%, and Won’t Have <45%). Task 5 transformed themes into NFR specifications consistent with ISO/IEC 25010:2023 principles, validated by Weighted Coverage Score of 0.71 within the acceptable range (0.70–0.85).

Output: The 30 NFRs reveal systematic practitioner priorities. Eleven Must Have requirements (37%) concentrate in Robustness (six themes) and Transparency (four themes), with one data privacy theme (consumer-side permission validation), reflecting the strong consensus on operational and security concerns. Nine Should Have requirements (30%) span Transparency (2), Data Privacy (3), and Explainability (four themes: machine-readable documentation, automated generation, natural language descriptions, and code examples). Six Could Have requirements (20%) cover Data Privacy (1), Explainability (2: documentation quality signal, and error message clarity), and Fairness (3: educational pricing, pricing equity, and sandbox access). Four Won’t Have classifications (13%) reflect insufficient consensus for Data Privacy (1: advanced privacy-preserving patterns) and Fairness (3: tiered access, access fairness, and proportional limiting).

Critical Finding: The most significant finding is a 2.4:1 disparity between Robustness (93.2% average TCS; 20 of 22 presentations substantive) and Fairness (38.5% average TCS; 73% of presentations with no substantive discussion, and coverage score ≤25%). Fairness contributes zero Must Have or Should Have requirements, while Robustness and Transparency dominate because operational failures provide immediate, visible consequences. This gap suggests that Fairness remains an immature practice area lacking concrete patterns, measurable metrics, and operational tooling comparable to the security or monitoring domains.

4. Discussion

This section interprets the empirical findings from Section 3 through the lens of trustworthy API design, examining what the data means for practitioners, framework designers, and policymakers navigating NFR specification in rapidly evolving domains. Figure 3 frames the complete narrative: from methodological validation through the operational reality gap to coordinated action.

4.1. Methodological Validation: Viability of Practitioner Discourse Analysis

The methodology’s validity rests on three interlocking checks that collectively confirm the evidence base is fit for deriving actionable NFR specifications. First, quantitative validity: The Weighted Coverage Score of 0.71 falls within the acceptable range (0.70–0.85) for grounded theory saturation and content analysis reliability. Importantly, WCS preserves discriminatory power: the 38.5% average TCS for Fairness is not a sign of inadequate sampling. Rather, it signals genuine practitioner underattention—precisely the kind of finding a requirements analyst needs to surface. Second, qualitative validity: The methodology satisfies established grounded theory criteria: credibility through transparent coding with documented decision trails; dependability through a publicly available replication package; transferability through domain-agnostic procedures; and confirmability through timestamp-linked evidence tracing each requirement to specific practitioner statements. Third, empirical validity: Thematic saturation was reached after 22 presentations, with the final three contributing zero new themes. All 11 Must Have requirements were corroborated by 4–10 independent sources, providing the cross-source consensus that distinguishes robust signal from noise. Together, these three forms of validation establish that the derived requirements reflect what practitioners actually know and prioritise, not what researchers assume they should.

4.2. Trustworthy API Priorities: Gap Between Frameworks and Practice

The 2.4:1 disparity between Robustness (93.2% average TCS) and Fairness (38.5%) is best understood not as practitioner negligence but as a consequence of asymmetric feedback architecture. In domain-driven terms, Robustness and Transparency benefit from first-class domain events: security incidents trigger incident management workflows, cost overruns fire budget alerts, and API outages page on-call engineers. Fairness has no equivalent mechanism. Biased outputs go undetected, inequitable pricing affects parties other than the implementing team, and access disparities accumulate silently until they escalate into crises. This feedback asymmetry has three observable consequences. First, all 11 Must Have requirements concentrate in Robustness and Transparency. Second, 20 of 22 presentations discussed Robustness substantively (score ≥75%). Third, 73% of presentations provided no substantive Fairness discussion (coverage score ≤25%); the six presentations that did address Fairness focused on pricing and access equity rather than algorithmic bias detection—the subdomain where tooling and metrics remain most immature. Prescriptive frameworks that present all trustworthiness dimensions as equally implementable set expectations that current practice cannot meet. A more effective approach would acknowledge this maturity gradient and direct early investment toward creating the monitoring hooks, audit triggers, and compliance signals that would make Fairness violations as operationally visible as a failed health check.

4.3. Stakeholder Perspective Differences: Providers Versus Consumers

Two stakeholder perspectives are considered: API providers, who design and implement APIs, and API consumers, who integrate and use APIs. Separating providers’ and consumers’ perspectives reveals a pattern familiar to architects working across bounded contexts: shared concerns produce convergence, while asymmetric obligations produce divergence. Regarding Robustness and Transparency, both stakeholder groups show near-parity on Robustness (providers 93.3% vs. consumers 93.0%) and Transparency (93.2% vs. 87.5%). This convergence reflects a shared reality: security, resilience, and observability create operational feedback loops that are equally salient on both sides of the API boundary. Regarding Data Privacy, a 21.9 percentage point gap (providers 73.3% vs. consumers 51.4%). This divergence directly reflects asymmetric regulatory exposure. Providers own the data model and bear direct GDPR and CCPA liability. Consumers, by contrast, often treat privacy as a provider responsibility and lack visibility into upstream data handling. This delegation assumption constitutes a genuine integration risk that requires explicit validation patterns rather than implicit trust. Most revealing is the Fairness result, both groups fall below 45% (providers 33.3%; consumers 43.75%). This symmetry confirms that the fairness gap is a domain-wide developmental deficit, not a stakeholder-specific blind spot. Closing it requires shared infrastructure—standardised metrics, open tooling, and domain events for bias detection—rather than guidance targeted at only one side of the API contract.

4.4. Implications for Practice and Policy

The findings translate into differentiated guidance for three audiences whose coordinated action is needed to close the gap between trustworthy API theory and practice. API practitioners should treat the 11 Must Have requirements (TCS ≥ 85%) as a non-negotiable architectural baseline. This baseline encompasses security frameworks, observability infrastructure, OpenAPI specifications, cost monitoring, and token tracking. Data Privacy demands particular stakeholder-specific attention: providers must establish compliance architectures, while consumers must actively validate provider claims rather than delegating responsibility. Assuming the other party has resolved privacy obligations is precisely how integration incidents occur. Framework designers should retire equal-weight multi-dimension models. These should be replaced with maturity-differentiated guidance that provides concrete implementation patterns for Robustness and Transparency while focusing Fairness guidance on three near-term objectives: defining measurable proxies, identifying early monitoring signals, and building the ubiquitous language of fairness that domain practitioners currently lack. Policymakers should sequence compliance timelines according to practice maturity. Immediate enforcement is appropriate for dimensions with established tooling. For Fairness, the more productive investment is funding shared infrastructure—bias metrics, open-source detection libraries, and certification frameworks. The goal is to replicate, for Fairness, what decades of security standards achieved for Robustness: making violations operationally visible, measurable, and consequential enough to sustain practitioner attention without ongoing regulatory coercion.

4.5. Threats to Validity

Validity threats are addressed following Runeson and Höst’s framework [48], with mitigations built into the methodological design. Concerning construct validity, the risk of coding subjectivity is reduced through explicit coding rules, TCS quantification replacing judgment-based importance scores, and a complete replication package on Zenodo (DOI: https://doi.org/10.5281/zenodo.18341067 (accessed on 23 January 2026)) enabling independent re-coding. The WCS of 0.71, falling within the acceptable range, provides quantitative confirmation of adequate construct validity. Concerning internal validity, confounding factors such as the presentation venue, speaker expertise, and audience composition are mitigated through diverse source selection spanning conferences, webinars, podcasts, and technical talks. A residual risk is that practitioners in highly regulated industries may be underrepresented in public multimedia discourse. Concerning external validity, three acknowledged scope limitations bound the generalizability of the findings. First, the English-language corpus may exclude communities with different priorities. Second, YouTube-based collection may overrepresent certain organizational types. Third, the 2019–2025 collection window captures a specific technological moment. The findings should therefore be read as characterizing English-language public practitioner discourse on trustworthy APIs during this period, not as universal generalizations. The primary limitation across all validity dimensions, the fundamental constraint is that discourse analysis captures stated rather than enacted priorities. What practitioners emphasize in public presentations may differ from their private design decisions. Future work should triangulate these findings with implementation audits and practitioner interviews to validate the alignment between the stated and enacted priorities.

5. Conclusions

This research demonstrates that the systematic analysis of practitioner discourse enables the derivation of empirically grounded, prioritized NFR specifications for rapidly evolving domains. The five-task methodology combining AI-assisted transcription, grounded theory coding, and Theme Coverage Score validation produces requirements reflecting actual practitioner consensus (WCS = 0.71) rather than researcher speculation, with a complete replication package enabling the independent verification and adaptation to other domains. Applied to trustworthy APIs, the methodology yields 30 NFR specifications distributed as 11 Must Have, 9 Should Have, 6 Could Have, and 4 Won’t Have, consistent with ISO/IEC 25010:2023 principles. The primary empirical finding is a systematic 2.4:1 disparity: Robustness (93.2% TCS) and Transparency (90.4% TCS) dominate practitioner discourse while Fairness (38.5% TCS) receives no substantive attention in 73% of presentations (coverage score ≤25%). This gap challenges prescriptive frameworks presenting all trustworthiness dimensions as equally important and reflects fundamental differences in operational feedback mechanisms rather than practitioner indifference. For practitioners, the Must Have requirements provide an evidence-based implementation baseline; Fairness warrants investment as an emerging practice area rather than deferral. For framework designers and policymakers, the findings indicate that maturity-differentiated guidance and artificial feedback mechanisms—through auditing, certification, or mandatory reporting—are needed to close the gap between theoretical emphasis and operational reality.

Funding

This research was supported by Chiang Mai University under research project No. R67IN00546.

Institutional Review Board Statement

Ethical review and approval were waived for this study because the research involved only analysis of publicly available YouTube video presentations without human subject interaction or intervention.

Informed Consent Statement

Not applicable. This study analyzed publicly available video presentations where speakers voluntarily shared professional knowledge with the technical community. No human subjects participated beyond the secondary analysis of their public presentations. The study did not involve direct interaction with participants, the collection of private information, or any intervention. All sources are properly cited with complete attributions and URLs provided in Table 1.

Data Availability Statement

The complete dataset is openly available in Zenodo at https://doi.org/10.5281/zenodo.18341067 (accessed on 23 January 2026). The repository includes the following: (1) 30-NFR-Specifications-Complete.md: complete NFR specifications with MoSCoW classifications and source traceability; (2) open coding Video folder: per-video coding sheets with dimension coverage assessments and extracted key takeaways; (3) Transcription.docx: full transcripts for all 22 presentations; and (4) Video folder: video metadata with titles, speakers, dates, and YouTube URLs. All original videos remain publicly accessible on YouTube at the URLs in Table 1. Raw video files are not redistributed but remain accessible via original YouTube URLs to respect content creators’ rights.

Acknowledgments

The author thanks all practitioners whose expert presentations formed the corpus of this study. During the preparation of this manuscript, the author used the following AI tools: Claude 3.5 Sonnet (Anthropic, via Google AI Studio https://aistudio.google.com/ (accessed on 19 August 2025)) for transcript post-processing, including speaker identification and terminology correction; Claude Sonnet 4.5 (Anthropic, via claude.ai https://claude.ai (accessed on 19 August 2025)) as a writing assistant for manuscript drafting and revision; Gemini 3 (Google, https://gemini.google.com (accessed on 23 January 2026)) and NotebookLM (Google, https://notebooklm.google.com (accessed on 23 January 2026)) for figure generation; Otter.ai (https://otter.ai (accessed on 19 August 2025)) for automated transcription with speaker diarization; and YouTube Transcript API (https://youtube-transcript.io (accessed on 19 August 2025)). The author has reviewed and edited all AI-generated outputs and takes full responsibility for the content of this publication.

Conflicts of Interest

The author declares no conflicts of interest. The funding institution provided only institutional support for the research project without influencing research directions or outcomes.

References

Scaled Agile, Inc. Nonfunctional Requirements. Scaled Agile Framework. 2025. Updated November 2025. Available online: https://framework.scaledagile.com/nonfunctional-requirements (accessed on 20 February 2026).
Dongmo, C. A Review of Non-Functional Requirements Analysis Throughout the SDLC. Computers 2024, 13, 308. [Google Scholar] [CrossRef]
ISO/IEC 25010:2023; Systems and Software Engineering—Systems and Software Quality Requirements and Evaluation (SQuaRE)—Product Quality Model. International Organization for Standardization: Geneva, Switzerland, 2023.
Garousi, V.; Felderer, M.; Mäntylä, M.V.; Rainer, A. Benefitting from the gray Literature in Software Engineering Research. In Contemporary Empirical Methods in Software Engineering; Felderer, M., Travassos, G.H., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 385–413. [Google Scholar] [CrossRef]
Garousi, V.; Felderer, M.; Mäntylä, M.V. Guidelines for including gray literature and conducting multivocal literature reviews in software engineering. Inf. Softw. Technol. 2019, 106, 101–121. [Google Scholar] [CrossRef]
Dieste, O.; Juristo, N. Systematic review and aggregation of empirical studies on elicitation techniques. IEEE Trans. Softw. Eng. 2011, 37, 283–304. [Google Scholar] [CrossRef]
Pacheco, C.; García, I.; Reyes, M. Requirements elicitation techniques: A systematic literature review based on the maturity of the techniques. IET Softw. 2018, 12, 365–378. [Google Scholar] [CrossRef]
MacLeod, L.; Bergen, A.; Storey, M.A. Documenting and sharing software knowledge using screencasts. Empir. Softw. Eng. 2017, 22, 1478–1507. [Google Scholar] [CrossRef]
Treude, C.; Barzilay, O.; Storey, M.A. How do programmers ask and answer questions on the web?: NIER track. In Proceedings of the 2011 33rd International Conference on Software Engineering (ICSE); Association for Computing Machinery: New York, NY, USA, 2011; pp. 804–807. [Google Scholar] [CrossRef]
Aniche, M.; Treude, C.; Steinmacher, I.; Wiese, I.; Pinto, G.; Storey, M.A.; Gerosa, M.A. How Modern News Aggregators Help Development Communities Shape and Share Knowledge. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE); Association for Computing Machinery: New York, NY, USA, 2018; pp. 499–510. [Google Scholar] [CrossRef]
Hata, H.; Novielli, N.; Baltes, S.; Kula, R.G.; Treude, C. GitHub Discussions: An exploratory study of early adoption. Empir. Softw. Eng. 2022, 27, 3. [Google Scholar] [CrossRef]
Fang, H.; Vasilescu, B.; Herbsleb, J. Understanding information diffusion about open-source projects on Twitter, HackerNews, and Reddit. In Proceedings of the 2023 IEEE/ACM 16th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE); Association for Computing Machinery: New York, NY, USA, 2023; pp. 56–67. [Google Scholar] [CrossRef]
Stol, K.J.; Ralph, P.; Fitzgerald, B. Grounded theory in software engineering research: A critical review and guidelines. In ICSE ’16: Proceedings of the 38th International Conference on Software Engineering; Association for Computing Machinery: New York, NY, USA, 2016; pp. 120–131. [Google Scholar] [CrossRef]
Hoda, R. Socio-Technical Grounded Theory for Software Engineering. IEEE Trans. Softw. Eng. 2022, 48, 3808–3832. [Google Scholar] [CrossRef]
Charmaz, K. Grounded theory in global perspective: Reviews by international researchers. Qual. Inq. 2014, 20, 1074–1084. [Google Scholar] [CrossRef]
Krippendorff, K. Content Analysis: An Introduction to Its Methodology, 4th ed.; SAGE Publications, Inc.: Thousand Oaks, CA, USA, 2019. [Google Scholar] [CrossRef]
Seaman, C.B.; Hoda, R.; Feldt, R. Qualitative Research Methods in Software Engineering: Past, Present, and Future. IEEE Trans. Softw. Eng. 2025, 51, 783–788. [Google Scholar] [CrossRef]
Wutich, A.; Beresford, M.; Bernard, H.R. Sample Sizes for 10 Types of Qualitative Data Analysis: An Integrative Review, Empirical Guidance, and Next Steps. Int. J. Qual. Methods 2024, 23, 16094069241296206. [Google Scholar] [CrossRef]
Singjai, A.; Simhandl, G.; Zdun, U. On the practitioners’ understanding of coupling smells — A gray literature based Grounded-Theory study. Inf. Softw. Technol. 2021, 134, 106539. [Google Scholar] [CrossRef]
Singjai, A.; Zdun, U.; Zimmermann, O. Practitioner Views on the Interrelation of Microservice APIs and Domain-Driven Design: A gray Literature Study Based on Grounded Theory. In Proceedings of the 2021 IEEE 18th International Conference on Software Architecture (ICSA); IEEE: Piscataway, NJ, USA, 2021; pp. 25–35. [Google Scholar] [CrossRef]
Rodríguez, P. Grounded Theory in Software Engineering: Challenges and Lessons Learned from the Trenches. In WSESE ’24: Proceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering; Association for Computing Machinery: New York, NY, USA, 2024; pp. 21–26. [Google Scholar] [CrossRef]
Saldaña, J. The Coding Manual for Qualitative Researchers, 5th ed.; SAGE Publications: London, UK, 2025. [Google Scholar]
Corbin, J.; Strauss, A. Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory, 4th ed.; SAGE Publications: Thousand Oaks, CA, USA, 2015. [Google Scholar]
Clegg, D.; Barker, R. CASE Method Fast-Track: A RAD Approach; Addison-Wesley: Reading, MA, USA, 1994. [Google Scholar]
Vijayakumar, S.; Prasad, K.K.; Holla, M.R. Assessing the Effectiveness of MoSCoW Prioritization in Software Development: A Holistic Analysis across Methodologies. EAI Endorsed Trans. Internet Things 2024, 10, 1. [Google Scholar]
Treblle Webinars. Rethinking API Architecture for the AI Era|Treblle Webinars. YouTube. 2025. Available online: https://www.youtube.com/watch?v=-dSKp-saIt0 (accessed on 19 August 2025).
Permit. API Development in the AI Era. YouTube. 2024. Available online: https://www.youtube.com/watch?v=5sglo4e1SLs (accessed on 19 August 2025).
Wilde, E. How Does AI Affect APIs? Expert Opinions from Erik Wilde. YouTube. 2025. Available online: https://www.youtube.com/watch?v=6ekEQR45IIo (accessed on 19 August 2025).
Sharma, A.; Freeman, J. Securing APIs in the Age of AI: New Risks and Opportunities. YouTube. 2025. Available online: https://www.youtube.com/watch?v=tpsq3LoP67Y (accessed on 19 August 2025).
Budzynski, M.; Kasper, J. Effective API Governance in the Era of AI. YouTube. 2024. Available online: https://www.youtube.com/watch?v=SfFe2e-9u5M (accessed on 19 August 2025).
Cindric, V. 7 Key Lessons to Make Your APIs Work. YouTube. 2025. Available online: https://www.youtube.com/watch?v=1iw5Ywz0TLE (accessed on 19 August 2025).
Pilarinos, D. Building Trust in the AI Era. YouTube. 2025. Available online: https://www.youtube.com/watch?v=j9zzX-dO-x0 (accessed on 19 August 2025).
Sitaraman, M. AI and APIs, A Powerful Duo! YouTube. 2025. Available online: https://www.youtube.com/watch?v=EajW4HuS7zQ (accessed on 19 August 2025).
Sitbon, R. Securely Boosting Any Product with Generative AI. YouTube. 2024. Available online: https://www.youtube.com/watch?v=FZTq_8Iwj2A (accessed on 19 August 2025).
Harmon, J. API-as-a-Product: The Key to a Successful API Strategy. YouTube. 2024. Available online: https://www.youtube.com/watch?v=G3UZ_oiIw6I (accessed on 19 August 2025).
Gunatilaka, P.; Wijesekara, N. AI Driven API Design, Development, and Consumption. YouTube. 2025. Available online: https://www.youtube.com/watch?v=9WkuTw9NFcg (accessed on 19 August 2025).
Bhartiya, S.; Websbecher, A. How Traceable AI Is Approaching API Security. YouTube. 2022. Available online: https://www.youtube.com/watch?v=R59zciw679c (accessed on 19 August 2025).
Wilhelm, A.; Deng, S. Merge’s Unified API Bet in the AI Era. YouTube. 2024. Available online: https://www.youtube.com/watch?v=7owvVCVEDXk (accessed on 19 August 2025).
Park, S.; Segal, T. Five Ways AI-Assisted API Automation Can Supercharge Integration. YouTube. 2024. Available online: https://www.youtube.com/watch?v=gNWd6tlrhcI (accessed on 19 August 2025).
a16z. The API Economy—The Why, What and How. YouTube. 2019. Available online: https://www.youtube.com/watch?v=HNBDxRhc9PU (accessed on 19 August 2025).
Bouchard, L.F. APIs 101: From Concept to Deployment for AI Engineers. YouTube. 2025. Available online: https://www.youtube.com/watch?v=5atR70lV1fs (accessed on 19 August 2025).
Dalley, A. Why API Architecture Is the Missing Key to AI Success. YouTube. 2025. Available online: https://www.youtube.com/watch?v=vZEN9PZf2cw (accessed on 19 August 2025).
Amir; Brid, A. The Brutal Truth About Enterprise AI Adoption. YouTube. 2025. Available online: https://www.youtube.com/watch?v=VzNmynLu0e0 (accessed on 19 August 2025).
Karpathy, A. Software Is Changing (Again). YouTube. 2025. Available online: https://www.youtube.com/watch?v=LCEmiRjPEtQ (accessed on 19 August 2025).
Petruzzelli, V.; Rivera, G. The Rise of Agentic Checkout and AI Agents. YouTube. 2025. Available online: https://www.youtube.com/watch?v=_7SkxySh2tY (accessed on 19 August 2025).
Grover, K. The Price of Intelligence—AI Agent Pricing in 2025. YouTube. 2025. Available online: https://www.youtube.com/watch?v=In7K-4JZKR4 (accessed on 19 August 2025).
Sinha, S.; Fronza, E.M.; Gajula, S. Transparent and Trustworthy AI Governance. YouTube. 2025. Available online: https://www.youtube.com/watch?v=bbXqqmvppnI (accessed on 19 August 2025).
Runeson, P.; Höst, M.; Rainer, A.; Regnell, B. Case Study Research in Software Engineering: Guidelines and Examples; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar] [CrossRef]

Figure 1. Five-task research methodology modeled using BPMN 2.0 notation.

Figure 2. Empirically grounded NFR derivation methodology.

Figure 3. Trustworthy API development encounters four stations: station 1 (methodology), station 2 (reality check), station 3 (stakeholder map), and station 4 (action).

Table 1. Collected video dataset: 22 expert presentations on trustworthy APIs in the AI era.

ID	Title	Speaker(s)	Date	Ref.
TWAI-1	Rethinking API Architecture for the AI Era	Treblle Webinars	17 April 2025	[26]
TWAI-2	API Development in the AI Era	Permal	22 May 2024	[27]
TWAI-3	How does AI affect APIs? Expert Opinions from API Days NY 2025	Erik Wilde	22 May 2025	[28]
TWAI-4	Securing APIs in the Age of AI: New Risks and Threat Models	Anubhav Sharma Freeman	28 May 2025	[29]
TWAI-5	Effective API governance in the era of AI with Azure API Management	Mike Budzynski, Julia Kasper	25 September 2024	[30]
TWAI-6	7 Key Lessons to Make Your APIs Work Efficiently	Vedran Cindric	15 March 2025	[31]
TWAI-7	Building Trust In The AI Era	Dennis Pilarinos	10 July 2025	[32]
TWAI-8	AI & APIs: A Powerful Duo	Mulari Sitaraman	5 March 2025	[33]
TWAI-9	Securely Boosting Any Product with Generative AI APIs	Ruben Sitbon	30 October 2024	[34]
TWAI-10	API-as-a-product: The Key to a Successful API Program	Jason Harmon	13 November 2024	[35]
TWAI-11	AI Driven APIAI-Drivenh Enhanced Governance	Natasha Wiesekara	4 April 2025	[36]
TWAI-12	How Traceable AI Is Approaching API Security Differently	Weichao Li	18 August 2022	[37]
TWAI-13	Merge’s Unified API Bet in the AI Era	Alex Wilhelm, Shensi Deng	25 October 2024	[38]
TWAI-14	Five ways AI-assisted API automation can supercharge platform engineering	Sujin Park, Todd Segal	1 July 2024	[39]
TWAI-15	The API Economy: The Why, What, and How	a16z Podcast	2 January 2019	[40]
TWAI-16	APIs 101: From Concept to Deployment for AI Engineers	Louis-Francois Bouchard	11 January 2025	[41]
TWAI-17	Why API Architecture Is the Missing Key to AI Success	Alan Dailey	27 May 2025	[42]
TWAI-18	The Brutal Truth About Enterprise AI Adoption	Amir, Ameya Brid	29 May 2025	[43]
TWAI-19	Andrej Karpathy: Software Is Changing (Again)	Andrej Karpathy	19 June 2025	[44]
TWAI-20	The Rise of Agentic Checkout and AI Agents in Ecommerce	Vito Petruzzelli, Gena Rivera	26 June 2025	[45]
TWAI-21	The Price of Intelligence: AI Agent Pricing in 2025	Kashif Grover	23 February 2025	[46]
TWAI-22	Transparent and trustworthy AI governance with watsonx	Monich Fronza, Sunil	18 February 2025	[47]

Table 2. Trustworthiness dimension coverage by video source: API provider perspective.

	Video Source																						Avg
Dimension	TWAI-1	TWAI-2	TWAI-3	TWAI-4	TWAI-5	TWAI-6	TWAI-7	TWAI-8	TWAI-9	TWAI-10	TWAI-11	TWAI-12	TWAI-13	TWAI-14	TWAI-15	TWAI-16	TWAI-17	TWAI-18	TWAI-19	TWAI-20	TWAI-21	TWAI-22
Provider Perspective
Explainability	75	75	75	75	75	50	50	50	50	75	75	50	50	75	75	50	50	50	75	100	50	100	65.91%
Fairness	25	25	25	25	50	25	0	0	25	25	50	0	0	50	25	0	0	25	25	75	50	75	27.27%
Robustness	100	75	100	100	100	100	100	75	100	50	100	100	100	100	50	75	100	75	100	100	75	75	88.64%
Data Privacy	50	75	25	100	75	75	100	75	25	25	25	50	50	75	25	25	75	50	50	50	0	100	54.55%
Transparency	100	100	100	75	100	75	75	100	75	75	100	75	75	100	100	50	75	50	75	100	75	100	84.09%
Consumer Perspective
Explainability	75	75	75	50	50	50	50	50	25	75	75	25	25	75	25	50	50	25	25	75	50	75	52.27%
Fairness	25	25	25	25	25	50	0	0	25	25	25	0	0	25	25	0	0	25	25	50	50	75	23.86%
Robustness	100	100	100	100	100	75	100	75	100	25	50	100	75	75	50	50	75	50	100	100	75	100	80.68%
Data Privacy	25	100	25	75	50	50	100	50	25	25	0	100	50	50	25	25	50	25	25	0	0	75	43.18%
Transparency	75	100	75	100	100	75	75	100	50	75	75	50	75	75	75	25	75	25	50	75	100	100	73.86%

Coverage assessed from API provider/consumer perspective. 0% = no coverage, 25% = minimal mention, 50% = moderate discussion, 75% = substantial discussion, and 100% = comprehensive focus.

Table 3. Theme Coverage Scores (TCS) and MoSCoW classification.

Dimension	Stakeholder	Theme	Source IDs	n	TCS	Classification
Robustness	Provider	R1: Security frameworks	2, 3, 4, 5, 6, 7, 11, 14, 18, 22	10	92.5%	Must Have
		R2: Modular architecture	1, 2, 17, 18	4	87.5%	Must Have
		R3: Behavioral detection	4, 12, 14	3	100%	Must Have
		Provider Average		3	93.3%
	Consumer	R1: Infrastructure readiness	3, 5, 9, 12, 13, 19	6	95.8%	Must Have
		R2: Budget caps	2, 5, 8, 21	4	87.5%	Must Have
		R3: Integration testing	2, 4, 7, 13, 20, 22	6	95.8%	Must Have
		Consumer Average		3	93.0%
	Dimension Average			6	93.2%
Transparency	Provider	T1: Progressive observability	1, 5, 8, 22	4	100%	Must Have
		T2: OpenAPI specifications	1, 2, 3, 11, 14, 15, 19	7	96.4%	Must Have
		T3: Audit trails	10, 13, 15	3	83.3%	Should Have
		Provider Average		3	93.2%
	Consumer	T1: Token tracking	2, 5, 8, 21	4	100%	Must Have
		T2: Cost monitoring	2, 4, 7, 13, 20, 22	6	87.5%	Must Have
		T3: Usage analytics	6, 10, 11, 17	4	75.0%	Should Have
		Consumer Average		3	87.5%
	Dimension Average			6	90.4%
Data Privacy	Provider	P1: Logical separation	2, 3, 4, 5, 6, 7, 11, 14, 18, 22	10	70.0%	Should Have
		P2: Encryption	7	1	100% *	Should Have
		P3: Row-level security	12, 20	2	50.0%	Could Have
		Provider Average		3	73.3%
	Consumer	P1: Validate permissions	4, 7, 12, 22	4	87.5%	Must Have
		P2: Data minimization	2, 4, 7, 13, 20, 22	6	66.7%	Should Have
		P3: Privacy patterns	20	1	0%	Won’t Have
		Consumer Average		3	51.4%
	Dimension Average			6	62.4%
Explainability	Provider	E1: Machine-readable docs	1, 2, 3, 11, 14, 15, 19	7	75.0%	Should Have
		E2: Automated generation	10, 15, 19	3	75.0%	Should Have
		E3: Natural language	11	1	75.0%	Should Have
		Provider Average		3	75.0%
	Consumer	E1: Doc quality signal	6, 10, 11, 17	4	62.5%	Could Have
		E2: Error messages	1, 16	2	62.5%	Could Have
		E3: Examples	2, 3, 8, 14	4	68.8%	Should Have
		Consumer Average		3	64.6%
	Dimension Average			6	69.8%
Fairness	Provider	F1: Tiered access	13, 21	2	25.0%	Won’t Have
		F2: Educational pricing	11	1	50.0%	Could Have
		F3: Proportional limiting	13, 21	2	25.0%	Won’t Have
		Provider Average		3	33.3%
	Consumer	F1: Pricing equity	6	1	50.0%	Could Have
		F2: Access fairness	9, 15, 18, 20	4	31.25%	Won’t Have
		F3: Sandbox access	6	1	50.0%	Could Have
		Consumer Average		3	43.75%
	Dimension Average			6	38.5%

Source IDs: Numbers refer to TWAI-XX video sources. MoSCoW Thresholds: Must Have (TCS ≥ 85%), Should Have (65–84%), Could Have (45–64%), and Won’t Have (<45%). * (P2 Encryption): Single-source comprehensive coverage classified as Should Have to reflect insufficient breadth of practitioner consensus.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Singjai, A. Deriving Empirically Grounded NFR Specifications from Practitioner Discourse: A Validated Methodology Applied to Trustworthy APIs in the AI Era. Information 2026, 17, 304. https://doi.org/10.3390/info17030304

AMA Style

Singjai A. Deriving Empirically Grounded NFR Specifications from Practitioner Discourse: A Validated Methodology Applied to Trustworthy APIs in the AI Era. Information. 2026; 17(3):304. https://doi.org/10.3390/info17030304

Chicago/Turabian Style

Singjai, Apitchaka. 2026. "Deriving Empirically Grounded NFR Specifications from Practitioner Discourse: A Validated Methodology Applied to Trustworthy APIs in the AI Era" Information 17, no. 3: 304. https://doi.org/10.3390/info17030304

APA Style

Singjai, A. (2026). Deriving Empirically Grounded NFR Specifications from Practitioner Discourse: A Validated Methodology Applied to Trustworthy APIs in the AI Era. Information, 17(3), 304. https://doi.org/10.3390/info17030304

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deriving Empirically Grounded NFR Specifications from Practitioner Discourse: A Validated Methodology Applied to Trustworthy APIs in the AI Era

Abstract

1. Introduction

2. Materials and Methods

2.1. Research Domain and Goal Definition (Initial Step)

2.2. Task 1: Data Collection

2.2.1. Video Source and Selection Strategy

2.2.2. Inclusion and Exclusion Criteria

2.2.3. Sampling Strategy and Saturation Criterion

2.2.4. Metadata Documentation

2.3. Task 2: Automated Transcription with Speaker Diarization

2.4. Task 3: Grounded Theory Coding with TCS Validation

2.4.1. Grounded Theory Foundation

2.4.2. Step 1: Open Coding

2.4.3. Step 2: Axial Coding

2.4.4. Step 3: Selective Coding and TCS Calculation

2.5. Task 4: MoSCoW Prioritization

2.5.1. MoSCoW Framework and Threshold Derivation

2.5.2. Priority Assignment Procedure

2.6. Task 5: NFR Specification with Coverage Score Evaluation

2.6.1. NFR Specification Template

2.6.2. Weighted Coverage Score Validation

3. Results

3.1. Research Domain and Goal Specification

3.1.1. Domain Selection: Trustworthy APIs in the AI Era

3.1.2. Quality Framework: Five Trustworthiness Dimensions

3.1.3. Stakeholder Perspectives

3.1.4. Research Goals

3.2. Task 1 Results: Data Collection Outcomes

3.3. Task 2 Results: Transcription Outcomes

3.4. Task 3 Results: Grounded Theory Coding Findings

3.5. Task 4 Results: MoSCoW Prioritization Outcomes

3.6. Task 5 Results: NFR Specifications and Validation

3.6.1. Derived Non-Functional Requirements

3.6.2. Must Have Requirements (TCS ≥ 85%)

3.6.3. Should Have Requirements (65% ≤ TCS < 85%)

3.6.4. Could Have Requirements (45% ≤ TCS < 65%)

3.6.5. Won’t Have Classifications (TCS < 45%)

3.6.6. Distribution Summary

3.7. Summary of Findings

4. Discussion

4.1. Methodological Validation: Viability of Practitioner Discourse Analysis

4.2. Trustworthy API Priorities: Gap Between Frameworks and Practice

4.3. Stakeholder Perspective Differences: Providers Versus Consumers

4.4. Implications for Practice and Policy

4.5. Threats to Validity

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI