Research and Development of Test Automation Maturity Model Building and Assessment Methods for E2E Testing

Kato, Daiju; Mogi, Ayane; Ishikawa, Hiroshi; Takama, Yasufumi

doi:10.3390/software4030019

Open AccessArticle

Research and Development of Test Automation Maturity Model Building and Assessment Methods for E2E Testing

¹

Customer Success Division, Nihon Knowledge Co., Ltd., JS Building 9F, 3-19-5 Kotobuki, Taito-ku, Tokyo 111-0042, Japan

²

Faculty of Systems Design, Tokyo Metropolitan University, 6-6 Asahigaoka, Hino-shi, Tokyo 191-0065, Japan

^*

Author to whom correspondence should be addressed.

Software 2025, 4(3), 19; https://doi.org/10.3390/software4030019

Submission received: 17 May 2025 / Revised: 12 July 2025 / Accepted: 1 August 2025 / Published: 5 August 2025

(This article belongs to the Special Issue Software Reliability, Security and Quality Assurance)

Download

Browse Figures

Versions Notes

Abstract

Background: While several test-automation maturity models (e.g., CMMI, TMMi, TAIM) exist, none explicitly integrate ISO 9001-based quality management systems (QMS), leaving a gap for organizations that must align E2E test automation with formal quality assurance. Objective: This study proposes a test-automation maturity model (TAMM) that bridges E2E automation capability with ISO 9001/ISO 9004 self-assessment principles, and evaluates its reliability and practical impact in industry. Methods: TAMM comprises eight maturity dimensions, 39 requirements, and 429 checklist items. Three independent assessors applied the checklist to three software teams; inter-rater reliability was ensured via consensus review (Cohen’s κ = 0.75). Short-term remediation actions based on the checklist were implemented over six months and re-assessed. Synergy with the organization’s ISO 9001 QMS was analyzed using ISO 9004 self-check scores. Results: Within 6 months of remediation, mean TAMM score rose from 2.75 → 2.85. Inter-rater reliability is filled with Cohen’s κ = 0.75. Conclusions: The proposed TAMM delivers measurable, short-term maturity gains and complements ISO 9001-based QMS without introducing conflicting processes. Practitioners can use the checklist to identify actionable gaps, prioritize remediation, and quantify progress, while researchers may extend TAMM to other domains or automate scoring via repository mining.

Keywords:

maturity model; assessment; software test automation; E2E test; quality control; SQuaRE

1. Introduction

Today, software has become an indispensable element in almost every aspect of our lives, making our lives more convenient, user-friendly, and efficient, and supporting people’s comfortable lifestyles. In 2001, 17 software developers who had adopted methods different from traditional software development techniques met to discuss their principles and methods and presented their results. Agile software development methodologies (derived from the Agile Software Development Manifesto [1]) and DevOps [2] emerged, enabling software development methodologies that adapt to the speed of business and the convenience of software. Full-stack engineers, who have a wide range of skills and can handle all stages of the development process—from defining system requirements to design, development, operations, and maintenance—have become active by integrating development and operations, prioritizing automation and monitoring, and improving development efficiency. Technologists keeping pace with these new development technologies support the systems that connect our lives, and each engineer continues to challenge himself to improve quality.

On 18 July 2024, a system failure caused by a malfunction in CrowdStrike Holdings, Inc. [3] security software resulted in blue screen errors on Windows devices worldwide. This incident is estimated to have affected approximately 8.5 million Windows devices worldwide, making it the largest such incident in history. The global economic damage from this outage is estimated to be at least USD 10 billion, and it disrupted numerous systems, including medical facilities and public transportation systems, impacting people’s lives around the world.

In Japan, frequent reports of system outages at banks and securities firms have prevented users from accessing systems or conducting transactions, causing widespread disruptions to business operations.

Considering that such a critical outage occurred in a system that requires high reliability and had a social impact, improving development productivity alone is not enough to reduce outages. It is essential to advance quality management technologies without compromising development productivity.

The latest quality model for “quality-in-use” in the ISO 25000 [4] series of international standards for software quality requirements and evaluation, known as SQuaRE (Software Product Quality Requirements and Evaluation), which is widely used by many software quality management engineers, has added “environmental and social risks” as a sub-characteristic of “freedom from risk,” which requires consideration of measures to mitigate the social impact of systems (Figure 1). Thus, when considering software quality, it is essential for software developers to consider the social impact of failures and to ensure that users can use the software safely and with confidence.

Definition of environmental and societal risk:

The extent to which a product or system mitigates the potential risk to the environment and society at large in the intended contexts of use.

Source:

ISO/IEC 25019:2023 [5] Systems and software engineering—Systems and software Quality Requirements and Evaluation (SQuaRE)—quality-in-use model 3.2.2.2 freedom from environmental and societal risk.

Agile development methodologies are being adopted by companies developing SaaS and web systems to rapidly develop systems while maintaining high quality without sacrificing productivity. Various quality management and testing methodologies suitable for agile development have emerged, and to maintain development productivity, it is necessary to focus on the efficiency and automation of software testing, which is said to account for the majority of development work.

To effectively implement quality management, it is necessary to use a project Kanban board to visualize the pass rate of automated tests, the number of bugs identified, and the number of fixes, thereby clarifying the status of development productivity and quality.

Mugridge et al. [6] addressed the issues of redundancy, repetition, and high maintenance costs associated with automation using capture-replay tools by developing a custom tool to achieve efficient automation of ATDD (Acceptance Test-Driven Development). Regarding the impact of automation, Berłowski et al. [7] automated over 70% of functional tests in a Java EE/JSF-based project using Selenium WebDriver and JUnit, established a process for continuous execution in Scrum sprints, and demonstrated a 48% reduction in defect leakage rate and a reduction in customer acceptance test duration from 5 days to 3 days after implementation.

Jussi et al. [8] studied barriers to test automation by conducting interviews and observing data from 12 teams in six Indian IT companies. They inductively theorized the barriers and success factors for adopting and scaling test automation in agile projects. They found that skill shortages and fear of fragile testing lead to a passive attitude, while management’s requirements for quality metrics and the team’s experimental culture act as drivers.

Furthermore, Kato et al. [9] proposed a method for using the quality characteristics defined in international quality standards for software quality in agile software development using DevOps. They demonstrated that by using quality characteristics to clarify target quality in Scrum development and sprint plans, the number of defects detected can be significantly reduced, even as test density is increased. By using quality characteristics as project quality KPIs, it becomes possible to visualize DoR (Deliverable/Required/Acceptable). Furthermore, by using metrics for each quality characteristic as quality gates for each sprint, they demonstrated that it is possible to achieve built-in quality.

So how can we maximize the effectiveness of test automation to facilitate the collection of quality data? In general, organizational maturity in software development is commonly assessed using CMMI^® [10]. Gibson et al. [11] systematically examine the extent to which process improvement using CMMI actually works in practice, using publicly available data and 13 case studies from 10 companies. After a cross-sectional analysis of six indicators (cost, schedule, productivity, quality, customer satisfaction, and ROI), the results show a median cost reduction of 34%, an average schedule reduction of 22%, and a productivity improvement of up to 62%.

Meanwhile, the maturity of software testing activities can be assessed with TMMi^® [12]. Garousi et al. [13] conducted a polyphonic literature review covering 181 articles (academic 130/industry 51) on software testing maturity assessment and improvement (TMA/TPI) and extracted 58 maturity models, of which TMMi was used in the largest number of 34. The business benefits of TMMi include up to 63% reduction in defect costs, ROI payback within 12 months, and improved customer satisfaction; the operational benefits include prevention of release delays, shorter test cycles, improved risk management, and more accurate estimates based on measurements; and the technical benefits include fewer critical defects, improved traceability, test automation, and improved design techniques. The report concludes that TMMi is effective regardless of organization size or development process. In addition, the report identifies factors that inhibit the adoption of TMMi, such as lack of resources, resistance, and ROI uncertainty, and organizes measures to overcome these factors, providing practical guidance for practitioners considering test process improvement. Also, Eldh et al. [14] proposed the Test Automation Improvement Model (TAIM) as a new approach to improve the quality and efficiency of test automation to complement TPI and TMMI, which do not support test automation.

The effectiveness of test automation depends on an organization’s adherence to automation principles and the guidelines for achieving them. Objectively evaluating and analyzing this adherence allows us to measure the maturity of test automation within a development organization. However, no existing maturity model explicitly maps its practices to ISO 9001 [15] clauses, leaving compliance-driven organizations without clear guidance.

We have devised the following research questions (Table 1) and elected to develop a quantifiable test automation maturity model (TAMM) employing a checklist to fill this gap, while taking into account the validity of the ensuing hypotheses.

H1:

There is a positive correlation between low maturity and high individual differences among automation engineers, significant variation in design and code, and the number of issues identified during reviews.

H2:

A consensus-based, multi-reviewer assessment approach yields consistent checklist scores with no material divergence after group discussion. Three evaluators with κ ≥ 0.75.

Research Gap: None of the existing maturity models explicitly link E2E test-automation maturity with an ISO 9001-based QMS.

Research Goal: To devise TAMM that bridges E2E automation and ISO 9001 QMS, and to validate it in three industrial projects.

Figure 2 positions TAMM against CMMi, TMMi, and TAIM along two strategic axes (ISO 9001 alignment, automation scope). Through this research, we provide methodologies for improving test automation maturity, maximizing the productivity of test work—which accounts for the majority of development tasks—through automated testing, and further improving development productivity and software quality control to a higher level.

2. Related Work for Test Automation

2.1. Study of Test Automation Effectiveness

Early empirical work for test automation showed that where you focus automation is as important as whether you automate. Elbaum et al. [17] showed that prioritizing regression tests so that the cases most likely to reveal faults run first increases early defect detection and provides developers with faster feedback loops. Rothermel et al. [18] examine whether test-case prioritization accelerates fault detection in regression testing. Leveraging 10,000+ test suites and >150,000 mutants, the study provides strong quantitative evidence that inexpensive structural heuristics can nearly double early fault-detection effectiveness in practical regression-testing scenarios. Building on this, Elbaum et al. compared multiple prioritization strategies across program types and release cadences. They concluded that history-based strategies outperform structural ones when change volatility is high. Together, these three studies established test-case ordering as an inexpensive automation strategy that improves quality without requiring new test artifacts. Also, Kato [19] demonstrated that scenario testing—which incorporates use cases, such as those employed in performance and load testing—can improve the effectiveness of automated system testing by including use cases involving errors made by users, such as operational mistakes.

Automation then moved beyond unit-level suites. Memon et al. [20] analyzed 12 rapidly evolving GUI applications and found that GUI-specific test cases expose faults that code-centric suites miss. Meanwhile, Memon et al. introduced “GUI smoke tests”—short, automatically generated event sequences that detected 64% of field faults in just 10% of the original execution time. These papers validated the feasibility of high-level automation for user-interface defect prevention.

Bertolino et al. [21] quantified the economic dimension with an ROI model that balanced scripting and maintenance costs against fault-removal savings. Organizations that adopted their guidelines reported payback periods under nine months. Nagappan et al. [22] linked Test-Driven Development (TDD)—often automated via continuous unit testing—to 40–90% lower defect density across four industrial projects. This signals that workflow-embedded automation multiplies the ROI predicted by earlier static models.

The scope and maturity of automation became the next order of business. The Test Automation Improvement Model (TAIM), which outlines capability stages that mirror the Capability Maturity Model Integration (CMMI) but emphasize the maintainability of scripts and the analysis of results. By mapping 63 empirical studies, they demonstrated that organizations reaching TAIM Level 3 reduced release lead times by ~30%. Garousi and Mäntylä [23] complemented this with a multi-vocal review (MLR) that distilled practical “when/what to automate” heuristics—e.g., automate stable regression checks first and postpone flaky integration steps—grounded in literature and practitioner interviews.

The practitioner voice is further amplified in Rafi et al. [24], who surveyed 115 engineers. The main issues raised were fragile scripts that caused false expectations and made maintenance difficult. The questions were designed based on a literature review suggesting that false alarms, or false detections, are a widely recognized issue in the industry. Parry et al. [25] conducted a systematic analysis of the fragility issues that often arise in end-to-end (E2E) testing. They classified the root causes of these issues as timing, test order, and environment. Finally, they recommended mechanisms for isolation and retry. They reiterated that the effectiveness of automation hinges on the reliability of the tests themselves.

Software tools evolution has paralleled methodological insight. Berlowski et al. documented an SME’s year-long Selenium WebDriver adoption, recording a 48% drop in production escapes and an 18% decline in customer-reported issues after only 320 h of automation effort. This is evidence that modern open-source frameworks lower entry barriers outside of Fortune 500 contexts. Liu et al. [26] benchmarked 11 automated test-generation tools for Java and revealed that large language model–powered generators achieve 14% higher branch coverage than search-based tools, yet they still lag behind developers in terms of oracle quality. This indicates the current limits of AI-assisted automation.

Continuous integration (CI) systems, such as Jenkins [27], operationalize these techniques, enabling fast feedback and automatic enforcement of quality gates. The ubiquitous adoption of Jenkins thus acts as a sociotechnical enabler for the gains observed by many empirical studies.

Pérez-Verdejo et al. [28] took a quality-model perspective, mapping characteristics to automation effects. They found that maintainability and reliability improve most consistently, whereas portability shows mixed outcomes. Mariani et al. [29] synthesized these findings into a manifesto that positions test automation as the “central pillar” of modern QA. They urge researchers to treat automation debt as seriously as code debt. Bridging the gap between research and DevOps, Kato et al. [30] introduced a quality classification rubric that maps testing tasks to DevOps pipeline stages. Their field data showed that categorizing tests by business risk rather than layer improved stakeholder buy-in for automation investments. Kato further argued that designing tests “automation-first” increases script reuse by 22% across sprints, aligning with behavior-driven design philosophies advocated in Agile literature.

Readability also matters. Buse et al. [31] learned a metric for code readability and correlated it with post-release defects. When integrated with static analysis-triggered test creation, readable test code was associated with 16% fewer production bugs. This suggests that the way automated tests are written influences their long-term defect-shielding power.

Pressman and Maxim [32] summarize foundational process wisdom in their practitioner text, which integrates decades of automation research into lifecycle guidance. Their chapters underpin many industries training programs, ensuring that academic findings propagate into practice.

In this way, the effects of test automation are being analyzed both academically and commercially, and research is continuing toward the realization of even better automation.

2.2. Considering Test Automation from Quality Control Perspective

The utilization of test automation is a prevalent practice that is often employed to enhance the efficiency of functional testing methodologies. However, to further enhance the effectiveness of test automation, it is also possible to achieve benefits beyond functional conformity by organizing it according to international quality standards for software quality. Consequently, it is imperative for individuals involved in quality management to possess a comprehensive understanding of the quality characteristics that serve as the foundation for the quality model.

In general, the effects of software test automation by quality characteristics are represented in Figure 3 for product quality and in Figure 4 for quality-in-use. Ensuring “functional suitability” is a fundamental aspect of testing functionality. However, measuring the processing time and load status of each function during testing can also provide data to ensure “performance efficiency”. Moreover, the aggregation and analysis of test result reports can elucidate the status and trends of weak functions. Moreover, the repeated implementation of the system can contribute to ensuring its “reliability”.

In the case of scenario testing, whether the product quality is to be confirmed as a functional combination test or quality-in-use is to be confirmed as a business adaptation test depends on how the scenario is implemented.

This approach is the same for both manual and automated testing, but in automated testing, by not outputting the preparatory test cases in the result report, the test results can be made applicable to both test patterns. If a problem occurs in the preparatory test cases, marking the test results as “NG” instead of “Fail” clearly indicates that a problem occurred prior to test execution and records that the test could not be executed.

As shown in Figure 5, it is possible to manage both product and application quality based on the results of automated testing.

In the context of agile development, the utilization of automated unit test pass rates and use case automation as sprint completion criteria has been demonstrated to facilitate the implementation of test automation for quality control purposes. Furthermore, the creation of graphs depicting pass rates and reliability growth curves for each test facilitates a more comprehensive understanding of the prevailing quality standards.

Moreover, by validating the quality that has been assured for each quality characteristic, it is feasible to enhance the implementation of quality visibility through the automation of tests.

To perform quality control more accurately, it is important to quantify testing work through test automation. In other words, measuring the maturity of test automation will make it possible to measure the maturity of testing and understand its contribution to quality control. Furthermore, integrating with ISO 9001 and QMS can improve the maturity of test automation throughout the entire development organization.

2.3. Research Methodology for Test Automation

Wang et al. [34] have developed a self-assessment tool to evaluate the maturity of test automation in the software industry. In this study:

15 key areas (KAs) were identified through a literature review.
An initial tool (including 77 assessment items) was developed.
The tool was improved based on the feedback collected.

The novelty of this study is as follows:

Identifies 15 key areas for evaluating test automation maturity.
The development and evaluation of an evaluation tool based on scientific methods.
Discusses the importance of addressing bias (especially response bias) in self-assessment tools.
Uses the Content Validity Index (CVI) and the cognitive interview method to confirm the effectiveness of assessment tools.

Addresses response bias: Identifies problems in the design of self-assessment tools and suggests improvements.

It also provides the following two benefits:

Industry practicality: It is useful for companies to identify problem areas and create improvement plans. It provides an objective and quantitative assessment of test automation maturity.
Academic contribution: It is a demonstration of the process of development of assessment tools that can be applied to other areas of software engineering.

However, the initial 77 items tool takes time to respond to, and there are issues with its practicality in the field, as well as the lack of specific guidelines for comparison with the industry as a whole and for each maturity level.

TAIM provides an objective improvement process that removes the subjectivity of the TPI and TMMI models, defining 10 KAs and one GA on the basis of scientifically valid measures, and provides a framework for evaluating and implementing improvements in each of the KAs in a stepwise fashion. The report provides the following information.

It comprehensively evaluates the entire testing process, including test code design, environment configuration, and tool integration, and uses measurements in each KA to quantify the quality and efficiency of the test suite and make the improvement process transparent. The results reveal high test code maintenance costs and lack of architecture as areas for improvement. TAIM is more of a capability model for full automation than a maturity model.

This study employed a mixed-methods design comprising (a) checklist-based maturity assessment, (b) statistical agreement analysis among three assessors, and (c) short-term remediation followed by re-assessment.

Sample: Three industrial teams (RPA, Power Automate, Generative-AI) selected via maximum-variation sampling.

Assessment procedure: Interview, audit for test suite, guidelines, other documents for test execution or maintenance and test results.

Reliability: Inter-rater agreement quantified by Cohen’s κ (target ≥ 0.75).

Hypothesis testing: H1 using Spearman’s ρ between maturity score and (i) design variance σ², (ii) review comment count; H2 using κ.

Development organizations are increasingly incorporating new features and enhancing the quality of existing systems or services through derivative development, and managing various incidents using QMS. The enhancement of test automation, a process that has been demonstrated to contribute to development productivity and quality improvement, can be implemented concurrently with the improvement of the development process if it is carried out in conjunction with a Quality Management System (QMS). Consequently, a meticulous examination of TAMM in relation to ISO 9001 standards is imperative, along with the assessment of a maturity model.

3. Effectiveness of Guidelines for Test Automation

In general, the field of software development is characterized by a high degree of individualization. Consequently, within the domain of software engineering, there has been substantial research conducted on coding conventions, review processes, and related subjects. The enhancement of code readability exerts a substantial influence on the quality of the code. Consequently, when developing software, we consistently adhere to coding conventions and verify compliance through reviews.

In the context of agile development, the project charter serves as a document that provides a comprehensive overview of the project, encompassing its purpose, scope, budget, schedule, and risks. This document is generated at the inception of the project and functions as a framework for project progression (Table 2). The establishment of a project charter serves to facilitate the dissemination of project objectives among its members.

As with general software development, to increase the effectiveness of test automation, the development of automated tests must not become dependent on specific individuals. As shown in Table 3, the actors involved in test automation can naturally be categorized. This categorization is the same as that used for general development work. According to their skills, designers and developers will also serve as reviewers. Based on this classification, Listing 1 is often considered the most appropriate guideline. However, such guidelines only list activities and standardize the tasks to be performed. They do not improve the quality of each task. Therefore, it is necessary to establish criteria. Clearly setting and managing compliance with these criteria makes it easier to link with the QMS.

Listing 1. Draft table of contents for test automation guidelines.

1.

Preparation (Common)

-: Installing the test tool
-: Building a test environment
-: Checking the operation of test tool samples

2.

Test design/script implementation (Test scripts designers)

-: Test Scenario Design
-: Design/implementation of use cases
-: Design/implementation of test data

3.

Test scripts development (Test scripts developers)

-: Script development
-: Script debugging
-: Test suite creation

4.

Test execution (Test executors)

-: Scripts execution
-: Test results analysis

5.

Test maintenance/Test operations (Maintenance developers)

-: Development of a general-purpose test environment (CI environment)
-: Maintenance and updating of the test environment
-: Maintenance and updating of test tools
-: Script maintenance

Based on the key points of the identified guidelines, we created guidelines for projects using Ranorex [35] and Power Automate [36].

For projects using Power Automate, we created a separate document outlining coding conventions for flows that function as automation scripts. This difference stems from the fundamental differences between Power Automate and Ranorex. Power Automate is a GUI-based, no-code development tool, whereas Ranorex requires coding. This results in lower update frequencies for coding conventions compared to guidelines, hence the separate document.

Ranorex also supports GUI-based development; however, user code can be implemented in C#, and user coding is essential to improving the quality of automated testing. In other words, logic recorded and edited on a GUI basis can be converted to user code and then converted into reusable components. As with general development, these components increase the reusability of test code.

Listing 2 is the table of contents for the Power Automate guidelines, which span over 150 pages.

The guidelines provide step-by-step explanations of development and operational procedures, including installing Power Automate and utilizing scenario tests created for existing manual tests as automated tests. The guidelines also explain the necessity of subflows and how to create them. The review section describes the criteria for implementing flows and offers tips to prevent those flows from becoming person-dependent. The Operations section describes how to perform build verification checks (BVT) within the project’s shared environment and provides points to consider when executing BVT in the shared environment and the implementer’s local environment. It also describes the criteria for continuously upgrading Power Automate to ensure long-term stable operation.

In contrast, for projects that adopted Ranorex, the guidelines were developed using Confluence [37], a wiki-based development document management tool. Confluence was chosen because the project uses Jira [38], a ticket management tool, for task and bug management. Furthermore, Bitbucket [39], a source code management tool that can be integrated with Ranorex, was adopted to manage test suites created with Ranorex. The guidelines are documented in the Confluence space and integrated with Jira.

The main differences between the Power Automate and Ranorex guidelines stem from these tools’ inherent design differences, with little variation in the rules governing test operations. Ranorex can convert code created in an integrated development environment into user code and transform it further into components.

The Ranorex guidelines also specify that test suites should be used as test case management tools with clearly defined test modules for each case. Additionally, the guidelines outline the implementation process for user code.

Listing 2. Table of contents of guideline for project using Power Automate.

1. About this guideline
  1.1 Purpose of the Guidelines
  1.2 Subject of the Guideline
2. Test Environment construction
  2.1 Execution environment
  2.2 Power Automate License
  2.3 How to install Power Automate for Desktop
  2.4 How to connect from Cloud Flow to Desktop Flow
3. Test Design Documentation
  3.1 Positioning of Test Design Documentation
  3.2 Template files for test design
  3.3 How to design
4. Backup data creation
5. Test files creation
  5.1 Test Case sheets
  5.2 Test Data sheet
  5.3 Generic Data Conversion Tool
  5.4 Global Variables
6. Flow creation
  6.1 Cloud Flow Configuration
  6.2 Desktop Flow Configuration
  6.3 Creation rules
  6.4 Exception Handling
  6.5 Precautions
  6.6 Confirmation of Operation after Creation
  6.7 Merge into shared environment
7. Review
  7.1 Purpose of review
  7.2 Review target
  7.3 Reviewer
  7.4 Example of review perspectives
  7.5 Checklist for review
  7.6 Record of review
8. Operation
  8.1 Operational flow
  8.2 Frequency and timing of test execution
  8.3 Test execution method
  8.4 Test result management
  8.5 Operation for daily build testing
  8.6 Test Result Reporting
  8.7 Revision up Scenario
  8.8 Conversion Scenario
  8.9 Setup test
  8.10 Past trouble collection
9. Maintenance
  9.1 Product Revision Upgrade (without scenario update)
  9.2 Product Revision Upgrade (with scenario update)
  9.3 PAD Version Upgrade
  9.4 Script management

For this project, we used Jira, a ticket management tool, to oversee task execution and identify software defects. Using Confluence spaces to manage test automation documentation has proven effective in improving efficiency. The main difference between the Power Automate and Ranorex guidelines is the difference in tools. Ranorex can convert code created in the GUI into user code and then into components, thereby increasing reusability through componentization. Therefore, Ranorex guidelines primarily focus on reusability. They explain how to use test suites as test case management tools, how to use test modules associated with each test case, and provide important considerations for implementing user code.

These guidelines clearly define the use of test suites and test modules for managing Ranorex test cases and provide important considerations for implementing user code. When used with coding conventions (Figure 6), they prevent inconsistencies between developers and reduce dependency on specific individuals. Furthermore, they clearly explain the concept and implementation methods of script verification, clarifying test design and implementation rules and enabling the measurement of test coverage.

Ranorex includes a data binding feature that allows tests to be executed repeatedly using different data sets. Utilizing this feature increases test coverage and enables the seamless automation of screen operation tests incorporating diverse data sets. To use the data binding feature without creating dependency on individuals, it is necessary to establish rules. Additionally, to distinguish between software and script issues, it is essential to clearly define test result reporting rules. Depending on the test execution order or environment, “flaky tests” may arise, causing test results to become unstable. Therefore, test reports are important for reducing these issues, and report rules play a crucial role in this process.

Ranorex can be integrated with Jenkins to build a continuous integration (CI) environment. Recovery scripts and error handling for test failures improve test continuity by enabling repeated execution of automated tests and early detection of degradation.

Establishing guidelines eliminates human dependency, ensures data quality, improves test accuracy, and contributes to the development of test automation personnel.

Test automation’s effectiveness in agile development methods can be enhanced by combining it with ideas and guidelines related to quality characteristics in agile process sprints. This combination ensures quality checks in each sprint are performed effectively and that each quality characteristic is met when adding new features.

Standardizing test automation provides the same benefits as the software engineering development process. For example, it improves the accuracy of development man-hour estimates and contributes to quality control. In other words, test automation guidelines should be consistent with development rules and their effects on the software engineering development process from the beginning. It is also desirable to apply general development process rules to test automation.

Similar to organizational project management evaluation processes, such as CMMI, developing a test automation maturity model and establishing assessment methods enables objective measurement of guideline effectiveness. Furthermore, integrating with a QMS based on ISO 9001 clarifies the level of test automation adoption and the challenges to quality improvement within the development organization.

4. Developing TAMM

4.1. Preparation for Maturity Model Building

Given that TAIM does not consider compatibility with ISO 9001, it was determined that an examination of methods that can be linked to quality management without compromising the usefulness of previous studies would be necessary.

In the context of constructing TAMM, the use of CMMI as a guideline and reference framework for the enhancement of test processes is a common practice. Furthermore, the utilization of the international standard ISO/IEC 33000 family was contemplated for the evaluation of TMMi and assessment methodologies. Given the pervasive utilization of the TMMi maturity model, a test technology version of CMMI, by numerous test teams, it was determined that the development of TAMM based on the TMMi model was warranted.

The model that was constructed places significant emphasis on E2E testing, with an underlying focus on continuous improvement. It utilizes compliance levels with established policies and rules as a metric for evaluating maturity, thereby enabling a comprehensive assessment of its strategic positioning.

4.2. Development of TAMM

CMMI and TMMi are both five-level model. However, we excluded organizations that do not implement test automation. We positioned TAMM at four levels. Each level in the test automation process is represented in Figure 7.

Table 4 presents an overview of the status of each level. It also indicates the status that should be aimed for in order to reach the next level.

The model that was developed demonstrates the necessary steps to achieve the subsequent level. The model enables comprehension of the present state of test automation, even in the absence of an assessment. The model was initially presented at a seminar on test automation, where it was met with favorable feedback from numerous attendees.

This model facilitates comprehension of the levels of test automation and elucidates the issues that must be addressed to achieve the subsequent level.

Level 1 is a state in which engineers automate processes without any established rules, relying exclusively on their own methods. In essence, the efficacy of automation is contingent upon the actions of individuals. In general, E2E test automation code and flows created in this state are not only difficult to share with other engineers but also challenging to maintain and operate in the long term. Even unit test code, which is frequently developed by software code implementers, must be designed to avoid becoming dependent on individuals. To advance beyond this stage, it is imperative to elucidate the purpose of test automation, enhance teamwork awareness, and establish and disseminate minimum rules, such as coding conventions and test pass/fail criteria, as guidelines.

Level 2 signifies the ability to design and execute automated tests with a focus on quality enhancement and maintaining a certain level of quality. This is distinct from the mere automation of manual test scenarios. The team begins to recognize the benefits of test automation, and guidelines and rules for test automation are established. In order to progress to the subsequent level, it is imperative to establish a reproducible operational environment and employ build verification testing (BVT) to expeditiously identify degradation. Furthermore, the integration of continuous integration (CI) environments is crucial.

Level 3 is the state where automated tests can be repeatedly executed using external data, i.e., the state where automated testing is utilized as data-driven testing. In order to achieve data-driven testing, it is imperative to design and implement tests with automation in mind. When establishing automation goals, it is imperative to consider the potential benefits of test automation for the entire development organization or project. As an organization reaches this state, it can enjoy more benefits of automation. As the level of automation increases, the organization can leverage the results of automation to improve quality visibility and productivity.

The final level 4 indicates a state in which the organization maintains the level 3 status quo while continuously improving the test automation process by addressing various issues that arise. At this level, test automation is considered a standard component of software development, and the purpose and objectives of test design, implementation, and operation are not dependent on specific individuals. Instead, the implementation of test automation is based on a common understanding within the organization. Automated test results are utilized for the purpose of quality control and are represented visually in real time. Furthermore, various issues that arise in the lifecycle of automated test design, implementation, execution, and operation are addressed within the organization, and activities are carried out to further enhance the effectiveness of automation.

However, the application of this maturity model alone is insufficient to achieve Level 4. Therefore, an assessment checklist was developed, drawing upon previous examples and CMMI and TMMi. This initiative was undertaken in recognition of the imperative to identify issues across the entire test automation lifecycle and to contemplate countermeasures. The objective of conducting evaluations using the checklist is to establish a long-term roadmap for visualizing quality. This will enable the organization to reach the next maturity level and contribute to the reorganization of its development processes and improvements in development productivity.

5. Methodology of Assessment Checklist

5.1. Consideration of Assessment Methods

The checklists employed in the assessment are meticulously designed to align with quality improvement through quality management principles. In addition to CMMI and TMMi, which were referenced during the maturity model development, we have extensively referenced ISO 9004 [40], an international standard created as a self-assessment tool for ISO 9001, which outlines the requirements for quality management. TAMM’s clauses 7 and 8 refer to ISO 9004, allowing them to be linked to sections 9 at ISO 9001, Performance Evaluation. Additionally, all automation-related issues are linked to improvement activities through management reviews.

Initially, the requirements for test automation are categorized into eight distinct categories. The rationale underlying this phenomenon pertains to the efficacy of test automation in quantitatively assessing the quality of software. In the context of quality management that adheres to ISO 9001 standards and fosters continuous improvement initiatives, it is imperative to evaluate the maturity of the test automation lifecycle process and to promote the implementation of improvement processes. By referring to established guidelines and rules, it is possible to determine the level of compliance within projects and organizations, measure the maturity of test automation, and understand the level of capability to meet requirements. A paucity of research has been conducted on the maturity of test automation in the context of QMS, particularly with regard to the promotion of improvement. Given the prevalence of organizations incorporating features and enhancing software quality through derivative development, it is logical to expect a correlation between advancements in test automation and QMS.

Each requirement is comprised of four to five items, with a focus on more than just the presence or absence of guidelines or policies. Additional considerations include the modularization and reusability of test scripts and flows, as well as the consideration of parallel execution during test execution. The requirements delineated in Table 5 were formulated with consideration for the utilization, implementation, and operation of design documents and scripts.

5.2. Development of Assessment Checklists

For each requirement, we prepared assessment questions that could be scored. Referring to Annex A of ISO 9004 and TMMi, we prepared 11 questions that could be used to determine the compliance status of each requirement. By answering the questions, it is possible to assess each requirement (Table 6).

Given the four levels of the maturity model and the process and improvement categories, which are based on ISO 9001, the checklist scores are set from 1 to 5, similar to ISO 9004. However, it should be noted that a single score is allotted for each requirement, with a maximum of five scores indicating the fulfillment of all requirements.

The checklist is to be completed by the assessor; however, an assessor with a high level of experience in automation will need to conduct interviews and make corrections to ensure that there are no problems with the descriptions.

Ultimately, a cumulative score is determined for each category, and the scores from the eight categories are presented in a radar chart to illustrate the variation in scores for each requirement. When determining the score for each item, we made sure to discuss and decide among multiple assessors to avoid inconsistencies.

6. Result

6.1. Assessment Implement Policy

The assessment of the test automation project was conducted in accordance with a predetermined policy. This policy called for the administration of checklist responses within the organization to be assessed. These responses were followed by interviews with members of the target organization. Various actual evidences were also confirmed.

In particular, a firm determination of the sufficiency of each item on the checklist was made through a combination of methods. These methods included the confirmation of test scripts and flows through sampling and the checking of test result reports.

The final assessment implementation policy is illustrated in Figure 8.

6.2. Result of Assessment

We considered simply assigning scores to each question on the checklist, but decided that in order to actually assign scores to each question, it would be necessary to first assign scores to all requirements and then have the assessor review the checklist again. This is because it is important to review the results of related questions depending on the existence of policies and the status of processes. To ensure the consistency and reliability of the assessment, a thorough and meticulous procedure was implemented. First, the item scores were determined by a panel of multiple assessors. This approach was taken to avoid any potential inconsistencies or variations in interpretation.

For example, even if a project or organization has a policy, it may be that only part of the policy is being used or that some people are not following it, so it is necessary to review the overall score.

It is difficult to determine where to place the assessment results on the maturity model, but we believe it is best to make a level judgment based on the presence or absence of policies and audit rules, as well as the implementation status.

Rather than simply presenting the results of the analysis, we believe that adding short- and long-term improvement points makes it easier to target the next level.

Short-term improvement activities focus on identifying issues in the current process and improving them to increase maturity. Long-term improvement items, on the other hand, include issues that are likely to arise after implementing short-term improvements, examples of issues that will become apparent through repeated improvements, and other items necessary to increase test automation maturity over the long term. Finally, we decided to present a roadmap for reaching Level 4, which is an expandable level, through repeated improvements.

The evaluation results report is based not only on a summary of the evaluation results, but also on the analysis results for each question item of the requirements. We have clearly stated the short-term improvement actions for each requirement and identified the issues for each item.

(1): Example of a team using the capture replay tool

The team performing E2E testing using capture replay tool, the OpenText Functional Testing [41], has a low overall score (Figure 9).

While the team has documented automation rules in wiki, these are essentially notes from script writers, and third parties cannot maintain the system based on these records. There is no documentation of the purpose, goals, or value of automation, making it difficult to share the rationale for automation within the team.

Test design for automation has not been performed, and existing manual tests are being automated. Priorities for automation also appear to be set on an ad hoc basis. As a result, even when reviewing test results, it is unclear what level of quality is being ensured by automated tests, making the effectiveness of testing unclear.

Despite progress in automating E2E testing, the effectiveness of automation remains unclear due to the high degree of task specialization. By clarifying the guidelines for tests to be automated, it is believed that the effectiveness of automated testing can be shared across the team.

(2): Examples of test automation with RPA tools

This is the result of the evaluation of E2E testing using Power Automate, which was also included in the guidelines (Figure 10). The guidelines are well established, and the processes of test design, test flow implementation, review, and operation are performed without reliance on specific individuals, resulting in a high overall score. However, the automated tests primarily automate the E2E test scenarios used in manual testing from a business perspective, resulting in only partial benefits from test automation. Although it is used as BVT, its effectiveness as a degradation check is low because the operational phase is significantly delayed in the project timeline. In addition, the guidelines and review processes rely heavily on external consulting teams, making it difficult to establish an internal improvement process.

(3): Example of E2E test automation using generative AI

Finally, we present the evaluation results of a team that uses a proprietary generative AI tool to generate pytest [42] scripts using Playwright for manual scenario testing for E2E testing and to automate the process (Figure 11).

As a generative AI tool, it enables the creation of automated tests immediately after test design is complete, resulting in high automation coverage for E2E testing. It not only supports automated scenario testing for functional combinations, but also for business verification, contributing significantly to development productivity. However, since the tool is used to automate manual test scenarios, despite its capability for data-driven testing, this feature is currently underutilized, which is a key challenge.

It is considered necessary to establish processes, guidelines, and improvement plans for test design rules to address the need for script regeneration due to tool version updates and to further drive automation.

6.3. Statistical Tests

Hypothesis 1: A quantitative approach is recommended to analyze fluctuations in the number of review comments. However, data collected using the same testing tool is unavailable. In the case of RPA tools, review comments are often found in automated test designs because these designs use existing manual scenario tests without modification. This is because existing scenario tests lack design rules and criteria.

Therefore, quality improvement through reviews is necessary.

Currently, multiple evaluators are conducting qualitative analysis to evaluate the results. At the same time, the effectiveness of automated testing in identifying software defects is being evaluated through qualitative analysis.

According to Hypothesis 2, if the scores given by multiple evaluators differ, the quality of the target software is classified based on quality characteristics. Then, discussions are held based on the qualitative analysis results, and the final decision is made by consensus among the three evaluators.

Hypothesis 2 assumes that the Cohen’s kappa (κ) value between Evaluator 3 and the 39 requirements should be 0.75 or higher; the existing approach meets this criterion.

Going forward, we plan to introduce a mechanism to calculate Spearman’s rho (ρ) in Hypothesis 1 and Cohen’s kappa (κ) in Hypothesis 2.

7. Discussion

Based on the findings of the TAMM survey, we recommend that the teams responsible for implementing automation using Power Automate promptly revise the following guidelines.

These guidelines classify each chapter, policy, and stakeholder and categorize them as follows: automation objectives and improvements, preparation, automation design, implementation, design/implementation review, operation, and maintenance.

Implementing automation in accordance with the revised guidelines is expected to increase the target values of TAMM’s “8. Improvement” and “7. Process” requirements from 2.0 to 2.5. Clarifying the processes in the guidelines enables a more thorough evaluation of each task’s compliance with the guidelines and improves automation quality by clarifying the criteria for each process. Improving the test automation process improves test automation activities under QMS management. TAMM increases maturity by integrating test automation issues with QMS, thereby improving overall development quality.

However, revising the guidelines takes two to three months. Implementing and operating test automation in accordance with the revised guidelines and conducting another TAMM evaluation after the product is released takes at least six months to a year.

8. Conclusions

This study has two limitations: sample size (n = 3) and follow-up period (six months). Nevertheless, the applicability of the proposed method has been demonstrated with three types of test tools. It has also been shown that implementing short-term measures based on assessment results can raise the maturity level immediately from 2.75 to 2.85—an increase of 0.1 points.

To increase the effectiveness of test automation, organizations must comprehensively review their policies and development processes. However, the absence of a mechanism to analyze the current state poses a significant challenge, hindering the identification of areas for improvement and the determination of necessary actions. Furthermore, even if such improvements are successful, it is difficult to apply unified policies and enhanced development processes to different organizations within the same company because of differences in maturity levels.

With the advent of agile development methods, scenario testing using use cases has expanded, and evaluation results are used as various criteria. Consequently, the number of organizations adopting end-to-end (E2E) test automation is expected to increase in the future. Therefore, measuring the maturity of E2E test automation will make it easier to identify areas for improvement based on the current situation and consider countermeasures, contributing to the improvement of the development process. To address this issue, we developed TAMM, which is compatible with QMS. We defined maturity levels and requirements and devised an assessment checklist and evaluation method. This method includes sample surveys and interviews.

The effectiveness of this evaluation method is as follows:

First, we conduct empirical evaluations to obtain quantitative measurements targeting organizations that utilize multiple evaluation tools.

Three evaluators determine scores and agree upon them to ensure a κ ≥ 0.75.

When combined with a QMS based on ISO 9001, TAMM can contribute to the maturity of development organizations and improve productivity by addressing automation issues, particularly in processes and improvement measures.

Test automation significantly improves software development productivity and enables quality visualization through automated test results.

From a quality management perspective, conducting test automation maturity assessments to evaluate development organization maturity and identify future challenges is a major benefit. Additionally, the consulting services provided by the author’s organization include identifying support measures for customers based on assessment results and developing countermeasures jointly. Deploying this method to various development organizations and combining it with data accumulation for improvement is expected to achieve quality visualization and productivity improvements through test automation.

To improve organizational maturity, it is essential to incorporate human resource development elements, such as accumulating test automation knowledge and establishing training programs, into the evaluation framework in addition to technical capabilities. Therefore, formulating test automation guidelines that consider the four Ps of ITIL [43] (people, process, products, and partners) is a prerequisite for considering long-term improvement measures. Furthermore, strengthening the evaluation of test automation by utilizing generative AI and promoting the evolution of evaluation methods is essential.

Future improvements to TAMM will collect additional evaluation result data and prepare a score determination process based on quantitative analysis. Additionally, we will develop a mechanism for setting target scores based on the software domain and required quality. We will also consider analyzing evaluation results that take development productivity into account.

Our goal is to promote the adoption of the proposed methodology and disseminate TAMM guidelines and evaluation checklists to support self-assessment and self-improvement processes within a broader range of organizations, alongside their quality management systems (QMS).

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/software4030019/s1, S1: TAMM Assessment Check sheet.

Author Contributions

Conceptualization, methodology, validation, formal analysis, investigation, resources, data curation, writing—review and editing, D.K.; Data curation, writing—review A.M.; supervision, H.I. and Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no competing interests. The authors Daiju Kato and Ayane Mogi are employed by the company Nihon Knowledge Co., Ltd. There is no conflict of interest between any of the authors and the company.

References

Beck, K.; Beedle, M.; van Bennekum, A.; Cockburn, A.; Cunningham, W.; Fowler, M.; Grenning, J.; Highsmith, J.; Hunt, A.; Jeffries, R.; et al. Manifesto for Agile Software Development. 2001. Available online: https://agilemanifesto.org/ (accessed on 31 December 2024).
Allspaw, J.; Hammond, P. 10+ Deploys Per Day: Dev and Ops Cooperation at Flickr. Presented at the O'Reilly Velocity Conference, San Francisco, CA, USA, 23 June 2009. Available online: https://tech-talks.code-maven.com/ten-plus-deploys-per-day.html (accessed on 7 May 2025).
Crowd Strike Holdings. Available online: https://www.crowdstrike.com/ (accessed on 7 May 2025).
ISO/IEC 25000:2014; Systems and Software Engineering—Systems and Software Quality Requirements and Evaluation (SQuaRE)—Guide to SQuaRE. ISO/IEC: Geneva, Switzerland, 2014.
ISO/IEC 25019:2023; Systems and Software Engineering—Systems and Software Quality Requirements and Evaluation (SQuaRE)—Quality-in-Use Model. ISO/IEC: Geneva, Switzerland, 2023.
Mugridge, R.; Utting, M.; Streader, D. Evolving Web-Based Test Automation into Agile Business Specifications. Future Internet 2011, 3, 159–174. [Google Scholar] [CrossRef]
Berłowski, J.; Chruściel, P.; Kasprzyk, M.; Konaniec, I.; Jureczko, M. Highly Automated Agile Testing Process: An Industrial Case Study. E-Inform. Softw. Eng. J. 2016, 10, 69–87. [Google Scholar] [CrossRef]
Kasurinen, J.; Taipale, O.; Smolander, K. Software Test Automation in Practice: Empirical Observations. Adv. Softw. Eng. 2010, 2010, 620836. [Google Scholar] [CrossRef]
Kato, D.; Ishikawa, H. Quality Control Methods Using Quality Characteristics in Development and Operations. Digital 2024, 4, 232–243. [Google Scholar] [CrossRef]
CMMI (Capability Maturity Model Integration). Available online: https://cmmiinstitute.com/ (accessed on 7 May 2025).
Gibson, D.L.; Goldenson, R.D.; Kost, K. Performance Results of CMMI®-Based Process Improvement; Technical Report CMU/SEI-2006-TR-004; Software Engineering Institute, Carnegie Mellon University: Pittsburgh, PA, USA, 2006; Available online: https://insights.sei.cmu.edu/documents/764/2006_005_001_14762.pdf (accessed on 7 May 2025).
TMMi Foundation. TMMi (Test Maturity Model integration). Available online: https://www.tmmi.org (accessed on 31 August 2024).
Garousi, V.; Felderer, M.; Hacaloğlu, T. Software test maturity assessment and test process improvement: A multivocal literature review. Inf. Softw. Technol. 2017, 85, 16–42. [Google Scholar] [CrossRef]
Eldh, S.; Andersson, K.; Ermedahl, A.; Wiklund, K. Towards a Test Automation Improvement Model (TAIM). In Proceedings of the 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Cleveland, OH, USA, 31 March–4 April 2014; pp. 337–342. [Google Scholar] [CrossRef]
ISO 9001:2015; Quality Management Systems—Requirements. ISO: Geneva, Switzerland, 2015.
ISO/IEC 33001:2015; Information Technology—Process Assessment—Concepts and Terminology. ISO/IEC: Geneva, Switzerland, 2015.
Elbaum, S.; Malishevsky, A.G.; Rothermel, G. Test case prioritization: A family of empirical studies. IEEE Trans. Softw. Eng. 2002, 28, 159–182. [Google Scholar] [CrossRef]
Rothermel, G.; Untch, R.H.; Chu, C.; Harrold, M.J. Prioritizing test cases for regression testing. IEEE Trans. Softw. Eng. 2001, 27, 929–948. [Google Scholar] [CrossRef]
Kato, D. Improvement of Test Efficiency Through Automation of Regression Testing. In Proceedings of the Japan Symposium on Software Testing (JaSST) 2005, Osaka, Japan, 1–2 March 2005; Available online: https://www.jasst.jp/archives/jasst05w/pdf/S3-1.pdf (accessed on 7 May 2025). (In Japanese).
Memon, A.M.; Xie, Q. Studying the Fault-Detection Effectiveness of GUI Test Cases for Rapidly Evolving Software. IEEE Trans. Softw. Eng. 2005, 31, 884–896. [Google Scholar] [CrossRef]
Bertolino, A.; Korel, B. Cost Models and Return on Investment for Automated Regression Testing. ACM SIGSOFT Softw. Eng. Notes 1996, 21, 38–45. [Google Scholar] [CrossRef]
Nagappan, N.; Maximilien, E.M.; Bhat, T.; Williams, L. Realizing quality improvement through test driven development: Results and experiences of four industrial teams. Empir. Softw. Eng. 2008, 13, 289–302. [Google Scholar] [CrossRef]
Garousi, V.; Mäntylä, M. When and what to automate in software testing? A multi-vocal literature review. Inf. Softw. Technol. 2016, 76, 92–117. [Google Scholar] [CrossRef]
Rafi, D.M.; Moses, K.; Petersen, K.; Mäntylä, M.V. Benefits and limitations of automated software testing: Systematic literature review and practitioner survey. In Proceedings of the 2012 7th International Workshop on Automation of Software Test (AST), Zurich, Switzerland, 2–3 June 2012; pp. 36–42. [Google Scholar] [CrossRef]
Parry, O.; Kapfhammer, G.; Hilton, M.; Mcminn, P. A Survey of Flaky Tests. ACM Trans. Softw. Eng. Methodol. 2021, 31, 17. [Google Scholar] [CrossRef]
Liu, X.; Yu, P.; Ma, X. An Empirical Study on Automated Test Generation Tools for Java: Effectiveness and Challenges. J. Comput. Sci. Technol. 2024, 39, 715–736. [Google Scholar] [CrossRef]
Jenkins. Available online: https://www.jenkins.io/ (accessed on 8 September 2024).
Pérez-Verdejo, J.M.; Sánchez-García, Á.J.; Ocharán-Hernández, J.O.; Mezura-Montes, E.; Cortés-Verdín, K. Requirements and GitHub Issues: An Automated Approach for Quality Requirements Classification. Program. Comput. Softw. 2021, 47, 704–721. [Google Scholar] [CrossRef]
Mariani, L.; Pastore, F.; Pezzè, M. The Central Role of Test Automation in Software Quality Assurance. Softw. Qual. J. 2017, 25, 797–802. [Google Scholar] [CrossRef]
Kato, D.; Shimizu, A.; Ishikawa, H. Quality classification for testing work in DevOps. In Proceedings of the 14th International Conference on Management of Digital EcoSystems (MEDES '22), Venice, Italy, 19–21 October 2022; Association for Computing Machinery: New York, NY, USA; pp. 1–8. [Google Scholar] [CrossRef]
Buse, R.P.L.; Weimer, W.R. Learning a Metric for Code Readability. IEEE Trans. Softw. Eng. 2010, 36, 546–558. [Google Scholar] [CrossRef]
Pressman, R.S.; Maxim, B.R. Software Engineering: A Practitioner's Approach; McGraw-Hill: Columbus, OH, USA, 2019; ISBN 978-1260423310. [Google Scholar]
ISO/IEC 25010:2023; Systems and Software Engineering—Systems and Software Quality Requirements and Evaluation (SQuaRE)—Product Quality Model. ISO/IEC: Geneva, Switzerland, 2023.
Wang, Y.; Mäntylä, M.; Eldh, S.; Markkula, S.; Wiklund, K.; Kairi, T.; Raulamo-Jurvanen, P.; Haukinen, A. A Self-assessment Instrument for Assessing Test Automation Maturity. In Proceedings of the 23rd International Conference on Evaluation and Assessment in Software Engineering (EASE ’19), Copenhagen, Denmark, 14–17 April 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 145–154. [Google Scholar] [CrossRef]
Ranorex. Available online: https://ranorex.com/ (accessed on 7 May 2025).
Microsoft. Power Automate. Available online: https://www.microsoft.com/ja-jp/power-platform/products/power-automate (accessed on 7 May 2025).
Atlassian. Confluence. Available online: https://www.atlassian.com/software/confluence (accessed on 11 July 2025).
Atlassian. Jira. Available online: https://www.atlassian.com/software/jira (accessed on 11 July 2025).
Bitbucket. Available online: https://bitbucket.org/product/ (accessed on 7 May 2025).
ISO 9004:2018; Quality Management—Quality of an Organization—Guidance to Achieve Sustained Success. ISO: Geneva, Switzerland, 2018.
OpenText Functional Testing. Available online: https://www.opentext.com/products/functional-testing (accessed on 7 May 2025).
Pytest. Available online: https://docs.pytest.org/en/stable/ (accessed on 7 May 2025).
Axelos. ITIL (Information Technology Infrastructure Library). Available online: https://www.axelos.com/ (accessed on 31 December 2024).

Figure 1. Quality model of quality-in-use on ISO/IEC 25019.

Figure 2. TAMM position.

Figure 3. Organizing effects of test automation on product quality model. Source from [ISO/IEC 25010:2023 Figure-2] [33].

Figure 4. Organizing effects of test automation on quality-in-use model. Source from [ISO/IEC 25019:2023 Figure-2].

Figure 5. Relationship between structure of scenario test and quality that can be ensured.

Figure 6. Example of coding rule.

Figure 7. Developed maturity model for software test automation.

Figure 8. Assessment implementation policy.

Figure 9. Team using OpenText Functional Testing.

Figure 10. Team using Power Automate.

Figure 11. Team using generative AI tools.

Table 1. Research questions for TAMM.

RQ1	What maturity dimensions and levels appropriately characterize an organization’s test-automation capability for E2E testing when existing standards (ISO9001, CMMI, TMMi, ISO 33000 [16] family) and industrial practices are synthesized?
RQ2	How can these dimensions be operationalized into an objective, checklist-based assessment instrument that yields reliable and valid maturity scores?
RQ3a	To what extent can short-term remediation of checklist findings improve the assessed maturity score and perceived value by project teams?
RQ3b	Does the improvement have synergy with quality management system based on ISO9001 (QMS)?

Table 2. Project charter items and contents.

Items	Contents
Project objectives and goals	Clarify what you want to accomplish and the goals of the project
Project scope	Define the scope of the project, what is covered and what is not included
Success criteria	Clarify under what circumstances a project is considered successful
Participating teams, stakeholders and users	List the parties involved in the project
Resources and budget	Describe the personnel, equipment, funds, etc. required for the project
Constraints and risks	List factors that may affect the progress of the project
Project start and completion dates	Clarify project timelines

Table 3. Actors of test automation.

Actors	Definition
Test scripts designers	Engineers who develop specifications for automated tests and design test scenarios, test cases, test data and criteria.
Test scripts developers	Engineers who code, debug and componentize automated test scripts.
Test executors	Engineers who conduct automated tests, analyze results
Maintenance developers	Engineers who maintain the environment for continuous execution of automated tests and modify the scripts.

Table 4. State of each level and state to aim for to reach the next level.

Transition Conditions Between Levels	Current Status	State Required for Transition
Level 1 (Ad-hoc level) to Level 2 (Partial Compliance level)	▪ The situation is gradually changing from one where the automation engineer designs and implements the tests alone to one where the development is done by a team. ▪ Because many of the automated tests are scripted from manual tests, the benefits of automation have not been fully realized and there has been little impact on productivity improvement.	▪ Automation is viewed as a team effort, and the goals of automation and the objectives to be achieved are defined incrementally, with process and script management issues identified and improved on a case-by-case basis.
Level 2 (Partial Compliance level) to Level 3 (Basic level)	▪ Test design and execution can now follow development progress and be used to detect degradation and verify build quality. ▪ To achieve high productivity through test automation, it is necessary to establish operational processes and extend test coverage.	▪ A state in which higher productivity can be achieved by studying effective ways of operating test automation within the company and by gradually increasing the visualization of quality.
Level 3 (Basic level) to Level 4 (Sustainable level)	▪ The purpose and operation of automated testing has been clearly defined for each project, and quality visualization has been achieved. Not only is there a clear separation between manual testing and automated testing, but also automated testing is being used in various operations and maintenance. High leverage is achieved.	▪ Test automation has become the norm, and process improvements are being made along with visualization of quality, and methods for utilizing automated testing can be found.

Table 5. Contents of assessment.

#	Category	Requirements
1	Objectives, Policies and Values	▪ Are the objectives of test automation clearly stated? ▪ Are the criteria for achieving the objectives set? ▪ Are the goals for achieving the objectives set? ▪ Are the milestones for achieving the objectives set?
2	Tool Understanding	▪ Understanding the purpose of automation and the tool Understanding the maintenance functions provided by the tool ▪ Understanding the purpose and effectiveness of the UI information management functions provided by the tool ▪ Understanding the purpose, effectiveness, and use of the componentization functions provided by the tool ▪ Understanding and use of the reporting functions provided by the tool
3	Test Design and Test implementation	▪ Test Design Policy ▪ Test Design Review Policy ▪ Test Implementation Policy ▪ Test Implementation Review Policy ▪ Data Driven Review Policy
4	Guidelines and Validation Rules	▪ Is there a guideline for test automation? ▪ Is there a rule for no attribution considered? ▪ Is there a coding rule Is there a validation rule? ▪ Is there a rule for user code description?
5	Componentization and Reusability	▪ Are there rules for componentization? ▪ Are there rules for reuse of scripts? ▪ Are there rules for use of componentized modules? ▪ Are there rules for library of components? ▪ Are there use of test suites/use of libraries in test suites?
6	Operation and Parallel Operation	▪ Are there any rules for the implementation of automated tests? ▪ Are there any rules for the parallel implementation of automated tests? ▪ Are there any rules for the implementation of automated tests? ▪ Are there any rules for the parallel implementation of automated tests?
7	Process	▪ Process of decision ▪ Process of responsibility ▪ Process of authority ▪ Process of management ▪ Process of maintenance
8	Improvement	▪ Performance evaluation ▪ Performance analysis ▪ Performance indicators ▪ Process improvement

Table 6. Part of requirements questions from Supplementary Material S1.

No	Assessment Contents	Score	Results	Referenced documents	Judgement
1-1	Are the objectives of test automation clearly stated?	□	□	□	□
1-1-1	Not explicitly stated and decided by each individual.	1	□	□	□
1-1-2	Objectives are shared within some members of the project	2	□	□	□
1-1-3	Objectives are shared within some members of the project and understood by PM, PL, etc.	2	□	□	□
1-1-4	Objectives are shared among project members	3	□	□	□
1-1-5	Objectives are shared among project members, and it is shared that it is the policy of the PM and PL	3	□	□	□
1-1-6	Objectives are shared among project members and clearly stated in the project plan and other documents.	3	□	□	□
1-1-7	Objectives are shared among project members, clearly stated in the project plan and other documents, and the system in charge is clearly defined.	3	□	□	□
1-1-8	Processes for test automation are defined, and based on these processes, the objectives are shared with project members and clearly stated in the project plan and other documents.	4	□	□	□
1-1-9	Objectives are shared not only with project members but also with the quality control team and other project support teams, and are clearly stated in the project plan and other documents.	4	□	□	□
1-1-10	Objectives are shared with project members as well as project support teams such as the quality control team, clearly stated in project plans, etc., and objectives are subject to third-party review.	5	□	□	□
1-1-11	The process for test automation is defined, and based on the process, the objectives are shared not only with project members but also with the project support team such as the quality control team, and are specified in the project plan, etc. The objectives specified in the project plan, etc. are reviewed by a third party. The objectives specified in the project plan, etc. are reviewed by a third party.	5	□	□	□

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kato, D.; Mogi, A.; Ishikawa, H.; Takama, Y. Research and Development of Test Automation Maturity Model Building and Assessment Methods for E2E Testing. Software 2025, 4, 19. https://doi.org/10.3390/software4030019

AMA Style

Kato D, Mogi A, Ishikawa H, Takama Y. Research and Development of Test Automation Maturity Model Building and Assessment Methods for E2E Testing. Software. 2025; 4(3):19. https://doi.org/10.3390/software4030019

Chicago/Turabian Style

Kato, Daiju, Ayane Mogi, Hiroshi Ishikawa, and Yasufumi Takama. 2025. "Research and Development of Test Automation Maturity Model Building and Assessment Methods for E2E Testing" Software 4, no. 3: 19. https://doi.org/10.3390/software4030019

APA Style

Kato, D., Mogi, A., Ishikawa, H., & Takama, Y. (2025). Research and Development of Test Automation Maturity Model Building and Assessment Methods for E2E Testing. Software, 4(3), 19. https://doi.org/10.3390/software4030019

Article Menu

Research and Development of Test Automation Maturity Model Building and Assessment Methods for E2E Testing

Abstract

1. Introduction

2. Related Work for Test Automation

2.1. Study of Test Automation Effectiveness

2.2. Considering Test Automation from Quality Control Perspective

2.3. Research Methodology for Test Automation

3. Effectiveness of Guidelines for Test Automation

4. Developing TAMM

4.1. Preparation for Maturity Model Building

4.2. Development of TAMM

5. Methodology of Assessment Checklist

5.1. Consideration of Assessment Methods

5.2. Development of Assessment Checklists

6. Result

6.1. Assessment Implement Policy

6.2. Result of Assessment

6.3. Statistical Tests

7. Discussion

8. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI