Exploring the Impact of AI Tools on Cognitive Skills: A Comparative Analysis

Musazade, Nurlan; Mezei, József; Wang, Xiaolu

doi:10.3390/a18100631

Open AccessArticle

Exploring the Impact of AI Tools on Cognitive Skills: A Comparative Analysis

by

Nurlan Musazade

,

József Mezei

^*

and

Xiaolu Wang

Department of Business and Economics, and Law, Faculty of Human and Social Sciences, Åbo Akademi University, 20500 Turku, Finland

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(10), 631; https://doi.org/10.3390/a18100631

Submission received: 23 August 2025 / Revised: 3 October 2025 / Accepted: 4 October 2025 / Published: 7 October 2025

(This article belongs to the Special Issue Evolution of Algorithms in the Era of Generative AI)

Download

Browse Figure

Versions Notes

Abstract

This study evaluates the impact of Generative AI (Artificial Intelligence) algorithms on human decision making in complex problem-solving tasks. Rather than assessing the algorithms in isolation, we focus on how their use shapes three critical cognitive components of decision making: analytical thinking, creative thinking, and systems thinking. In an experimental setting, student participants were randomly assigned to solve management consulting cases either with or without access to an AI algorithm. Their solutions were evaluated using a structured rubric that captures sub-skills across the three cognitive dimensions. The results of this exploratory study reveal that AI-supported participants achieved stronger outcomes in logical reasoning, structuring, and problem definition, but showed weaknesses in novel idea generation, multidisciplinary integration, and critical rejection of unsupported conclusions. These findings highlight how algorithmic support influences the configuration of human cognitive processes in decision making.

Keywords:

Generative AI; cognitive skills; problem solving; analytical thinking; creative thinking; systems thinking

1. Introduction

Recent studies and reports indicate the presence of a skill gap across different professions [1,2], with technological advancements considered one of the main causes of changing demand [3,4]. For example, findings from a study focused on the Fintech sector [5] show that recruiters are expecting an optimal combination of both soft and hard skills from candidates. The same study highlights that problem-solving skills are one of the main core skills in the emerging positions in the Fintech (Financial Technology) field [5]. An overemphasis and excess focus on hard skills may address relevant gaps on one side of this issue, while soft skills continue to be high in demand [4].

The emergence of new technologies, e.g., big data, data analytics and data science, Internet of Things (IoT), etc., may evidently require new or enhanced hard skills to apply and utilize these tools effectively. Meanwhile, the extent to which new technologies change the priority of soft skills in demand may be less evident, which can be considered a secondary effect. In other words, operating and managing a particular new tool is the initial step that enables basic applications, whereas the skills required to operate and manage successfully in more complex situations can be considered secondary and determine the quality of an application’s output.

A review of the literature [1] highlights that teamwork, communication, and domain knowledge are some of the key skills for emerging data-related professions, even though hard skills, such as programming languages and tools, remain essential for functionality. Furthermore, while tools like ChatGPT (Generative Pre-trained Transformer 4) can be used for programming tasks, the generated code must be reviewed [6], which indicates the important role of corresponding soft skills.

In this study, we aim to understand the role of several cognitive skills identified as critical in various sources: creative, analytical, and systems thinking. Based on the 2023 report of the World Economic Forum (WEF) [7], two of these skills, creative and analytical thinking, are ranked as the most important skills overall, based on what companies value in employees. The third skill, systems thinking, is ranked 11th overall, while being the third most important cognitive skill. The survey also shows that problem-solving-related cognitive skills, such as creativity and analytical thinking, are expected to experience the highest increase in demand. In other words, while these skills are already critical, creativity and analytical thinking are expected to strengthen their position even more. Furthermore, systems thinking is also among the top-ranked skills in terms of its expected rise in significance for businesses.

Furthermore, the same report notes that the current results, when compared to similar WEF studies from earlier years (e.g., 2018, 2020), suggest that automation drives the rise of creativity as an essential skill more strongly than it drives the rise of analytical thinking. In other words, although both skills are ranked highly and their significance is expected to rise, creativity, currently listed second, is increasing in importance at a faster rate. These observations indicate the rising importance of soft skills, yet the shift in their relative importance is an important matter that warrants further research.

Therefore, the objective of this research is to study the effect on cognitive skills of one of the most important emerging technologies, Generative AI (Artificial Intelligence). In particular, this study focuses on how the involvement of the current top three cognitive skills in business and consulting problem solving is changing with the incorporation of Generative AI. For this purpose, the following research questions have been formulated:

What cognitive skills are expected to become most critical with an increased reliance on AI in problem solving?
What are the most commonly observed problem-solving-related benefits and challenges of Generative AI?

To answer these questions, we conducted an experiment with the involvement of university master’s and doctoral students, as well as recent graduates. The problem-solving tasks were designed to resemble typical cases frequently encountered in the management consulting profession, a domain that has been a focus of some of the earliest studies assessing the impact of Generative AI [8]. The experiments comprised two tasks, with each participant solving one of the tasks. Participants were randomly assigned to the treatment and control groups. The treatment group, in addition to available online resources and tools, was granted access to using Generative AI while solving the case. The performance, i.e., solutions to the cases with and without Generative AI, was evaluated separately, and a comparative analysis was conducted. The evaluation was based on a carefully designed rubric.

The rest of this paper is structured as follows. Section 2 reviews prior research on Generative AI and cognitive skills, identifying gaps this study addresses. Section 3, the Methodology, describes the experimental design, data collection, and evaluation criteria. The findings from the comparative analysis of AI-assisted and non-AI-assisted participants are presented in Section 4. Section 5, the Discussion, interprets the results in the context of the existing literature. Finally, key insights, contributions, limitations, and future research directions are presented in Section 6.

2. Literature Review

This section explores the existing understanding of soft skills, in the context of Generative AI, focusing on their role in enhancing or complementing problem-solving capabilities. It draws on recent studies to discuss the interplay between AI tools and cognitive skills and identifies areas where AI may augment or challenge traditional approaches to skill development.

2.1. Soft Skills

Although the term “soft skill” has been prevalent from the second half of the 20th century, its meaning has been continuously changing [9]. As no single definition for the term “soft skills’’ exists, the choice of definition can greatly shift the focus of any study. In this research, we rely on the contribution of Marin-Zapata et al. [10], who review the relevant literature on soft skill-related studies. Based on their findings, soft skills can be characterized as a combination of intra- and interpersonal skills, which aligns with the findings of, e.g., Matteson et al. [11]. However, the study also shows that soft skills are approached from uncertain and broad perspectives that encompass not only skills but also related concepts, such as values and attitudes [11]. For example, as highlighted by Robles [12], business executives define a “positive attitude” as one of the top soft skills they expect employees to embody.

Matteson et al. [11] discuss definitions of soft skills, distinguishing them from related terms and concepts. They show that the components that comprise the term may vary. Different from hard skills, soft skills are intangible and domain-independent [11,13], and the usage of soft skills is devoid of particular tools [14]. Furthermore, soft skills are challenging to quantify and assess [11,13]. At the same time, Hendarman [14] states that, based on the literature, classifying a skill as a hard or soft skill is not always clear; for example, conceptual thinking can be measured by the Intelligence Quotient (IQ). Therefore, in this research, we do not attempt to focus on all possible soft skills but narrow the focus to a specific subset as motivated in the introduction: cognitive skills.

An important source to identify the most important cognitive skills is the Global Skills Taxonomy interactive tool (https://www1.reskillingrevolution2030.org/skills-taxonomy/index.html, accessed on 26 January 2025) of the WEF. The objective of the Global Skills Taxonomy is to serve as a universal and integrative framework that can also align with other taxonomies [15]. Moreover, the interactive tool presents the definitions of various skills and related categories. The focus of this study falls within the category defined as Cognitive (Analytical) skills. The framework also includes other categories such as physical, technological, management and customer-related, and engagement. The Cognitive (Analytical) skills category encompasses “...learning, thinking, reasoning, remembering, problem solving, decision making, and attention”. Within this category, our focus is on “creativity and problem solving”, defined as a “qualitative reasoning and ideation”, and including “systems thinking”, “analytical thinking”, and “creative thinking” skills.

Focusing on cognitive skills is particularly essential in the context of integrating AI into management consulting. Tools like ChatGPT (https://chat.openai.com/), built based on the Large Language Model (LLM) Generative Pre-trained Transformer (GPT), have the potential to enhance consultants’ efficiency and output quality by automating data analysis and generating insights, allowing them to focus on strategic tasks [16]. At the same time, productivity gains from AI are not guaranteed and require substantial efforts both at the organizational level (e.g., optimal resource allocation) and individual level (e.g., appropriate skills).

2.2. Analytical Thinking

According to the Global Skills Taxonomy, analytical thinking is defined as the “capacity to break down concepts and complex ideas into basic or fundamental principles. This includes critical thinking, whereby judgments are made by analyzing and interpreting facts and information. In the academic literature, however, there is no consensus on a unique definition for the ‘critical thinking’ term.

In a recent study, Altun and Yildirim [17] define teachers’ perspective and definitions on critical thinking based on a review of the relevant literature. Their findings show that numerous skills are considered critical parts of, or at least associated with, critical and analytical thinking: argument evaluation, (critical) questioning (and inquiry), investigating reliable sources, problem solving, decision making, information evaluation, defining terms, acquisition and implementation of knowledge, understanding causation, criticism, analyzing, interpreting and explaining, inference making, and recognizing the complexities of a different perspective. This list also corresponds to most of the concepts associated with critical thinking in various other definitions.

Based on a review of relevant studies [18], analytical thinking skills can be considered a crucial indicator of critical thinking. Furthermore, the study by Mayarni and Nopiyanti [19] shows that critical thinking skills have a high correlation with and a potential to predict analytical skills. Critical thinking is defined as having five indicators, including easy explanation, inference, strategy, elaborating, and building basic skills. Analytical thinking skills, by contrast, consist of skills to distinguish, organize, and connect. However, both thinking skills are intended to assist in problem solving and involve logical thinking [19].

Similarly, Adams [20] defines critical thinking skills as an important contributor to competence in problem solving. Adams [20] identifies several abilities that critical thinking encompasses, including strategy formulation, collecting and usage of applicable data and information, argument assessment, identifying logical interrelations, generalizations and inference verification, and judgment-based decision making.

In line with the WEF taxonomy, Johnson [21] identifies analytical, creative, and critical thinking as enablers of problem-solving capabilities. However, Johnson [21] distinguished between critical and analytical thinking as separate skills, while also noting that both terms, as well as problem solving, are often used interchangeably. Both are categorized as “Higher-Order Thinking Skills” that demand “deep” thinking. Starting from the terms’ etymology review, the author defines “to analyze” as disassembling something into its key components. Furthermore, Johnson [21] refers to people with high levels of analytical thinking skills as inspectors, identifiers, and categorizers, who are capable of using various strategies and methods to comprehend interrelationships, operating principles, and linkages between parts, as well as comparisons and dissimilarities, and to deconstruct and reassemble elements and concepts. However, from the critical thinking perspective, “to criticize” refers to the evaluation of value, worth, quality, and deficiencies. Critical thinking can thus be defined as involving skepticism and posing challenging questions, and it requires an assessment and fundamental knowledge about the field that can be acquired via analytical thinking [21].

In their study of critical thinking in students’ essays, Marni et al. [22] identify four components of critical thinking: inference, interpretation, analysis, and evaluation. Furthermore, the authors note that analytical skills reflect learners’ critical thinking skills, and the analysis component has been found to be the central part in developing argumentative writing. The analysis of the differing patterns in the writing and related literature also highlights several subcomponents, including trying, investigating, solving problems, and making decisions (inference subcomponents), analyzing the mistakes, considering alternative approaches, forming a hypothesis and assumptions, developing possible results, and rejecting unsupported conclusions. These are all part of the critical thinking components.

2.3. Creative Thinking

According to the Global Skills Taxonomy, creative thinking is defined as the “capacity to bring a new idea or concept into existence through imagination and to imagine something that does not exist”. Many existing definitions, however, do not necessarily imply formulating concepts, ideas, or products from nothing. Instead, creativity is often described as a synthesis of the different components, parts, or ideas that are present by that time. For example, Fabian [23], p. 17 proposes the following definition: “reasoning that uses imagination to substitute, expand, modify, or transform the symbols, images, ideas, patterns, conditions, or elements in the world around us.”

For creative thinking, analytical thinking is also an important enabling skill [24]. Similarly, reflecting the established connection between critical and analytical thinking, critical and creative thinking skills can be considered complementary, with creativity enabling the development of alternative solutions and explanations for problems [25].

Based on the literature, creativity involves two main features: novelty and appropriateness, both of which must be present [26]. The degree of novelty and the range of appropriateness may vary. There are also additional features, such as significance, quality, and production history that affect how creative a particular product or idea has been judged as being, as well as the extent to which the main features are valued. However, “there is no absolute standard for creativity”; it is comparative and can differ depending on the audience as well as evolve over time [26], p. 292.

Moreover, different components of creativity are widely considered in the literature, such as intellectual skills. The starting point for creativity lies in the ability to identify problems. This involves questioning existing solutions and approaching challenges with the focus on optimizing and improving existing solutions. Furthermore, the identified problem and its goals need to be defined, specified, and represented (e.g., through “visual thinking”) in a clear and accurate manner. Selecting a strategy, including the choice between divergent and convergent thinking, is the next ability that may impact creativity. The literature shows that using both types of thinking and shifting between them is an effective strategy. Finally, being able to evaluate and select the most suitable idea is another essential “high-level skill”. There are also several basic information-related abilities that contribute to creativity, e.g., recognizing new information, analogies, information consolidation, and comparison. Other enabling factors for creativity include knowledge (e.g., facts, heuristics), thinking style (e.g., ability to initially see the big picture), personality (e.g., openness, willingness), motivation (e.g., intrinsic and task-focused, extrinsic and goal-focused), and environment (e.g., in the workplace, social). These components can be interrelated and must reach a minimum required level to support creativity [26].

2.4. Systems Thinking

According to the Global Skills Taxonomy, systems thinking is defined as the “capacity to understand how concepts work together, embrace a multidisciplinary approach, and identify patterns over time.”

The skill set typically associated with systems thinking, as well as definitions of the systems thinking concept, varies in the literature [27]. The comprehensive study of systems thinking by Arnold and Wade [28] identifies a group of facilitating skills and their measurement. Based on this study, there are two main high-level aspects of systems thinking: obtaining insights and using insights. Obtaining insights primarily involves “mindset skills” (e.g., studying a range of perspectives, holistically and in parts, and mental modeling), whereas using insights encompasses skills related to the “content” (e.g., recognizing issues as systemic, preserving systems boundaries, recognizing distinctions among components, and quantifying), “structure” (e.g., identifying interconnections and their characterization, and “feedback loops” and their characterization), and “behavior” (e.g., past behavior, future behavior forecast, timely changes, and response, such as “re-evaluate one’s strategy”).

Zanella [29] conducted a study with the aim of facilitating access to a validated tool for evaluating students’ systems thinking. In that study, the author also referred to four main categories: systems definition, interaction, flows, and balance. The skills within these categories show similarities with the study of Arnold and Wade [28], such as defining and identifying systems and boundaries, components and their impact, and feedback loops.

2.5. GPT, Cognitive Skills, and Problem Solving

Academic studies are increasingly focusing on applications of Generative AIs beyond their technical capabilities (e.g., coding). For example, Rick et al. [30] explore an LLM’s capacity of assisting with creative problem solving. The researchers developed GPT-3.5-based systems that incorporate an interface, fine-tuned models, and the techniques designed to elicit ideas. In the research, participants were asked to present a problem description and then consider and react to results. The results show that using the system while solving problems can support creativity by facilitating new perspectives.

Goldstein et al. [31] compared the performance of four models, including GPT-4, with humans in problem-solving tasks. The questions in the study were designed as riddles, in which multiple steps are extraneous. The riddles were associated with creative thinking and intuition rather than straightforward thinking. The results show that GPT-4 and GPT-3.5-Turbo outperformed humans in generating responses. However, the research also involved verification of the responses among given choices, in which humans were more accurate. It is also noteworthy that Davinci-3 and -2 models have performed better in generating answers with prompting compared to other models.

Zollman et al. [32] compared GPT-generated results with students’ responses, in physics-related tasks. The study focused on identifying sense-making processes (including mechanistic reasoning) and representations used in the solutions. From a quantitative perspective, the use of diagrams was more prevalent in students’ responses, whereas other elements of sensemaking were more consistently present in GPT’s responses. However, all GPT responses lacked the critical “noticing gap” component, which is essential in problem solving. Moreover, the accuracy of the responses generated by the language model was deficient, despite using experts’ vocabulary in the generated responses.

The research conducted by Qawqzeh [33] studies several cognitive skills, including those considered in the current research: problem solving, critical thinking, and creativity. However, Qawqzeh [33] specifically focuses on the role of GPT in education and academic assignments. Moreover, the research uses surveys of different student groups and researchers, as well as parents and teachers. The results demonstrate that perceptions regarding the effect of the LLM on skills varies. The participants who have used GPT previously reported positive effects on their problem solving, creative, and critical thinking skills. Furthermore, the study found a correlation between these skills and respondents’ positive perceptions.

Zhai et al. [34] compared the performance of GPT on cognitively demanding scientific assignments with that of school students. The research determined the performance of GPT through rubric scoring, within the students’ average score levels. Furthermore, the research also analyzed the effect of cognitive load on the performance of both LLMs and students. Overall, the study indicates that the LLMs outperformed students on scores. Moreover, unlike students, the performance of the AI models was largely unaffected by increasing cognitive demand.

Similarly, Dhingra et al. [35] evaluated the performance of GPT-4 from a cognitive psychology perspective. The study involved several datasets covering mathematics, common-sense reasoning, linguistic comprehension, and heuristics hypothesis testing. The results indicate that in all of the datasets, except geometry (35%), an accuracy above 80% was reached by the GPT model. Furthermore, the researchers show that the accuracy of GPT-4 significantly outperforms earlier models from which the datasets were originally obtained.

Urban et al. [36] conducted an experiment similar in design to this study but with a different focus. The experiment involved two groups of students, with one group using GPT and the other not. The study focuses on the uniqueness, level of detail, and quality of the answers, rather than specific skills. The findings indicate that students’ performance, as well as self-efficacy, improved with the usage of GPT, whereas a more intensive application of metacognitive skills may be needed.

Bellettini et al. [37] tested the capacity of the GPT-3 model in solving problems designed for school students participating in an international competition. The results underline the need for critical analysis and adjustments to the output of GPT. While the model was successful linguistically, similar to the findings of Dhingra et al. [35], the number of correct responses was insufficient. However, in tasks requiring single-step logical deductive reasoning or applying an outlined process to a specific part of the problem, GPT-3 produced comparatively better results than in tasks demanding sophisticated, coherent logic.

Orrù et al. [38] examined GPT’s problem-solving ability in comparison with human performance, in particular problem solving making use of insights. The results indicate that the average human output can be achieved with an LLM, which highlights the potential of GPT in problem solving. The findings of Avci [39] show that a creative mindset positively influences the acceptance of Generative AI.

In summary, recent studies have increasingly explored the applications of Generative AI beyond technical tasks, focusing on skills such as problem solving, creativity, and cognitive skills. While some studies highlight the models’ strength in generating creative solutions and, in some cases, outperforming humans, other studies reveal GPT’s limitations in identifying gaps in solutions and maintaining accuracy in complex logical reasoning tasks. Overall, GPT models have a great potential to enhance problem-solving performance. While these prior studies have explored specific dimensions of cognitive performance, there remains a gap in comprehensively assessing how Generative AI impacts cognitive skills within realistic professional contexts, such as management consulting. To address this gap, the present study aims to investigate the interplay of creativity, analytical thinking, and systems thinking in problem-solving tasks supported by Generative AI, offering insights into its role in developing critical cognitive competencies.

3. Methodology

This section outlines the research design and approach used to investigate the impact of Generative AI on cognitive skills. We will discuss details of the experimental setup, evaluation methods, participant selection, and selected tasks.

3.1. Skill Selection and Rubric Formulation

As discussed in the literature review, and based on the World Economic Forum’s taxonomy, we identified analytical thinking, creativity, and systems thinking as the three most critical cognitive skills for this study. These skills have been examined in several previous studies, although often measured through different components. To ensure an appropriate and consistent evaluation, we defined the key components of each skill, identified suitable questions to capture these components, and established robust evaluation criteria. This process is illustrated in Figure 1.

First, an assessment rubric was developed to evaluate the targeted cognitive skills. As shown in Table 1, sub-skills were identified for each of the three cognitive skills, corresponding questions were formulated to capture these sub-skills, and three levels of evaluation criteria were established. For example, within the Analytical Thinking domain, the Logical Thinking sub-skill was assessed across three levels based on the formulated question, How well could the participant identify the relationship between provided information, perform data processing and analytical computations, and draw a conclusion? At Level 1, a participant can identify few or no relationships between numbers, information, and solutions. At Level 2, they can recognize some relationships, though these may be incomplete or only partially accurate. At Level 3, participants successfully identify all or most of the expected relationships between numbers, information, solutions, and conclusions. The sub-skills were derived from the definitions and descriptions of the cognitive skills presented in the literature review. Each cognitive skill includes sub-skills as follows:

Analytical thinking (11 sub-skills): breaking down concepts and ideas; information evaluation; argument evaluation; questioning; decision making; inductive and deductive inference; interpretation; logical thinking; identifying alternatives; formulating hypotheses; rejecting unsupported conclusions.
Creative thinking (8 sub-skills): bringing up new ideas; novelty; appropriateness; goal and problem definition and representation; divergent thinking strategy; finding analogies; combining information; seeing the big picture.
Systems thinking (10 sub-skills): relationships between concepts; multidisciplinary perspective; identifying patterns over time; applying multiple perspectives; holistic approach; mental modeling; recognizing system boundaries; identifying and quantifying system components; recognizing and characterizing interconnectedness; predicting future behavior.

The complete rubric used in this study is provided in Appendix A. Each sub-skill was evaluated according to three levels specifically defined for that sub-skill and, when necessary, tailored to the task at hand. For instance, when evaluating a participant’s ability to break down concepts and ideas, one of the following three levels was assigned: (i) the participant did not break down the problem and considered a single perspective, (ii) the participant listed some perspectives, (iii) a considerable number of sub-components of the task were considered. In this study, the three co-authors independently scored the performance of each participant with respect to each sub-skill. The final score was determined by a majority vote. In cases when there was a difference between evaluations, the scores were discussed until a consensus was reached.

Furthermore, during the evaluation, when at least one of the evaluators reasoned and concluded the infeasibility of the sub-skill’s evaluation in this experiment’s context, the sub-skill’s evaluation was removed from this study to minimize the bias in the skill-level evaluation. In particular, “A9. Field knowledge”, “C11. Knowledge”, “C4. Identifying problem (via questioning or optimizing)”, “C7. Evaluate and electing most suitable idea”, “C10. Recognize new information”, “C13. Motivation”, “S11. Feedback loops and their characterization”, “S12. Past Behavior”, and “S14. Changes over time” skills were removed from the analysis. As our sub-skill formulation process involved extensive inclusion of the skill elements from multiple definitions and sources, we believe such sub-skill removals ensured minimal skill-level biases.

Finally, empty cells for the sub-skills represent the cases when none of the evaluators agreed on a common score. Each assigned three different grades, which were supported by rationale to justify the assessment. Given the small number of similar cases, the sub-skills were retained in the overall evaluation for the other participants, whereas in the skill-level evaluation for the participant or in the sub-skill-level averaging for experimental groups, the empty values (when exist) were disregarded when averaging the performance to represent the unbiased performance per participant or per sub-skill.

3.2. Experimental Design and Tasks

An experimental approach was selected as the most appropriate method for this research, as it allows for controlled observation of the relationship between Generative AI use and the application of cognitive skills in problem-solving tasks. Experiments are particularly effective in isolating variables and testing specific interventions, such as the inclusion of AI tools [38]. This method enables a systematic comparison of AI-assisted and non-AI-assisted groups, thereby enhancing the reliability and validity of the findings [39]. Furthermore, experiments are well-suited for studying behavioral patterns and skill application in realistic yet replicable scenarios, such as management consulting cases, thereby providing insights into the effects of Generative AI on cognitive skills.

To observe behavioral patterns in the use of AI, we designed an experiment involving typical management consulting tasks. These tasks required financial analysis combined with strategic considerations and data requirement assessments. Participants were randomly divided into two groups: one group was allowed to use the ChatGPT tool (with GPT-4 model) during the tasks, while the other group completed the tasks without AI support. Each participant worked on a business consulting case adapted from Cosentino [40] and modified to address the specific research objective of this study. The tasks closely resembled cases commonly used in the recruitment processes of management consulting firms, which aligns with the primary research objective: identifying the skills required for real-world problem solving that should be emphasized in university curricula.

To minimize task-specific bias, participants were randomly assigned on of two different, complex tasks that required multi-perspective reasoning, data interpretation, and high-level cognitive capabilities. Each participant was given up to 1.5 h to complete the task on a computer. Screen recordings were collected for further behavioral analysis while ensuring anonymity and privacy. In addition to the experimental tasks, participants were asked to complete pre- and post-experimental surveys with questions focusing on their background, expectation/evaluation regarding their performance, and perceptions of ChatGPT.

3.3. Participants

The experiment was conducted with master’s and doctoral students, as well as recent graduates from business administration (BA) and information technology (IT) disciplines at a university in Finland, with 16 participants recruited in total. The number of participants reflects both the availability of suitable volunteers and the intensive nature of the experimental design. Although the sample size is modest, it is consistent with prior exploratory studies employing controlled experiments with cognitively demanding tasks. As shown in a recent literature review [41,42], the number of participants in similar studies can range from five to several hundreds of participants. Moreover, the primary objective of this research was not to provide statistically generalizable conclusions but to generate in-depth insights into how Generative AI may influence the application of key cognitive skills in problem solving.

Considering the typical applicant profile and required knowledge in the management consulting profession, master’s and doctoral students and recent graduates were considered appropriate representatives in this study. The study participants’ fields of study or specialization were not restricted to a specific discipline, as the research focuses on general cognitive skills rather than domain-specific expertise. However, to minimize potential variability and ensure relevance to tasks resembling management consulting, where cognitive skills like problem solving, analytical thinking, and creativity are critical, participants were primarily recruited from business, information systems, and technology-related fields.

The average age of the respondents was 31.6 years, with an equal number (8) of female and male participants. The highest completed level of education of the respondents is a bachelor’s degree (44%), master’s degree (50%), or a doctorate (6%). While all the participants had some prior experience with ChatGPT, the usage varied. Half of the participants reported a limited use focusing on tasks such as improving the quality of texts and text summarization, i.e., tasks typically associated with traditional Natural Language Processing (NLP) applications. On the other hand, several participants reported more advanced use cases: utilizing ChatGPT in work for marketing tasks, designing and creating reports, idea generation, or generating Python code. However, none of the participants reported previous use related to data analysis and problem solving in a context comparable to the experiment tasks.

Regarding the expectations of participants before the experiment, they expected ChatGPT to be very easy to use (average 6), to be somewhat reliable (average 4.8), and to somewhat enhance the quality of task completion (average 4.46). The participants also indicated that they were confident in their cognitive skills (average on a 0–10 scale): 7.56 for analytical thinking, 7.25 for creative thinking, and 6.93 for systems thinking.

3.4. Analysis Methods

The analysis employed a mixed-methods approach. Initially, a quantitative analysis based on descriptive statistics was conducted, with a particular emphasis on group-level differences. Given the limited number of participants, this analysis provides an exploratory overview rather than definitive conclusions. A more detailed examination followed, reviewing individual scores for each sub-skill to compare and summarize the performance of participants, with special attention to those who benefited most from AI usage. To further understand participants’ problem-solving processes and their interaction with the ChatGPT tool, screen recordings were analyzed to trace their approaches, tool usage, and decision-making patterns. The results were summarized to highlight the most evident trends in the sub-skills under study. Additionally, participants’ problem-solving processes were observed, and their notes prepared while solving the tasks were reviewed to identify any common challenges encountered. Finally, post-experiment survey results were incorporated to capture participants’ experiences with AI and their problem-solving approaches, as well as the perspectives of those who completed tasks without access to AI tools.

4. Results

This section presents the findings of this study, focusing on the comparative analysis of participants’ performance in AI-assisted and non-AI-assisted problem-solving tasks. The results highlight key trends and differences across the three cognitive skills—analytical thinking, creativity, and systems thinking—providing insights into how Generative AI influences cognitive processes in problem-solving contexts. Particular attention is given to participants who were assisted by ChatGPT, with an emphasis on identifying behavioural patterns and their relationship to the targeted cognitive skills.

4.1. Comparing GPT Users vs. Non-Users

This section provides an overview of the differences between GPT users and GPT non-users based on general descriptive statistics, focusing on group averages and observed variations. One participant’s data was excluded from the calculations to avoid skewing the results, as this individual belonged to the GPT user group but opted not to utilize the tool and received the lowest possible scores across all criteria. It is important to note that, given the small sample size and varying levels of GPT usage, this initial analysis is exploratory and aims to identify noticeable trends that will be further examined in the subsequent, more detailed analysis. Furthermore, in addition to the general skill-level analysis, usage of the two different, but skill-requirement-wise similar tasks enabled us to differentiate between performance variations arising from ChatGPT usage and task-sensitive variations.

A comparative analysis of GPT user and GPT non-user performance across tasks reveals notable differences in certain sub-skills of analytical thinking. For example, in Task 1, GPT users scored lower on Argument Evaluation (−25%) and Questioning (−20%) compared to GPT non-users. In Task 2, the scores for Questioning remained unchanged, but Argument Evaluation was 25% higher for GPT users. Similarly, GPT users demonstrated a higher performance in Logical Thinking (+29%) and Decision Making (+13%) in Task 1, whereas these scores showed no difference between the groups in Task 2. Additionally, significant differences were observed in Task 1 for an Alternative Approach (+25%) and Hypothesis Formulation (+13%) for GPT users; however, these skills were the opposite in Task 2, with −21% and −50% decreases in an Alternative Approach and Hypothesis Formulation, respectively. Despite these variations, the general average difference in creativity scores across the two tasks remained relatively consistent between the groups, with only minor deviations (−3% in Task 1 and +3% in Task 2).

The analysis reveals two major differences between GPT users and GPT non-users in creative thinking sub-skills. First, in Task 2, GPT users scored approximately 60% higher in Problem and Goals Definition and Representation compared to non-users, whereas no significant difference was observed in Task 1. Second, the See the Big Picture sub-skill showed a 38% higher score among GPT users in Task 1, but a 13% lower score in Task 2. Smaller but consistent differences were observed in Novelty of Ideas and Appropriateness, with GPT users scoring 13% higher in novelty and 13% lower in Appropriateness compared to non-users. Overall, GPT users exhibited a slight advantage in creativity-related sub-skills, with an average difference of approximately 7% across both tasks.

Several sub-skills within the Systems Thinking domain demonstrated higher scores among GPT users in Task 1. Notably, GPT users outperformed non-users in Holistically and Parts (+50%), How Concepts Work Together (+25%), and Identify Patterns Over Time (+21%). However, in Task 2, these same sub-skills exhibited lower or no significant differences, with Holistically and Parts and How Concepts Work Together both scoring 13% lower for GPT users and Identify Patterns Over Time showing no difference. Conversely, Systems Boundaries Definition was 38% higher for GPT users in Task 2, while in Task 1, it was 13% lower. Mental Modeling, primarily assessed through the format and organization of responses, was 20% lower for GPT users in Task 2 but 13% higher in Task 1. The overall Systems Thinking score was comparable between groups in Task 2, whereas in Task 1, GPT users demonstrated an 11% advantage.

Overall, GPT users exhibited slightly higher performance across Creativity, Analytical Thinking, and Systems Thinking in Task 1, while in Task 2, GPT non-users showed a marginal (3%) advantage in the Creativity category. While notable differences between groups were observed in several sub-skills, these patterns were not consistently observed across both tasks. Given the complexity and subjective nature of evaluating soft skills, the significant variations identified in specific sub-skills suggest the need for further examination.

4.2. Basic Configurational Analysis

In order to provide a more holistic perspective on the performance comparison and to analyze the interrelatedness of the three main skills, as the first step, we calculated overall skills scores for each participant. For each skill, the average of the corresponding sub-skill scores was computed, resulting in values ranging from 1 to 3. Then, we classified the performance of a participant on a skill as acceptable/positive if this average score was at least 2, and classified it as not acceptable/negative otherwise. In other words, we created a binary version of the overall performance. The performance of each participant could then be represented by this triplet of binary values. As we had three skills and a binary evaluation for each, in theory, we could observe 2^3 = 8 possible combinations/configurations of skill scores. Table 2 presents the frequency of observed configurations. In Table 2, 1 indicates a positive skill performance, while 0 indicates negative. For instance, the most frequent configurations in Table 2 consist of participants who scored 0-0-0, meaning a negative performance in Analytical Thinking, Creative Thinking, and Systems Thinking simultaneously. This pattern reflects cases where insufficient problem comprehension or minimal AI engagement translated into consistently weak outcomes across all domains. There were five participants with this score: two GPT users and three GPT non-users. Several notable observations emerge from these results. First, the most frequent configuration consisted of participants who scored negatively on all three skills. As we will show later, insufficient problem comprehension or minimal AI engagement often translates into a lower performance across all domains, consistent with this result.

Second, a prominent group includes those who achieved a positive rating in Analytical Thinking but remained negative on Creative and Systems skills. This aligns with the observation that certain participants effectively leveraged GPT for core analysis but struggled to extend that success to more open-ended or integrative tasks.

Third, several participants demonstrated proficiency in two of the three domains. For example, positive results on Analytical and Systems thinking reflect the literature’s emphasis on how structured thinking can complement understanding interconnections.

Finally, while the sample is not sufficient for definite conclusions, the results suggest that GPT users were somewhat more evenly distributed across both high and low performance categories than non-users. At the same time, GPT users also appear in the group that did poorly in all three skills, indicating that mere access to AI does not guarantee better outcomes; collaborative and critical engagement remains essential.

4.3. Behavioral Types and Diverse Sub-Skill Scores

As the next step of the analysis, we focused on usage behavioral types, identified from the screen recordings of participants’ interactions with ChatGPT and their relationships with sub-skill performance. This is relevant for the participants who were allowed to utilize ChatGPT in solving the assigned task. The behaviors are mapped to a five-level scale based on the level of collaboration with (i.e., “C-level”) and exploitation (i.e., copy–pasting, “E-level”) of the language model, as presented in Table 3. Each participant’s, denoted as “P_n” in the table, average performance (i.e., “Mean Score”) is calculated as all three skills’ average, where average scores of “1” and “3” correspond to 0% and 100%, respectively. These scores were obtained through consensual evaluation of the three co-authors. As the table illustrates, similar usage behaviors were observed in both tasks. Importantly, the most collaborative behavior consistently resulted in at least average and, in some cases, the highest scores in the second task. For analysis purposes, we generally exclude participants whose GPT usage does not directly or indirectly relate to the studied sub-skills. For instance, if somebody uses ChatGPT only once during the whole experiment, to obtain the definition of a concept, this interaction is not considered when analyzing the logical thinking sub-skill. Instead, the focus was placed on the collaborator and copy–paste behavior groups.

The complete scores for each evaluated sub-skill are presented in Appendix B, and corresponding usage behavior is described in Table 3 for each participant. While some of the sub-skills show no clear patterns, others reveal noticeable differences linked to usage behavior. For instance, in the Analytical Thinking sub-skill Logical Thinking (A8), six participants identified some of the relationships, while one collaborative participant identified most of the expected relationships between numbers, information, solutions, and conclusions. Similarly, for an Alternative Approach (A10) in both tasks, the highest performance was attained by the participants with the most cooperative behavior.

However, the observations vary across sub-skills, depending on the task and participant behavior. For instance, in Rejecting Unsupported Conclusion (A12), the only below-average score belongs to the participant in task 1 who relied solely on copy–pasting from ChatGPT. In Interpret and Explain (A7), all participants achieved average scores, except a copy–paster in task 2 and the most collaborative participant in task 1, both of whom received the highest scores. Similarly, in Decision Making (A5), in task 1, the lowest score belongs to the participant who copied the answer from ChatGPT, and the highest to the participants who collaborated most with the AI tool. Similarly, in Information Evaluation and Judgment (A2), all participants received average scores in task 2, whereas in task 1, the lowest score was assigned to a copy–paste participant, and the highest score was shared between a collaborative participant and another who used AI only once, for a term definition.

For the Creative Thinking sub-skills, several notable findings can be observed. In Novelty (C2), the lowest score in task 1 was observed in a GPT-generated response, while the highest score was achieved by the participant who collaborated most with the AI tool. Similarly, in Appropriateness (C3), the lowest score in task 1 was associated with a GPT-generated answer, while the highest score was obtained by the most collaborative participant. Interestingly, in Problem and Goals Definition and Representation (C5), the highest scores in both tasks belong to the participants who copy–pasted the results. In See First the Big Picture (C12), the participant who copy–pasted in task 2 received the highest grade. In Divergent and Convergent Thinking (C6), all participants scored the minimum in task 2, whereas in task 1, the highest score belonged to the collaborative participant and the lowest to the participant who copy–pasted. Furthermore, in Analogies (C8), all participants received the minimum score. In Combining Information (C9), all participants received an average score, except the most collaborative participants in task 1 and one of the average participants in task 2.

Finally, for the Systems Thinking sub-skills, several notable differences were observed. In How Concepts Work Together (S1), most participants scored at an average level except for the most collaborative participant in task 1 and the participant with GPT-generated response in task 2. In Identify Patterns Over Time (S3), all of the participants received the minimum score, except the most collaborative participant in task 1, who received the highest score. Similarly, in Holistically and Parts (S5), all participants scored on an average level, except two participants, who submitted GPT-generated responses and received the highest scores. In Systems Boundaries (S8), a participant who used a GPT-generated response had the highest score in task 2, while in task 1, the participant with the GPT-generated response received the lowest score. For Multidisciplinary Approach (S2) and Multiple Perspectives (S4), there is no clear trend; one of the copy–pasters had the highest score while another had the lowest. In Consider Issue as Systematic (S7), all participants received average scores, except those who used GPT-generated responses. Lastly, in Components of the System, Distinction and Quantity (S9), participants who used GPT-generated responses had the lowest and average scores, whereas those who collaborated the most scored at the average and highest levels.

Overall, the findings indicate that higher levels of collaboration with GPT often correlate with stronger performance in sub-skills such as Logical Thinking, Alternative Approaches, and certain Creative Thinking dimensions. By contrast, participants who relied solely on copy–pasting GPT-generated responses tended to exhibit lower or at best average performance on several sub-skills, though notable exceptions occurred in selected creative or systems thinking tasks. These results indicate that the impact of AI use is not uniform across analytical, creative, and systems thinking sub-skills. While collaborative use frequently yields higher scores, certain tasks or sub-skills showed no significant benefit from increased AI involvement. These findings suggest that fostering a more dialogic use of AI tools may enhance both learning outcomes and skill development.

5. Discussion

To discuss the results presented in the previous section, in the following, we look into the performance differences observed in Analytical, Creative, and Systems Thinking tasks, comparing both with-AI and without-AI groups. We further explore the main barriers to accurate solutions, offer insights from participant behaviors and post-experiment survey data, and highlight where AI integration shows the most promise or remains limited.

5.1. Performance in Analytical Thinking

In line with the literature on definitions and sub-skills of Analytical Thinking [17,18], we have assessed the interplay of the sub-skills and the use of AI support in problem solving. We identified several of the sub-skills, such as an Alternative Approach, and to some extent, “To Break Down Concepts”, “Logical Thinking”, “Decision Making”, and “Information Evaluation and Judgment”, that appear to benefit from cooperative use of AI. Reflecting on the findings of Adams [20] related to the use of data, the results suggest that participants who iteratively engaged with ChatGPT generally performed better on tasks requiring logical reasoning and decision making.

When reviewing the participants’ scores belonging to the GPT non-user group in the experiment, the Alternative Approach sub-skill has the lowest average score (1.59) in the Analytical Thinking skill domain. Logical Thinking and Decision Making also had relatively low scores, highlighting the complexity of these sub-skills supporting the findings of Altun and Yildirim [17]. Hypothesis and Rejecting Unsupported Conclusions sub-skills are the other lowest scores in the GPT non-user group. The latter two sub-skills, in addition to the Alternative Approach, Questioning and Decision Making, are among the lowest scores in the GPT user group as well. This mirrors the findings by Bellettini et al. [37] and Dhingra et al. [35], who note that while Generative AI can excel in language-based tasks, it does not universally guarantee accuracy in multi-step or more nuanced logical reasoning. At the same time, in the Information Evaluation and Breaking down concepts sub-skills, the GPT non-user group has an above average score.

Overall, these findings suggest that an Alternative Approach, Logical Thinking, and Decision Making could gain the most from more intentional, collaborative AI usage, whereas sub-skills requiring deeper critical judgment (e.g., formulating hypotheses, rejecting unsupported conclusions) may need additional pedagogical or training interventions to realize the full potential of AI support. This aligns with a literature review [43], which indicates that Generative AI strengthens Analytical Thinking while requiring analytical assessment of the AI use. Furthermore, the implementation of AI determines its influence [43]. In other words, higher achievements of the GPT users require critical thinking to maximize AI’s positive impact, as we observed with regard to the collaborative participants in our experiment.

5.2. Performance in Creative Thinking

Aligned with prior research [23,26], we considered creative thinking through sub-skills that encompass the capacity to introduce novel and appropriate ideas by synthesizing existing knowledge and perspectives. Overall, participants’ performance on Creative Thinking was comparatively low in both GPT user and GPT non-user groups, matching the notion that creativity requires multifaceted abilities and may not be easily triggered by a single prompt either. In particular, Novelty and Appropriateness stood out as sub-skills that most benefit from a collaborative rather than purely Generative AI interaction—an observation consistent with Rick et al. [30], who found that human–AI co-creation fosters more creative solutions. In other words, creativity both affects AI acceptance and can be enhanced via AI, which further strengthens creative thinking’s importance. At the same time, collaborative usage can possibly enhance Divergent and Convergent Thinking and Combining Information sub-skills. Interestingly, Problem and Goals Definition garnered the highest AI-related scores, indicating that generative models can assist in clarifying initial problem statements—a result echoing earlier evidence that AI tools often excel at producing well-structured outlines.

In general, the performance of both the GPT user and GPT non-user groups is comparatively low in the Creativity-related sub-skills and below the average score (i.e., 2 or 50%), except the Seeing First the Big Picture (C12) and Combining Information (C9) sub-skills, which are around average, and Appropriateness for the GPT non-user group. Overall, these patterns suggest that Divergent and Convergent Thinking, Novelty, and Appropriateness show the greatest potential for improvement through human–AI collaboration, rather than passive usage. This adds another dimension to the importance of the collaboration perspective, as presented in [44], where the authors demonstrate that human-generated ideas can be further improved by AI. The findings also align with [45], indicating the high potential of human–AI collaboration in creativity, while distinguishing different dimensions of creativity. According to a literature review on creativity and AI [41], human–AI collaboration provides a small but consistent advantage in creativity; however, the heterogeneity of ideas tends to diminish compared to human-only approaches. Our study builds on research directions suggested in [41] and addresses limitations in the existing studies, such as the need for realistic task design requiring tacit and strategic knowledge, as well as more context-specific investigations. Research in this field has largely been dominated by direct comparisons of human and AI performance [41]. As highlighted in [41], comprehensive study designs and systematic exploration of creativity’s dimensions are essential inclusion criteria for advancing the domain. This is confirmed by the variations observed in our results across creativity sub-skill dimensions.

5.3. Performance in Systems Thinking

Based on established frameworks of Systems Thinking [27,28], our study highlights the interplay between AI usage and participants’ abilities to recognize interconnections or define boundaries. Notably, GPT-generated responses appeared to improve performance better in specific aspects of Systems Thinking—such as conceptualizing issues as systematic and considering holistic perspectives—compared to the other two main cognitive skills considered in this study. In particular, the Consider Issue as Systematic sub-skill is the highest for the participants who simply copied the responses of GPT. However, it has to be noted that, in both GPT user and GPT non-user groups, this sub-skill has one of the highest averages.

AI-generated answers may offer advantages in sub-skills such as Holistically and Parts, Mental Modeling, as well as Interconnections and Their Characterizations. Among these three sub-skills, the performance in Mental Modeling is lower in both GPT user and GPT non-user groups, whereas in the other two sub-skills, the average performance is observed.

In contrast, collaborative usage of AI seems beneficial in How Concepts Work Together and Identify Patterns Over Time, even though the latter sub-skill had the lowest average. Interestingly, How Concepts Work Together emerged as one of the stronger-performing sub-skills on average. The findings align with the exploratory study by [46], the only study identified in our literature review examining ChatGPT and systems thinking, where prompts from different disciplinary perspectives were used. In that study, ChatGPT demonstrated knowledge of systems thinking and successfully explained its components. Additionally, [46] highlights the potential of ChatGPT to enhance systems thinking when guided by appropriate prompting, including its strong ability to identify and elaborate on system elements and their interrelationships, as well as to generate visualizations (e.g., Causal Loop Diagrams).

By comparison, AI responses may not be reliable from Systems Boundaries, Multidisciplinary Approach, and Multiple Perspective sub-skills perspectives. In all three sub-skills, the average performance of the 16 participants is below average. No consistent trends were observed for Predicting Future Behavior and Components of the System, Distinction, and Quantity, suggesting that deeper collaboration or more specialized AI interventions may be necessary for these multifaceted tasks.

Overall, sub-skills such as Mental Modeling—centered on structuring and organizing information—and Identify Patterns Over Time appear to hold the greatest potential for improvement with AI support. In contrast, defining system boundaries, incorporating multiple perspectives, and effectively quantifying system components remain key developmental targets for cultivating a more holistic systems thinking approach.

5.4. Main Observed Barriers to Problem Solving

Consistent with prior research underscoring the importance of critical engagement with AI outputs [37], among participants who relied on simple copy–paste strategies, Participants 1 and 4 (i.e., P_01 and P_04 in Appendix B) encountered multiple challenges. One of them entered sentences one by one each as a prompt, whereas the second copy–pasted in two steps. Although both participants eventually detected faulty instructions in ChatGPT’s answers and attempted to correct them, they devoted minimal time (often 5–10 min) to reviewing the tasks before starting to use the AI tool. By contrast, the participant who combined all relevant business-case instructions in a small number of prompts achieved notably higher scores across Analytical, Creative, and Systems Thinking skills, underscoring the value of a holistic perspective in problem solving [28].

Other user behaviors further illustrate how misunderstandings of provided information can negatively affect performance. For example, the participant who declined to use AI altogether (P_13) produced an answer too irrelevant to be accurately scored, suggesting that lack of clarity in problem comprehension can severely hinder output quality regardless of AI availability.

Regarding the collaborative participants, P_9 and P_12 leveraged AI prompts more strategically but still exhibited varying degrees of success. P_9 used AI, and other tools such as an internet browser, after 8 min for brainstorming. The participant scored the highest on Analytical and Creative Thinking skills among the GPT user group, and second highest in Systems Thinking, behind only one copy–pasting user. Participant 12 (P_12) similarly tailored and refined multiple rounds of AI-generated text, illustrating the iterative co-creation process advocated by Urban et al. [36], though the final solution still required further validation.

In the GPT non-user group, the general behavior is similar: the usage of internet searches was observed only in the solution of a few participants and they searched for definitions, company information, pros, and cons. Moreover, incorrect solutions can be attributed to similar reasons, such as misunderstanding the provided information, misplacement of the numbers in the calculations, dismissing the provided information and data, and misunderstanding expectations.

Notably, the task completion time differed significantly between the two groups, with GPT users finishing 15–20 min earlier on average. For GPT non-users, including the participants in the GPT user group who did not use AI or only used it for a definition, an extended task duration correlated positively with overall skill performance, aligning with the notion that reflective thinking often demands sustained cognitive effort. In contrast, the average performance among GPT users showed no strong correlation with task duration, suggesting that effective AI collaboration can offset the advantages of an extended time availability.

5.5. Post-Experiment Survey Results

Based on the survey completed by the participants post-experiment, they assess the usefulness of the AI from the perspective of the three studied cognitive skills at around the same level, between 60% and 65%, with Creative Thinking skill rated slightly lower than the other two skills. The main challenges mentioned by the participants in the GPT user group include limited business knowledge, limited case and background information, case comprehension, and time. In the GPT non-user group, case comprehension and understanding, business analytics knowledge, background information, and language are among the mentioned challenges.

For participants in the GPT user group, the most time-consuming tasks involved understanding the content and conducting quantitative analyses. By contrast, the single participant who declined to use GPT identified writing a high-quality recommendation as the most significant time investment. In the GPT non-user group, the most frequently cited time-intensive activities included quantitative calculations, information organization, case comprehension, and, for some participants, writing. Notably, quantitative analysis and conceptual understanding were the chief hurdles for GPT users, whereas writing emerged as a parallel dominant challenge for GPT non-users, alongside numerical tasks.

Across both groups, participants generally valued the prospect of AI for its capacity to structure content, clarify terminology, facilitate conceptualization and summarization, and perform rapid quantitative analyses, all consistent with reported benefits in the recent literature [30,36]. Similarly, participants lacking AI access expressed that, given the option, they would primarily employ AI-driven tools for brainstorming and idea generation, numerical computations, and language or terminology support, reflecting the broad spectrum of potential applications for Generative AI in problem-solving contexts.

These findings can be interpreted in light of recent discussions about how large language models and agentic AI reshape human cognition in problem-solving contexts. A recent survey emphasizes that LLMs alter the way people process information, evaluate trust, and engage with decision making, which parallels our observation that AI use systematically shifts the balance between analytical structuring and creative or integrative reasoning [47]. Furthermore, frameworks such as nomological deductive reasoning stress the importance of combining transparent logic with probabilistic inference to ensure that AI assistance supports rather than constrains human cognitive processes [48]. These perspectives underscore that the impact of Generative AI cannot be understood merely by measuring performance differences but must also be framed as part of broader algorithmic design questions about how AI affords or limits different modes of human reasoning. Finally, the findings underline the importance of incorporating insights from relevant studies [49,50], which show that integrating ChatGPT into the learning process can have a considerable positive impact on students’ computational thinking capabilities, including problem solving, critical thinking, and creativity, while also emphasizing the importance of AI literacy and the use of AI as a cognitive tool.

6. Conclusions

This study explored the impact of Generative AI on cognitive skills in problem solving, focusing on analytical thinking, creativity, and systems thinking. The analysis indicates that while AI assistance can support certain higher-order thinking processes, its benefits are uneven and dependent on how the technology is applied. The results suggest that there are considerable benefits of AI in enhancing cognitive performance in problem solving, but only if the AI is utilized in the optimal way through collaborative efforts. However, Generative AI does not on its own foster creativity or originality, underscoring the irreplaceable role of human input in generating novel insights.

With respect to Research Question 1, focusing on the cognitive skills expected to become most critical with increased reliance on AI, this study suggests that analytical and systems thinking remain indispensable. These skills are essential for critically evaluating AI outputs, ensuring logical consistency, and integrating multiple perspectives into problem solving. At the same time, creativity, particularly in generating novel ideas and analogies, becomes even more significant as AI tools currently lack independent creative reasoning.

Concerning Research Question 2, focusing on the benefits and challenges of Generative AI in problem solving, the findings highlight that AI offers clear advantages in efficiency, structure, and support for information processing. Yet these benefits are limited when AI is treated as an unquestioned authority. This study demonstrates that critical engagement and reflective collaboration with AI are key to realizing its potential. Without such engagement, there is a risk of over-reliance, reduced accuracy, and diminished cognitive skill development.

Despite these insights, this study acknowledges several limitations. First, while this study provides a structured approach to evaluating soft skills, the definitions and measurement criteria should be refined in future research. Second, some cognitive sub-skills, such as Analogical Thinking and Mental Modeling, may require different assessment methods to be fully captured. Third, the sample size is small, limiting the generalizability of the results and the robustness of statistical comparisons, though the in-depth analysis offers valuable initial insights. Future research should extend the analysis to larger and more diverse participant groups across different academic disciplines and professional contexts to validate and strengthen the findings. In particular, studies with larger samples could explore subgroup differences, longitudinal effects of AI-assisted skill development, and variations across professional domains

Author Contributions

Conceptualization, N.M., J.M. and X.W.; methodology, N.M.; validation, N.M., J.M. and X.W.; formal analysis, N.M.; investigation, N.M., J.M. and X.W.; writing—original draft preparation, N.M.; writing—review and editing, N.M., J.M. and X.W.; supervision, J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The task specification can be accessed at https://figshare.com/s/0abd16a4f303e0d3ba72, accessed on 3 October 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Complete evaluation rubrics.

Sub-Skill	Question	Level 1	Level 2	Level 3
A1. To break down concepts or ideas	Did the participant consider the new idea (i.e., outsource) from the main business principles (e.g., profit, sales, logistics, brand recognition) perspective?	The participant did not break down the concept, and considered it from a single perspective (e.g., logistics only)	Some perspectives were listed	A considerable number of sub-components of the case were considered
A2. Information evaluation and judgment New: Judgments by fact	How did the participant interpret and make a judgment about costs and revenues, or about warehouse income?	The participant did not consider the numbers nor warehouse income	The participant considered the numbers as a fact, but made an inconsistent judgment	The participant interpreted numbers as a fact, as well as warehouse income, and made an appropriate judgment
A3. Argument evaluation	How did the participant evaluate any of the (counter-)arguments present in Cosentino [40] and the task?	The participant did not consider assessing any argument or potential issues	The participant considered at least one argument, while incomplete and skipping some crucial issues	The participant evaluated at least two potential arguments
A4. Questioning	Did the participant question,e.g., different pros/cons, the less busy 10 months, “great interview” points, or other critical points?	The participant did not ask or formulate any critical question nor did they question the ideas or issues	The participant formulated and asked at least one critical question, or questioned at least one idea or issue	The participant formulated and asked more than one critical question, or questioned more than one idea or issue
A5. Decision making	Was the decision made by the participant optimal considering the arguments and other information provided in Cosentino [40] and the task?	Decision made by the participant did not match with any analyses or provided information	Decision made by the participant missed some considerations or issues mentioned in the description	The participant made the best or one of the best possible decisions, after calculations and analyses, e.g., “great interview” dual vendors, free shipping (Cosentino [40])
A6. Inductive–deductive inference	Did the participant draw a conclusion based on the facts, previous knowledge, and calculations?	None of the information, facts, or knowledge supported the participant’s conclusion	Some of the information, facts, or knowledge supported the participant’s conclusion	The participant’s conclusions were completely based on the inductive or deductive inference
A7. Interpret and explain	Was the participant able to interpret,e.g., argument parts of the text or holiday information, as well as structure and report the information in an explainable way?	The participant could not interpret the provided information nor explain the conclusion and other analysis	The participant interpreted and provided information correctly, but did not explain the conclusion and other analysis, or vice versa, or both interpretation and explanations were incomplete	The participant interpreted and explained all or most of the information
A8. Logical thinking	How well could the participant identify the relationship between the provided information, perform data processing and analytical computations, and then draw a conclusion?	The participant did not identify relationships, e.g., between revenue and cost, (non-)holidays and costs, arguments and solution	Some of the relationships were identified	All the relationships between numbers or information and conclusions and solutions were identified
A10. Alternative approach	Did the participant think about alternative approaches to the problem or alternative solutions?	The participant could not provide or did not mention alternative solutions	The participant mentioned at least one alternative solution	The participant considered two or more alternative solutions
A11. Hypothesis	Did the participant formulate any hypothesis when reaching the solution?	Not at all	One or incomplete	Yes
A12. Rejecting unsupported conclusions	Did the participant reject other ideas (i.e., possible solutions) based on their nonsupport from the task description (i.e., arguments in the description) or calculations?	The participant did not consider if the conclusion was supported by the information and data provided	The participant skipped some of the unsupported parts, and they rejected only one or a few unsupported conclusions	The participant analyzed information and facts, and they rejected unsupported solutions
C1. To bring new idea	Did the participant bring an idea about the possible solution?	The participant could not find any idea or found an incomplete solution or idea	The participant brought at least one complete idea or solution	The participant brought more than one complete solution or idea
C2. Novelty	Were the ideas that the participant brought novel?	None of the ideas the participant brought are novel	At least one idea that the participant brought has some sign of novelty	The ideas or part of them have considerable signs of novelty, e.g., compared with the argument provided in Cosentino [40]
C3. Appropriateness	How appropriate are the ideas that the participant brought to the described problem and objective?	Does not address the problem at all or match with the objective	The idea addresses the problem partly, while skipping some parts of the problem or objective	The idea is completely appropriate to the problem and objectives
C5. Problem and goals’ definition and representation	Did the participant ask goals and problem-related questions, such as in Cosentino [40] or any other?	Not at all	One	More than one
C6. Divergent/convergent thinking strategy	Did the participant generate many ideas and then narrow down the options?	Not at all	The participant generated some ideas, but did not analyze and identify the best one	The participant listed many possible ideas and selected the best one
C8. Analogies	Did the participant’s solutions report show any sign of analogies?	Not at all	At least one, but unclear or irrelevant	At least one and a clear analogous case (e.g., product, industry) in the problem definition or solution
C9. Combining information	Did the participant combine the cost results with the holiday revenue, and with argument-related information pieces?	There is not any sign of “combining” presented information, previously known information, and information gained during the process	There is some information that is combined and is useful, but clear signs of gaps exist	There are many signs of information combinations that have generated valuable information
C12. See first the big picture	Did the participant see the two (high sales) months’ holiday fact, put 22800 into perspective, pros and cons, or anything else?	There is not any sign of the big picture. Instead, all the analysis and observations are about the details	There are some signs of or a mention of the big picture. Yet, in most of the analysis and observations, the participant has a narrow focus	The participant has a considerable number of (e.g., at least two) big picture observations, notes, or perspective
S1. How concepts work together	Did the participant understand how related concepts,e.g., profit, cost, transport, work together?	Did not consider at all how the concepts interrelated	Considered a few of them, but to some degree and incomplete	Understood and considered at a satisfactory level at least three concepts
S2. Multidisciplinary approach	Did the participant think from the multidisciplinary perspective?E.g., psychology, marketing, finance	There is not any sign of multidisciplinary approach (e.g., only financial discipline)	The participant considered from at least two disciplinary perspectives	More than two disciplines have been considered in the approach
S3. Identify patterns over time	Could the participant see the pattern in the historical data of the last two years?	No pattern analysis or observations	The pattern identification was inaccurate	The participant analyzed and identified pattern accurately
S4. Multiple perspectives	Did the participant consider any perspective other than the business perspective, or within the business, such as different perspectives on cost reduction?	The participant had only business perspective, e.g., profit	The participant approached from at least two perspectives, possibly vague and missing crucial perspectives	The participant has multiple perspectives in the solution, e.g., considering other stakeholders, societal implications
S5. Holistically and parts	Did the participant consider the organization or business environment as a system, and consider as a whole and detailed parts (e.g., departments mentioned in Cosentino [40])?	The participant considers only particular part of the system, e.g., financial, sales, or logistics	The participant considers the whole business or organization or any other perspective, but also considers the parts to some degree	The participant considers systems as a whole and its or their parts
S6. Mental modeling	Are there any signs of implementation of the mental modeling?	The participant failed to realize any mental modeling of the case	The mental model visualized by the participant lacked clarity or sufficient analyses, and organization	The mental model has sufficient level of clarity, analyses and the element are organized
S7. Consider issue as systematic	Did the participant identify the issue as systematic?	Failed to identify and define the issue as systemic, and no knowledge that such concept exists	Defined the issue as systemic, whereas could not accurately identify the systems	Clearly identified the issue as systemic, as well as the related systems
S8. Systems boundaries	Did the participants succeed in defining the boundary of the system and its elements?	Fails to define elements or boundary at all	Misses some key elements or includes considerable number of non or less relevant elements	Clearly defines its boundaries, including the key element in that boundary, and excluding irrelevant elements
S9. Components of the system, distinction, and quantify	Did the participant succeed in defining what are the components, and differentiate between and quantify them?	The participant failed to define	Succeeded in recognizing difference, while lacking clear or accurate quantification and descriptions	Accurately defined the system elements, their properties, and distinguished between them
S10. Interconnections and their characterization	Did the participant succeed in identifying the existence of the relationships between the elements, as well as characterize them?	The participant failed to identify any interconnection	Identified interconnections, but their characterizations are unclear or lacking	Clearly identified and characterized most of the relationships
S13. Predicting future behavior	Does the analysis or report show any predictions regarding future behavior?	Did not consider any estimation regarding future system behavior in analysis or response	The analysis or response has some elements of the estimations regarding future behavior	Made detailed and comprehensive future behavior estimations

Appendix B

Table A2. Evaluation scores for GPT users.

Sub-Skills Task 1	P_01	P_05	P_09	P_13	Sub-Skills Task 2	P_04	P_08	P_12	P_16	Sub-Skill Average
A1	2	3	3	1	A1	3	2	3	3	2.50
A2	1	3	3	1	A2	2	2	2	2	2.00
A3	2	2		1	A3	3	2	3	3	2.29
A4	1	3	3	1	A4	2	1			1.83
A5	1	2	3	1	A5	2	2	2	2	1.88
A6	1	3	3	1	A6	3	2	2	2	2.13
A7	2	2	3	1	A7	3	2	2	2	2.13
A8	2	2	3	1	A8	2	2	2	2	2.00
A10	1	2	3	1	A10	1	1	2	1	1.50
A11	2	2	2	1	A11	1	1	1	1	1.38
A12	1	2	2	1	A12	2	2	2	2	1.75
Average A	1.45	2.36	2.8	1		2.18	1.72	2.1	2	1.95
C1	2	2	2	1	C1	2	2	2	2	1.88
C2	1		3	1	C2	2	2	2	2	1.86
C3	1	2	3	1	C3	2		2	2	1.86
C5	2	1	1	1	C5	3	2	2	2	1.75
C6	1	2	3	1	C6	1	1	1	1	1.38
C8	1	1	1	1	C8	1	1	1	1	1.00
C9	2	2	3	1	C9	2	2	2	3	2.13
C12		3	2	1	C12	3	1	2	2	2.00
Average C	1.42	1.85	2.25	1		2	1.57	1.75	1.875	1.71
S1		2	3	1	S1	3	2	2	2	2.14
S2	1	2	2	1	S2	3	1	2	2	1.75
S3	1	1	3	1	S3	1	1		1	1.29
S4	1	2	1	1	S4	3	1	2	2	1.63
S5	3	2		1	S5	3	2	2	2	2.14
S6	2	2	2	1	S6	2	1	1		1.57
S7	3	2	2	1	S7	3	2	2	2	2.13
S8	1	2		1	S8	3	2	2	2	1.86
S9	1	2	3	1	S9	2	1	2	2	1.75
S10	2	2	2	1	S10	3	2	2	2	2.00
S13	2	2	2	1	S13	2	2	2	2	1.88
Average S	1.7	1.90	2.222	1		2.54	1.54	1.9	1.9	1.84

Table A3. Evaluation scores for GPT non-users.

Sub-Skills Task 1	P_02	P_06	P_10	P_14	Sub-Skills Task 2	P_03	P_07	P_11	P_15	Sub-Skill Average
A1	3	3	3	2	A1	3	3	2	2	2.63
A2	2	3	3	1	A2	2	2	2	2	2.13
A3	3	3	2	2	A3	2	3	2	2	2.38
A4	3	3	3	2	A4	2	2	1	1	2.13
A5	2	2	2	1	A5	2	2	2	2	1.88
A6	3	3	2	1	A6	2	3	2	2	2.25
A7	2	2	3	2	A7	2	3	2	2	2.25
A8	2	2	2	1	A8	2	2	2	2	1.88
A10	2	2	1	1	A10	1		2	2	1.57
A11	1	3	2	1	A11	2	3	1	2	1.88
A12	2	2	2	1	A12	2	2	2	2	1.88
Average A	2.27	2.55	2.27	1.36		2.00	2.50	1.82	1.91	2.09
C1	2	2	2	1	C1	2	2	2	2	1.88
C2	2	2	2	1	C2	2	2	1	2	1.75
C3	2	3	2	2	C3	2	2	2	3	2.25
C5	1		2	1	C5	1		1	1	1.17
C6	2	3	2	1	C6	1		1		1.67
C8	1	1	1	1	C8	1	1	1	1	1.00
C9	3	3	2	1	C9	2	2	2	2	2.13
C12	2	2	1	2	C12	2	3	2	2	2.00
Average C	1.87	2.28	1.75	1.25		1.62	2	1.5	1.85	1.77
S1	2	2	2	2	S1	3	3	2	2	2.25
S2	2	1	2	2	S2	2	2	1		1.71
S3	1	1	2	1	S3	1	1	1	1	1.13
S4	2	1	2	1	S4	2	3	1	2	1.75
S5	1	2	2	1	S5	3	3	2	2	2.00
S6	2	2	2	1	S6	2	2	1	2	1.75
S7	2	2	2	2	S7	3	3	1		2.14
S8	2	2	2	1	S8	2	2	1	1	1.63
S9	2	2	2	1	S9	2	2	1	2	1.75
S10	2	2	2	1	S10	2	3	2	2	2.00
S13	2	2	2	2	S13	2	3	2	2	2.13
Average S	1.81	1.72	2	1.36		2.18	2.45	1.36	1.77	1.83

References

Musazade, N. Tools and technologies utilized in data-related positions: An empirical study of job advertisements. In Proceedings of the 36th Bled eConference—Digital Economy and Society: The Balancing Act for Digital Innovation in Times of Instabili, Bled, Slovenia, 25–28 June 2023; 1(BLED 2023—Proceedings):155-170. University of Maribor, University Press: Maribor, Slovenia, 2023. [Google Scholar] [CrossRef]
Do, H.D.; Tsai, K.T.; Wen, J.M.; Huang, S.K. Hard Skill Gap between University Education and the Robotic Industry. J. Comput. Inf. Syst. 2022, 63, 24–36. [Google Scholar] [CrossRef]
Cukier, W. Disruptive processes and skills mismatches in the new economy: Theorizing social inclusion and innovation as solutions. J. Glob. Responsib. 2019, 10, 211–225. [Google Scholar] [CrossRef]
Bukartaite, R.; Hooper, D. Automation, artificial intelligence and future skills needs: An Irish perspective. Eur. J. Train. Dev. 2023, 47, 163–185. [Google Scholar] [CrossRef]
Doherty, O.; Stephens, S. Hard and soft skill needs: Higher education and the Fintech sector. J. Educ. Work. 2023, 36, 186–201. [Google Scholar] [CrossRef]
Musazade, N.; Mezei, J.; Wang, X. Exploring the Performance of Large Language Models for Data Analysis Tasks Through the CRISP-DM Framework. In Good Practices and New Perspectives in Information Systems and Technologies; Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Poniszewska-Marańda, A., Eds.; WorldCIST 2024; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2024; Volume 989. [Google Scholar] [CrossRef]
World Economic Forum. The Future of Jobs Report 2023. 2023. ISBN-13: 978-2-940631-96-4. Available online: https://www3.weforum.org/docs/WEF_Future_of_Jobs_2023.pdf (accessed on 3 October 2025).
Dell’Acqua, F.; McFowland, I.I.I.E.; Mollick, E.R.; Lifshitz-Assaf, H.; Kellogg, K.; Rajendran, S.; Krayer, L.; Candelon, F.; Lakhani, K.R. Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality; Unit Working Paper, (24-013); Harvard Business School Technology & Operations Mgt: Boston, MA, USA, 2023. [Google Scholar]
Kamin, M. Soft Skills Revolution: A Guide for Connecting with Compassion for Trainers, Teams, and Leaders; Pfeiffer: San Francisco, CA, USA, 2013. [Google Scholar]
Marin-Zapata, S.I.; Román-Calderón, J.P.; Robledo-Ardila, C.; Jaramillo-Serna, M.A. Soft skills, do we know what we are talking about? Rev. Manag. Sci. 2022, 16, 969–1000. [Google Scholar] [CrossRef]
Matteson, M.L.; Anderson, L.; Boyden, C. “Soft Skills”: A Phrase in Search of Meaning. Portal Libr. Acad. 2016, 16, 71–88. [Google Scholar] [CrossRef]
Robles, M.M. Executive Perceptions of the Top 10 Soft Skills Needed in Today’s Workplace. Bus. Commun. Q. 2012, 75, 453–465. [Google Scholar] [CrossRef]
Rao, M.S. Soft skills: Toward a sanctimonious discipline. Horizon 2018, 26, 215–224. [Google Scholar] [CrossRef]
Hendarman, A.F.; Cantner, U. Soft skills, hard skills, and individual innovativeness. Eurasian Bus. Rev. 2018, 8, 139–169. [Google Scholar] [CrossRef]
World Economic Forum. Global Skills Taxonomy Adoption Toolkit: Defining a Common Skills Language for a Future-Ready Workforce; WFE Insight Report; World Economic Forum: Geneva, Switzerland, 2025. [Google Scholar]
Tronnier, F.; Bernet, R.; Löbner, S.; Rannenberg, K. Consulting in the Age of AI: A Qualitative Study on the Impact of Generative AI on Management Consultancy Services. In Proceedings of the 58th 2 Hawaii International Conference on System Sciences, Big Island, HI, USA, 7–10 January 2025; pp. 132–141. [Google Scholar]
Altun, E.; Yildirim, N. What does critical thinking mean? Examination of pre-service teachers’ cognitive structures and dblackefinitions for critical thinking. Think Ski Creat. 2023, 49, 101367. [Google Scholar] [CrossRef]
Demir, E. An examination of high school students critical thinking dispositions and analytical thinking skills. J. Pedagog. Res. 2022, 6, 190–200. [Google Scholar] [CrossRef]
Mayarni, M.; Nopiyanti, E. Critical and analytical thinking skill in ecology learning: A correlational study. JPBI (J. Pendidik. Biol. Indones.) 2021, 7, 63–70. [Google Scholar] [CrossRef]
Adams, D.J.; (with UK Centre for Bioscience). Effective Learning in the Life Sciences: How Students Can Achieve Their Full Potential; John Wiley & Sons: Oxford, UK, 2011. [Google Scholar]
Johnson, B. Teaching Students to Dig Deeper: Ten Essential Skills for College and Career Readiness, 2nd ed.; Routledge: New York, NY, USA, 2017. [Google Scholar]
Marni, S.; Suyono, S.; Roekhan, R.; Harsiati, T. Critical Thinking Patterns of First-Year Students in Argumentative Essay. J. Educ. Gift. Young Sci. 2019, 7, 683–697. [Google Scholar] [CrossRef]
Fabian, J. Creative Thinking and Problem Solving, 1st ed.; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
Adair, J.E. The Art of Creative Thinking: How to Be Innovative and Develop Great Ideas; Kogan Page: London, UK; Philadelphia, PA, USA, 2007. [Google Scholar]
Black, B. A to Z of Critical Thinking; Continuum International Publishing: London, UK; New York, NY, USA, 2012. [Google Scholar]
Lubart, T.I. Creativity. In Thinking and Problem Solving, 2nd ed.; Sternberg, R.J., Ed.; Elsevier: Amsterdam, The Netherlands, 1994; pp. 289–332. [Google Scholar] [CrossRef]
Dugan, K.E.; Mosyjowski, E.A.; Daly, S.R.; Lattuca, L.R. Systems thinking assessments in engineering: A systematic literature review. Syst. Res. Behav. Sci. 2022, 39, 840–866. [Google Scholar] [CrossRef]
Arnold, R.D.; Wade, J.P. A complete set of systems thinking skills. Insight 2017, 20, 9–17. [Google Scholar] [CrossRef]
Zanella, S. System thinking skills: A questionnaire to investigate them. J. Phys. Conf. Series. 2022, 2297, 012023. [Google Scholar] [CrossRef]
Rick, S.R.; Giacomelli, G.; Wen, H.; Laubacher, R.J.; Taubenslag, N.; Heyman, J.L.; Knicker, M.S.; Jeddi, Y.; Maier, H.; Dwyer, S.; et al. Supermind Ideator: Exploring generative AI to support creative problem-solving. arXiv 2023, arXiv:2311.01937. [Google Scholar] [CrossRef]
Goldstein, A.; Havin, M.; Reichart, R.; Goldstein, A. Decoding Stumpers: Large Language Models vs. Human Problem-Solvers. arXiv, 2023; arXiv:2310.16411. [Google Scholar] [CrossRef]
Zollman, D.A.; Sirnoorkar, A.; Laverty, J.T. Analyzing AI and student responses through the lens of sensemaking and mechanistic reasoning. In Proceedings of the 2023 Physics Education Research Conference Proceedings, Sacramento, CA, USA, 19–20 July 2023; pp. 415–420. [Google Scholar] [CrossRef]
Qawqzeh, Y. Exploring the Influence of Student Interaction with ChatGPT on Critical Thinking, Problem Solving, and Creativity. Int. J. Inf. Educ. Technol. 2024, 14, 596–601. [Google Scholar] [CrossRef]
Zhai, X.; Nyaaba, M.; Ma, W. Can Generative AI and ChatGPT Outperform Humans on Cognitive-demanding Problem-Solving Tasks in Science? Sci. Educ. 2025, 34, 649–670. [Google Scholar] [CrossRef]
Dhingra, S.; Singh, M. Dhingra, S.; Singh, M.; S.B., V.; Malviya, N.; Gill, S.S. Mind meets machine: Unravelling GPT-4’s cognitive psychology. BenchCouncil Trans. Benchmarks Stand. Eval. 2023, 3, 100139. [Google Scholar] [CrossRef]
Urban, M.; Děchtěrenko, F.; Lukavský, J.; Hrabalová, V.; Svacha, F.; Brom, C.; Urban, K. ChatGPT improves creative problem-solving performance in university students: An experimental study. Comput. Educ. 2024, 215, 105031. [Google Scholar] [CrossRef]
Bellettini, C.; Lodi, M.; Lonati, V.; Monga, M.; Morpurgo, A. Davinci Goes to Bebras: A Study on the Problem Solving Ability of GPT-3. In Proceedings of the 15th International Conference on Computer Supported Education, Prague, Czech Republic, 21–23 April 2023; pp. 59–69. [Google Scholar] [CrossRef]
Orrù, G.; Piarulli, A.; Conversano, C.; Gemignani, A. Human-like problem-solving abilities in large language models using ChatGPT. Front. Artif. Intell. 2023, 6, 1199350. [Google Scholar] [CrossRef]
Avcı, Ü. Students’ GAI Acceptance: Role of Demographics, Creative Mindsets, Anxiety, Attitudes. J. Comput. Inf. Syst. 2024, 1–15. [Google Scholar] [CrossRef]
Cosentino, M. Case in Point: Complete Case Interview Preparation; Burgee Press: Santa Barbara, CA, USA, 2013. [Google Scholar]
Holzner, N.; Maier, S.; Feuerriegel, S. Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis. arXiv 2025, arXiv:2505.17241. [Google Scholar] [CrossRef]
Umer, H. Generative Artificial Intelligence’s Impact on Education: A Selected Review of Experimental/Interventional Studies. 2025. Available online: http://dx.doi.org/10.2139/ssrn.5084325 (accessed on 3 October 2025).
Hikmawati, A.; Mohammad, N.K. Enhancing Critical Thinking with Gen AI: A Literature Review. Bul. Edukasi Indones. 2025, 4, 40–46. [Google Scholar] [CrossRef]
Castelo, N.; Katona Zs Li, P.; Sarvary, M. How AI Outperforms Humans at Creative Idea Generation. 2024. Available online: http://dx.doi.org/10.2139/ssrn.4751779 (accessed on 3 October 2025).
Hitsuwari, J.; Ueda, Y.; Yun, W.; Nomura, M. Does human–AI collaboration lead to more creative art? Aesthetic evaluation of human-made and AI-generated haiku poetry. Comput. Hum. Behav. 2023, 139, 107502. [Google Scholar] [CrossRef]
Arndt, H. AI and education: An investigation into the use of ChatGPT for systems thinking. arXiv 2023, arXiv:2307.14206. [Google Scholar] [CrossRef]
Brohi, S.; Mastoi, Q.-U.-A.; Jhanjhi, N.Z.; Pillai, T.R. A Research Landscape of Agentic AI and Large Language Models: Applications, Challenges and Future Directions. Algorithms 2025, 18, 499. [Google Scholar] [CrossRef]
Hakizimana, G.; Ledezma Espino, A. Nomological Deductive Reasoning for Trustworthy, Human-Readable, and Actionable AI Outputs. Algorithms 2025, 18, 306. [Google Scholar] [CrossRef]
Sivasakthi, M.; Meenakshi, A. Generative AI in Programming Education: Evaluating ChatGPT’s Effect on Computational Thinking. Sn Comput. Sci. 2025, 6, 541. [Google Scholar] [CrossRef]
Daniel, K.; Msambwa, M.M.; Wen, Z. Can Generative AI Revolutionise Academic Skills Development in Higher Education? A Systematic Literature Review. Eur. J. Educ. 2025, 60, e70036. [Google Scholar] [CrossRef]

Figure 1. Evaluation rubric generation process.

Table 1. Assessment rubric.

Sub-Skills Related to	Questions/Sub-Skills	Level 1	Level 2	Level 3
Analytical thinking	Question_A	Criteria_A1	Criteria_A2	Criteria_A3
Creativity	Question_C	Criteria_C1	Criteria_C2	Criteria_C3
Systems thinking	Question_S	Criteria_S1	Criteria_S2	Criteria_S3

Table 2. Combinations of skill performance assessments.

Combinations	Analytical Thinking	Creative Thinking	Systems Thinking	with GPT	Without GPT	Total
1	0	0	0	2	3	5
2	1	0	0	3	1	4
3	0	1	0
4	0	0	1	1		1
5	1	1	0		1	1
6	1	0	1		2	2
7	0	1	1
8	1	1	1	2	1	3

Table 3. User behavioral types of participants’ access to ChatGPT in the experiment (C-level measures collaboration, while E-level measures to what extent the participant was copy–pasting the output of the AI tool). Mean score in the performance across skills of interest converted to %.

Interaction-Based Grouping: Behavior	C-Level	E-Level	P_n	Mean Score	Task
Copy–paster: Copy–pasted everything	2	5	P_01	26%	1
Copy–paster: Copy–pasted everything	2	5	P_04	62%	2
Minimal user: Asked 1 definition	3	2	P_05	52%	1
Moderate user: Language and 1 fact	4	3	P_08	31%	2
Collaborator: For brainstorming, language, formatting	5	4	P_09	71%	1
Collaborator: For cons and pros, reviewing GPT results and retrieving some parts. Calculations copy–pasted. Conclusion copy–pasted, but some adjustments made.	5	4	P_12	46%	2
Non-user: Did not use at all	1	1	P_13	0	1
Moderate user: Asked reasoning based on the calculation student performed, and format. The participant added one risk factor manually.	4	3	P_16	46%	2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Musazade, N.; Mezei, J.; Wang, X. Exploring the Impact of AI Tools on Cognitive Skills: A Comparative Analysis. Algorithms 2025, 18, 631. https://doi.org/10.3390/a18100631

AMA Style

Musazade N, Mezei J, Wang X. Exploring the Impact of AI Tools on Cognitive Skills: A Comparative Analysis. Algorithms. 2025; 18(10):631. https://doi.org/10.3390/a18100631

Chicago/Turabian Style

Musazade, Nurlan, József Mezei, and Xiaolu Wang. 2025. "Exploring the Impact of AI Tools on Cognitive Skills: A Comparative Analysis" Algorithms 18, no. 10: 631. https://doi.org/10.3390/a18100631

APA Style

Musazade, N., Mezei, J., & Wang, X. (2025). Exploring the Impact of AI Tools on Cognitive Skills: A Comparative Analysis. Algorithms, 18(10), 631. https://doi.org/10.3390/a18100631

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring the Impact of AI Tools on Cognitive Skills: A Comparative Analysis

Abstract

1. Introduction

2. Literature Review

2.1. Soft Skills

2.2. Analytical Thinking

2.3. Creative Thinking

2.4. Systems Thinking

2.5. GPT, Cognitive Skills, and Problem Solving

3. Methodology

3.1. Skill Selection and Rubric Formulation

3.2. Experimental Design and Tasks

3.3. Participants

3.4. Analysis Methods

4. Results

4.1. Comparing GPT Users vs. Non-Users

4.2. Basic Configurational Analysis

4.3. Behavioral Types and Diverse Sub-Skill Scores

5. Discussion

5.1. Performance in Analytical Thinking

5.2. Performance in Creative Thinking

5.3. Performance in Systems Thinking

5.4. Main Observed Barriers to Problem Solving

5.5. Post-Experiment Survey Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI