2. Material and Methods
2.1. Design and Methodology
The researcher utilized a quantitative, quasi-experimental, explanatory methodology for the envisioned study, using survey research to better understand related phenomena. Quantitative methods are used to measure behavior, knowledge, opinions, or attitudes in business research, as is pertinent when the Technology Acceptance Model is the utilized instrument. An online survey was used to test for statistically significant differences in the level of acceptance of alert output between those choosing VAO in all scenarios and those having some or complete preference for TAO, with VAO and TAO being generated via data science/machine learning methods as predicted by the TAM. In pursuit of further insights relevant to potential differences in security analysts’ perceptions of visual and text analytics, the research question that guides this study was:
RQ1: Is there a difference in the level of acceptance of security alert output between those with a preference for VAO and those with a preference for TAO, with VAO and TAO generated via data science/machine learning methods, as predicted by the TAM?
- ○
Sub-questions were:
- ▪
SQ1: Does the adoption of VAO have a significant impact on the four individual TAM components: PU, PEU, AU, and IU?
- ▪
SQ2: Does the adoption of TAO have a significant impact on the four individual TAM components: PU, PEU, AU, and IU?
The online survey utilized for this study incorporated visual images as part of the questioning process, to create clarity and compel answering in full. To further minimize non-response, and to prepare data for testing, the following were included:
As part of this quantitative, quasi-experimental, explanatory study, the online survey for data collection utilized a 7-point Likert scale.
The online survey questionnaire and survey experiment, given that this research was specifically focused on visualization versus text, incorporated visual elements, which lead to a higher response quality and generate interesting interaction effects [
19].
The target population for this study was global information security analysts working in a blue team (defender) capacity, analyzing security monitoring data and alerts. This is an appropriate population given the significant challenges the industry faces due to the sheer scale of security data, and the resulting difficulties security analysts face seeking precise and efficient answers to alert-related questions. Participants were solicited from this population via social media, including LinkedIn and Twitter, mailing lists, industry partners, and contact lists. The researcher ensured prequalification with a job and role-specific question. Survey participants who did not meet population requirements were disqualified.
Data analysis for this study utilized a mixed ANOVA because it enables efficiency while keeping variability low [
20]. In other words, given the within-subjects component of this study where all participants undertook the same three scenarios, a mixed ANOVA allowed for partitioning out variability as a function of individual differences. Additionally, a mixed ANOVA provided the benefit of efficiency while keeping variability low, thereby keeping the validity of the results higher yet allowing for smaller subject groups [
20].
2.2. Data Collection
SurveyMonkey was utilized to create survey hyperlinks for social media and e-mail dissemination to prospective participants and solicit their responses. The criteria for inclusion in the sample were as follows: (a) information security analysts, (b) working in a security monitoring role as part of a security operations center or fusion center, and (c) responding to security alert data. Participants were prequalified to meet these criteria and those who did not were excluded. Any survey results received from participants determined not to meet the criteria for inclusion were eliminated. Participants were required to provide their informed consent before responding to the survey. An opt-out option was available for participants while taking the survey.
The defined variables, related constructs, applied scale, and data types for each variable are listed in
Table 1.
2.3. Instrumentation
The TAM implies that positive perception of usefulness and ease of use (perceived usability) influence intention to use, which in turn influences the actual likelihood of use [
21]. Original construction of the TAM for measurement of PU and PEU resulted in a 12-item instrument that was shown to be reliable [
22]. It consisted of the two factors PU and PEU and was correlated with intentions to use and self-report usage [
17]. This quantitative, quasi-experimental, explanatory study utilized a 7-point Likert scale to assess the level of acceptance and the perceived ease of use and perceived usefulness of alerts in three scenarios (the within-subjects independent variable). The preferred alert output (VAO or TAO) forms the basis of the between-subjects independent variable. Likert-type scale response anchors set the range between agreement and disagreement; as an example, 1 indicated strong disagreement and 7 indicated strong agreement with a statement.
2.4. Hypotheses
The following research questions served to determine if a relationship exists between the dependent variable, which is the level of acceptance of alert output, and the two independent variables, which are Session (1, 2, or 3) and Maximum Visual. Maximum Visual had two levels: one where VAO was chosen for all scenarios and one where TAO was chosen for some or all scenarios.
The following research hypotheses explored the research questions for a relationship between the independent variable of Maximum Visual (a preference for VAO in all scenarios versus a preference for TAO in some or all scenarios), and the dependent variable, which is the level of acceptance of alert outputs. The dependent variable is specific to security analysts’ perception of machine learning (ML)- and data science (DS)-generated alert output.
The null and alternative hypotheses are stated as:
H1: There is no significant difference in the level of acceptance of alert outputs between those preferring VAO in all scenarios and those preferring TAO in some or all scenarios, as predicted by the TAM.
H2: There is a significant difference in the level of acceptance of alert outputs between those preferring VAO in all scenarios and those preferring TAO in some or all scenarios, as predicted by the TAM.
Omnibus tests are applicable to these hypotheses, where H1: R-squared is equal to 0 and H2: R-squared is greater than 0.
Table 2 highlights the relationship between the research questions and the hypotheses.
2.5. Data Analysis
The data collected for analysis from the results of a SurveyMonkey online questionnaire were processed with IBM SPSS software and R, a programming language for statistical computing, machine learning, and graphics. The analysis focused on data exploration of dependent and independent variables. The main dependent variable was the level of acceptance of the security alert output and was based on the four individual TAM components: PU, PEU, AU, and IU. Each component was derived from responses to groups of Likert-style statements (scored 1 through to 7, with 7 representing the most favorable response). PU and PEU had a total of six statements, and AU and IU had three statements. The level of acceptance of the alert output was calculated by adding all 18 scores together, with a maximum score of 126 and a minimum score of 18. The sub-scores for PU, PEU, AU, and IU represent secondary dependent variables. The within-subjects independent variable was scenario. It had three levels, Scenario 1, Scenario 2, and Scenario 3, with all participants being subject to all scenarios. The between-subjects independent variable was Maximum Visual. This had two levels: a preference for VAO in all three scenarios, and a preference for TAO in at least one of the scenarios.
Both parametric and non-parametric tests were performed. Mixed ANOVA tested whether the level of acceptance of alert outputs is influenced by the within-subjects variable Scenario and the between-subjects variable Maximum Visual. Mixed ANOVA was also repeated for the four sub-scales of PU, PEU, AS, and IU, with Bonferroni corrections for multiple comparisons. Additionally, a Mann–Whitney U test was performed, comparing the level of acceptance of alert outputs of the two levels of Maximum Visual, and a Friedman test compared the level of acceptance across the three scenarios.
2.6. Validity and Reliability
The study’s dependent variables are derived from the TAM. As such, the validity and reliability of TAM are paramount. Davis developed and validated scales for two variables, perceived usefulness (PU) and perceived ease of use (PEU), as basic determinants of user acceptance. Davis used definitions for PU and PEU to develop scale markers pretested for content validity, as well as tested for reliability and construct validity [
17].
Davis found that the PU scale attained a Cronbach’s alpha reliability of 0.97 for both systems tested, while PEU achieved a reliability of 0.86 for one system tested and 0.93 for the other. Upon pooling observations for the two systems, Cronbach’s alpha was found to be 0.97 for usefulness and 0.91 for ease of use [
17].
Davis tested for convergent and discriminant validity using multi-trait–multimethod (MTMM) analysis, where the MTMM matrix contained the intercorrelations of items (methods) applied to the two different test systems (traits). Davis indicated that convergent validity determines if items making up a scale behave as if measuring a common underlying construct. Convergent validity is demonstrated when items that measure the same trait correlate highly with one another [
17]. Davis’ study found that 90 mono-trait–hetero-method correlations for PU were all significant at the 0.05 level, while for PEU, 86 out of 90, or 95.56%, of the mono-trait–hetero-method correlations were significant. These data support the convergent validity of TAM’s two scales: PU and PEU [
17].
3. Results
3.1. Background
The specific business problem that oriented this study is: organizations risk data breach, loss of valuable human resources, reputation, and revenue due to excessive security alert volume and a lack of fidelity in security event data. To determine means of support for security analysts experiencing these security event-specific challenges, the study asked if there is a difference in the level of acceptance of security alert outputs between those preferring VAO in all scenarios, and those preferring TAO in some or all scenarios, as predicted by the TAM. The dependent variable was participants’ level of acceptance of security alert output: the within-subjects independent variable is Scenario, and the between-subjects independent variable is Maximum Visual (preference for VAO in all scenarios versus preference for TAO in some or all scenarios). SurveyMonkey was utilized to deliver an online survey to participants, from which the collected data were analyzed. The survey queried a population of cybersecurity analysts and managers in SOC, DFIR, DART, and TI roles, targeted for participation via social media. Twitter and LinkedIn were utilized. The LinkedIn campaign included the use of Linked Helper to create a list of potential participants whose profiles matched the desired role descriptions from connections in the researcher’s network of 1411 connections as of this writing. The final filtered list resulted in 234 potential participants to whom an invitation to participate was sent. A 7-point Likert scale survey queried participants regarding their perspectives on perceived ease of use and perceived usefulness of ML and DS-generated alert output across three scenarios with TAO and VAO results [
23]. Of 119 respondents, 24 disqualified themselves and 95 identified themselves as qualified, 81 of whom completed all 3 scenarios.
3.2. Description of the Sample
Data collected from cybersecurity analysts and managers in SOC, DFIR, DART, and TI roles resulted in 95 qualified respondents. A total of 95 qualified respondents is in keeping with estimates of an appropriate sample size. Where 2018 Bureau of Labor Statistics data indicate that there were 112,300 information security analysts, and this specific target population is a subpopulation of the larger 112,300 security analysts, if 5% of the larger 112,300 population is applied, a target population of 5615 is appropriate [
24]. With a 95% confidence level, and 10% confidence interval (margin of error), then the ideal sample size is 94 [
25]. Of the 95 respondents to this survey, 81 completed all 3 scenarios presented in the survey. The 14 incomplete survey results were discarded, resulting in an 85.20% completion rate. The 14 incomplete surveys were discarded due to missing data and to enable analysis of two complete and distinct groups, namely respondents who chose VAO across all three scenarios, and those who selected a mix of VAO and TAO or all TAO results across all three scenarios. The 81 respondents, as broken down into their 2 distinct groups, are defined under the Maximum Visual variable (Vis_max), where the participants who said yes to VAO in all three scenarios were labeled
Yes (N = 59), and the participants who selected a mix of VAO and TAO or all TAO results across all three scenarios were labeled
No (N = 22).
3.3. Hypothesis Testing
Given that the data collected for this study did not meet the standard for normality, both parametric and non-parametric tests were performed. Parametric statistical procedures depend on assumptions about the shape of the distribution (assume a normal distribution) in the population and the form or parameters (means and standard deviations) of the assumed distribution [
26]. On the other hand, nonparametric statistical procedures depend on few or no assumptions about the shape (normality) or parameters of the population distribution from which the sample was taken [
26]. Nonparametric tests include the Mann–Whitney U test and the Friedman test. Parametric tests can be conducted via a mixed analysis of variance (ANOVA) with a Bonferroni correction. The mixed ANOVA tests included an approach for treatment of the dependent variable: security analysts’ level of acceptance of the alert output. First, mixed ANOVA was performed across the TAM-based questionnaire categories, namely perceived usefulness (PU), perceived ease of use (PEU), attitude towards using (AU), and intent to use (IU), where the scores for all sub-scales were summed. Second, mixed ANOVA was performed on each sub-scale. For the individual sub-scales, statistical significance was set at α/4, or 0.0125.
3.4. Validating Assumptions
When assessing normality, the distributions were not normally distributed. Standardized residuals for each of the three scenarios do not appear normally distributed, as seen in the histograms in
Figure 2.
Given that the residuals are skewed, Friedman’s test was also conducted, as a non-parametric equivalent of a within-subjects one-way ANOVA. It only considers the impact of the within-subjects variable Scenario.
Finally, reliability was assumed where Cronbach’s alpha measures the internal consistency of questions related to the same issues across each of the three scenarios. If Cronbach’s alpha ranged from 0 to 1 and scores were expected to be between 0.7 and 0.9, the result for this study represents good consistency [
27]. Using a scale comprised of 18 TAM questions for each scenario, and 81 valid cases, with 14 excluded (
n = 95), the reliability statistic for each scenario as indicated by Cronbach’s alpha was 0.958 for Scenario 1, 0.971 for Scenario 2, and 0.986 for Scenario 3.
3.5. Descriptive Statistics
Survey respondents were categorized as follows:
For each of the three scenarios, a scenario variable:
- ○
0 = no response
- ○
1 = text response
- ○
2 = visual response
A scenario product variable (product of all scenario variables):
- ○
All visual responses: 2 ∗ 2 ∗ 2 = 8
- ○
2 visual responses, 1 text response: 2 ∗ 2 ∗ 1 = 4
- ○
1 visual response, 2 text responses: 2 ∗ 1 ∗ 1 = 2
- ○
All text responses: 1 ∗ 1 ∗ 1 = 1
The results using these variables are seen in
Table 3.
The dependent variable is represented by survey scenario question response totals as summed from Likert-scale responses ranging from 1 (strongly disagree) to 7 (strongly agree). These are represented for each scenario presented to participants as
S1_tot for Scenario 1,
S2_tot for Scenario 2, and
S3_tot for Scenario 3. For the mixed ANOVA, these represent the within-subjects factors seen in
Table 4.
The Maximum Visual variable (
Vis_max) defined the participants who said yes to VAO in all three scenarios, labeled
Yes (N = 59), and the participants who selected a mix of VAO and TAO or all TAO results across all three scenarios, labeled
No (N = 22). Maximum Visual is the study’s between-subjects independent variable. It was one of the main factors in the mixed ANOVA, as can be seen in
Table 5.
3.6. Mann–Whitney U Test
A Mann–Whitney U test of independent samples had participants’ level of acceptance of alert output as its dependent variable, which is the ranked, summed scores across all scenarios (
S_tot). The independent variable is Maximum Visual (
Vis_max). The test determines whether the group who prefer VAO across all scenarios have a significantly different acceptance score than those who prefer TAO in some or all scenarios. Score totals are noted in
Figure 3, while
Table 6 provides a statistical summary.
The Mann–Whitney U test indicates that there is a significant difference (U = 863.5, p = 0.023) in the level of acceptance of alert output between the respondents who selected visual output across all scenarios (n = 59) as compared to the respondents who provided mixed responses (n = 22). As such, the null hypothesis, that there is no statistically significant difference in the level of acceptance of alert output between those who preferred VAO in all scenarios and those preferring TAO in some or all scenarios, is rejected.
The effect size is calculated by dividing the Standardized Test Statistic, Z, by the square root of the number of pairs: . The effect size, according to Cohen’s classification of effect, is moderate, given 0.1 (small effect), 0.3 (moderate effect), and 0.5 and above (large effect).
3.7. Friedman Test
A related samples Friedman test was conducted to assess the measurements of the same dependent variable under different conditions for each participant, namely the three scenarios for this study defined by the variables
S1_tot,
S2_tot, and
S3_tot. Rank frequencies are shown in
Figure 4 and the statistical summary is represented in
Table 7.
The Friedman test carried out to compare the score ranks for the three scenarios found there to be no significant difference between scenarios: . The result indicates that scenario mean ranks did not differ significantly from scenario to scenario when not also factoring for responses based on output preference (Maximum Visual).
Effect size was not applicable as no measurable significance was found.
3.8. Mixed ANOVA—All Measures (PU, PEU, AU, IU Combined)
A two-way mixed ANOVA was conducted, with a Bonferroni correction for the within-subjects variable. The dependent variable was the level of acceptance of alert output, with all items of all TAM sub-scales summed.
While considered more conservative, most authorities suggest the Greenhouse–Geisser correction when the epsilon (ε) estimate is below 0.75. As noted in
Table 8, ε = 0.727, and thus the Greenhouse–Geisser correction was utilized.
As indicated in
Table 8, sphericity cannot be assumed as
p < 0.001. As such, the Greenhouse–Geisser correction was applied.
The within-subjects variable, equating to score totals for each of the three study scenarios, is represented by Scenarios (S1_tot, S2_tot, and S3_tot). The between-subjects variable was Maximum Visual (Vis_max), labeled as Yes (n = 59) and No (n = 22). Again, the Maximum Visual variable (Vis_max) differentiates between the participants who said yes to VAO in all three scenarios, labeled Yes (N = 59), and the participants who selected a mix of VAO and TAO, or all TAO results, across all three scenarios, labeled No (n = 22). Maximum Visual is the statistical analogy for the study’s between-subjects independent variable, specifically (a) ML/DS-generated TAO, and (b) ML/DS-generated VAO.
Participants were presented with three scenarios exhibiting security alert output for the results of applied models, where the output was both VAO and TAO. A mixed ANOVA using α = 0.05 with a Greenhouse–Geisser correction showed that scores varied significantly across Scenarios in tests of within-subject effects, and there was also a significant interaction with Maximum Visual:
Scenarios: (F (1.455, 114.915) = 19.925, p < 0.001, ηp2 = 0.201)
Scenarios∗Vis_max: (F (1.455, 114.915) = 5.634, p = 0.010, ηp2 = 0.067)
The impact of Maximum Visual (vis_max) on the level of acceptance of output was mediated by Scenarios. The difference of the level of acceptance was more significant for Scenario 3, as an example. Post hoc tests using the Bonferroni correction revealed that favorable scores declined insignificantly from Scenario 1 to Scenario 2 by an average of 1.596 points (p = 0.702) but declined significantly from Scenario 1 to Scenario 3 by 12.342 points (p < 0.001). Scenario 2 to Scenario 3 saw an additional significant decrease of 10.746 points (p < 0.001). The differences in scores were not particularly meaningful between or within Scenarios 1 and 2 (S1_tot and S2_tot) and Maximum Visual (Vis_max) = Yes or No. However, a significant difference was noted in Scenario 3 (S3_tot) compared to Scenarios 1 and 2, as well as Maximum Visual = Yes versus Maximum Visual = No. Most noteworthy is a 15% decrease in mean score for Maximum Visual = No in Scenario 3 as compared to Scenario 2, indicating a noteworthy decrease in PU, PEU, AU, and IU for participants selecting TAO.
Via estimated marginal means between-subjects, where Maximum Visual = Yes or Maximum Visual = No, inclusive of all TAM components with α = 0.05 and Bonferroni correction, pairwise comparisons yielded a 7.881 point mean difference in favor of VAO, significant at
p = 0.046. As such, there was a significant main effect of Maximum Visual scores (
F (1, 79) = 4.111,
p = 0.046, ηp2 = 0.049) on the level of acceptance of alert output, as indicated by the sum of participants’ scores for all TAM components (PU, PEU, AU, and IU). These results are represented visually in
Figure 5.
3.9. Mixed ANOVA—Perceived Usefulness (PU)
Two-way mixed ANOVA with Bonferroni correction, computed using α = 0.0125, was performed for PU in isolation. α = 0.0125 was appropriate to avoid family-wise errors by adjusting to be more conservative, where four tests at α = 0.05 implies the use of α = 0.0125. The measures related to PU represented one of four TAM-specific comparisons, and thus a conservative but accurate method to compensate for multiple tests was required.
Mixed ANOVA was again applied, where the within-subjects variables equating to score totals for each of the three study scenarios were represented by Perceived_Usefulness (PUS1_tot, PUS2_tot, and PUS3_tot), and between-subjects factors were again represented by Maximum Visual (Vis_max), labeled as Yes (n = 59) and No (n = 22).
Participants were presented with three scenarios exhibiting security alert output for the results of applied models, where the output was both VAO and TAO. A mixed ANOVA computed using α = 0.0125 with a Greenhouse–Geisser correction showed that scores varied significantly across scenarios specific to Perceived_Usefulness (PUS1_tot, PUS2_tot, and PUS3_tot) in tests of within-subject effects, and less significantly when differentiated for Maximum Visual:
Scenarios: (F (1.637, 129.311) = 16.999, p < 0.001, ηp2 = 0.177)
Scenarios∗Vis_max: (F (1.637, 129.311) = 4.017, p = 0.028, ηp2 = 0.048)
Post hoc tests using the Bonferroni correction revealed that favorable scores for PU declined insignificantly from Scenario 1 to Scenario 2 by an average of 0.076 points (p = 1.000), but then declined significantly from Scenario 1 to Scenario 3 by 3.999 points (p = < 0.001) and from Scenario 2 to Scenario 3 by an additional 3.924 points (p < 0.001). The differences in scores were not particularly meaningful between or within Scenarios 1 and 2 (PUS1_tot and PUS2_tot) and Maximum Visual (Vis_max) = Yes or No. A significant difference was, however, noted in Scenario 3 (PUS3_tot) compared to Scenarios 1 and 2, as well as Maximum Visual = Yes versus Maximum Visual = No. Again, a 15% decrease in mean score for Maximum Visual = No was noted in Scenario 3 as compared to Scenario 2, indicating a significant decrease in PU for participants selecting TAO. Interestingly, there was a 1% increase in PU for participants selecting TAO for Scenario 2 as compared to Scenario 1.
Via estimated marginal means between-subjects, where Maximum Visual = Yes or Maximum Visual = No, inclusive only of PU data with α = 0.0125 and Bonferroni correction, pairwise comparisons yielded a 3.642 point mean difference in favor of VAO, significant at
p = 0.007. As such, there was a significant main effect of Maximum Visual scores (
F (1, 79) = 7.643,
p = 0.007, ηp2 = 0.088) on the level of acceptance of alert output, as indicated by sum of participants’ scores for PU. These results are best represented visually, as noted in
Figure 6.
3.10. Mixed ANOVA—Perceived Ease of Use (PEU)
Two-way mixed ANOVA with Bonferroni correction, computed using α = 0.0125, was performed for PEU in isolation. α = 0.0125 was applicable as one quarter of α = 0.05 given that the TAM components related to PEU represent one of four tests of related measures.
Mixed ANOVA was again applied, where the within-subjects variables equating to score totals for each of the three study scenarios are represented by Perceived_EaseOfUse (PEUS1_tot, PEUS2_tot, and PEUS3_tot), and between-subjects factors were again represented by Maximum Visual (Vis_max), labeled as Yes (n = 59) and No (n = 22).
Participants were presented with three scenarios exhibiting security alert output for the results of applied models, where the output was both VAO and TAO. A mixed ANOVA computed using α = 0.0125 with a Greenhouse–Geisser correction showed that scores varied significantly across scenarios specific to perceived ease of use (PEUS1_tot, PEUS2_tot, and PEUS3_tot) in tests of within-subject effects, and insignificantly when differentiated for Maximum Visual:
Scenarios: (F (1.658, 130.988) = 8.752, p = 0.001, ηp2 = 0.100)
Scenarios∗Vis_max: (F (1.658, 130.988) = 3.548, p = 0.040, ηp2 = 0.043)
Post hoc tests using the Bonferroni correction revealed that favorable scores for PEU decreased insignificantly from Scenario 1 to Scenario 2 by an average of 1.020 points (p = 0.294) but declined significantly from Scenario 1 to Scenario 3 by an average of 3.357 points (p = 0.002). An insignificant decrease was noted from Scenario 2 to Scenario 3 by an additional 2.337 points (p = 0.033). The differences in scores were meaningful between Scenarios 1 and 2 (PEUS1_tot and PEUS2_tot) and Maximum Visual (Vis_max) = No and again between Scenarios 2 and 3 (PEUS2_tot and PEUS3_tot) and Maximum Visual (Vis_max) = No. A significant difference was, however, noted in Scenario 3 (PEUS3_tot) compared to Scenarios 1 and 2, as well as Maximum Visual = Yes versus Maximum Visual = No. Again, a 10% decrease in mean score for Maximum Visual = No was noted in Scenario 3 as compared to Scenario 2, indicating a significant decrease in PEU for participants selecting TAO. Interestingly, there was a 1% increase in PEU for participants selecting VAO for Scenario 2 as compared to Scenario 1. Additionally, for the first time in this analysis, within Scenario 1, TAO outscored VAO within a specific TAM component (PEU).
Via estimated marginal means between-subjects, where Maximum Visual = Yes or Maximum Visual = No, inclusive only of PEU data with α = 0.0125 and Bonferroni correction, pairwise comparisons yielded only a 1.229 point mean difference in favor of VAO, insignificant at
p = 0.362. As such, there was not a significant main effect of Maximum Visual scores (
F (1, 79) = 0.842,
p = 0.362, ηp2 = 0.011) on the level of acceptance of alert output, as indicated by the sum of participants’ scores for PEU. These results are best represented visually, as noted in
Figure 7.
3.11. Mixed ANOVA—Attitude toward Using (AU)
Two-way mixed ANOVA with Bonferroni correction, computed using α = 0.0125, was performed for AU in isolation. α = 0.0125 was applicable as one quarter of α = 0.05 given that the TAM measures related to AU represented one of four tests of related measures.
Mixed ANOVA was again applied, where the within-subjects variables equating to score totals for each of the three study scenarios were represented by Attitude2Use (AUS1_tot, AUS2_tot, and AUS3_tot), and between-subjects factors were again represented by Maximum Visual (Vis_max), labeled as Yes (n = 59) and No (n = 22).
Participants were presented with three scenarios exhibiting security alert output for the results of applied models, where the output was both VAO and TAO. A mixed ANOVA computed using α = 0.0125 with a Greenhouse–Geisser correction showed that scores varied significantly across scenarios specific to attitude toward using (AUS1_tot, AUS2_tot, and AUS3_tot) in tests of within-subject effects, and significantly again when differentiated for Maximum Visual:
Scenarios: (F (1.669, 131.861) = 20.605, p < 0.001, ηp2 = 0.207)
Scenarios∗Vis_max: (F (1.669, 130.988) = 8.159, p = 0.001, ηp2 = 0.094)
Post hoc tests using the Bonferroni correction revealed that favorable scores for AU decreased insignificantly from Scenario 1 to Scenario 2 by an average of 0.196 points (p = 1.000) but declined significantly from Scenario 1 to Scenario 3 by an average of 2.293 points (p < 0.001). A significant decrease was noted from Scenario 2 to Scenario 3 by an additional 2.097 points (p < 0.001). The differences in scores were not meaningful between Scenarios 1 and 2 (AUS1_tot and AUS2_tot) and Maximum Visual (Vis_max) = No, but were quite impactful between Scenarios 2 and 3 (AUS2_tot and AUS3_tot) and Maximum Visual (Vis_max) = No. As is consistent throughout this analysis, there was a significant difference noted in Scenario 3 (AUS3_tot) compared to Scenarios 1 and 2, as well as Maximum Visual = Yes versus Maximum Visual = No. A stark 19% decrease in mean score for Maximum Visual = No was noted in Scenario 3 as compared to Scenario 2, indicating a significant decrease in AU for participants selecting TAO. No change in AU was noted for participants selecting VAO for Scenario 2 as compared to Scenario 1. Also noteworthy was the lowest mean scores of all results recorded, specifically for TAO in Scenario 3, indicating a particularly poor attitude towards using TAO.
Via estimated marginal means between-subjects, where Maximum Visual = Yes or Maximum Visual = No, inclusive only of AU data with α = 0.0125 and Bonferroni correction, pairwise comparisons yielded a small 1.587 point mean difference in favor of VAO, insignificant at
p = 0.036. As such, there was not a significant main effect of Maximum Visual scores (
F (1, 79) = 4.566,
p = 0.036, ηp2 = 0.055) on the level of acceptance of alert output, as indicated by the sum of participants’ scores for AU. These results are best represented visually, as noted in
Figure 8.
3.12. Mixed ANOVA—Intention to Use (IU)
Two-way mixed ANOVA (mixed ANOVA) with Bonferroni correction, computed using α = 0.0125, was performed for IU in isolation. α = 0.0125 was applicable as one quarter of α = 0.05 given that the TAM measures related to IU represent one of four tests of related measures.
Mixed ANOVA was again applied, where the within-subjects variables equating to score totals for each of the three study scenarios were represented by Intention2Use (IUS1_tot, IUS2_tot, and IUS3_tot), and between-subjects factors were again represented by Maximum Visual (Vis_max), labeled as Yes (n = 59) and No (n = 22).
Participants were presented with three scenarios exhibiting security alert output for the results of applied models, where the output was both VAO and TAO. A mixed ANOVA computed using α = 0.0125 with a Greenhouse–Geisser correction showed that scores varied significantly across scenarios specific to Intention to Use (IUS1_tot, IUS2_tot, and IUS3_tot) in tests of within-subject effects, and significantly again when differentiated for Maximum Visual:
Scenarios: (F (1.447, 114.327) = 24.493, p < 0.001, ηp2 = 0.237)
Scenarios∗Vis_max: (F (1.447, 114.327) = 5.728, p = 0.009, ηp2 = 0.068)
Post hoc tests using the Bonferroni correction revealed that favorable scores for IU decreased insignificantly from Scenario 1 to Scenario 2 by an average of 0.304 points (p = 0.758) but declined significantly from Scenario 1 to Scenario 3 by an average of 2.692 points (p < 0.001). A significant decrease was noted from Scenario 2 to Scenario 3 by an additional 2.388 points (p < 0.001). The differences in scores were not meaningful between Scenarios 1 and 2 (IUS1_tot and IUS2_tot) and Maximum Visual (Vis_max) = No, but were quite impactful between Scenarios 2 and 3 (IUS2_tot and IUS3_tot) and Maximum Visual (Vis_max) = No. As is consistent throughout this analysis, there was a significant difference noted in Scenario 3 (IUS3_tot) compared to Scenarios 1 and 2, as well as Maximum Visual = Yes versus Maximum Visual = No. Again, a substantial 19% decrease in mean score for Maximum Visual = No was noted in Scenario 3 as compared to Scenario 2, indicating a significant decrease in IU for participants selecting TAO. As is the case for AU, no change in IU was noted for participants selecting VAO for Scenario 2 as compared to Scenario 1. Also noteworthy was the largest percentage of decrease in mean scores of all results recorded, specifically for Scenario 3, indicating that intention to use was low for any aspect of Scenario 3, TAO, or VAO.
Via estimated marginal means between-subjects, where Maximum Visual = Yes or Maximum Visual = No, inclusive only of IU data with α = 0.0125 and Bonferroni correction, pairwise comparisons yielded a small 1.423 point mean difference in favor of VAO, insignificant at
p = 0.040. As such, there was not a significant main effect of Maximum Visual scores (
F (1, 79) = 4.378,
p = 0.040, ηp2 = 0.053) on the level of acceptance of alert output, as indicated by the sum of participants’ scores for IU. These results are represented visually in
Figure 9.
3.13. Summary of Hypothesis Testing
The null hypothesis states that there is no statistically significant difference in the level of acceptance of alert output between those choosing VAO and those having some or complete preference for TAO, with VAO and TAO being generated via data science/machine learning methods as predicted by the TAM. The null hypothesis was rejected via non-parametric and parametric methods.
Table 9 represents non-parametric outcomes per an independent samples Mann–Whitney U test.
The Mann–Whitney U test indicates that there was a significant difference (U = 863.5, p = 0.023) between the respondents who selected visual output across all scenarios (n = 59) as compared to the respondents who provided mixed responses (n = 22).
Table 10 represents the outcomes for parametric tests of within-subjects effects.
The mixed ANOVA using α = 0.05 with a Greenhouse–Geisser correction was significant when differentiated for Maximum Visual: F (1.455, 114.915) = 5.634, p = 0.010.
Table 11 represents the outcomes for parametric tests of between-subjects effects.
The mixed ANOVA using α = 0.05 with Bonferroni adjustment was significant: (F (1, 79) = 4.111, p = 0.046.
In summary, the null hypothesis was rejected, as follows:
As such, for RQ1: is there a difference in the level of acceptance of security alert output between those with a preference for VAO and those with a preference for TAO, with VAO and TAO generated via data science/machine learning methods, as predicted by the TAM? the answer is yes.
Additional sub-questions were examined in this analysis. Specifically, the sub-questions are stated as:
SQ1: Does the adoption of VAO have a significant impact on the four individual TAM components: PU, PEU, AU, and IU?
SQ2: Does the adoption of TAO have a significant impact on the four individual TAM components: PU, PEU, AU, and IU?
Outcomes indicate mixed results in answering the sub-questions.
Table 12 states the results of within-subjects effects per individual TAM components.
The within-subjects findings indicated that PU and PEU were not significantly influenced by the adoption of VAO or TAO, while AU and IU were significantly influenced by the adoption of VAO.
Table 13 states the results of between-subjects effects per individual TAM components.
The between-subjects findings indicate that PU was the only TAM component to be significantly influenced by the adoption of VAO.
As a result, the answer to SQ1 is yes, in part:
The TAM components PU and PEU were not significantly influenced by the adoption of VAO within-subjects, while AU and IU were significantly influenced by the adoption of VAO within-subjects.
The TAM component PU was significantly influenced by the adoption of VAO between-subjects.
The answer to SQ2 is universally no. No individual TAM component was significantly influenced by TAO adoption, and TAO adoption trailed VAO in near totality.
3.14. Summary
The results indicate that there was a difference in acceptance as predicted by TAM. The dependent variable, security analysts’ level of acceptance of security alert output, and the two independent variables, Scenario and ML/DS-generated alert output (TAO and VAO), were assessed with non-parametric and parametric methods. Both the Mann–Whitney U test and the mixed ANOVA determined that there was a difference between the acceptance of VAO and TAO in favor of VAO. The mixed ANOVA also demonstrated that two of the TAM factors, AU and IU, were influenced by the adoption of VAO and TAO.