A Scalable and Automated Framework for Tracking the likely Adoption of Emerging Technologies

While new technologies are expected to revolutionise and become game-changers in improving the efficiencies and practises of our daily lives, it is also critical to investigate and understand the barriers and opportunities faced by their adopters. Such findings can serve as an additional feature in the decision-making process when analysing the risks, costs, and benefits of adopting an emerging technology in a particular setting. Although several studies have attempted to perform such investigations, these approaches adopt a qualitative data collection methodology which is limited in terms of the size of the targeted participant group and is associated with a significant manual overhead when transcribing and inferring results. This paper presents a scalable and automated framework for tracking likely adoption and/or rejection of new technologies from a large landscape of adopters. In particular, a large corpus of social media texts containing references to emerging technologies was compiled. Text mining techniques were applied to extract sentiments expressed towards technology aspects. In the context of the problem definition herein, we hypothesise that the expression of positive sentiment infers an increase in the likelihood of impacting a technology user's acceptance to adopt, integrate, and/or use the technology, and negative sentiment infers an increase in the likelihood of impacting the rejection of emerging technologies by adopters. To quantitatively test our hypothesis, a ground truth analysis was performed to validate that the sentiment captured by the text mining approach is comparable to the results given by human annotators when asked to label whether such texts positively or negatively impact their outlook towards adopting an emerging technology.


Introduction
Technological change is revolutionising the way we lead our daily lives.From the way we work to our own homes, the integration of new technologies, such as 5G, Internet of Things (IoT) devices, and Artificial Intelligence (AI), enhances our productivity and efficiencies [1].However, while new technologies evolve and are expected to revolutionise practices, the industry is facing various barriers in terms of their adoption and implementation.The technology adoption process is affected by aspects such as the availability and quality of hardware/software, organisational role models, available financial resources and funding, organisational support, staff development, attitudes, technical support, and time to learn new technology [2].
Understanding the barriers and advantages faced by technology adopters can be a key feature which impacts essential business decision-making components, such as recognising shifts in user behaviours, new demands, and emerging uncertainties.As such, this information can significantly aid in the development of an adequate response, such as developing a technology for addressing the new requirements in an efficient way [3] or developing new organisational strategies.
Several studies have focused on investigating the current positive experiences and barriers of adopting emerging technologies in various settings, such as in educational institutions (e.g.[4]), healthcare (e.g.[5]), smart-medium enterprises (e.g.[6]), and by older adults (e.g.[7]).Such studies often rely on qualitative data collection methods, such as focus groups and interviews.However, this approach is associated with several limitations, including collecting data from a small and targeted sample size which limits the variance and bias in responses, the significant overhead associated with recruiting participants, organising interviews, and manually transcribing and inferring results.To gain an understanding of the landscape surrounding emerging technologies from a larger sample of adopters and across different settings, a wider analysis is needed.
To extract such data at scale, automated techniques are needed to collect and programmatically extract relevant information from publicly available sources.Such sources may include online social media platforms (e.g.Twitter) in which users can publish their content, presenting a wealth of information surrounding their opinions and experiences [8].To automatically extract and process large volumes of texts originating from diverse sources, text mining techniques may be used [9].In particular, sentiment analysis, often referred to as opinion mining, aims to automatically extract and classify sentiments and/or emotions expressed in text [10,11].In the context of the aforementioned problem definition, we hypothesise that the expression of positive sentiment infers opportunities and positive experiences surrounding emerging technologies, which, therefore, increases the likelihood of impacting a technology user's acceptance to adopt, integrate, and/or use the technology.Conversely, we hypothesise that the expression of negative sentiment infers the obstacles and barriers caused and faced by such technologies, which, therefore, increases the likelihood of impacting the rejection of emerging technologies by adopters.
To the best of our knowledge, this paper presents the first scalable and automated framework towards tracking the likely adoption of emerging technologies.Such framework is powered by the automatic collection and analysis of social media discourse containing references to emerging technologies from a large landscape of adopters.The main contributions of the work presented herein are as follows: • The extraction of aspects relating to a range of emerging technologies from social media discourse over a period of time.• The classification of sentiment expressed towards such technologies, indicating the positive and negative outlook of users towards adopting them.• A ground truth analysis to validate the hypothesis that the sentiment captured by the text mining approach is comparable to the results given by human annotators when asked to label whether such texts positively or negatively impact their outlook towards adopting an emerging technology.• A scalable and automated framework for tracking the likely adoption and/or rejection of new technologies.
This information serves as an important decision-making component when, for example, recognising shifts in user behaviours, new demands, and emerging uncertainties.• Resources that can further support research into narratives surrounding emerging technologies, such as a large corpus of social media discourse covering five year's worth of data.
The study was designed as follows: 1) compile a corpus of texts containing references to emerging technologies over a time period, 2) pre-process text responses using traditional Natural Language Processing (NLP) techniques, 3) divide texts based on their publication date, 4) for each dataset in 3), automatically extract technology aspects from text segments, 5) for each dataset in 3), apply a sentiment analysis approach to automatically extract the sentiment expressed towards the identified aspects, and 6) visualise and analyse the results.
The remainder of this paper is structured as follows: Section 2 presents the related work, Section 3 discusses the collection of texts used to support the experiments herein and the techniques used to prepare the data for such experiments, Section 4 discusses aspect-based sentiment analysis and how it was applied to the datasets, Section 5 presents and discusses the results, Section 6 quantitatively evaluates our hypothesis by comparing the automated text mining method against the impact such texts have on technology adopters, Section 7 concludes the paper, and finally, Section 8 discusses future work.

Related Work
Several studies have explored the factors influencing technology adoption in several different settings such as healthcare, education, smart-medium enterprises, and by older adults.Such studies have often adopted theoretical frameworks, such as Technology Acceptance Model (TAM) [12] and the Unified Theory of Acceptance and Use of Technology (UTAUT) [13], to understand and predict the acceptance and adoption of new technologies based on factors such as their perceived usefulness and ease of use.For example, by using such frameworks, Alalwan et al. [14] and [15] et al. explore factors that influence mobile banking uptake and customer uptake and adoption of mobile payment technologies respectively.Likewise, Bhattacherjee & Park [16] investigate factors that influence end-user migration to cloud computing services.
To collect customer and user data, the aforementioned studies use a survey-based methodology.
To derive the barriers faced within a healthcare setting, Sun & Medaglia [17] investigate the perceived challenges of AI adoption in the public healthcare sector in China.Their study relies on data collected from semi-structured interviews asking a sample group of seven key stakeholder groups open-ended questions focusing on the challenges of AI adoption in healthcare.Similarly, Al-Hadban et al. [18] explore the opinions of healthcare professionals using semi-structured interviews to highlight the important factors and issues that influence the adoption of new technologies in the public healthcare sector in Iraq.Their study relies on data collected from a sample group of eight interviewees.They describe their data collection approach which includes producing transcriptions of the audio recordings, interpreting and understanding the general sense of the text to form themes, and validating the accuracy of their findings as a time-consuming process.Poon et al. [19] assess the level of healthcare information technology adoption in the United States by also implementing semi-structured interviews with 52 participants from eight key stakeholder groups.They describe that one of the key limitations of their study is the responder biases caused by selecting participants based on their access to contacts.
To derive barriers and positive experiences of adopting emerging technologies in an educational setting, Jin et al. [20] also implement their data collection methods by conducting semi-structured interviews with nine instructors and nine students to understand their perceptions of educational Virtual Reality (VR) technologies.Dequanter et al. [21] examine the factors underlying technology use in older adults with mild cognitive impairments.In their study, over the course of two years, they conducted semi-structured interviews with 16 adults aged 60 and over from a single area in Belgium.Similar to Poon et al. [19], both Jin et al. [20] and Dequanter et al. [21] describe that one of the key limitations of their study is the responder biases as they believe they recruited participants that were more interested in using VR or novel technologies and from one specific geographical area which limits the transferability of the results.
The aforementioned works focus on applying qualitative approaches towards understanding factors which surround emerging technology adopters in several different settings.However, it is evident that such approaches are faced with significant limitations such as recruiting few participants to partake in their studies and the bias in their responses.As a response to such limitations, recent studies have turned to text mining approaches to automatically analyse, and ultimately help understand the factors influencing consumer adoption or rejection of emerging technologies from a large landscape of adopters.For example, Kwarteng et al. [22] investigate applying sentiment analysis to Twitter data to provide insights into consumer perceptions, emotions, and attitudes towards autonomous vehicles.Efuwape et al. [23] investigate the acceptance and adoption of digital collaborative tools for academic planning using sentiment analysis of responses gathered in a poll.Hizam et al. [24] employ sentiment analysis to examine the correlations between numerous factors of technology adoption behaviour, such as perceived ease of use, perceived utility, and social impact.The research aims to understand the underlying variables driving Web 3.0 adoption and offers insight on how these factors influence users' decisions to accept or reject these emerging technologies by analysing user-generated content on social media sites.Mardjo & Choksuchat [25] and Caviggioli et al. [26] investigate using sentiment analysis to examine the public's perception of adopting Bitcoin.The goal of such studies is to forecast the sentiment of Bitcoin-related tweets, which could influence the cryptocurrency's market behaviour, as well as providing insights into how the public reacts to the adoption of Bitcoin, and how it affects the perception of the adopting companies.Ikram et al. [27] investigate how potential adopters perceive specific features of open-source software by examining the sentiment expressed on Twitter.
While the aforementioned studies involve investigating sentiment analysis as an approach towards understanding the factors influencing consumer adoption or rejection of emerging technologies, such studies have primarily focused on specific technologies.Additionally, they lack a ground truth analysis to corroborate the sentiment gathered by the text mining approach, raising worries about the robustness of their findings.As a result, there is an opportunity to expand on existing research by creating a framework that allows for the examination of a greater range of technologies, resulting in a more comprehensive knowledge of the factors influencing their adoption.Furthermore, this framework can be customised to focus on specific sectors or technologies, supporting decision-making processes by identifying shifts in user behaviour, new expectations, and growing uncertainties.This study will not only add to the existing body of knowledge by broadening the scope of analysis and allowing for greater customisation, but it will also provide valuable insights that can inform strategic decisions across industries as they navigate the challenges and opportunities associated with technology adoption.

Data Collection and Preparation
To explore online narratives surrounding emerging technologies, textual data was collected from Twitter, a social networking service that enables users to send and read tweets -text messages consisting of up to 280 characters.Snscrape1 , a Python scraper for social networking services, was used to scrape English tweets.To facilitate the concept of the framework presented herein, five year's worth of tweets published between 01/01/2016 and 31/12/2021 was collected as it aligns with the increase in the adoption of one of the most popular emerging technologies, the Internet of Things (IoT) [28].
The IoT refers to the collection of smart devices which have ubiquitous connectivity, allowing them to communicate and exchange information with other technologies [29].As more devices connect to the internet and to one another, the IoT is an emerging technology which is considered as being amongst the biggest disruptors, particularly for companies across industries, due to their ability to innovate and develop new products and services, increase productivity with higher levels of performance, improve inventory management, and allow greater access to consumer data to observe patterns and behaviours for continued product and service enhancements [30].In this case, in this paper, tweets published between the aforementioned dates were collected based on the presences of the hashtags "IoT" or "Internet of Things".A total of 4,520,934 tweets containing the aforementioned keywords were collected and divided into datasets based on the month and year they were published.The dataset with the most tweets (92,290) was reported in November 2017, with December 2021 reporting the fewest tweets (29,793).No retweets or quote retweets were collected; only self-authored tweets were to avoid duplicated data.
The dataset is available on GitHub2 and is released in compliance with Twitter's Terms and Conditions, under which we are unable to publicly release the text of the collected tweets.We are, therefore, releasing the tweet IDs, which are unique identifiers tied to specific tweets.The tweet IDs can be used by researchers to query Twitter's API and obtain the complete tweet object, including tweet content (text, URLs, hashtags, etc.) and authors' metadata.
The data preparation and analysis in this study was conducted using Python (version 3.7.2).For text pre-processing, the following standard NLP techniques were applied: • Converting text to lowercase.
• To remove bias from the analysis, the keywords (i.e."IoT" and "Internet of Things") used to scrape tweets were also removed.

Aspect-Based Sentiment Analysis
Aspect-based sentiment analysis is a text mining technique which aims to identify aspects (e.g.foods, sports, countries) and the sentiment (the subjective part of an opinion) and/or emotion (the projections or display of a feeling) expressed towards them.This technique is often achieved by performing: • Aspect extraction -aims to automatically identify and extract specific entities and/or properties of entities in text [31].
• Sentiment analysis -often referred to as opinion mining, sentiment analysis aims to automatically extract and classify sentiments and/or emotions expressed in text [10,11].The following Sections further present how aspects relating to emerging technologies and the sentiment expressed towards them were extracted from text in more detail, as well as the results following the application of such techniques on the dataset presented in Section 3.

Aspect Extraction
There are various methods by which aspects can be extracted from text.For example, aspect extraction may be achieved using topic modelling, a text mining technique used to identify and extract salient concepts or themes referred to as "topics" distributed across a collection of texts [32].The output from applying topic modelling is commonly a set of the top most co-occurring terms appearing in each topic [33,34].However, some of the issues with applying topic modelling methods (e.g.spaCy [35], Gensim [36]) to achieve aspect extraction are that the pre-trained models provided by these libraries are not specific to emerging technologies and may not be able to recognise or accurately identify new or specialised terms related to this field.In addition, there is often manual overhead associated with interpreting aspects extracted by such methods.For example, "car, power, light, drive, engine, turn" may infer topics surrounding Vehicles, and "game, team, play, win, run, score" may infer Sports.Another similar method for extracting aspects is named entity recognition, a technique for extracting named entities, such as names, geographic locations, ages, addresses, phone numbers, etc. from the text.However, both topic modelling and named entity recognition methods may over-generalise the aspects extracted from texts, in turn, losing finer-grained entities.In addition, challenges may occur when topic modelling outputs present irrelevant terms, such as "car, power, light, cake, baking, chocolate", where the overall aspect cannot be defined.
In this case, for each pre-processed dataset described in Section 3, a simple direct string matching approach was applied to automatically extract aspects that could be mapped against the mapping reference of the Cybersecurity Body of Knowledge (CyBOK) [37], a resource which provides an index of cybersecurity referenced terms, including emerging technology terms.Of the 13,037 terms available in CyBOK v1.3.0,3,911 were extracted from the corpus, with one tweet containing a maximum of 20 terms, 514,458 tweets containing a minimum of 1 term, and 3,472,358 tweets containing no terms.Table 1 reports examples of the CyBOK aspects extracted from tweets.
Having removed the keywords used to scrape the tweets, Figure 1 reports the distribution of extracted terms from CyBOK across the dataset.

Sentiment Analysis
Sentiment analysis, often referred to as opinion mining, aims to automatically extract and classify sentiments and/or emotions expressed in text [38,11].Most research activities focus on sentiment classification, which classifies a text segment (e.g.phrase, sentence or paragraph) in terms of its polarity: positive, negative or neutral.Various techniques and methodologies have been developed to address the automatic identification and extraction of sentiment expressed within free text.The two main approaches are the rule-based approach, which relies on predefined lexicons While there exists a variety of sentiment analysis methods, in the work herein, Valence Aware Dictionary and Sentiment Reasoner (VADER) [39], a lexicon-based sentiment analysis tool was employed.VADER not only aligns with other relevant studies in the field (e.g.[22,23,24,25]) and therefore ensures consistency and comparability with existing research, but it is also specifically tuned to classify sentiment expressed in social media language, such as the dataset collated in Section 3. VADER takes into account various features of social media language, such as the use of exclamation marks, capitalisation, degree modifiers, conjunctions, emojis, slang words, and acronyms, which can all impact the sentiment intensity and polarity of a tweet.For example, the use of an exclamation mark increases the magnitude of the sentiment intensity without modifying the semantic orientation, while capitalising a sentiment-relevant word in the presence of non-capitalised words increases the magnitude of the sentiment intensity.Given the complexity of social media language and the various features that can impact sentiment analysis, using VADER to extract the sentiment expressed in the dataset collated herein allows for more accurate and nuanced sentiment analysis of the tweets.
VADER provides a percentage score, which represents the proportion of the text which falls in the positive, negative, or neutral categories.To represent a single uni-dimensional measure of sentiment, VADER also provides a compound score which is computed by summing the valence scores of each word in the lexicon and then normalising the scores to be between -1 (most extreme negative) and 1 (most extreme positive).In the work herein, text segments with a compound score <= -0.05 were considered as expressing a negative sentiment, those with a score > -0.05 and < 0.05 were considered as expressing neutral sentiment, and those with a score >= 0.05 were considered to express positive sentiment.For example, the tweet 'No fear that a hacker can get access to your camera or thermostat or other electronic devices.Your privacy is 100% protected because the technology is inside your electronics and not located on any server across the world.' achieved a compound score of 0.6734 and was therefore assigned a positive sentiment.For each extracted aspect, the final sentiment class was assigned by taking the polarity with the highest average compound score.For example, for the aspect 4G network, in December 2018, the average compound scores were as follows: positive = 0.1027, negative = 0.58, and neutral = 0.In this case, the overall sentiment assigned for 4G network during this time was negative.

Results and Discussion
Figure 2 reports chronological aspect-based analysis results across an excerpt of emerging technologies across the whole dataset.The figure depicts chronological monthly data, presenting changes in sentiment over the timeline.However, it is possible to refine the data to show daily results to gain more granular information.Despite this, the monthly data is still useful for monitoring outputs from a broader perspective.While the figure may not capture every detail, it provides an overview of trends and changes over time that can be used to inform decision-making and identify potential areas for improvement.
By observing sentiment expressed towards emerging technologies alongside the stages of technology adoption presented by Rogers [40] and the various adopter categories (i.e.innovators, early adopters, early majority, late majority, and laggards) presented by Moore [41], it is possible to gain a better understanding of how shifts in sentiment may influence the adoption or rejection of emerging technologies.During the early stages of 5G network adoption in April 2020, for instance, negative sentiment was expressed in the form of conspiracy theories and misinformation (e.g., 'Rumours of 5G as the true cause behind COVID-19, communication towers burned...').Such data may have a greater effect on laggards, who are typically more risk-averse and resistant to new technologies [40].In contrast, early consumers, who are typically more receptive to innovation and risk [41], may be more interested in the potential advantages and opportunities of the 5G network.For example, in November 2021, several tweets (e.g.'5G network compatibility will make IoT devices better suited for the future as the industry continues to see how the speed 5G provides can make IoT devices preform better' and 'Microsoft and AT&T are accelerating the enterprise customer's journey to the edge with 5G') expressed an overall positive sentiment.As the 5G network matures and progresses through the adoption stages, users in the early and late majority stages may become more familiar with the benefits of the technology and their attitudes may change.This change may be indicative of a broader acceptance of the technology, resulting in its increased adoption.
Similarly, when analysing cybersecurity issues such as cyber attacks, various adopter categories may be influenced differently by the sentiment conveyed.In the case of malware, the negative sentiment expressed in July 2017 regarding the WORM-RETADUP attack (e.g.'Information-stealing malware discovered targeting Israeli hospitals'), innovators and early adopters may view this as a learning opportunity and work to develop more robust security measures.In contrast, the late majority and laggards may be discouraged from adopting these technologies due to cybersecurity concerns.Likewise, the negative sentiment expressed in June 2016 related to the Distributed Denial-of-Service (DDoS) attack involving compromised CCTV cameras could impact adoption patterns across adopter categories in a similar manner (e.g.'25,000 CCTV cameras hacked to launch #DDoS attack' and 'What a way to cause a distraction -25,000 CCTV cameras hacked to launch DDoS attack').
By refining the data, the framework can be customised to concentrate on specific industries, which enhances its flexibility and makes it a dynamic solution that can adapt to the unique needs of different sectors.For example, Figures 3 and 4 report chronological aspect-based analysis results across an excerpt of emerging technologies from discourse relating to healthcare and education respectively.In the health sector, in August 2017an overall negative sentiment was expressed towards Siemens' medical molecular imaging systems (e.g.'This type of vulnerability in healthcare is not unique to Siemens') as an alert warning was issued when publicly available exploits were identified that could allow an attacker to remotely execute damaging code or compromise the safety of their systems [42].Innovators and early adopters might view the security vulnerability as an opportunity to improve upon existing systems and develop more secure solutions.In contrast, the late majority and laggards may perceive the security vulnerability as a reason to delay or reject the adoption of such technology in healthcare.An interesting observation is the variation in technological aspects in each sector, as well as the expression of sentiment towards them.The difference in sentiment between industries may be useful in highlighting the distinct adoption patterns prevalent in each sector.By analysing these patterns, valuable insights into the factors that influence the adoption of new technologies can be obtained and used to tailor strategies accordingly.Understanding the nuances of sector-specific adoption patterns enables stakeholders to make better-informed decisions, thereby facilitating the successful integration of emerging technologies across multiple domains.

Evaluation
To quantitatively test our hypothesis, a ground truth analysis was performed to validate that the sentiment captured by the text mining approach are comparable to the results given by independent human annotators when asked to label whether such texts positively or negatively impact their outlook towards adopting an emerging technology.We measured impact using three metrics: • Positive -The text has a positive impact on the reader.Given this information, they are now more likely to accept, integrate, and/or use the technology in their business or personal life.
• Negative -The text has a negative impact on the reader.Given this information, they now feel against integrating and using the technology in their business or personal life.
• Neutral -The text has no impact on the reader and they feel indifferent about the technology.
To facilitate the annotation task, a bespoke web-based annotation platform accessible via a web browser was implemented.This eliminated any installation overhead and widened the reach of annotators.Annotators were presented with instructions explaining the task's requirements and then with the platform's interface consisting of two panes.The The crowdsourcing of labelling natural language often uses a limited number of annotators with the expectation that they are perceived to be experts [43].However, annotation is a highly subjective task that varies with age, gender, experience, cultural location, and individual psychological differences [44].For example, Snow et al. [45] investigate collecting annotations from a broad base of non-expert annotators over the Web.They show high agreement between the annotations provided by non-experts found on social media and those provided by experts.In this case, in this study, a crowdsourcing approach was adopted to annotate a randomly sampled set of 150 tweets (50 samples of positive, negative, and neutral tweets) by developing and disseminating a annotation platform on Twitter, enabling users to participate in the annotation process and contribute to the assessment of sentiment in the dataset.A total of 750 annotations were collected with five annotations per sample.A total of 20 independent annotators participated in the study.
To quantitatively measure the reliability of the collected annotations, we measured inter-annotator agreement using Krippendorff's alpha coefficient [46].As a generalisation of known reliability indices, it was used as it applies to: (1) any number of annotators, not just two, (2) any number of categories, and (3) corrects for chance expected agreement.Krippendorff's alpha coefficient of 1 indicates perfect agreement, 0 indicates no agreement beyond chance, and -1 indicates disagreement.The values for Krippendorff's alpha coefficient were obtained using Python's computation of Krippendorff's alpha measure [47].
Krippendorff suggests α = 0.667 as the lowest acceptable value when considering the reliability of a dataset [48].The inter-annotator agreement of the annotated dataset in this study was calculated as α = 0.769, with a total of 89 samples out of 150 (59.3%) achieving full agreement.The relatively high agreement (α = 0.769) illustrates the relative reliability of the annotations which delineate the impact of the presented texts on technology users.
To evaluate our proposed text mining approach against a human annotator perspective, annotated tweets were used to create a gold standard.For each tweet in the sample, an annotation agreed by the relative majority of at least 50% was assumed to be the ground truth.For example, the tweet 'Cyber attacks on the rise how secure is your router network' was annotated with negative four times and once with neutral, thus negative was accepted as the ground truth.When no majority annotation could be identified, a new independent annotator resolved the disagreement.For example, the tweet 'We may have soon pills or grain size sensors in US reporting in real time' was annotated twice with positive, twice with neutral, and once with negative.Thus, the independent annotator accepted neutral as the ground truth.A total of three samples suffered from disagreement and were resolved by the independent annotator.The confusion matrix given in Table 2 shows how the sentiment categories are re-distributed when comparing the sampled dataset generated by the text mining approach with the gold standard formed using the collected annotations.
Overall, 123 out of 150 samples (82%) were in agreement.When considering the positive sentiment, 37 of the samples in the gold standard were in agreement with the text mining approach.Some instances were in disagreement, where 13 samples were categorised as neutral.No positive instance was in disagreement with the negative category.Of the 50 samples of negative tweets, 45 samples of the gold standard agreed, with 2 and 3 samples being in disagreement and annotated as positive and neutral respectively.Likewise, for neutral tweets, 41 samples were in agreement, with 8 and 1 instances being annotated as positive and negative respectively.Such disagreements illustrate the natural subjective nature of the task.Overall, the relatively high agreement between the impact of such texts on human annotators and the results generated by the text mining approach implies that the proposed automated method generates reliable results towards understanding the barriers and opportunities faced by technology adopters from large online corpora.

Conclusion
This paper presents a scalable and automated framework towards tracking the likely adoption of emerging technologies.Such framework is powered by the automatic collection and analysis of social media discourse containing references to emerging technologies from a large landscape of adopters.In particular, to support the experiments presented herein, and subsequently remove the dependence on manual qualitative data collection and analysis, an automated text mining approach was adopted to compile a large corpus of over four million tweets covering five year's worth of data.Once pre-processed, the corpus was divided into datasets based on their publication month and year.To extract references to emerging technologies from text, a simple string-matching approach was applied to automatically identify tweets containing references to technologies that could be mapped to CyBOK's cybersecurity index.Under the hypothesis that the expression of positive sentiment infers an increase in the likelihood of impacting a technology user's acceptance to adopt, integrate, and/or use the technology, and negative sentiment infers an increase in the likelihood of impacting the rejection of emerging technologies by adopters, sentiment analysis was applied to extract the sentiment expressed towards the identified technology.For each technology, the sentiment polarity with the highest average score was used to determine the overall sentiment expressed during the specific month and year.
Notably, this study reports that instances of negative sentiment disclose the obstacles faced by technologies as a result of the dissemination of false information or their participation in malicious activities.In the context of risk assessment, a crucial aspect of a company's decision-making process, this information can serve as an additional factor in assessing the risks, costs, and benefits an organisation may face upon deploying such technologies, including the overall security of their technology systems and data.In the context of the stages of technology adoption, these obstacles may contribute to delays in progressing through the adoption stages or even lead to the rejection of the technology altogether.On the other hand, the expression of positive sentiment is useful for recognising the benefits and advantages of adopting particular technologies, as it provides insight into how other organisations with similar structures have successfully integrated them.By refining the data, the framework can also be customised to concentrate on specific industries, such as education and healthcare, which enhances its flexibility and makes it a dynamic solution that can adapt to the unique needs of different sectors.
To quantitatively test our hypothesis, a ground truth analysis was performed to validate that the sentiment captured by the text mining approach are comparable to the results given by human annotators when asked to label whether such texts positively or negatively impact their outlook towards adopting an emerging technology.The collected annotations demonstrated comparable results to those of the text mining approach, illustrating that automatically extracted sentiment expressed towards technologies are useful features in understanding the landscape faced by technology adopters across various stages of adoption.
Given the positive results of this preliminary study, the next step is to investigate the use of context-aware sentiment extractors to reduce false positives in emerging technology sentiment analysis.While the current method has presented useful insights, it may occasionally misinterpret sentiments due to a lack of context awareness.For example, whereas one would classify the tweet 'cyber attack quick response guide' as expressing a neutral sentiment, VADER's results report that the tweet expresses a negative one due to the presence of the word 'attack'.By employing advanced techniques, such as sentiment analysis models based on deep learning, the accuracy and reliability of the findings presented here can be enhanced, allowing for a more nuanced comprehension of the factors influencing technology adoption.
In addition, the results presented herein may be used to support event prediction.In particular, narratives which surround the cybersecurity threats and attacks faced by emerging technologies, particularly dialogues that express a negative sentiment or other linguistic elements that describe the intent to disrupt or cause harm, may be monitored in real-time and subsequently aid the identification and prediction of activities such as the launch of a malicious cyber attack aimed towards a particular technology.

Figure 1 :
Figure 1: Top 30 most frequently referenced CyBOK terms used across the dataset

Figure 2 :
Figure 2: Chronological aspect-based analysis results across an excerpt of emerging technologies from the whole dataset (positive sentiment = green, neutral sentiment = orange, negative sentiment = red)

Figure 3 :
Figure 3: Chronological aspect-based analysis results across an excerpt of emerging technologies from discourse relating to healthcare (positive sentiment = green, neutral sentiment = orange, negative sentiment = red)

Figure 4 :
Figure 4: Chronological aspect-based analysis results across an excerpt of emerging technologies from discourse relating to education (positive sentiment = green, neutral sentiment = orange, negative sentiment = red)

Table 1 :
Examples of tweets mapped to CyBOK terms

Table 2 :
Confusion matrix comparing text mining outputs with human annotations