Applying Machine Learning in Marketing: An Analysis Using the NMF and k-Means Algorithms

: The integration of machine learning (ML) techniques into marketing strategies has become increasingly relevant in modern business. Utilizing scientific manuscripts indexed in the Scopus database, this article explores how this integration is being carried out. Initially, a focused search is undertaken for academic articles containing both the terms “machine learning” and “marketing” in their titles, which yields a pool of papers. These papers have been processed using the Supabase platform. The process has included steps like text refinement and feature extraction. In addition, our study uses two key ML methodologies: topic modeling through NMF and a comparative analysis utilizing the k-means clustering algorithm. Through this analysis, three distinct clusters emerged, thus clarifying how ML techniques are influencing marketing strategies, from enhancing customer segmentation practices to optimizing the effectiveness of advertising campaigns.


Introduction
The impact of marketing on businesses is crucial to ensure their growth and development [1].Marketing enables brand awareness and recognition, customer acquisition and loyalty, differentiation from competitors, and increased sales, thus contributing to the sustainability of business profitability [2].Systematic data management and data processing is essential to obtain timely and accurate information about the market, products and customers, as well as the business environment, as it facilitates decision making in the different marketing processes.With the advancement of information technologies, companies need to apply new methods for data processing and analysis [3].Machine learning (ML) can be used to gain new insights into consumer behavior, improve the performance of marketing actions [4], formulate strategic marketing decisions [5], identify competitors, effectively manage products and prices, etc. [6].Customer reviews are considered essential for both strategy development and consumer attraction.Likewise, topic modeling algorithms have emerged as a popular analytical tool, enabling companies to make decisions on (i) product enhancement and development; (ii) service recovery and improvements; (iii) pricing; and (iv) customer recruitment and retention [7].The use of ML offers great possibilities for companies, marketing professionals, and academics [8].However, it has become a challenge to identify the methods, techniques, and tools being applied to different marketing processes.Additionally, determining the level of application and technological adaptation by companies is complex [9].This includes marketing activities, commercial management, improving customer experience, optimizing operations, and creating new business models.
The increasing application of ML in marketing strategies not only shows technological advancements in this field, but also brings to light the diverse academic perspectives that investigate both the impacts and the potential of ML in marketing.Studies such as the one by Herhausen et al. [4] highlight the integration of ML in targeting consumers and customizing offers, while also addressing the challenges of data privacy concerns and the opacity of algorithms.Likewise, Arasu et al. [10] explore how data analytics can predict customer purchase preferences by proposing an integrated ML approach to social media marketing.Their approach employs text mining and analysis techniques to process and understand large volumes of social media data, thereby enhancing marketing effectiveness.Similarly, Bonab et al. [11] focus on the challenge of adapting product recommendations across diverse markets by developing multiple models.They introduce FOREC, a novel approach designed to optimize market adaptation through structured steps of pre-training, forking, and fine-tuning of market data.The study also considers models such as the Generalized Matrix Factorization, Multi-Layer Perceptron, and Neural Matrix Factorization.Tirunillai and Tellis [12] introduce Latent Dirichlet Allocation (LDA) to marketing research, utilizing it to analyze online product reviews.Their application of LDA revealed significant dimensions of product quality, referred to as topics, which captured diverse aspects of consumer satisfaction.Another application of ML in marketing is studied by Kalinić et al. [13].Specifically, they use Artificial Neural Networks and Structural Equation Modeling in order to model consumer satisfaction, thus identifying factors influencing consumer satisfaction.
Building on the analysis of existing research, our study will contribute to the literature by examining both academic publications and case studies to further explore the impact and potential of ML techniques on marketing practices [14].To this end, the following objectives have been established: (i) to identify the ML methods, techniques, and tools that are more frequently employed in marketing practices; (ii) to find out marketing processes in which ML techniques are being applied in an intensive manner; and (iii) to establish successful cases of the use of ML techniques in marketing.To achieve the aforementioned research objectives, the study makes use of the Scopus database using two precise keywords, "machine learning" and "marketing", in order to obtain relevant data.After identifying some of the most representative scientific manuscripts on the subject, they are systematically processed.Two analysis methodologies are employed: (i) topic modeling using the nonnegative matrix factorization (NMF) algorithm [15] and (ii) a comparative analysis using the k-means clustering algorithm.The NMF has allowed us to discover and understand the predominant themes in the dataset of selected articles.Meanwhile, the purpose of using k-means clustering is to analyze and validate the results of the NMF.Hence, one methodological novelty of our work is the combined use of NMF and the k-means clustering algorithm to analyze academic literature at the intersection of ML and marketing.While both algorithms are already well known, their combined application to review and compare existing papers on marketing represents an innovative approach.
The rest of the document is structured as follows.Section 2 identifies the research methodology that explains the process of obtaining and systematizing the information based on the main keywords of the study.Section 3 introduces the first cluster of papers obtained, which describes the applications of ML algorithms in marketing.Section 4 refers to a second cluster, which refers to the application of big data and ML techniques in marketing processes and their impact on consumer behavior.Section 5 describes the results obtained in a third cluster, which refers to the applications of ML in digital marketing.Section 6 introduces future trends related to the application of ML in marketing practices.Finally, Section 7 summarizes the main findings of the article.

Methodology
This section describes the methodology employed in our study, from the data collection process to data management and preprocessing, feature extraction, topic modeling, comparative analysis, and evaluation.

Data Collection Process: Literature Search Procedure
The first step of this study is the identification of academic articles in which ML has been employed to develop marketing applications.We utilized the Scopus database, known for its wide range of peer-reviewed literature, to conduct the search.In order to ensure that only the most relevant studies are included, the search criteria were precise.Hence, we searched for articles including both terms, "machine learning" and "marketing", in their title, also refining our search with several filters.(i) Document type was set as 'Article', thus excluding non-peer-reviewed materials; (ii) source type was restricted to 'Journal', thus ensuring sources were reputable academic journals; (iii) language was set to 'English', hence ensuring the findings are widely accessible; and (iv) the period considered is from 2018 to 2023.This approach resulted in a collection of 66 papers, marking a significant corpus of work at the intersection of ML and marketing.Regarding the time period chosen for the analysis of these articles, it was determined based on the availability of relevant articles rather than a preconceived timeframe.

Data Management and Storage: Database Development and Record Storage
After identifying relevant literature, we organized and stored these data systematically.We chose Supabase (https://supabase.com,accessed on 19 June 2024), an open-source platform for database storage, to manage our collection.We created an SQL database table to catalog each article, including fields for (i) ID (a unique identifier for each paper); (ii) created-at (the date and time of the record's database entry); (iii) title (the article's title as it appears in the source); (iv) abstract (a summary of the research and its outcomes); (v) year (the publication year of the article); (vi) cited (the citation counts of the article, indicating its influence); (vii) title-clean (a formatted version of the title for analysis); and (viii) abstractcleaned (a standardized version of the abstract for uniform analysis).Notice that 'cleaned title' refers to the process of refining and preparing the titles of documents in the dataset by removing any unnecessary or extraneous information, such as special characters, formatting issues, or irrelevant words.This ensures that the titles are standardized and ready for analysis.Similarly, 'cleaned abstract' refers to the abstracts of documents in the dataset that have undergone a similar refinement process.This involves removing any noise or irrelevant information from the abstracts, such as filler words, repetitive phrases, or formatting inconsistencies.Figure 1 shows a bar chart of journals including at least 2 articles with the terms 'machine learning' and 'marketing' in their title.Additionally, Figure 2 presents the same analysis but the search criteria are expanded to include articles that contain the terms "machine learning" and "marketing" either in the title or in the abstract.The blue line refers to articles in which both "machine learning" and "marketing" appear in their title, while the orange line also includes the possibility that these terms appear in the articles' abstracts.

Data Preprocessing: Text Data Cleaning
The subsequent phase of our study was dedicated to text data cleaning, a key process aimed at enhancing the quality and consistency of the data extracted from the article titles and abstracts.This stage involved multiple processes to refine the text for subsequent analysis: (i) inclusion of custom stop words-the text cleaning procedure was augmented by incorporating a custom list of stop words, which included domain-specific terms such as "ai" and "digit", along with the standard stop words typically used in text preprocessing; (ii) tokenization-we segmented the text into discrete words or tokens to facilitate detailed thematic analysis; (iii) case normalization-all textual data were converted to lowercase to ensure uniformity across the dataset; (iv) numerical character removalnumerical characters were removed as they were considered irrelevant to our topic analysis; (v) contraction expansion-we expanded contractions to their full forms to enhance the clarity and consistency of the text; (vi) stemming-words were reduced to their root form using stemming algorithms, which simplified the analysis by reducing the complexity of the dataset; (vii) punctuation and enhanced stop word removal-punctuation marks were removed, and the text was further refined using our expanded stop word list to eliminate unnecessary tokens; and (ix) exclusion of short words and spaced tokens-we excluded words shorter than two characters and tokens containing spaces, as their contribution to our analysis was small.
These meticulous preprocessing steps transformed the raw text into a structured format, ready for deeper analysis and ensuring the integrity and relevance of data.To illustrate the effectiveness of these procedures, Tables 1 and 2 show examples of cleaned titles and abstracts, respectively, from our dataset.

Abstract Abstract Cleaned
The widespread impacts of artificial intelligence (AI) and machine learning (ML) in many segments of society have not yet been felt strongly in the marketing field.Despite such shortfall, ML offers a variety of potential benefits, including the opportunity to apply more robust methods for the generalization of scientific discoveries.Trying to reduce this shortfall, this monograph has four goals.First, to provide marketing. . .widespread impact artifici intellig machin learn ml mani segment societi felt strong market field despit shortfal ml offer varieti potenti benefit includ opportun appli robust method general scientif discoveri tri reduc shortfal monograph goal provid market. . .

Feature Extraction: Generation and Storage of Vector Embeddings
In the next step of our methodology, the cleaned textual data are transformed into highdimensional vector embeddings.These embeddings will be used by a k-means clustering algorithm, which will allow us to categorize the papers into groups based on their intrinsic textual similarities.We use the 'text-embedding-ada-002' model to convert text into vectors that efficiently encapsulate semantic content.To ensure optimal representation while maintaining semantic integrity across large datasets, we pair this model with the 'cl100kbase' encoding scheme.The process adheres to a maximum token limit of 8000 per text segment, a threshold determined by the model's capacity.The initial steps for generating vector embeddings are outlined in the the Python code snippet shown in Listing 1.  Post-generation, the vector embeddings are systematically stored within a Supabase database.This integration supports NLP tasks within our ongoing analytical processes.Listing 2 shows the integration process and data storage: To illustrate the application of these methods, Tables 3 and 4 display examples of the embedding vectors generated from the cleaned title and abstract, respectively.

Topic Modeling Using Non-Negative Matrix Factorization
Topic extraction is an effective unsupervised data extraction technique for exploring the relationship between text subtexts.Although there are many different approaches, this paper focuses on NMF.The next stage in our methodological framework involved the application of NMF to discover thematic topics within the clean abstracts.The following steps defined this process: 1.
Data preparation: We retrieved the cleaned text data from our Supabase database to ensure a robust analysis foundation.The first row of this database is shown in Table 5.

2.
Word count analysis and common word identification: The initial analysis focused on the distribution of word usage in the abstract, identifying the most frequently occurring words in the dataset to guide the topic modeling process.The mean word value for abstract paper is approximately 120.59, and the mean standard deviation is approximately 44.63, indicating a wide range of values.The 25th percentile (Q1) is 94 words, the average (Q2) is 109.5 words, and the 75th percentile (Q3) is 142.75 words.

3.
Vectorization of text: The NMF is an unsupervised technique, so there is no labeling of the topics in which the model is trained.It decomposes (or factorizes) highdimensional vectors into low-dimensional representations.These lower-dimensional vectors are not negative, which means that their coefficients are also not negative.
In our example, high-dimensional vectors are tf-idf weights [16], but they can be anything, including word vectors or a simple number of raw words.By converting the abstract text of each document into numerical form, we can use it to create functions (Listing 3).After processing the documents, we have a little more than 1553 unique words, so we set max-features to include only the top 750 terms per term frequency in all documents to further reduce the features.4.
Model fitting and topic extraction: The only parameter required is the number of components, i.e., the number of topics that we want (Listing 4).From this point on, the model is run to obtain the topics.Table 6 shows the three topics extracted with the NMF algorithm.

5.
Topic assignment and dataset compilation: Documents were systematically assigned to the extracted topics and a comprehensive dataset encapsulating these associations was compiled.6.
Output documentation: We meticulously documented the results of the analysis for subsequent review and reference, ensuring that findings were accessible and well organized.
Table 5.First row of our database.).fit ( tfidf ) As shown in Table 7, the documents were mapped into topics and a dataset with these associations was compiled to support the analysis.

Abstract Topic No. Topic
The widespread impacts of artificial intelligence (AI) and machine learning (ML) in many segments of society have not yet been felt strongly in the marketing field.Despite such shortfall, ML offers a variety of potential benefits, . . .

ml learn research review tool intellig applic method analyt machin mak decision focus manag artifici adopt potenti
This article introduces algorithmic bias in machine learning (ML) based marketing models.Although the dramatic growth of algorithmic decision making continues to gain momentum in marketing, research in this stream is still inadequate despite the devastating, asymmetric and oppressive impacts of algorithmic bias on various customer groups. . . 3 model algorithm data use consum predict method respons custom research machin learn accuraci databas direct campaign network classif decis perform

Comparative Analysis Using K-Means Clustering
In order to extend the previous analysis and verify the findings derived from the NMF algorithm, we employed k-means clustering on the vector embeddings.This method facilitated the identification of natural groupings within our data.The k-means algorithm was executed through the following procedural steps: 1.
Data retrieval and preparation: Initially, we accessed the vector embeddings stored in the Supabase database, preparing them for subsequent analysis (Listing 5).

2.
Matrix formation and k-means clustering: A matrix was constructed from these embeddings, upon which k-means clustering was applied.In order to construct the matrix, string embeddings were transformed into numpy arrays (Listing 6).

3.
Cluster assignment and analysis output: Documents were systematically assigned to their respective clusters.The clustering results were then summarized to reflect the natural groupings and their characteristics (Listing 7). 4.
Data export: The dataset, inclusive of cluster assignments, was exported for extended analysis and future reference (Listing 8).
Listing 5. Python code for creating a Supabase client and loading data.This methodological step allowed for an additional layer of insight into the dataset's thematic organization.The application of k-means clustering provided a comparative perspective that validated and enhanced the thematic structures identified by the NMF.Using the cleaned abstract data for an example involving three clusters, we obtained the following results: number of clusters = 3, inertia = 7.4452.Then, we assigned each document to a cluster.

Evaluation of Topic Modeling and Clustering Results
As a way to conclude the analysis, a comprehensive evaluation of the outcomes derived from the NMF algorithm and the clustering of k-means was conducted.The evaluation process included the following steps: (i) comparative framework setup and iterative trials-we established a comparative framework to systematically evaluate the two methodologies; (ii) optimal category determination and qualitative and quantitative analysis; and (iii) consolidation of findings, which allowed us to confirm the thematic coherence across the dataset (Figure 5).

Cluster 1: Machine Learning Algorithms in Marketing
The first cluster includes four research articles focusing on the application of ML algorithms within marketing.In the first article, El Koufi et al. [17] explores precision marketing through big data and ML, with a specific focus on Morocco.This research introduces an algorithm designed to forecast potential clients by employing big data analysis and ML techniques.The study comprises five key steps: (i) data selection and understanding, using a dataset from Attawfiq Micro-Finance, a Moroccan bank-the dataset comprises 6000 records and 14 variables, including customer demographic and behavioral information spanning from 2018 to 2021; (ii) data cleaning and filtering procedures are conducted to eliminate missing values, outliers, and noise, thereby favoring dataset dimensionality and computational efficiency; (iii) feature selection becomes imperative for extracting critical features and eliminating non-significant attributes from the dataset-univariate selection followed by the grid search method is employed to determine optimal parameters; (iv) various ML algorithms, including XGBoost, random forest, gradient boosting, support vector machines, logistic regression, and decision trees, are implemented; and (v) evaluation of results, which employs a confusion matrix to assess predictive model performance-metrics such as precision, recall, accuracy, and f1 are utilized, alongside a detailed description of true positives, true negatives, false positives, and false negatives.The study concludes that gradient boosting outperforms other ML methods in terms of accuracy, f1, recall, precision, and cross-validation score.
In the second article, Saengthongrattanachot et al. [18] investigate the performance of ML techniques on bank marketing datasets.This study encompasses four primary steps: (i) the dataset characteristics are understood-the dataset details direct marketing campaigns of a Portuguese banking institution and comprises 36,548 records, with twenty input attributes and one output attribute (yes/no) indicating term deposit subscription; (ii) data preparation, which involves addressing class imbalance through subsampling of the majority class, resulting in a balanced dataset of 9280 records; (iii) modeling, which employs k-fold cross-validation to construct predictive models, utilizing algorithms such as J48, random forest, naive Bayes, and k-nearest neighbor-the models are initially built with all twenty input attributes, followed by models constructed with the five most relevant attributes identified by J48; and (iv) model evaluation, which employs criteria such as accuracy, precision, recall, and f1.The study concludes that reducing the attribute count to five enhances predictive model simplicity without significant accuracy loss.
In the third article, Zhang [19] focuses on predicting consumer loan processing utilizing ML algorithms and extracted characteristic data.The next steps are followed.(i) The study treats loan prediction as a classification problem, utilizing a neural network, a support vector machines algorithm, and a random forest algorithm; (ii) the support vector machine, known for its powerful classification method, seeks to find the most appropriate hyperplane as a decision surface by training sample data-the study highlights the use of kernel functions to map to a high-dimensional feature space; and (iii) the random forest algorithm constructs a forest of decision trees independently, with the final classification determined by voting.The study's results, based on a dataset of 400 groups, reveal that random forest exhibits the highest accuracy, averaging over 94% and identifying elderly customers as more responsive to subscribing to term deposits.
Lastly, Sarker [20] integrates data mining algorithms into a bank's database system to predict customer behavior and improve marketing strategies.This study utilizes unsupervised ML algorithms-such as k-means clustering-as well as supervised algorithmsincluding multi-class decision forest, multi-class logistic regression, etc.The study emphasizes the successful prediction of customer behavior regarding term deposit subscriptions, particularly highlighting the efficacy of the multi-class decision jungle algorithm.These algorithms enable targeted marketing efforts towards customers likely to accept term deposits, thereby enhancing subscription rates.All in all, these articles collectively show the effectiveness of ML algorithms in enhancing marketing strategies within the banking sector.

Cluster 2: Application of Big Data and ML in Marketing and Consumer Behavior
The second cluster consists of twenty-two articles that analyze the use of big data and ML in marketing and their impact on consumer behavior.The main objective of this study is to determine the potential of these technologies in marketing processes, as well as the challenges and opportunities that marketers face in connecting with consumers.Marketing is a dynamic and constantly changing business function.To achieve its goal of delivering the right product to the right customers at the right time, it must manage information systematically, accurately, and in a timely manner in regards to the market, products, and customers, as well as the business environment.[21].The adoption of these technologies in marketing shows their full potential in different aspects: Direct marketing: Predicting customer behavior helps tailor messages and improve campaign effectiveness [23].ML helps companies understand customers better and run more targeted campaigns.• Customer experience: Technology enhances marketing value and creates better customer experiences.Meeting customer needs and expectations requires using technology to understand behavior and assess marketing impact.Big data aids in decision making and understanding consumer behavior in real time.• E-commerce: By using machine learning models to search for similar items on competitive online platforms, sellers can easily compare prices and offers to optimize their own marketing strategy.This can help sellers attract new customers and increase sales by providing better value propositions [24].• Dynamic prices: The potential of automated pricing information and targeting marketing messages [25].

•
Brand personality: Applied to consumer data, firms can assess how consumers perceive brand personality and study the effects of brand-consumer congruence in personality on social media [26].
All in all, the combination of big data and ML allows for enhancing customer relationship management, and for developing effective marketing strategies.In that way, businesses can understand customer needs, segment audiences, predict behavior, and personalize marketing efforts.Big data also plays a key role in identifying target audiences for advertising campaigns, ensuring maximum impact and engagement.Table 8 displays several cases of companies that are applying ML and AI in their marketing and customer management processes.

Cluster 3: Machine Learning Applied to Digital Marketing
The third cluster encompasses a collection of thirty-seven articles.These articles are focused on two primary themes: general applications of ML in digital marketing and innovation and technological advancements [31].One significant application of ML in digital marketing is the identification and recommendation of key indicators within brand communities and public behavioral data.Techniques such as random forest, AdaBoost, and XGboost are utilized to distinguish between active and passive behaviors on Facebook pages of digital product brands and platforms.Additionally, text mining and ML methods such as content and sentiment analysis are employed to identify patterns within public and community behavior data of brands.The integration of AI further strengthens the processing of unstructured information, enhancing data quality through sampling methods like oversampling and undersampling.Furthermore, ML aids in analyzing consumer demand for information and uncovering hidden unstructured elements within data available on social networks, thereby improving marketing effectiveness.AI-based data analysis also helps in understanding the impact of community content on public engagement and how participatory posts enhance interactive experiences.
In digital marketing, ML finds many applications in analyzing the vast amount of data generated by social media and online platforms.NLP techniques enable AI systems to analyze human language across various mediums such as blogs, product reviews, and social media posts.This analytical capability is key in understanding and responding to customer perceptions online.Moreover, ML techniques aid in specific applications such as fraud detection, stock market prediction, customer relationship management, and automated generation of content summaries.Integration of ML tools with marketing strategies enhances performance by leveraging statistics and AI to make informed decisions.Arasu et al. [10] propose a framework comprising three key steps: (i) text mining, which involves the examination of unstructured information to identify meaningful patterns; (ii) use of ML tools to better understand and interpret data; and (iii) performing a data analysis that provides efficient marketing insights.Beyond social media, ML applications extend to various domains within digital marketing, including chatbots, advertisement refinement, email marketing, customer behavioral targeting, lead generation, and automated content creation.These applications enable enhanced customization, deeper understanding of user behavior, and optimization of marketing strategies for improved efficiency [32].
Innovation and technological development in digital marketing are underscored by intelligent marketing strategies based on simulation with ML and mobile computing.Studies reveal significant improvements in company performance metrics, including efficiency gains of approximately 20% and revenue increases exceeding 30% compared to traditional strategies [33].Additionally, advancements in ML facilitate automatic content classification in marketing, with neural networks demonstrating superior performance in multi-label classification of news articles compared to other algorithms like random forest and k-nearest neighbors [34].Their ability to handle high-dimensional data without dimensionality reduction techniques highlights their efficacy in enhancing marketing strategies.

Future Trends
Marketing, as can be understood from the the integration of AI and ML, is reshaping innovation [35], moving towards a future dominated by personalized customer experiences that dynamically adapt in real time based on customers' interactions, contexts, and evolving preferences [36].This is driven by the convergence of data proliferation, computational power, and algorithmic sophistication, allowing marketers to understand consumer behaviors and preferences with remarkable precision.On the consumer-facing side, it allows for (i) improvement of shopping fundamentals and (ii) enhancement of the consumption experience.On the business-facing side, it enables (i) better decision making and (ii) improved financial applications [37].For example, in digital contexts, AI-driven features like "recommended for you" sections on platforms such as Amazon, Alibaba, and Netflix increase user engagement through tailored suggestions [38].Another example is Airbnb, which uses ML algorithms to analyze user preferences and behaviors.This enables the platform to provide personalized accommodation choices, adjusting the results to the individual preferences of the clients [39].In non-digital contexts, such as Starbucks' use of predictive analytics, the technology is applied to tailor marketing offers and product recommendations at an individual level [40].By analyzing customer data, Starbucks can predict what products customers might be interested in, even before they themselves know, optimizing marketing efforts and increasing the likelihood of sales [41].
With the continuous refinement of ML models, there is not only a strong emphasis on real-time personalization of services and experiences.Predictive analytics are also experiencing a significant boost, enabling marketers to forecast market trends [42], customer behaviors [43], and campaign outcomes with greater accuracy [44].This foresight will also facilitate more strategic decision making, resource allocation, and timely adjustments to marketing campaigns [45].The integration of AI in marketing tools is expected to generate systems capable of autonomous operation, where marketing campaigns are automatically created, managed, and optimized without extensive human intervention [46].These systems will identify opportunities, execute A/B testing, and refine strategies based on real-time data analysis.The proliferation of voice-assisted devices and conversational AI will transform search marketing [47] and customer service [48].Marketing strategies will need to adapt to voice search optimization [49], and conversational AI will play a central role in customer interactions [50], providing personalized assistance and enhancing engagement.Blockchain technology will also allow the introduction of transparency into marketing practices, particularly in digital advertising.By facilitating secure and transparent transactions, blockchain can help mitigate fraud, ensure fair compensation in ad networks, and build trust among stakeholders [51].Extended and virtual reality technologies are set to redefine experiential marketing, offering immersive experiences that engage customers in novel ways [52].The Internet of things will provide marketers with a wealth of data from connected devices, enabling a better understanding of consumer habits and preferences [53].This will open up new possibilities for personalized marketing, predictive maintenance, and enhanced customer experiences [54].

Conclusions
This paper combines the NMF and k-means algorithms to perform an analysis of the existing literature on the use of machine learning in marketing.The application of these two algorithms has facilitated the segmentation of the literature into three distinct clusters.Cluster 1 focuses on the integration of ML in the finance sector, highlighting advancements in customer segmentation and targeted marketing campaigns.Cluster 2 explores the synergy between big data and ML, emphasizing their role in enhancing direct marketing and customer relationship management.Cluster 3 focuses on the application of ML in digital marketing, particularly in optimizing content delivery and personalizing user experiences on digital platforms.The results derived from the analysis are based on the possibility that companies use ML tools in their marketing processes, as their applicability demonstrates their relevance in (i) gathering data from various marketing activities; (ii) processing, recording, and identifying patterns in the data; (iii) transforming data into valuable insights and opportunities; and (iv) predicting customer behavior and aiding in marketing decisions.The pursuit of more precise and adaptable marketing strategies has led to significant changes through the integration of AI, ML, and business analytics.This shift is driven by several factors.Firstly, today's consumers are more demanding and seek personalized products and services.Secondly, in our data-abundant world, businesses face both numerous opportunities and challenges.When used together, AI, ML, and data analytics provide the capabilities to handle large datasets, uncover patterns, and predict customer behavior with remarkable precision.
Furthermore, the use of ML significantly improves the marketing mix model.ML is used in product development, pricing, distribution, and promotion.Common applications include personalizing products and services, enhancing customer service, dynamic pricing, optimizing e-commerce, predicting conversion success, forecasting advance purchases, managing social media engagement, automating image analysis, optimizing search engines, and improving customer experiences, among others.Also, ML techniques are key for understanding customer feedback, enhancing products, and customer experiences.The importance of product reviews in e-commerce is also highlighted, underscoring their role in assisting customers with their buying decisions and guiding companies in their marketing strategies.All in all, the application of ML in marketing ensures that decisions are datadriven and aligned with business objectives.While our analysis provides valuable insights, it is based on existing literature and may be subject to publication bias.Additionally, the rapid evolution of ML technologies and marketing practices means that our findings may need to be updated periodically.Finally, although the study focused on the global analysis of marketing processes, we must highlight the opening of new lines of research focused on each of the elements of the marketing mix model.Therefore, future research should include specific variables to provide more details on the processes that apply to product development, pricing strategies, distribution channels, and brand communication.

Figure 1 .
Figure 1.Journals with at least 2 articles including 'machine learning' and 'marketing' in their titles.

Figure 2 .
Figure 2. Journals with articles including 'machine learning' and 'marketing' in their titles or abstract.

Figure 3
Figure 3 displays the number of publications per year during the period 2018 to 2023.The blue line refers to articles in which both "machine learning" and "marketing" appear in their title, while the orange line also includes the possibility that these terms appear in the articles' abstracts.

Figure 3 .
Figure 3. Annual evolution of Scopus-indexed journal articles with ML and marketing in their title (blue line) or in their title or abstract (orange line).

Listing 1 .
Python code for embedding model and encoding parameters.

1 # 6 #
Define embedding model and encoding parameters 2 e mb ed d in g _m od e l = " text -embedding -ada -002 " 3 e m b e d d i n g _ e n c o d i n g = " cl100k_base " Load data set and apply encoding 7 df = g e t _ d a t a _ t o _ s u p a b a s e ( supabase = supabase , name_table = ' web_results ') 8 encoding = tiktoken .get_encoding ( e m b e d d i n g _ e n c o d i n g )

Listing 2 .
Python code for calculating tokens, filtering data, and generating embeddings.# Calculate number of tokens and filter data df [ " n_ tokens _title " ] = df [ " title_clean " ]. apply ( lambda x : len ( encoding .encode ( x ))) df = df [ df .n_ tokens _title <= max_tokens ] df [ " n _ t o k e n s _ a b s t r a c t " ] = df [ " abs tract_ clean " ]. apply ( lambda x : len ( encoding .encode ( x ))) df = df [ df .n _ t o k e n s _ a b s t r a c t <= max_tokens ] # Generate vector embeddings for titles and abstracts df [ " em b ed di n g_ ti t le " ] = df [ " title_clean " ]. apply ( lambda x : get_embedding (x , engine = e m be dd i ng _m o de l )) df [ " e m b e d d i n g _ a b s t r a c t " ] = df [ " ab stract _clean " ]. apply ( lambda x : get_embedding (x , engine = e m be dd i ng _m o de l )) # Save embeddings to database new_df = df [[ ' id ' , ' e m be d di ng _ ti tl e ' , ' e m b e d d i n g _ a b s t r a c t ' ]] stored = s a v e _ e m b e d d i n g s _ t o _ s u p a b a s e ( supabase = supabase , df = new_df )

Figure 4 .Listing 3 . 4 tListing 4 . 1 nmf
Figure 4. Top 20 most frequent words in abstracts of ML in marketing papers.Listing 3. Python code for TF-IDF vectorization of text data.

#Listing 6 .Listing 7 .Listing 8 .
Create client supabase supabase = c r e a t e _ s u p a b a s e _ c l i e n t () df = g e t _ d a t a _ t o _ s u p a b a s e ( supabase = supabase , name_table = ' web_em beddin gs ') Python code for converting string embeddings to numpy arrays.# Field to work field_name = ' e m b e d d i n g _ a b s t r a c t ' # Convert string to numpy array df [ field_name ] = df [ field_name ]. apply ( literal_eval ).apply ( np .array ) matrix = np .vstack ( df [ field_name ]. values ) Python code for KMeans clustering.Python code for printing KMeans clustering results and saving to CSV. print ( f " Number of clusters : { n_clusters } , Inertia : { kmeans .inertia_ }\ n " ) df .to_csv ( " c s v _ R e s u l t s _ C l u s t e r i n g _ K M E A N S 3 .csv " , sep = " # " , columns =[ ' id ' , ' cluster ' ])

Figure 5 .
Figure 5.The best result is obtained with 3 clusters and 3 topics.

Table 1 .
Examples of cleaned titles from our dataset.

Table 2 .
Examples of cleaned abstracts from our dataset.

Table 3 .
Example of embedding vectors of a cleaned title.

Table 4 .
Example of embedding vectors of a cleaned abstract.

Table 6 .
Extraction of topics using the NMF algorithm.

Table 7 .
Topic assignment and dataset compilation.

Table 8 .
Examples of use of ML and AI for enhancing customer relationship management.