A Comprehensive Survey on Privacy-Preserving Techniques in Federated Recommendation Systems

: Big data is a rapidly growing ﬁeld, and new developments are constantly emerging to address various challenges. One such development is the use of federated learning for recommendation systems (FRSs). An FRS provides a way to protect user privacy by training recommendation models using intermediate parameters instead of real user data. This approach allows for cooperation between data platforms while still complying with privacy regulations. In this paper, we explored the current state of research on FRSs, highlighting existing research issues and possible solutions. Speciﬁcally, we looked at how FRSs can be used to protect user privacy while still allowing organizations to beneﬁt from the data they share. Additionally, we examined potential applications of FRSs in the context of big data, exploring how these systems can be used to facilitate secure data sharing and collaboration. Finally, we discuss the challenges associated with developing and deploying FRSs in the real world and how these challenges can be addressed.


Introduction
In recent years, recommendation systems have become a popular tool to address information overload in many real-world fields such as news, E-commerce, and healthcare [1][2][3][4].This requires collecting a large amount of sensitive information about users, such as user attributes, social relations, contextual information, and behaviors [5].Unfortunately, a central server is needed to store these data, leading to the potential risks of data privacy leakage, such as selling user data to a third party without consent or malicious attackers stealing the data [6].Additionally, due to privacy concerns and regulatory restrictions, it is difficult to integrate data from other platforms to improve the performance of the recommendation system [7].For example, the General Data Protection Regulation (GDPR) has set strict rules for collecting user data and sharing data between different platforms, which can lead to an inadequate amount of data for the recommendation system, thus decreasing its performance [8].
To overcome this challenge, Google introduced federated learning, a privacy-preserving distributed learning scheme that allows participants to collaborate to train a machine learning model without exchanging real data [9].Instead, intermediate parameters such as model parameters and gradients are exchanged between participants.This has opened up a new field of application for recommendation systems known as federated recommendation systems (FRSs), which combine federated learning with recommendation systems for privacy-preserving recommendation systems [10].In Figure 1, we present the generic training procedure of FRSs.

Privacy Concerns
The FRS is an innovative approach that seeks to provide a privacy-aware paradigm for model training by avoiding direct exposure to real user data [11,12].However, there are still some core concerns that need to be addressed in order to make this approach fully effective.These concerns include finding the exact privacy-preserving technique that not only provides the trustworthiness of the data, but also achieves higher accuracy.Moreover, the scalability of the approach must be further explored in order to make it applicable to a wide range of use cases:

•
User-side concerns: the primary privacy concern of a user in an FRS is that personal data are shared across multiple entities [13,14].These data may include sensitive information such as name, address, age, gender, and financial information.If these data are not properly secured and shared securely, this could lead to a potential violation of an individual's privacy [15].In addition, the user may be unaware of how his/her data are being used, leading to a potential breach of his/her privacy.On the other hand, an FRS can itself pose privacy risks due to the potential for profiling.This means that the user's data are used to build a profile of his/her interests and activities, which can then be used to create targeted advertisements or other manipulative tactics.
This can lead to a potential violation of an individual's right to privacy.• Server-side concerns: An FRS is vulnerable to attacks by malicious actors, who can disrupt the federated system by manipulating user ratings, injecting false information, and taking over the federated system [16,17].Attackers can also employ various techniques, such as data poisoning, distributed denial-of-service attacks, and brute force attacks, to disrupt the system [18].Furthermore, attackers may be able to access user data through unencrypted channels or modify the recommendations generated by the system to suit their interests.

Related Work
Several surveys have focused on FRS, but they cannot provide comprehensive solutions to address the effects of privacy and security issues in big data applications.In [19], the study investigated the characteristics and challenges of federated learning in detail, but did not provide enough detail on FRSs.In [20], the authors gave an accurate definition of federated learning and its different architectures and applications, such as the FRS.In [21], the authors provided a comprehensive categorization of recommendation methods, as well as discussed their respective limitations.In addition, several surveys were conducted that focused on the privacy and security of federated learning.In [22], the study identified and analyzed the potential privacy threats and security vulnerabilities in federated learning.
Similarly, in [23], the authors elaborated on the assumptions, reasons, principles, and differences between the various attacks and defenses in the privacy and robustness fields of federated learning, but none of them particularly focused on FRSs.Despite this, most of these surveys focused on either recommendation systems or federated learning separately, and few surveyed specific problems in FRSs.For example, in [10], the study proposed a classification for FRSs from the perspective of federated learning and investigated the algorithm-level and system-level challenges for FRSs.These surveys may provide insight into the capabilities of FRSs, but they did not offer solutions that can effectively deal with privacy and security within the framework of the system.

Our Contribution
Compared to the existing surveys, this survey paper provides the impact of FRSs on privacy-preserving techniques in big data.It presents a comprehensive review of the main challenges and existing approaches to enable data sharing while preserving user privacy.Furthermore, it compared various privacy-preserving techniques, including differential privacy, secure multiparty computation, homomorphic encryption, tokenization, anonymization, and pseudonymization.Furthermore, this survey provides an overview of the targeted applications and industry of FRSs.Moreover, a detailed description of the public datasets used in FRSs is provided, highlighting the unique challenges posed by big data sharing.Finally, this survey paper provides real-world challenges for future research directions.
The rest of the paper is organized to provide an in-depth exploration of the field of FRSs.Section 2 provides the necessary overview of the research.Section 3 delves into the details of privacy-preserving techniques used in this area.Section 4 examines the applications and industries that are targeted with FRSs.Section 5 looks at the publicly available datasets for FRSs.Section 6 delves into the future directions and challenges that can be faced in real-world applications.Lastly, Section 7 concludes the survey and provides an overall summary of the research.

Overview of the FRS
An FRS is a distributed system for providing a personalized product and content recommendations across multiple platforms [24].It is designed to leverage the collective intelligence of the network of participants, providing a more comprehensive view of the user's preferences and needs [25].Below, we precisely explain the main entities in FRSs. Figure 2 presents the architectural overview of the FRS.

•
User: The user entity in an FRS is an individual identified by his/her unique ID.The user is the primary person interacting with the system and provides information to generate personalized recommendations [26].This includes user preferences, interests, purchase history, and other data that can be used to suggest relevant items.The user entity is also used to store information related to the user's interactions with the system, such as ratings, reviews, feedback, etc.This information can be used to help improve the accuracy of future recommendations.

•
Item: The item entity in the FRS refers to the items recommended to the users.This could include products, services, books, movies, music, etc. [27].The items can come from different sources, such as websites, stores, or catalogs.The recommendation system uses the item entity to suggest items to users based on their preferences and interests.The item entity also contains information about the items, such as the price, description, and other relevant details. • Cross-system: The cross-system entity in an FRS is a type of user profile created by combining user data from multiple systems [28].This allows for more accurate recommendation algorithms to be developed and deployed across multiple systems [29].
For example, if a system collects user data from multiple different websites and platforms, the cross-system entity can be used to create a unified profile of a user across all of these different sources.This unified profile can then be used to generate more accurate recommendations, such as suggesting a product that a user might like based on his/her preferences across all of the different systems.

•
Recommendation: A recommendation entity in an FRS allows individual users to access personalized recommendations from multiple sources [30,31].This type of system is useful for organizations that have multiple data sources and need to be able to provide personalized recommendations to individual users based on their historical experience [32].The recommendation entity acts as a central hub that can aggregate data from multiple sources and provide tailored recommendations to individual users.This helps organizations make decisions more quickly and efficiently, allowing them to increase customer satisfaction and loyalty.

•
Cloud server: A cloud server is an entity in an FRS that stores user information and makes recommendations for other users [33].The cloud server is responsible for collecting data from multiple sources, processing them, and making recommendations.It also is a hub for other participating entities, such as individual users, content providers, and recommendation systems.By connecting these entities, the federated server can provide more accurate and personalized recommendations to users [34].

Architecture of FRS
The architecture of an FRS can be divided into the following four components, as shown in Figure 3:

1.
Data aggregator: This component is responsible for collecting and processing data from multiple sources (e.g., websites and mobile apps).The data aggregator then stores the data in a unified format, allowing the recommendation system to access the data from all sources in one place [35].It can be used to create user profiles and determine user preferences, as well as to enable recommendations to be made across multiple platforms [36].

2.
Feature extractor: This component is a critical part of the FRS architecture.It is responsible for extracting key features from data sources in order to create a unified representation of the data [37].This unified representation enables the system to compare data from different sources and make recommendations accurately.Furthermore, it enables the system to generalize the data, reducing the amount of data that need to be processed [38].The feature extractor component can extract features such as user preferences, user demographics, item attributes, and item ratings.

3.
Recommendation system: A recommendation system is a key component of an FRS.It generates user recommendations based on the data gathered from the different federated sources [39].The system uses algorithms to analyze the data and create personalized recommendations for the users.The goal is to provide the most-relevant and accurate recommendations tailored to users' interests and preferences [40].The recommender system is also responsible for updating the recommendations as new data are added to the federated sources.This helps ensure that the recommendations remain relevant and up-to-date [41].

4.
Evaluation engine: This component measures the system's performance, such as user engagement, user satisfaction, and other metrics [42].The evaluation engine collects and analyzes data from different sources and uses machine learning algorithms to identify patterns in the data and evaluate the system's performance.It also provides feedback to the other components in the system, such as the content providers, to enable further optimization [43,44].The evaluation engine ensures fairness and accuracy in the system.

Categories of FRS
In this section, we explore the different categories of FRSs and discuss how FRSs can be used to improve the accuracy and efficiency of recommendations.We also present how these categories can pose serious privacy concerns to users' sensitive data.In Figure 4, we present the categories of FRSs.In Table 1, we briefly highlight the pros and cons of each below mentioned FRS technique.

•
Content-based filtering: This type of FRS uses content-based information to provide personalized recommendations to users without needing to collect and store personal user data [45,46].FRSs work by aggregating data from multiple sources, such as online stores, websites, and other applications, and using algorithms to detect patterns in the data.To this end, content-based filtering uses information about the content of the recommended items, such as genre, author, or keywords, to generate recom-mendations [47].Content-based filtering has several advantages over other forms of FRSs.It does not require any user data, so it can protect user privacy by avoiding the need to collect and store any sensitive personal information.Additionally, it is less resource-intensive than other forms of FRSs, since it does not require creating a centralized database to store user data.Moreover, it is easy to implement since it does not require a complex algorithm to generate the recommendations [48].Content-based filtering is often used in conjunction with other forms of FRSs, such as collaborative filtering, in order to generate more accurate and personalized recommendations [49].This category is becoming increasingly popular among online retailers and other businesses due to its effectiveness and privacy-preserving benefits.However, contentbased filtering requires access to the user's interaction data to make recommendations, which means that the system can access the user's preferences and behavior, which could be sensitive information.Therefore, without proper privacy measures in place, content-based filtering systems can pose a risk to the user's privacy.• Knowledge-based systems: This type of system uses a "knowledge base" to create recommendations based on user preferences and behavior, rather than relying solely on data collected from an individual user [50,51].This means that the system does not collect any data from the user, instead relying on a pre-existing collection of information.The knowledge base is typically a collection of data about different products, services, or experiences [52].It contains information about the characteristics of each item, such as its features, price, popularity, and ratings.Using these data, the system can generate recommendations tailored to the user's interests without collecting any personal information from the user.However, knowledge-based systems require access to the user's explicit knowledge about items and preferences to make recommendations.This means that the system has access to the user's sensitive information, which can be a serious concern from the user's perspective [53,54].

•
Collaborative filtering: This type of FRS prioritizes user privacy while providing effective and accurate recommendations [55].It uses the user's data to generate personalized recommendations [56,57].Moreover, the system uses data from similar users to make recommendations to the user.This means that the system can make accurate recommendations by accessing limited users' data or having to share it with other users or systems.The collaborative filtering approach works by first identifying similar users who have the same demographic characteristics, tastes, or preferences [16,58].Next, the system takes the data from these similar users and uses them to make recommendations to the user.This data might include past purchases, ratings, or reviews.The system then uses these data to develop recommendations for the user [59].One major privacy issue with collaborative filtering is that it requires access to sensitive user behavior data to make recommendations.These data can include information about the user's past purchases, searches, and interactions with other users.Therefore, without proper privacy measures in place, these data can be vulnerable to unauthorized access, misuse, or disclosure [60].

•
Context-aware recommendation: This type of FRS uses context-aware techniques to help tailor the recommendations to the users in a way that is more meaningful and tailored to their interests and needs [61,62].Context-aware recommendation systems are designed to consider the user's current context when making recommendations, such as his/her location, time, and surrounding environment [63,64].This allows the system to make more accurate and personalized recommendations that better suit the users' needs and preferences [65].By taking into account the user's current context, the system can provide recommendations that are more accurate and relevant to the user's needs [66,67].This type of system can help reduce the amount of data that need to be collected and stored by the system, as the data used to generate the recommendations are already present in the user's context [68].The privacy concern with context-aware recommendations is that they require access to a wide range of user data, including location, device information, and social media activity, to make personalized recommendations.These data can be sensitive and reveal private information about the user's behavior, interests, and preferences.Therefore, it is important for context-aware recommendation systems to implement robust privacy-preserving techniques to protect user privacy while still providing effective recommendations.

•
Hybrid recommendation systems: Hybrid recommendation systems are a type of FRS that combine the capabilities of traditional centralized and distributed recommendation systems to provide users with the best of both worlds [3].Hybrid recommendation systems enable users to make use of personalized recommendations while also protecting their privacy.In a hybrid system, the data are stored in a central repository, and the recommendation algorithm is executed by a federated learning system [69].The central repository stores all of the user's profile data, and the federated learning system executes the recommendation algorithm without any user data being shared between the two.This allows the system to remain secure and the user data to remain private.Besides, the hybrid system allows different algorithms to be used.For example, collaborative filtering can be used in conjunction with content-based filtering to provide a more accurate recommendation [70].Furthermore, the system can be adapted to different data sources, such as social media and web analytics.This type of system also has the advantage of scalability.As new users join the system, the recommendation algorithm can be updated without requiring additional user data to be shared [71].This makes the system more efficient and effective and allows for more users to be accommodated.The primary privacy concern with hybrid recommendation systems is the extensive collection of user data from various sources required to make accurate and diverse recommendations.These data can include sensitive user behavior, profile, and contextual data, which if not properly protected, can lead to unauthorized access and unintended disclosure of personal information.Thus, such systems must prioritize the implementation of robust privacy-preserving techniques, such as data anonymization, homomorphic encryption, and differential privacy, to mitigate the risks of data breaches and ensure that user privacy is not compromised.

Privacy Techniques
FRSs pose a unique privacy concern due to their distributed nature.Since the training data are processed and stored across multiple nodes, it is more difficult to ensure data privacy than with a centralized approach.Existing research has shown that the central server can infer sensitive information from the intermediate parameters.For instance, a server can identify items the user has interacted with based on the non-zero gradients sent by the client [72].Moreover, the server can also infer ratings from the user-uploaded gradients in two consecutive rounds [73].To address those concerns, privacy-preserving techniques have been developed to protect user data privacy while still allowing FRSs to operate efficiently, as shown in Figure 5.In this section, we present these privacy-preserving techniques of FRSs and compare their advantages and limitations.

Differential Privacy
On the one hand, differential privacy (DP) is a privacy-preserving technique that provides a mathematical guarantee of privacy [74,75].It works by adding random noise to the data, making it difficult to identify individual data within the dataset [76].This technique has been used in various contexts, including recommendation systems.For example, in [77], the study used DP to protect the privacy of users in a collaborative filtering-based recommendation system.The authors added Laplace noise to the user ratings to ensure privacy preservation.Similarly, in [78], the study used DP to protect the privacy of user ratings in a matrix factorization approach to recommendations.
On the other hand, local differential privacy (LDP) is a variant of DP that is used for federated learning.Instead of adding noise to the entire dataset, LDP adds noise to each user's data before sending them to the server [79].This ensures that the server never has access to the raw data, but still can obtain useful information.Several studies have been conducted in this context; for example, in [80], the study used LDP to protect the privacy of users in a matrix-factorization-based recommendation system.Similarly, in [81], the authors improved the matrix factorization using LDP for recommendation systems.
The benefits of using DP and LDP in FRS include the ability to protect user privacy while still allowing the server to obtain useful information from the data [82][83][84].Moreover, it ensures that the raw data are never exposed to the server, which is important for protecting user privacy.The limitations of using DP and LDP in FRSs include the fact that the noise added to the data can lead to inaccurate results, as well as increased computation time due to the need to add the noise [85].Besides, it is difficult to determine the optimal noise level for a given dataset, which can lead to inaccurate results.Therefore, it is essential to design a system that can effectively trade off between recommendation accuracy and privacy.

Secure Multi-Party Computation
Secure multi-party computation (SMPC) is a cryptographic technique that allows two or more parties to securely compute a function over their private data without any of the parties being able to access the data of the other parties [86,87].In other words, SMPC ensures that the private data of each party remain secure while still allowing the parties to obtain the result of the computation over all the data without ever having to share the data themselves.
SMPC has been used in FRSs to protect user data and enable collaborative recommendations across multiple parties.For example, in [88], the study used SMPC to create a privacy-preserving collaborative filtering system that allows multiple parties to generate personalized recommendations without sharing the user data.Similarly, in [89], the study used SMPC in blockchain for privacy preservation.
The benefits of using SMPC in FRSs include the ability to protect user data privacy, the ability to generate personalized recommendations from multiple parties collaboratively, and the ability to create a decentralized recommendation system [90,91].However, the use of SMPC also has some limitations, such as the high computational cost associated with secure computation, as well as the complexity of designing and implementing secure multi-party protocols.

Homomorphic Encryption
Homomorphic encryption is a type of encryption that allows computations to be performed on encrypted data without having to decrypt it [92].In an FRS, homomorphic encryption allows for data to be encrypted and shared between multiple parties while allowing for the computation of recommendations based on the encrypted data [93].There are two main types of homomorphic encryption used in FRSs: fully homomorphic encryption (FHE) and partially homomorphic encryption (PHE).FHE allows for the computation of arbitrary functions on encrypted data, while PHE allows only specific computations, such as addition or multiplication [94].
Several studies have used homomorphic encryption in recommendation systems with promising results.For example, a recent study [95] developed a privacy-preserving system for matrix factorization using FHE.Another study [96] proposed a privacy-preserving user-based recommendation system using homomorphic encryption.The main benefit of homomorphic encryption in FRSs is that it provides a high level of privacy and security for users, as their data are encrypted and remain confidential [97,98].However, the main limitation of homomorphic encryption is that it is computationally expensive, which can reduce the system's overall performance.

Tokenization
Tokenization in FRSs helps to break down the data into smaller components (tokens) to make them easier to analyze and process.This includes breaking down the text into words, sentences, and phrases and breaking down numerical data into numbers and other values [99].Tokenization also helps to ensure privacy and data security by preventing the leakage of sensitive user data, such as personal information and preferences [100,101].Several studies have used tokenization in FRSs.For example, in [102], the authors used tokenization to improve the performance of an FRS.They used a tokenization method based on the federated learning framework, which allowed the recommendation model to learn from distributed data sources without compromising user privacy.In [103], the authors used tokenization to improve the accuracy of an FRS by splitting the data into smaller parts.
The benefits of using tokenization in FRSs include improved accuracy, privacy, and scalability.In particular, tokenization allows the system to process data more quickly and accurately while ensuring user privacy [104,105].In addition, it allows the system to scale to larger datasets without compromising performance.On the other hand, the limitations of using tokenization in FRSs include the complexity of the tokenization process and the potential for data loss [106].Tokenization can be computationally intensive and time-consuming, and it can lead to data loss if the tokens are not generated correctly.Besides, some tokens may be more difficult to process than others, leading to potential accuracy issues.

Anonymization
Anonymization is a privacy-preserving technique in an FRS that is used to protect users' data privacy by concealing their identities [107,108].The main idea of this technique is to mask users' identity information, such as user ID, username, and other user-related information, while still maintaining their data utility.This technique can protect users' identity information in the FRS while allowing the system to generate useful and accurate recommendations.
There have been several studies that have utilized the anonymization technique in FRS, showing the effectiveness of the technique [109][110][111].Anonymization is easy to implement, as it only requires masking the user's identity information.The limitations of using anonymization in FRSs include the potential for data leakage, as well as the risk of data distortion due to the masking of user identity information [112].

Pseudonymization
Pseudonymization is a privacy-preserving technique used in FRSs to protect the privacy of users by replacing their identifiable attributes with a pseudonym [113,114].It is a method of masking the true identity of a user while allowing his/her to be identified by his/her pseudonym.Pseudonymization can protect a user's sensitive attributes, such as age, gender, and location, as well as his/her explicit and implicit preferences.Several studies have been conducted on this matter [115][116][117].
The benefits of using the pseudonymization technique in FRSs include improved privacy for users, as their data are not stored, and only their pseudonym is used to identify them [118].Furthermore, pseudonymization also allows for better scalability of an FRS as the amount of data stored is reduced and the data can be divided into smaller chunks [119].However, there are also some limitations to using pseudonymization in FRSs.One of the main drawbacks is that it can be difficult to accurately link a user's pseudonym with his/her true identity and preferences.Furthermore, pseudonymized data can also be vulnerable to dictionary attacks and re-identification attacks.

Comparison
The aforementioned privacy mechanisms have been widely utilized in FRSs to offer stronger privacy protection.Table 2 details the comparison between these mechanisms.Similarly, Table 3 overviews privacy-preserving techniques, including their descriptions, domains, platforms, and environments.Firstly, the main objects of protection vary between the mechanisms.The tokenization mechanism secures user interaction behaviors, while the remaining mechanisms protect user ratings.Moreover, homomorphic encryption can also integrate data from other participants in a secure manner.Secondly, homomorphic encryption and secure multi-party computation are both encryption-based mechanisms, which protect privacy while maintaining accuracy, but the high computational cost of homomorphic encryption restricts its application in large-scale industrial settings.Secret multi-party computation reduces the computation cost, but increases the communication costs.On the other hand, differential privacy and local differential mechanisms protect privacy by adding random noise, which has low computational costs and does not add any communication costs.Nonetheless, adding random noise will unavoidably influence model performance to a certain degree.

Applications and Target Industry
The FRS is a powerful tool that can be utilized in various applications and target industries.It enables secure and privacy-preserving data sharing for big data and provides real-time recommendations across multiple sources [136].However, sharing data means that users must reveal their private information, such as name and address, which can easily be taken advantage of by third-party companies [137,138].In particular, this private information can be used in the retail industry to provide personalized product recommendations to customers, in the healthcare industry to identify potential treatments for patients [139], in the finance industry to suggest financial products to customers [140], and in content recommendation for E-commerce websites and for recommending movies and shows on streaming services [141].In Figure 6, we present the potential applications and target industries of FRSs and outline their details in the following subsections.In Table 4, we present potential real-time applications and targeted industries of FRSs.

Online Shopping
Currently, online shopping services are becoming increasingly popular and are involved in various aspects of our lives [14].A large amount of private information of users is collected and stored centrally by online shopping service providers, which poses a serious risk of privacy leakage [142][143][144].User data may be sold to third parties by service providers or stolen by external hackers.FRSs can help users enjoy personalized recommendation services while maintaining personal privacy, make the service providers more trusted by users, and ensure the recommendation service complies with the regulations [145].For example, an FRS can be designed to implement various popular recommendation algorithms to support many online shopping services and deploy them on a real-world content recommendation application.

Social Media
Social media in FRSs allows users to interact with each other and share content, such as comments, photos, videos, and links [135,146].This is a platform that allows users to create networks, post content, and engage with one another in various ways.Social media can include sites such as Facebook, Twitter, LinkedIn, Instagram, Pinterest, and YouTube [147].Studies have found that recommender systems can help to increase the accuracy and relevancy of recommendations by leveraging the vast amount of user data from social media platforms [148][149][150][151][152]. Therefore, using an FRS can enable the personalization of recommendations by leveraging user preferences and interests from social media profiles and can eliminate bias by incorporating the collective wisdom of users from different social media platforms.FRSs can benefit social media in a number of ways, i.e., they can provide more accurate and relevant recommendations, which can increase user engagement and satisfaction.They can also help reduce bias and personalize recommendations, which can improve the user experience [153].Moreover, they can provide better insights into user preferences and interests, which can be used to inform marketing campaigns and product development.

Healthcare
The healthcare industry is one of the most-heavily regulated industries in the world, and using FRSs to share data can help reduce the burden of compliance [154,155].In particular, FRSs can allow various healthcare providers share patient data, medical records, and other relevant information while maintaining the privacy of individual data.Researchers have studied the potential of FRSs for healthcare data sharing [156,157].The consensus is that FRSs can provide a secure, efficient, and cost-effective way to share sensitive patient data between organizations.Studies have also shown that FRSs can reduce the cost of healthcare data storage, improve the quality of care, and reduce the burden of compliance with data privacy regulations [158,159].FRSs can benefit healthcare in a number of ways.For one, they can allow for more secure and efficient data sharing between healthcare providers.By using FRSs, organizations can share data without sacrificing the privacy of individual data [160].Moreover, FRSs can reduce the cost of healthcare data storage and improve the quality of care by allowing a more comprehensive view of each patient [161].Finally, FRSs can reduce the burden of compliance with data privacy regulations, as data can be stored in a secure and encrypted manner [162].

Education
Education can be defined as the process of preparing individuals for life, work, and citizenship through the acquisition of knowledge, skills, values, beliefs, and habits.In an FRS, education uses computers and other electronic technologies to facilitate those acquisitions [163,164].In particular, FRSs allow for a more personalized and tailored educational experience, allowing students to progress at their own pace and according to their interests [165].There has been increasing research into using recommendations in the education industry [166].Several studies have focused on the use of a recommendation system to provide personalized and tailored educational experiences through the use of computerbased learning [167][168][169].Such research has highlighted the potential of recommendations to increase student engagement, improve student performance, and reduce the need for teachers to provide instruction.To this end, FRSs can provide personalized and tailored educational experiences to students of all ages and backgrounds.By leveraging the power of computers and other electronic technologies, FRSs can provide students with personalized feedback and guidance tailored to their learning needs.

Advertising
Advertising in an FRS identifies and targets potential customers by delivering customized messages and offers.This is usually accomplished through personalization and contextualization, leveraging user data and preferences, geography, and other demographic characteristics [170][171][172].In FRSs, the main goal of advertising is to increase the visibility of products and services and to increase the likelihood of conversion [173].Recent research has highlighted the potential of FRSs to facilitate better advertisement targeting.In [174], it was found that a recommendation system using a machine learning approach can improve the accuracy of ads by up to 20%.This improvement was attributed to the ability of recommendations to model user preferences and characteristics more accurately, allowing for better targeting of ads.Moreover, FRSs can help reduce the cost associated with advertising campaigns, as well as the amount of effort needed to build and maintain the campaigns.On the other hand, privacy preservation is an essential component of any recommendation system.This is due to the fact that recommendation relies on the collection and analysis of user data [175].Without proper privacy measures, user data can be easily exposed to malicious actors, leading to privacy violations and potential data misuse [176].Therefore, FRSs must employ privacy measures such as data anonymization, data encryption, and access control in order to ensure the privacy of users.

E-Commerce
E-commerce is the buying and selling products and services over the Internet [125].It enables customers to access a wide range of products, services, and information from businesses worldwide.The increase in E-commerce activity has led to an increased usage of recommendation systems [177,178].To this end, FRSs can allow different data sources to collaborate in order to provide personalized recommendations to customers.A literature review on the application of FRSs in E-commerce revealed the various advantages [179][180][181].First, FRSs can help reduce the cost of data access and storage.By sharing data across different sources, FRSs can provide a more efficient and cost-effective way of accessing and storing data.Furthermore, FRSs can improve the accuracy of recommendations by providing more relevant and personalized recommendations [182].This can increase customers' satisfaction and loyalty by providing customers with better recommendations that are tailored to their individual needs.

Publicly Available Datasets
This section provides a list of publicly available datasets for FRSs.These datasets cover a wide range of topics, including movie ratings, TV show ratings, music tastes, consumer preferences, and more.Each dataset has been carefully curated to contain relevant information for developing FRSs.Those datasets are available for free download and can be used to train and test the performance of FRSs.In Table 5, we show the list of those datasets and discuss the details in the following subsections.

Amazon Reviews Dataset
The Amazon Reviews dataset is a set of data collected from customers who have purchased products from the Amazon platform.These data can be used to create an FRS that can accurately predict what products a user might be interested in based on his/her past purchase and browsing behaviors.The Amazon Reviews dataset provides the necessary data needed to create a robust FRS that can recommend relevant products to users.

MovieLens Dataset
The MovieLens dataset is a popular dataset used for collaborative filtering.The dataset comprises ratings from over 20 million users for more than 10,000 movies.It contains ratings from 0.5 to 5, with 0.5 being the lowest and 5 being the highest.In terms of FRSs, the MovieLens dataset is used to provide users with personalized movie recommendations based on the ratings of other users.It does this by considering the similarity between users and the ratings they have given to the movies.For example, if User A and User B both rated a particular movie highly, then it is likely that the system would recommend that movie to both users.This is performed by calculating the similarity between users and their ratings and then identifying movies that are popular among users with similar tastes.

Yelp Dataset
The Yelp dataset is a publicly available dataset that contains reviews, ratings, and other metadata related to businesses and services.It is a popular data source for recommendation systems, both in the context of federated learning and non-federated learning.In an FRS, the Yelp dataset can be used to build personalized user recommendations.Specifically, the dataset can be used to build a collaborative filtering model that considers user preferences and item ratings.The collaborative filtering model can then create personalized user recommendations based on his/her preferences and other related factors.

Film Trust Dataset
The Film Trust dataset is a collection of data about user ratings and preferences for movies.It was developed as part of the Netflix Prize competition and can be used to build an FRS.The dataset contains movie ratings from over 500,000 users.Each user rated at least 20 movies, and the ratings range from 1 to 5 stars.The dataset also includes user movie preferences and demographic information.The demographic information includes gender, age, zip code, and occupation.In the case of FRSs, the system allows users to share their ratings and preferences, and the system can then use these data to make recommendations.For example, if a user shares his/her ratings and preferences with his/her peers, the system can recommend other movies that the user may be interested in.

Goodreads Dataset
The Goodreads dataset is a dataset of book ratings and reviews from the popular book recommendation website, Goodreads.The dataset contains over 6 million user-generated ratings of and reviews on books, authors, and other related items, as well as over 1 million user-generated book reviews.This dataset can be useful for FRSs because it allows for the creation of personalized recommendations that are tailored to the interests of each user.The dataset also includes information about the books, authors, and reviews so that the FRS can provide personalized recommendations based on the user's interests.

Netflix Dataset
The Netflix dataset is a collection of user ratings of movies in the Netflix library.It can be used for creating an FRS, which allows users to make recommendations to one another without revealing their individual preferences.The dataset comprises over 20 million ratings from over 480,000 users for over 17,000 movies.The data are provided in a matrix containing users, movies, and their ratings.This allows for creating a collaborative filtering system, where users can make recommendations to one another based on the ratings they have given to movies.Moreover, this dataset provides additional information, such as release year, genre, and user age, which can be used to build a more personalized recommendation system.

LastFM Dataset
The LastFM Dataset is a collection of user-generated music-listening data from the Last.fmonline music service.The dataset is well suited for FRSs, as it contains large amounts of user-specific music-listening data from various users.These data can create personalized recommendations for users, as they are more likely to be interested in music similar to what they have already listened to.The dataset includes user IDs, the artist name, the release name, the song title, the album name, the number of plays, and the timestamp when the song was played.These data can be used to create more detailed recommendations for each user.By analyzing the number of plays and the timestamp of each song, the system can identify which songs are most-popular among users and make recommendations accordingly.

Douban Dataset
The Douban dataset is a large collection of user-generated content from the popular Chinese social networking site Douban.It is a useful resource for FRSs, providing a rich data source for understanding user preferences and behaviors.The dataset contains user ratings, reviews, and tags for movies, books, and music.It also contains user profiles, including age, gender, location, and occupation.All this information can be used to build a robust FRS.

BookCrossing Dataset
The BookCrossing dataset is a collection of user ratings and book ratings from a website called BookCrossing.It is composed of three tables: BX-Books, BX-Users, and BX-Book-Ratings.The BX-Books table contains the ISBN, title, author, and publisher information for the books that have been rated by the users.The BX-Users table contains the users' information, including their age, location, and book ratings.The BX-Book-Ratings table contains the ratings that the users have given to each book.The BookCrossing dataset is a great resource for training an FRS, as the system can use the ratings from multiple users to create a personalized recommendation for each user.

Future Directions and Real-World Challenges
In this section, we discuss the need for further research and development in order to optimize the effectiveness of FRS.We first show the existing limitations, issues, and challenges in Table 6.Then, we look at the potential challenges that need to be overcome for FRS to be implemented successfully.In Figure 7, we show the overview of future directions while we discuss the details of them in the following subsections.

Data Privacy
• Future directions: Data privacy is an increasingly important issue in FRSs.In order to ensure that all users' data are secure, FRSs must have security protocols in place for exchanging data between different organizations.These protocols should be designed to protect user privacy by preventing unauthorized access to user data and by allowing users to control with whom their data are shared.In addition, the FRS must take steps to ensure that user data are not misused or abused.This could include implementing policies and procedures for data security and monitoring, as well as developing technologies to detect any unauthorized access or misuse of user data.

•
Real-world challenges: A major challenge for FRSs is ensuring that user data are secure, both in terms of the data exchanged between different organizations and the data stored on each organization's servers.To address this challenge, FRSs must ensure that all data exchanged between different organizations are encrypted, secure, and adequately protected.Besides, the system must ensure that each organization's servers are secure and that all user data are stored in a secure manner that prevents unauthorized access or misuse.

Cold Start Problem
The cold start problem is a challenge faced by FRSs when providing accurate and personalized recommendations.This problem arises when there are no existing data available to predict user preferences accurately.Without these data, the system cannot make accurate recommendations and may default to providing generic suggestions.As a result, users may be less likely to engage with the system, leading to a decrease in customer satisfaction: • Future directions: These may include collecting more user data, such as preferences, likes, and dislikes, and implementing techniques such as collaborative filtering that allow the system to draw connections between similar users and their preferences.Moreover, techniques such as deep learning may be used to understand user behavior better.The FRS may also benefit from integrating external data sources, such as social media, from personalizing the user experience further.

•
Real-world challenges: These include protecting user privacy, balancing accuracy and complexity, and ensuring scalability.To ensure user privacy, FRSs must be designed with privacy-preserving techniques in mind.The complexity of the model must be balanced with its accuracy, as complex models may be difficult to interpret and may not be able to predict user preferences accurately.Finally, the system must be able to handle large amounts of data when scaling, as more data will allow for more accurate predictions.

Data Fragmentation
The use of FRSs is becoming increasingly popular due to the fact that they enable organizations to manage and store data in a distributed manner without compromising privacy.However, when data are stored in a distributed manner, there is a risk of data fragmentation.Data fragmentation occurs when the data are split into pieces and stored in different locations.This can cause problems with accessing and utilizing the data, as well as with data consistency: • Future directions: To address the issue of data fragmentation, research should focus on approaches to facilitate seamless data transfer between different locations and to ensure that the data are consistent and up-to-date.This could include the development of distributed databases, distributed query processing, and distributed data analytics.Moreover, research should focus on developing methods for federated learning with fragmented data.These methods should aim to reduce the computational cost of training models and improve the accuracy of the models.

•
Real-world challenges: Data fragmentation can be a significant challenge in FRSs, as it can lead to inconsistent data and can make it difficult to access and utilize the data.In addition, privacy concerns may prevent the data from being shared across different locations, further exacerbating the problem of data fragmentation.As such, organizations will need to develop policies and technologies to facilitate data transfer and to ensure that the data are consistent and up-to-date.Moreover, organizations will need to ensure the security of the data and protect the privacy of users.

Scalability Issues
FRSs hold great promise for improving the quality and accuracy of personalized recommendations, but they can also be difficult to scale.As the number of sources and users grows, the complexity of managing data across multiple sources and ensuring the accuracy of recommendations can become increasingly difficult:

•
Future directions: In order to address the scalability issues associated with FRSs, researchers are developing improved methods for managing and processing data across multiple sources.This includes optimizing algorithms to reduce the number of data transfers, as well as utilizing distributed computing architectures and artificial intelligence to manage data better.In addition, researchers must continue to explore methods of improving the accuracy of federated recommendations, such as using transfer learning, ensemble learning, and other latest machine learning techniques.

•
Real-world challenges: There are scalability concerns about data privacy and security, as well as the need for robust methods for managing user profiles and managing conflicts between multiple sources.Moreover, as federated systems become more complex and involve more stakeholders, it will be important to develop governance models and mechanisms for cooperation among the different parties involved.

Lack of Interoperability
Interoperability is the ability of different systems to work together and exchange information.Without interoperability, FRSs would be unable to share information, limiting their effectiveness.This could lead to a lack of the personalization and accuracy of recommendations.Moreover, this would create a barrier between users and their data, making it difficult for them to find the most-relevant content.

•
Future directions: These may include developing standards and protocols for data exchange between systems, as well as utilizing open-source solutions that allow different systems to communicate with each other.In addition, developing technologies such as blockchain could provide solutions to ensure data privacy and security while allowing interoperability.

•
Real-world challenges: The main challenge in achieving interoperability in FRSs is the need for systems to be able to share data without compromising privacy or security.Moreover, different systems may have different data formats, making it difficult for them to communicate with each other.Developing standards and protocols for data exchange could help address this challenge, as could the development of open-source solutions.In addition, ensuring data privacy and security in federated systems could be a difficult task, as data may need to be shared between different systems.Utilizing blockchain technology could help address this challenge, allowing for secure data sharing while ensuring privacy.

Conclusions
This paper discussed the potential of federated recommendation systems for protecting user privacy while allowing organizations to benefit from the data they share.We explored how federated recommendation systems allow for secure data sharing and collaboration in the context of big data.In addition, we identified the challenges associated with developing and deploying these systems in the real world.With the increasing demand for privacy, federated recommendation systems provide a powerful tool for protecting user data while allowing organizations to benefit from the data they share.With the right tools and practices in place, federated recommendation systems can help organizations navigate the complex landscape of data privacy.

Figure 1 .
Figure 1.This figure represents the generic training procedure of FRSs, where a set of IoT devices is connected to the cloud server.Firstly, the cloud server synchronizes with the IoT devices and creates personalized recommendations; secondly, those IoT devices train the local learning models individually and then upload those trained learning models to the cloud server.Finally, the cloud server aggregates those local models to generate new recommendations for the next round.

Figure 2 .
Figure 2. Overview of the FRS entities.

Figure 3 .
Figure 3. Architecture of the FRS, divided into four components used for user's data sources.

Figure 4 .
Figure 4. Categories of federated recommendation systems.

Figure 6 .
Figure 6.Applications and target industries of FRSs.

Figure 7 .
Figure 7. Directions that should be followed in FRSs for future developments.

Table 1 .
Comparison of categories of FRSs.

Table 2 .
Comparison between several privacy-preserving techniques.Privacy", we compare privacy with respect to accuracy.High means a good tradeoff between privacy and accuracy.Moderate means high privacy, but lower accuracy.Low means poor privacy and poor accuracy.C/C stands for communication/computation costs.

Table 3 .
Privacy-preserving techniques in various domains, platforms, and environments.

Table 4 .
Real-time applications and targeted industries of FRSs.

Table 5 .
Publicly available datasets that can be used for the development of FRSs.

Table 6 .
Current security protocols, limitations, issues, and challenges in FRSs.