Self-Sovereign Identity: A Systematic Review, Mapping and Taxonomy

Self-Sovereign Identity (SSI) is an identity model centered on the user. The user maintains and controls their data in this model. When a service provider requests data from the user, the user sends it directly to the service provider, bypassing third-party intermediaries. Thus, SSI reduces identity providers’ involvement in the identification, authentication, and authorization, thereby increasing user privacy. Additionally, users can share portions of their personal information with service providers, significantly improving user privacy. This identity model has drawn the attention of researchers and organizations worldwide, resulting in an increase in both scientific and non-scientific literature on the subject. This study conducts a comprehensive and rigorous systematic review of the literature and a systematic mapping of theoretical and practical advances in SSI. We identified and analyzed evidence from reviewed materials to address four research questions, resulting in a novel SSI taxonomy used to categorize and review publications. Additionally, open challenges are discussed along with recommendations for future work.


Introduction
The ability to prove that individuals are who they claim to be is critical to human interactions in society, whether in the physical world or online.The proof is typically presented in the form of a credential that enables the identification and authentication of a person.This credential, which consists of a collection of attributes, is referred to as an identity document or simply identity [1,2].In today's digital world, large corporations such as Google and Facebook issue electronic identities.They created these identities to facilitate user identification, authentication, authorization, and provision of user attributes for their internal services.These identities have developed into a powerful tool for identifying users who wish to access the companies' services and those of a variety of other Service Providers (SPs).As a result, these businesses serve as Identity Providers (IdPs).Numerous companies have outsourced their customer registration, identification, and authentication to IdPs.
Using IdPs has a number of benefits and drawbacks.The user benefits from having a single identity to authenticate with multiple SPs.One disadvantage may be that a single IdP manages data for many users.Storing people's electronic identities in a few IdPs has been a source of contention due to the fact that these few data silos have the data of a large number of people [3].These massive data silos have become attractive targets for hackers [4] because they contain high-value assets that can be misused [5] or even traded [6] with institutions that users have not authorized.
Although the vast majority of users trust IdPs naively, many users and businesses are uneasy with the requirement to use and trust these entities.Self-Sovereign Identity (SSI) [3] has garnered attention in this context because it prevents IdPs from tracking their users' activities.Additionally, it also enhances people's privacy by enabling them to store and manage their data and specify the granularity of the information they can share.
Despite the fact that SSI provides sovereignty over the digital presence, it introduces new challenges that must be overcome before widespread adoption can occur.The difficulties are conceptual and pragmatic in nature.The primary conceptual problems are defining SSI and defining what constitutes a self-sovereign system.The pragmatic challenges include, but are not limited to, how to coexist with and migrate existing IdPs' identities to the new model, how to trust data from other self-sovereign identities, and how to assist users with managing, backing up, and recovering private data.
The advantages of this new identity paradigm over traditional models have attracted researchers' and professionals' attention in recent years, resulting in an increasing number of publications on the subject.Some initiatives aim to review and condense the body of knowledge thus far.However, current reviews do not address all facets of SSI.For instance, they omit publications that contribute to the conceptual debate over the meaning of the term "self-sovereign identity" and efforts that present novel problems and solutions in specific areas of SSI.Existing reviews are primarily concerned with applications and research papers that propose SSI systems such as Sovrin [7] and uPort [8].
This article conducts a comprehensive systematic review and mapping of the scientific and non-scientific literature that contribute to the debate over what SSI is, as well as works that address practical issues related to SSI.We searched for, selected, and reviewed publications in a systematic manner, guided by four research questions.Due to the systematic nature of our work, it may be reproduced and updated in the future to reflect new activity.The results include: (i) a taxonomy that enables hierarchical classification of the SSI literature; (ii) an in-depth and systematic analysis of the surveyed materials using our novel taxonomy; and (iii) analyses and maps of publication frequency, venues, co-references, and co-authorships, which provide a global view of the state of the art of SSI literature to the reader.Finally, open issues and recommendations for researchers and practitioners working with SSI are discussed.

Novelty and Research Contributions
In summary, we make the following three main research contributions to the field.
• Our survey examines both conceptual and practical advances in SSI, highlighting philosophical contributions to the definition of SSI, novel problems and proposed solutions, and promising directions for future research.The manuscript conducts an analysis of the body of knowledge established by over 80 research papers, scientific reports, patents, technological standards, and theses.
• Through a proposed taxonomy, we provide the reader with a comprehensive and organized understanding of the SSI literature.Additionally, the manuscript presents and discusses maps of authors' relationships, publication venues, and the shift in the focus of research in the area over time.To our knowledge, this is the first survey of SSI to include a systematic literature review, a systematic mapping, and a taxonomy, all of which are based on rigorous criteria and reproducible methodology.
The remainder of this article is structured in the following manner.Section 2 provides an introduction to electronic identity and a detailed description of SSI.Section 3 outlines the existing secondary studies that review the SSI literature and their shortcomings.Section 4 defines the method used in this study and how it was carried out.Section 5 presents the reader with the proposed taxonomy.In Section 6, we describe the practical research surveyed.Section 7 identifies and discusses mathematical and cryptographic tools used in applied research.In Section 8, we detail philosophical discussions regarding understanding what The electronic identity document is the third format.This is the identity that is used in the virtual world to authenticate users and enable them to consume electronic services on the web.Unlike a digital document, which is a visual representation, an electronic document is built from the ground up to be used electronically, removing the need for visual verification of its integrity.Multi-factor authentication [21] and cryptographic techniques such as digital signatures and public-key cryptography [22] are used to carry out these processes.For instance, by combining a password known only to the identity holder with a key displayed in a timebased one-time password service [23,24].
These three forms of identity must be impervious to forgery, fraud, and data leakage.As a result, the collection, storage, and processing of identity-related data must be handled with extreme caution, with an emphasis on the use of appropriate data protection mechanisms.While each of the three types of identity listed above is vulnerable to fraud, the electronic version requires the most oversight.Numerous instances of fraud involving the misuse of electronic identities have been reported [5,6].
While business cards and curriculum vitae are examples of self-issued identity documents, the vast majority of identities in use are issued by trusted third parties.For instance, national-level identification documents such as driver's licenses and passports are frequently issued by the government [25] or by private companies authorized to do so [26].

Electronic Identity
In the physical world, establishing trust in relationships between various entities requires identifying the communicating parts.Proof of identity is accomplished through pre-agreed upon authentication factors or with the assistance of trusted third parties.Physical devices are frequently used as authentication factors.For instance, it is not uncommon for individuals to be identified visually through their identification documents, followed by a facial badge verification.Similarly, in the electronic world, communicating parties must have a certain Level of Assurance (LoA) regarding the other party's identity.This assurance is accomplished through the use of electronic identities on data communication networks such as the Internet.
As with a physical identity, an electronic identity is typically defined as a set of attributes that help in the description or qualification of an entity [1].Some authors prefer to limit this definition to a set of attributes in a specific context in order to improve its accuracy [27][28][29].As a result, electronic identities are not simply digital representations of physical identities such as a passport or driver's license.They are created, used, and destroyed in accordance with the user's desires, frequently containing only the attributes necessary to accomplish the task at hand.For instance, a seller on eBay [30] may have an electronic identity that conceals their name, age, and country of residence, as others are only concerned about whether or not this seller has a track record of successful transactions [31].
All identities, whether physical or electronic, are subject to ownership verification.That is, they require mechanisms for properly identifying and authenticating users [32].
The identification process begins with the holder of an electronic identity presenting a unique attribute in a given context, i.e., an identifier that differentiates it from all other electronic identities in that context [33].
The most common example is providing an email address when signing up for a subscription service.The subsequent stage is to authenticate the identified entity by verifying a security proof, which is traditionally accomplished via a secret password or digital signature, thereby ensuring that the holder of the identity is, in fact, its owner.In the above-mentioned subscription service example, providing a code or clicking a link received via email proves that the email address belongs to the holder.
Identification and authentication are critical in our digital society because they enable citizens to access services electronically.As a result, the identification and authentication processes are carried out by specialized services trusted by the parties involved.These services are provided by systems that manage electronic identity and are referred to as Identity and Access Management (IAM) systems.

The evolution of IAM models
In the early days of the web, SPs had to implement their own IAM solutions to identify and authenticate clients to offer personalized products and services.As a result, these services are referred to as centralized authorities.This model presented a number of usability issues for users.Most users ended up using similar low-entropy passwords on different systems, making room for numerous vulnerabilities.This model has sparked numerous initiatives aimed at educating users about the dangers of using simple passwords and reusing them across multiple services [34,35].
The next logical evolution was to replace the centralized authorities with third-party IAM solutions, i.e., IdPs.With this new paradigm, users only need to be registered with a few IdPs in order to access the web's plethora of services.By contrast, SPs must be registered with the desired IdPs or IdP federations to work with the IdPs' identified and authenticated users.Through token exchange protocols such as SAML [36], OAuth 2.0 [37], and OpenID Connect [38], the interactions between the IdP, SP, and end user were standardized.Even though this identity model significantly simplified the management of multiple identifiers and passwords for users, it resulted in the creation of a few large silos of valuable private information.
The user-centric model was the next evolutionary step [39].It was designed with the idea that users could use Personal Authentication Devices (PADs), such as smartphones and smartcards, to store and present authentication credentials from SPs, bypassing the need for third-party IdPs.However, as noted in [3], this model has not gained traction and is currently viewed as an extension of the IdP model with greater user control.According to [3], the current interpretation of this model is that the user is aware of and must authorize or deny her IdP sharing specific personal attributes requested by an SP.As a result, the current model of user-centric identity faces the same issues as the previous model.
Figures 1(a), 1(b) and 1(c) illustrate these three identity models, providing an overview of the interactions between the user, IdP, and SP.The emergence of specialized IAM services, i.e., third-party IdPs, resulted in the formation of electronic identity oligopolies [40].Long-term users of IdPs are effectively imprisoned by them, as IdPs do not support portability.These companies promote their own rules, which can result in a user being removed from their platforms if those rules are violated.This can be devastating for individuals who have spent years developing trusting relationships with SPs.They will lose their transaction history and become completely unknown if they are banned.This issue is particularly noticeable for IdPs that double as social media platforms, such as Facebook, LinkedIn, and Twitter, where violations of social network rules are often questionable [41].

Self-Sovereign Identity
In the early days of the web, the conception of the client-server model shaped the idea that in the digital world, people are users of online systems rather than human beings, i.e., entities that need identification, authentication, and authorization to access and perform tasks online [42].This digital model assumes administrative precedence because it was built on the foundation that servers (companies, online businesses) are more important than clients (individuals) and, therefore, dictate the rights of clients [43].This web fabric holds to this day and is exacerbated by the need for the creation of legislation, such as the European Union's General Data Protection Regulation (GDPR) [44] and the California Consumer Privacy Act (CCPA) [45], to specify the rights of individuals and their digital data in a society increasingly dependent on digital interactions.
The fundamental premise of SSI is that individuals have sovereignty over their digital selves and thus control over their data.This concept fundamentally distinguishes SSI from previous identity models, which viewed individuals as users.In this new model, sovereign individuals store and manage their data, thereby controlling with whom their private data are shared and to what extent.
First, individuals must have an existence independent of their digital selves, i.e., they cannot exist only virtually.A (self-sovereign) identity works by sharing the desired (digital) aspects of the individual.Second, people must control their identities by owning and managing their attributes, which does not prohibit them from making claims about other people.Third, people must have access to their data and claims by storing them or being readily available if they are outsourced.Fourth, all systems must be transparent and the underlying algorithms must be free and open-source, thus allowing detailed examination by anyone.Fifth, identities must persist forever, or as long as individuals wish.Sixth and seventh, identities and their claims must be portable across different systems and technologies, which requires interoperability between standards and implementations.Eighth and ninth, people need to consent to the use and sharing of their data, while data disclosure must be minimized to the absolute minimum.For instance, to find out if a person can buy an alcoholic beverage, it is unnecessary to share their date of birth.Tenth, at the end of the day, individuals' rights must be protected, which means that systems must be designed to avoid censorship and to protect individuals' rights, even at the expense of the system.
In SSI, any assertion about a subject is referred to as a claim.A credential is a collection of one or more assertions made about a subject by an entity.It could be, for example, a government-issued driver's license that contains a person's date of birth, name, and address.A Verifiable Credential (VC) is a credential that includes a revocation list or another method of revocation and contains cryptographic material that ensures the credential's integrity, as well as the issuer's identification and non-repudiation [57].Additionally, a tamper-resistant claim derived from a verifiable credential is referred to as a verifiable claim or Verifiable Presentation (VP).Although we use these terms interchangeably throughout this paper, we refer to tamperproof claims and tamper-proof credentials.
In the same way that entities issue physical credentials to holders in the form of paper or plastic cards in the physical world, entities issue VCs to holders in SSI.However, unlike physical and digital identities, these electronic documents enable individuals to select which attributes (claims) to share, which is impossible with physical or digital credentials.They require the holder to present the identity document in its entirety, revealing all of its attributes.
Suppose that you are asked to prove that you have reached the age of majority.With a physical document, showing the paper or plastic card will reveal the birthdate and all other attributes to the RP.The same is true for digital identity documents, which are commonly implemented using X.509 attribute certificates [60].With traditional X.509 certificates, the whole certificate has to be shared with the RP to verify the document's integrity.However, in the context of SSI, you would construct a VP stating that: (i) a credential was issued to you by a trusted party; (ii) this credential has your birthdate in it; (iii) your birthdate was more than 18 years ago; and (iv) this credential has not been revoked by the government body.Hence, whoever receives this VP does not learn your name, birthdate, and any other information in the credential, only that you have reached the age of majority.
The recipient of a VP (i.e., the RP) verifies the following: (i) who signed the credential that supports this VP; (ii) whether the VP is constructed correctly (i.e., it contains the required information and is not corrupted or counterfeited); and (iii) whether the credential that supports this VP is valid (i.e., whether the credential was revoked or not).It is important to note that once the issuer of the credential has been verified in step (i), the RP is free to decide whether or not to trust the issuer.Moreover, step (iii) does not require the RP to inquire the IdP in any particular manner.Revocation registries are publicly available, and the verification is done anonymously [61,62], that is, without disclosing the credential's unique identifier.
While individuals in SSI have the autonomy to issue their own credentials, others are free to distrust them.For example, a bank is unlikely to accept the VP of a self-issued credential that contains a person's name and birthdate.This is true in both the real and virtual worlds.The diagram in Figure 1(d) depicts a highlevel overview of SSI in which the user (i.e., the holder) can interact with the SP using either self-issued or third-party-issued credentials.In either case, the SP is free to decide whether or not to trust the issuer.
Despite the SSI literature's use of the term VP, this concept predates SSI by many years.Prior to SSI, more than a decade of research had been conducted on how to share portions of a credential, as well as predicates over one or more attributes, without losing integrity and authenticity [61,63].Zero-Knowledge Proof (ZKP) is the primary technique underlying VP [64][65][66].In short, a ZKP enables a prover to convince a verifier that she is aware of a value without disclosing the value [67].By combining ZKP and credentials, a credential holder can establish the validity and content of one or more credentials without disclosing the entire credential [65].The same is true for a VC's status.It is possible to demonstrate that a VC has not been revoked without disclosing the credential to the RP and without informing the issuer that a query for a specific credential was made [62].
In Figure 2, we illustrate the end-to-end process of issuing a VC and emitting a VP in a simplified threeactor model.In this example, three individuals own and control their electronic identities, each of which is appropriate for a particular situation.Each electronic identity is linked to a database of issued and received credentials, as well as a revocation registry for expired or revoked credentials.One of Alice's electronic identities issues a credential to one of Bob's electronic identities, such as a declaration that he is a reputable seller of fine wines.Bob then creates and sends a VP to Carl, proving that he possesses a credential attesting to his good reputation.Carl has trust in the issuer of the credential from which that VP was derived, Alice, an internationally renowned winemaker.Carl then begins negotiating with Bob.It should be noted that, in reality, the majority of people will not host revocation registries because they do not issue credentials, which is also the case for physical and digital identification documents.Figure 2: The actors, their electronic identities, and the interactions to issue a VC and present a VP.

Alice
This simplified example demonstrates the trust mechanics of SSI.However, it lacks the depth and complexities of real-life scenarios.For instance, a user may create a VP using two credentials, one of which is deemed trustworthy while the other is not.Deriving trust in non-trivial scenarios is one of the open challenges in SSI.
After discussing electronic identities and the evolution of IAMs, we introduced the reader to SSI.Following that, in Section 3, we present other SSI surveys and their shortcomings, followed by the method used in this systematic review and systematic mapping in Section 4.
Next, we present existing secondary research in SSI.
Kuperberg [13] conducted a survey in which forty-three blockchain-based SSI market offerings were evaluated against seventy-five criteria, including compliance with applicable legislation, market availability, and cost.He stipulated that no reviewed application meets all criteria, and no SSI solution possesses the following characteristics: (i) the maturity of traditional IAM offerings; (ii) a production-level integration standard (such as OAuth 2.0 [37] or SAML [36]); and (iii) OS-level integration.
Although Liu et al. [9] presented their search string, they do not provide any information about their review method.Thirty-six research efforts and patents introducing SSI applications are reviewed in total.They examined these works from the standpoints of authentication, privacy, and trust.They argued that, despite blockchain-related innovations, there are still issues and implications remaining, namely: (i) users may lose their blockchain-based identities (wallets) and need to (ii) change their identities, which is trivial in traditional IAM but might be challenging in distributed ledgers; and (iii) the cost of integrating existing systems into the new paradigm.
Zhu and Badr [14] conducted a review of works that use distributed ledgers to implement SSI in the context of IoT devices.They expanded on Liu et al.'s [9] focus on authentication, privacy, and trust, adding a fourth dimension: performance.They alleged that the trustless environments in which IoT devices operate necessitate SSI solutions.Nonetheless, blockchain technology should be thoroughly investigated, as storing and maintaining public blockchains in IoT devices is prohibitively resource-intensive.As a result, forming small groups of private blockchains may be an option.According to the literature, one possible solution is for IoT devices to inherit the peer-to-peer trust established between their owner entities (humans, businesses, and governments) [74].
Despite the comparison of the underlying infrastructure of blockchain-based SSI offerings, three surveys that do not specify a search method produced similar results [15][16][17].They all mentioned the blockchain framework that the surveyed papers use, as well as the type of blockchain network (private, permissioned, permissionless, or other).Lim et al. [15] conducted a review of 15 for-profit and non-profit companydeveloped, government-related, and open-source applications, concluding that SSI is the optimal solution for user-centric, secure, and cost-effective IAM.Kaneriya and Patel [16] conducted a review of six SSI systems, identifying future enhancements that each system, according to the authors, should prioritize.Finally, Gilani et al. [17] reviewed eight SSI offerings, noting which support selective disclosure of personal information, how cryptographic keys are managed, and blockchain-specific details such as whether credentials are stored on or off the ledger, as well as the use of smart contracts.Smart contracts is a software that executes automatically and transparently on the ledger, allowing anyone to verify them [75].
The authors of [18] described ten SSI systems that utilize blockchain technology but did not specify how they were chosen.They did, however, conduct an analysis of these works in terms of their adherence to the SSI's ten principles, detailing which principle each reviewed paper satisfies.
In contrast to previous surveys, M ühle et al. [19] examined what they refer to as the "four basic components of SSI": identification; authentication; verifiable claims; and attribute storage.They discussed how various research studies and market offerings attempt to address each of the four components.
Čučko et al. [10] presented a systematic map of decentralized identity.They mapped one hundred and twenty papers in total, but only eighty were determined to be SSI-related.While they established a category for conceptual contributions, it was filled up with surveys and research articles highlighting SSI's challenges and opportunities.Alternatively, we consider conceptual contributions that refute or include new philosophical perspectives on what SSI is.Their map encompasses information technology fields and the various domains to which SSI is applied, whereas our maps depict the relationship between authors and publications.
Taxonomies for SSI are introduced by both [11] and [73].The former proposes a four-tiered taxonomy encompassing registration, authentication, data management, and verifiable claims.They were used to categorize twenty-one blockchain-based solutions.The latter's taxonomy includes the facets member, interaction, ambition, and technology stack, which are used to classify one hundred forty-seven results from a gray literature review of the SSI ecosystem culled from DuckDuckGo, Github, Reddit, and ArXiv.Both taxonomies fall short of incorporating philosophical debates about the meaning of SSI.
Finally, the authors of [12] created a meta-synthesis of SSI based on blockchain technology.Meta-synthesis is a qualitative method for aggregating knowledge derived from quantitative, qualitative, empirical, conceptual, and review studies [76].They evaluated sixty-nine works from an enterprise adoption perspective, summarizing the state of the art's technological and business challenges.
Secondary research has already revealed an increasing number of studies in this field.However, a rigorous systematic review of SSI studies is lacking.Earlier studies have examined both the practical and technical aspects of SSI systems.However, they do not evaluate conceptual debates about SSI or works that present and attempt to resolve particular pragmatic issues.On the other hand, we are interested in discovering and examining research materials that extend or refute Allen's ten principles of self-sovereign identity [3] or present and resolve practical problems in the SSI ecosystem.Table 1 summarizes the major differences between previous surveys and ours.

Method
Secondary studies are necessary to keep track of advancements and developments as primary research efforts on a given topic evolve.Two types of secondary studies have gained popularity in recent years in computer science [77]: systematic mapping [78] and systematic literature review [79].Despite the fact that both are systematic and thus employ rigorous methods for identifying and interpreting relevant research, the former is intended to provide a broad overview and identify research trends, whereas the latter is intended to aggregate evidence in order to summarize and answer more specific Research Questions (RQs).
In this study, we conducted a systematic review of the literature and a systematic mapping.

Planning
We followed Petersen et al.'s method [77], which provides detailed guidelines based on a systematic review of mapping studies.These guidelines require the following: (i) the definition of objectives and RQs; (ii) a strategy for identifying relevant studies; (iii) objective inclusion and exclusion criteria to ensure that only relevant material is reviewed; (iv) an extraction process for objectively obtaining evidence from papers relevant to the RQs; (v) a classification method; and (vi) a discussion of potential threats to the study's validity.Our research protocol, which is detailed in the following sections, complies with the aforementioned stipulations.

Research Questions
The objective of this systematic study is fourfold: (i) to examine practical challenges associated with SSI and potential solutions; (ii) to investigate mathematical formalism and cryptographic tools (primitives) used to solve these problems; (iii) to investigate conceptual advancements made to the informal definition of SSI [3]; and (iv) to map SSI publications and authors.These goals result in the following RQs: • RQ-1:What practical problems have been introduced and solved?
• RQ-2:What properties, formal definitions and cryptographic tools have been used?
• RQ-3:What conceptual ideas have been introduced or refuted?
• RQ-4:When, where, and by whom were SSI studies published?

Search Strategy
Our investigation began by specifying a search string that was pertinent to the RQs previously mentioned.Rather than creating a potentially restrictive search query using PICOC [79] or another method of query framing, we searched for "self-sovereign identity" and variants in the title, author keywords, and abstract.Our search string is broad by design in order to encompass as many relevant articles, patents, and research materials as possible.Additionally, we placed no restrictions on the publication year, page count, conference, or journal.The following is the entirety of our query string.
self-sovereign identity OR self sovereign identity OR self-sovereignty OR self sovereignty

Study Selection
Our study selection process is divided into three stages.The first phase eliminates duplicate results and articles that have been republished in extended formats.Mendeley [80] was used to evaluate the results and eliminate duplicates.
After a preliminary screening of the search results, it was determined that several papers do not belong in the field of computer science or are not relevant to our review.We then narrowed our search by developing two inclusion criteria and one exclusion criterion.These criteria are detailed in Table 2.In short, the exclusion criterion eliminates research that is not computer science-related, whereas the inclusion criterion prioritizes papers that contribute to SSI in response to our RQs.Articles had to meet at least one inclusion criteria.

Exclusion Criterion
EC-1 The research work is not in the area of computer science.
We are not reviewing and mapping standalone SSI solutions, despite the fact that they may incorporate practical progress (such as Sovrin [7] and uPort [8]).Multiple surveys have been conducted on these works [9,[13][14][15].As a result, when it comes to practical progress, we prioritize works that raise specific pragmatic concerns about any aspect of the SSI ecosystem and propose solutions.Consider, for example, a piece that discusses the difficulty of recovering SSI keys that have been lost and offers a new solution to the problem.This work would comply with IC-2.Assume, however, that a research paper is published describing an implementation of SSI for IoT.While this work may make a significant contribution to the IoT literature, it does not satisfy IC-2 if it does not present a problem concerning SSI in general and a solution to that problem.
EC-1 is applied to the title, author keywords, and abstract in the second stage of our study selection process, effectively eliminating articles that are not related to computer science.The third phase involves obtaining and reading the remaining studies in their entirety, ensuring that they comply with IC-1, IC-2, or both.Then, articles that violate IC-1 or IC-2 are removed as well.

Data Extraction
To extract data from primary studies, we adapted Petersen's template [77].It is composed of three components: (i) a data item; (ii) a description; and (iii) the RQ to which the data item corresponds, as illustrated in Table 3. Except for the Study ID, which was generated manually, the General items were obtained from articles or their online metadata.Following the reading of a pilot set of articles, two Conceptual and two Practical data items were created to gather evidence and address the RQs.

Taxonomy
To develop a taxonomy to categorize SSI research, we used the three-step keywording method [78]: (i) the researcher reads the abstracts (and, if the abstract is of low quality, the introduction and conclusion as well), extracting keywords and concepts that indicate the article's contribution and the context of the research; (ii) the set of keywords is combined to create a high-level understanding of the research contribution; and (iii) the final set of keywords is clustered to create categories.The last step is the result of the process of making, updating, and merging categories, as well as classifying articles into the new categories that were made.

Search Execution
On February 15, 2022, this search string was entered into the ACM Digital Library [81], IEEE Xplore [82], ScienceDirect [83], and Springer Link [84] databases, which host popular computer science conferences and journals.To supplement our database search, we performed additional searches on Scopus Preview [85], Web of Science [86], and Google Scholar [87].Additionally, we queried Google Patents [88] on the same day and applied the search string to the title and abstract of patents, yielding seventeen results.Table 4 displays the number of search results returned by the queries.

Google Patents 17
Patent Search 17

Study Selection and Data Extraction
Our three-phase study selection process was executed five times, as presented in Figure 3.We applied the first execution to the outputs of the database search and the second to the patent search results.The combined output was a set of fifty-nine works which formed the input set for both forward and backward snowballing [89].In short, backward snowballing consists of reviewing all references in a document, while forward snowballing finds other works that reference it.The snowballing was repeated until no new work was found that satisfied our selection process, which required three runs.The remaining eighty-two works constitute our result set.We should point out that two researchers independently assessed each paper at every stage of the selection process, and a conflict resolution meeting was organized.We point the interested reader elsewhere [90] for the complete list of papers, our evaluation regarding their inclusion or exclusion for all five runs of the study selection process, and the data extracted with our data collection form.

Threats to Validity
The following validity threats are critical and must be highlighted [77]: (i) descriptive; (ii) study identification; and (iii) data extraction and classification.
To mitigate the risk of collecting observations inaccurately from research papers, i.e., the descriptive validity threat, we developed and used the data collection form described above to collect relevant evidence.The first author used the data collection form, and the second author evaluated the results.
Following that, to minimize the possibility of overlooking relevant work, i.e., the study identification validity threat, we did not restrict our database search by publication year or venue.Backward and forward snowballing was also used to supplement the database search.
Concerning the last threat to validity, namely data extraction and classification threat, it should be noted that researcher bias and human error cannot be completely eliminated because these processes involve human judgment.To avoid this, the second author examined at the first author's data extraction and classification.
Furthermore, it is worth stressing that identity management has been extensively studied for decades.Thus, despite the fact that numerous research efforts were conducted before the term "self-sovereign identity" was coined, a large number of research efforts can arguably contribute to the many facets of SSI.Ultimately, deciding which work makes a significant contribution to SSI is entirely dependent on the researcher's interpretation.To avoid this interpretation bias, we reviewed and mapped works that explicitly mention the term self-sovereign identity or any synonym from our search string.

Findings
The next five sections present our findings.First, the proposed taxonomy is introduced.Then, the following four sections answer our RQs respectively.

Taxonomy of Self-Sovereign Identity
We used the keywording method [78] to identify distinguishing characteristics of the reviewed work.These characteristics were combined into a proposed taxonomy with two facets: conceptual and practical, as illustrated in Figure 4.These two facets are further subdivided into additional facets, forming a tree-like hierarchy.Concepts, sometimes referred to as terms, are the leaves of this hierarchical tree.The number of existing concepts under the facets of our proposed taxonomy, i.e., the leaves, is likely to grow in the future.New research, for example, may introduce new pragmatic challenges.Future work can build on our proposed taxonomy and include new initiatives.
We present and discuss the state-of-the-art of SSI in the following sections through the lens of the proposed taxonomy.These sections are arranged in accordance with the taxonomy's facets and concepts.We discuss them and the works in terms of their most defining facet, namely the objective or problem they are attempting to solve because the majority of surveyed works are classified under multiple facets due to exhibiting a variety of characteristics.We begin with the practical facet.

RQ-1: What practical problems have been introduced and solved?
Our taxonomy enabled us to classify surveyed materials and generate visualizations to help answer our research questions.The data items in our data extraction form pertaining to our first research question are organized in Table 5 according to the facets and terms of our taxonomy under the practical facet, which were fulfilled by sixty-nine of the eighty-two reviewed materials.

Management
The management facet encompasses five characteristics that deal with the governance of credentials and claims presentation in SSI: (i) metadata search; (ii) protocol integration; (iii) identity derivation; (iv) wallet security; and (v) credential as a service.These concepts and the works that explore them are presented next.

Metadata Search
The authors of [91] introduced the problem of metadata search in blockchain-based SSI systems.Due to the unstructured nature in which data is stored in blockchain, it becomes a challenge to look for credential metadata stored on the ledger.The authors argued that creating new types of credentials comes at a monetary cost in Sovrin, and thus it is worth reusing existing credential metadata.Hence, effectively tackling the challenge of finding metadata in blockchain-based SSI results in reducing monetary cost for issuers.To attack this problem, the authors of [91] used Apache Solr [153] to build a search application that allows users to find credential metadata stored in Hyperledger Indy [154], which is the open-source SSI platform that powers Sovrin [7].
Similarly, in [92] the problem of searching metadata is also explored.The authors employed a natural language processing technique [155] and pre-trained word vectors [156] to enable users to query the Sovrin network's credential metadata using natural language.The reported results outperform [91] for queries with synonyms rather than exact terms.

Protocol Integration
Another area of study in SSI is protocol integration with production-level protocols such as SAML [36], OAuth 2.0 [37] and OpenID Connect [157].Failure to successfully address this challenge may jeopardize the adoption of SSI, as billions of users have electronic identities in IdPs that can only communicate using the aforementioned protocols.This challenge was presented as the driving problem in eight research papers [99][100][101][102][103][104]106] and was also mentioned in three other works [105,107,108].Three articles aim to integrate SSI with OpenID Connect [99,101,102], two works focus on OAuth 2.0 [103,106], one on SAML [103], and one paper on these three protocols [100].
Using the OpenID Connect protocol, [99] constructs a gateway between two SSI solutions (uPort [8] and Jolocom [158]) and web applications.Users can compose their identities by selecting claims, which are verified by the gateway and then transferred to the destination application for authentication via the ✓ ✓ [152] ✓ OpenID Connect protocol.Similarly, [102] implements an OpenID Connect gateway between Hyperledger Indy [154] and other applications, from which users of any instance of Hyperledger Indy (such as Sovrin [7]) can benefit.In contrast to [99] a wallet application is designed to store credentials on the user's smartphone.Claims, which the user must present, are used to implement application-level authorization.[101] authenticates the issuer and holder and transfers VCs using OpenID Connect.These VCs include an advanced or qualified signature or seal, which confirms the natural or legal person's identity.A bridge ensures that DID methods and signatures are interoperable among issuers, holders, and verifiers.
Hong et al. [106] used OAuth 2.0 for authorization, making it easier to integrate their solution with existing web services.In contrast to [99,102] authentication in [106] uses a custom mechanism rather than OpenID Connect.Lagutin et al. [103] were concerned about the burden of issuing and verifying VPs in resourceconstrained devices such as IoT sensors and actuators.A bridge protocol is proposed in which a server receives and processes VPs before distributing modified OAuth 2.0 access tokens to authorized entities.These tokens are given to resource-limited devices, which authorize access to the resource or service.
The authors of [104] proposed an integration with SAML, which allows SSI-based identities to authenticate with SPs via SAML.Gruner et al. [100] presented a more comprehensive architecture that enables users to integrate various SSI offerings with SAML, OpenID Connect, and OAuth 2.0.Additionally, they accomplished identity derivation, which is described below, as well as the evaluation of trust models used to accept or deny interactions.

Identity Derivation
Allowing users of SSI solutions to access web applications via the OpenID Connect protocol resulted in the implementation of identity derivation mechanisms, that is, methods for deriving SSI identities from non-SSI identities.This is the primary goal of [97,98], but it was also accomplished in [95,96].
The authors of [98] proposed an electronic identity derivation protocol in which user attributes from various IdPs are collected and transformed into VCs.The transformed VCs can be presented using VPs.Differently, [97] employs x509 digital certificates [159] with high LoA to generate VCs with high LoA.Digital certificates achieve high LoA through a rigorous enrollment process in which the certificate subject must present government-issued documents in person.Both a digital wallet running on a device with a secure enclave and a FIDO2-compatible token [160] equipped with a biometric fingerprint reader generate a key pair after authenticating the owner of an x509 certificate.The VC includes the two public keys.When this VC is used to generate VPs, the private keys of both the digital wallet and the FIDO2 token are accessed.Because the latter requires biometric authentication to perform operations on the private key, the VC holder must be its owner.
Biometric data can be used to make SSI identities, so Bathen et al. [95] explored the possibility of replay attacks when an attacker has access to biometric templates.They contended that user-managed cancelable biometrics is the solution to this problem.A person's self-image, i.e., a selfie, is passed through one-way functions to mask the original data, and the resulting data is then stored on a blockchain and managed as a credential.Mishra et al. [96] claimed that the underlying techniques used in [95], namely bloom filters [161], are vulnerable to invertibility and linkability attacks [162].To address these issues, their proposal uses OpenCV [163] to extract feature vectors from selfies, which are then subjected to a one-way transformation [164].Both methods generate revocable biometric credentials suitable for two-factor authentication.

Wallet Security
One patent [129] is concerned with wallet security.Its authors proposed a hardware-based wallet that stores cryptographic keys and credentials.It can connect to mobile devices when necessary and disconnect when not.

Auditability
When compared to other identity models, SSI provides more privacy.Nonetheless, some use cases necessitate the auditability of credentials or presentations.According to Lemieux et al. [132], there are use cases that require the collection of evidence that a VC was issued and sent to its holder, or that a VP was performed in order to comply with legal, audit, and accountability standards.They proposed using Shamir's Secret Sharing (SSS) [165] to generate a group key capable of encoding and decoding Personal Identifiable Information (PII), such as VCs or VPs, and storing it in a proof registry, i.e., a persistent storage for auditing.This group includes the issuer, the trusted audit service, and the holder.The group key can be generated by two of the three members.

Credential as a Service
Three papers discuss the drawbacks of local credential storage and issuance [119][120][121].We classify them as credential as a service because their solutions involve outsourcing the storage or processing of credentials.
Samir et al. [120] affirmed that storing VCs in a single location is a potential point of failure in SSI implementations because wallets can be lost.Furthermore, they noted that digital wallets confined to a single mobile device might not remain online at all times.To address these concerns, an anonymous multi-party computation solution based on smart contracts and SSS is proposed.It uses SSS to divide a VC into multiple shares, which are then stored on online platforms.Then, smart contracts use multi-party computation to process requests to the VC shares.
In the same way, in [119], holders do not keep their credentials.Credentials are instead stored on a storage service and protected by a two-party protocol.Furthermore, holders do not have direct access to their data.Instead, the VC holder has control over an agent that runs on the storage service and contacts the user to request permission to share information.Users never receive their credentials in this manner, and thus do not have to worry about storing them securely.Because the credentials are encrypted using a two-party encryption protocol, the storage service cannot misuse them.
The authors of [121] postulated that having the infrastructure to issue credentials is a barrier to SSI adoption.As a result, they proposed using a cloud-based Trusted Execution Environment (TEE) [166] to issue and distribute VCs to holders.

Operational
The operational facet is divided into two facets: VC and VP.They are a collection of concepts related to the functional aspects of verifiable credentials and verifiable presentations.

Revocation
Credential revocation and status verification are long-standing problems in IAM research.The Online Certificate Status Protocol (OCSP) [167] of traditional public key infrastructure (PKI), for example, allows users to query the status of a certificate.However, the query sends the serial number to the Certificate Authority (CA), revealing to the CA where the certificates it issued are being used and thus infringing on user privacy.The revocation verification of VCs in a privacy-preserving manner is an active area of research in SSI.Seven works present new approaches to addressing this challenge [57,98,[113][114][115][116].
The Verifiable Credentials standard from the World Wide Web Consortium (W3C) defines the metastructure and lifecycle of VCs and VPs [57].Both VCs and VPs must have the following: (i) metadata describing the data; (ii) the data; and (iii) cryptographic proof of integrity and authenticity.Aside from the roles of issuer, holder, and verifier, a fourth role is the verifiable data registry, which incorporates credential metadata, revocation registries, issuer public keys, and other information.When a model instantiates this metamodel, it must specify the syntax, cryptographic algorithms, and proof format that will be used to construct VCs and VPs.For example, in Hyperledger Indy [154], a VC's metadata is stored in a distributed ledger, whereas the data and proof are stored in a JSON file.
In [113], an approach is detailed in which social media platforms such as Facebook and LinkedIn are used to request, generate, and revoke credentials, as well as present and revoke presented claims.Predicates over credential attributes, on the other hand, are not supported; only attribute disclosure is.
The authors of [115] designed a VC that can be issued and revoked by two parties.They argued that this is useful in the financial context.A financial company issues credit scores as VCs together with clients, but these can only be revoked by the financial company with the credit bureau's permission.Their VC includes two digital signatures, one for each entity.A protocol for revocation and status verification using ZKP is proposed.
Chotkan and Pouwelse [116] created a mechanism for propagating revocation information using a gossipbased algorithm.Users save the revocation information of their trusted authorities and broadcast it to random peers at predetermined intervals.As a result, issuers are not required to remain online in order to provide revocation data, nor are clients required to contact them in order to obtain such data.The authors provided a threat model as well as a thorough examination of various efficiency metrics.
Abraham et al. [114] also addressed the issue of offline credential status verification.Their approach is to implement the verifiable data registry as a blockchain, which generates attestation of the validity of requested certificates with a timestamp.When there is no connectivity to the revocation registry, this attestation is presented, and the relying party determines whether it is recent enough to be accepted.

Decentralized Identifiers
On the internet, entities are identified in a variety of ways.Identification occurs at all levels, from the application to the network.Identifiers are typically issued or controlled by a regulatory agency and assigned to users and machines.IP addresses, for example, are managed by IANA [168], while e-mail providers manage e-mail addresses.A research trend in SSI is to create and improve decentralized identifiers from the machine to the human level.Four research articles [148][149][150][151], two protocols [146,147], and one W3C standard [58] have been written in response to various challenges associated with decentralized identifiers.
The Decentralized IDentifiers (DID) standard defines a metamodel to create identifiers that are issued and controlled by their owners [58].A DID method is an instance of this metamodel, which sets specific details such as the underlying encryption algorithms and the mechanism by which the method's identifiers are guaranteed to be unique.Each DID is a three-part Uniform Resource Identifier (URI) [169] separated by a colon: (i) the did scheme identifier; (ii) the DID method identifier; and (iii) the DID method-specific identifier.For instance, did:key:z6MkpTHR8VNsBxYAAWHut2Geadd9jSwuBV8xRoAnwWsdvktH is a valid DID identifier that uses the DID method key [170].In this method, the first character of the method-specific identifier is always z, and the following three characters represent the public-key algorithm used.In this case, the characters 6Mk indicate that Ed25519 [171] was used, and the subsequent characters are the multibase [172] encoded public-key.Other DID methods rely on blockchain and other technologies to preserve the user-generated DID and its associated DID document, a JSON-based document with communication endpoints and cryptographic keys to ensure that the holder of a DID is its owner.Although W3C's DID standard [58] provides a foundation for self-sovereign identifiers and the authentication of their owners, it does not define how two (or more) DIDs can interact.The authors of [146] proposed DIDComm, a two-party protocol for establishing a secure communication channel between the holders of two DIDs.It allows messages to be sent via traditional protocols such as HTTP, BlueTooth, NFC, and out-ofband channels such as QRcode and e-mail [173].Nonetheless, entities must first exchange DIDs before they can communicate.This is the driving problem of the DID Exchange protocol, which allows DID documents to be exchanged online or offline [147].
According to the authors of [149], transporting DID documents, which contain identifiers, keys, and communication endpoints, adds a significant overhead to IoT devices.They addressed this issue through three innovations: (i) a new DID method called DID:SW that has a smaller footprint than others; (ii) the use of Concise Binary Object Representation (CBOR) [174] to encode DID Documents; and (iii) an extension of DIDComm [146] to DID-based IoT Communication (DIoTComm), which reduces communication parameters and is based on CBOR.The DIoTComm protocol has a five-fold lower overhead than DIDComm.
According to Kim et al. [151], endpoint URLs in DID documents have an anonymity issue.They claimed that URLs could expose personal information such as country of origin and other affiliations.They proposed two countermeasures: (i) removing URLs and replacing them with other forms of communication; and (ii) using gateway URLs that only redirect authorized entities to the correct address.
From another angle, Smith [148] focused on self-certifying identifiers as a means of establishing trust.In this work, user-generated identifiers are coupled to public-key cryptography and explicitly disclose the hash of their next public key in their transactions.This proactive key rotation results in an auditable chain of digital identifier key transfers.To store the history of digital identifiers, a distributed ledger is presented as a root-of-trust.
The key rotation challenge was also addressed in [150] using Lamport's one-way hash chain [175].This technique explores the pre-image resistance of cryptographic hash functions by constructing a chain of hash operations on a secret seed and revealing hash values in reverse order.Public-key cryptography is added to this scheme so that only the DID creator, i.e., the person who knows the secret seed, can rotate to the next key pair [150].

Issuer Authorization
Three works present concepts for implementing issuer authorization [109][110][111], which entails issuers creating hierarchies akin to those found in traditional PKI.
Schanzenbach's Ph.D. thesis [109] describes a structure based on name systems (such as the Domain Name System (DNS) [176] and the GNU Name System (GNS) [177]) that enables an issuer to delegate authorization to other issuers to issue credentials with specific attributes.Additionally, these secondary issuers have the ability to delegate authorization to other issues, and so on.
With the same objective in mind, but a different approach, the authors of [110] formalized a model that utilizes the RSA cryptographic accumulator [62] to enable authorized issuers to issue credentials without disclosing their identity.The authors argued that this addresses a gap in the Hyperledger Indy framework [154], in which an issuer A cannot prevent another issuer B from issuing credentials in the same format as A.
According to the authors of [111], VCs issued in SSI today are assumed to be from trusted issuers, such as government agencies.Their work proposes an issuer authorization scheme based on policies, in which an issuer is only authorized to issue VCs if its policy allows it to.The root of authority serves as the policy authority, defining policies for issuers.

Delegation
Three research papers propose methods for achieving credential delegation.It refers to an individual's or group's ability to delegate some of their identity data to another individual or group of individuals.Two of them [72,130] are discussed later in this manuscript (Section 6.2.1.5and Section 6.3.1), as delegation is not their primary goal.
Lim et al. [131] proposed a system for VC delegation that requires the VC subject to confirm or deny the delegatee's use of the VC.A VP constructed by delegatees is limited in their method, as they only have the VC in an encrypted format.As a result, any VP presented by a delegatee induces communication with the VC subject in order to obtain authorization and incorporate the VP with required data.

Backup and Recovery
Another trend of research in SSI is the backup and recovery of keys and certificates.Empowering users with the ability to control their credentials currently comes with many burdens that were previously the tasks of IdPs.At this point, the backup and recovery of identity-associated materials are significant burdens.
Soltani et al. [134] used a decentralized protocol to handle key recovery.They created a wallet application in which users define their trusted peers and the recoverable keys.In a protocol based on SSS [165], key pieces are distributed to trusted users and can be recovered by the owner if a minimum number of parts can be retrieved from peers.
The authors of [137] presented a trade-off between security (storing an encrypted form of the private key in lower security environments) and usability (recovering the original private key without the need for long passwords or Hardware Security Modules (HSMs)).The private key is divided using SSS [165] to achieve this trade-off.The user must correctly answer a minimum number of previously registered questions, with each response constituting a component of SSS.To improve security, the minimum number of correct answers might be increased.
[135] also addresses the issue of identity recovery.Its authors suggested that a suitable solution would be to use another device in the identity owner's possession as a storage provider.To improve usability, it has been recommended that protocols could be developed and integrated with routers, resulting in a seamless user experience.
In [72], a self-signed root certificate acts as a CA that creates short-lived certificates for the users.The authors concluded that because certificates are rotated on a predetermined schedule, the key recovery issue is resolved as long as the CA's private key remain intact.
In [118], ZKP allows the creation of a VP to mathematically prove that a VC was created by an issuer who is a member of a group of authorized issuers without revealing any unique identifier, such as the issuer's public key.Finally, the authors of [64], [65] and [66] enable credential holders to explore the full expressive power of zk-SNARK, i.e., to produce proofs in any language in NP.

Reuse Prevention
Nothing stops the RP from copying what it learns from the user after receiving a VP.Preventing the reuse of acquired knowledge is one of the most challenging aspects of SSI.
The creators of [94] attempt to solve this challenge.They proposed an architecture that allows holders to charge RPs to access their attributes while preventing reuse.Instead of selective disclosure or proofs over private data, Fully Homomorphic Encryption (FHE) [186] is used.FHE is a method for processing encrypted data and producing valid results without decryption.Their proposal uses FHE to process user data in a secure third-party environment that both the user and the RP trust.According to the authors, this technique prevents private information from being leaked.Although it is unlikely that FHE will reveal user attributes, information about the computation over private data can be revealed.

System Design
The facet system design encompasses four concepts related to the conceptualization of SSI: design/architecture, Human-Computer Interactions (HCI), risk assessment and security model.

SSI Design/Architecture
Five articles discuss various aspects of what we refer to as SSI Design or SSI Architecture [57,112,[126][127][128]130].Rather than addressing specific issues or proposing SSI systems, these publications explore and analyze the planning, design, and construction of SSI systems.Previously, the W3C's VC metamodel [57] and Stokkink et al.'s [112] VP metamodel were examined.This section discusses the remaining three research papers in this category.
In [130], design patterns are presented to assist in the development of new SSI applications on the blockchain.The lifecycles of key management, identity management, and credential management are discussed.Then, twelve patterns are proposed within these three groups, following Martin et al.'s [187] format, which includes a pattern name, summary, context of use, problem statement, discussion, solution, and its consequences.
On the other hand, the authors of [127] asserted that identity management systems could be reduced to two mappings: (i) digital identifier and its owner, and (ii) digital identifier and its credentials.Furthermore, for both mappings, the following operations are required: create, read, update, delete, and verify.The system's chosen trust model determines the manner in which they are built.If the goal is SSI, all of them should be completed independently of any authority.
Barclay and colleagues [126] demonstrated a modeling technique that enables non-technical stakeholders to specify and comprehend SSI entities and their relationships.They used iStar 2.0 [188], an actor-based modeling language that enables the representation of actors and the interdependence of their goals.In an SSI system, the actors are the users who issue credentials and present claims.
Finally, Ferdous et al. [128] created a detailed mathematical model of SSI.This formalization includes a feature that is unique in the SSI literature reviewed: user de-registration.

HCI
There are five research materials [137,138,[140][141][142] and one patent [139] that look into usability and human perception issues in SSI systems.Section 6.2.1.5already introduced the work of Sign et al. [137].They are grouped under the HCI concept of our taxonomy.
Toth et al. [140] claimed that biometrics and other forms of two-factor authentication only marginally improved identity security.They then introduced a software agent to manage user data.It helps users decide which credentials to use and which private information to reveal, improving security through improved human-computer interactions.
With a different emphasis, the authors of [139] submitted a patent for an authentication method based on a users' interactions with their personal device.To determine if the person holding the device is the owner, the device monitors application usage patterns, browser history, location history, and other measurements.
Pertaining HCI and trust, [142] suggest that deciding whether or not to trust an identity and its claims is a major risk for an algorithm to decide on its own.The authors put forward a proposal in which the user must actively decide whether electronic identities can be trusted.The user is empowered to make that decision by viewing a graph of the proponent's previous interactions with other electronic identities, which is generated from the history stored in a distributed ledger.
The authors of [138] presented an extensive study of SSI usability and discovered that current SSI systems interactions necessitate extensive prior knowledge and participant responsibility.The authors investigated the SSI interface layer using the human data interaction theory [189], which says that humans interact with data rather than computers.To increase the likelihood of adoption, the conclusion emphasizes the need for standardization and design thinking of interfaces and interactions.
Shanmugarasa et al. [141] addressed the issue of users managing VPs.Non-technically competent users, for instance, may agree to submit more information than the RPs actually needs.The proposed solution to this problem is a privacy preference recommendation system that employs machine learning algorithms and pre-trained models based on survey data on privacy preferences.This system assists the user by suggesting on which attributes can be shared.

Risk Assessment and Threat/Attack Model
In relation to the design of SSI, two concepts related to computer security were observed in the reviewed literature, namely risk assessment and threat/attack model.The latter entails two activities: (i) identifying and analyzing potential threats; and (ii) comprehending how an attacker can exploit them.These two tasks are part of the risk assessment, which also includes calculating the potential loss if a vulnerability is exploited.
Eighteen works described in the other sections incorporated one or both of these activities to improve their schemes.While three articles discussed risk assessment, only one makes a novel contribution by tying risk assessment and SSI together [152].
Naik et al. [152] developed a tree-based risk analysis method for SSI.The attack tree approach represents the attack goal as the root of a tree, and the methods and actions to achieve the goal as the leaves [190].In this work, important assets in an SSI system are identified first.Then, the attack tree is used to generate input for their risk analysis, which concludes with appropriate mitigations for the identified risks.

Trust
The final practical facet of our taxonomy is trust.Entities in any IAM model must decide whether they trust other entities and, as a result, the data they generate.Since the inception of SSI, a strong emphasis has been placed on the use of verifiable credentials in order for RPs to be certain about the origin of the credentials [57].
SSI promotes the decentralization of identity management.Furthermore, the majority of SSI offerings endorse the deconstruction of centralized sources of trust (e.g.IANA [168] and Certification Authority Browser Forum [191]).Most SSI platforms allow anyone to issue VCs in anyone's name.As a result, reputation models that allow RPs to quantitatively assess whether a VP (and thus a VC) is trustworthy or not have been an active topic of study.Another topic of interest is the development of trust policy evaluation techniques for evaluating policy-based reputation models.
Gruner et al. [125] used graph theory to model trust in blockchain-based SSI systems.The originator of VPs is endorsed in a blockchain by system participants in their proposal.This enables the creation of an endorsement graph.They proposed an algorithm that navigates the graph and calculates a trust factor for the system's participants.This trust factor can be used to determine whether a participant can be trusted or if they are a malicious user.
Bhattacharya et al. [123] expanded on [125] by including time as a variable in their reputation model.They hypothesized that, in the context of Sovrin, the initial reputation of issuers could be influenced by Sovrin's onboarding process, which could be biased or falsified.
The authors of [122], on the other hand, developed a probabilistic model of trust.They applied probability theory to determine whether claims about the same information from different issuers could be combined to generate trust about it.
Zhong et al. [105] raised the problem of current SSI offerings' lack of interoperability and how this restricts the evaluation of VC credibility.Their solution to this problem employs cross-chain smart contracts to compute a credibility score based on the boolean evaluation (either support or refuse) of all verifiers who verify the VC, taking into account each verifier's credibility.
Finally, Abramson et al. [124] described the different user roles and transaction types stored in the Hyperledger Indy blockchain, including the steps a verifier can take to gain confidence when receiving a presentation.For example, they argued that if multiple entities issue credentials of a given format (credential schema), this provides more assurance than a schema that is only endorsed by a single issuer.

Trust Policy Evaluation
The trust policy evaluation is covered in eight papers [99-101, 110, 142].Three of them [99][100][101], which were previously introduced, are concerned with protocol integration and identity derivation.One aims for issuer authorization [110], while the other for HCI [142].The following are the three papers that attempt to address this problem [143][144][145].
The authors of [144] proposed that entities define trust policies through lists of authorities they trust.These trusted entities, in turn, also publish which entities they recognize as trustworthy.For instance, one could trust a bank federation that periodically reports which banks it recognizes as credible.Thus, when receiving the VP of a person stating that she has an account on an unrecognized bank, a query to the bank federation's list of trusted banks is enough to decide if the VP can be trusted or not.
Inoue et al. [143] considered the task of updating an individual's information across multiple issuers and RPs, each with its own trust policy.This challenge was modeled as an Integer Linear Programming (ILP) problem, with trust policies defined as credibility requirements for incoming update requests.Updating a person's information in an issuer or RP increases its credibility.The ILP is then transformed into a graph problem, and an approximate solution is found using a heuristic based on Dijkstra's algorithm.This article is the only one in the survey that provides a formal description of the problem.
The Trust Policy Language (TPL) [192], a declarative language for specifying trust rules without concern for low-level details, was adapted to work in SSI in [145].The TPL has been enhanced with SSI-related concepts such as DID and VC, allowing the specification of rules to validate VPs.

RQ-2: What properties, formal definitions and cryptographic tools have been used?
The first two years of examined papers were mostly focused on conceptual contributions to SSI.From 2018 forward, the works evaluated began to provide mathematical structures to help properly represent concepts.There are twenty-seven articles in total that include some type of formalism.Table 6 shows these articles and the building blocks they utilized.We divide the formal definitions into two categories: cryptographic tools and non-cryptographic tools.Cryptographic tools are well-known, low-level cryptographic algorithms that are often employed in computer systems to develop secure protocols and systems [193,194].
Inoue et al. [143] modeled trust policy evaluation using Integer Linear Programming (ILP).ILP is a mathematical optimization formulation in which all variables are integers and the objective function is linear [195].It may be used with other formulations, such as graph theory to express graph-related problems.For instance, the shortest path between two nodes.In addition to [143], two other papers used graph models to generate reputation models [123,125].
Two works lead by Martin Schanzenbach [107,109] used Name System (NS) (e.g.Domain Name System (DNS) [176], and GNU Name System (GNS) [177]) as blocks for attacking revocation and issuer authorization challenges.These systems are coupled with Attribute-Based Encryption (ABE), which allow the user to selectively give and revoke access to some of their attributes to reach their objectives.Another work that models a solution based on ABE is [93].Lastly in the non-cryptographyc tools category is probability theory.Both Gruner et al. [122] and Jakubeit et al. [133] base their contributions on this branch of mathematics.
We mapped nine cryptographic techniques formally defined in the examined literature.Most of the practical research we surveyed discussed about how cryptographic primitives like public-key cryptography and hash functions are used.Nevertheless, we only included those that did so with more than simple textual explanations in this study.
Multi-Party Computation (MPC) is formally described and used in [120].This field of research investigates methods for parties to compute a function together over their inputs without revealing them to the other parties [196].In [120], MPC was used in conjunction with Shamir's Secret Sharing (SSS) [165].This technique was used in two other articles to achieve backup and recovery of credentials [134,137].The SSS algorithm breaks a secret into shares.The original secret is recalculated using a predetermined number of shares, generally fewer than the total number of shares.
Another technique that was precisely described in the SSI literature was Proxy Re-Encryption (PRE) [178].This technique allows data encrypted with a person's key to be decrypted using someone else's key without revealing anyone's data or key to the proxy.It was used by Kim et al. [136] to recover private data.
The authors of [108] implemented VP revocation with Chameleon Hashing (CH).This family of one-way functions employs a trapdoor to find collisions for a given input [179].
User privacy is the utmost goal of SSI, and the most popular technique used to increase privacy is to use Zero-Knowledge Proof (ZKP) to convince the RP of statements regarding the user's private information.Five articles that mainly propose data minimization techniques formally defined their approaches [64-66, 98, 117], four of which use zk-SNARK to achieve ZKP [64][65][66]117] and the other [98] uses Multi-Signature (MS), which is also employed in [114].MS allows a set of participants to sign a document or message.Two papers formally describe and use Cryptographic Accumulator (CAcc) as part of their solutions [97,110].CAcc is a data structure that enables the accumulation of a large set of values into one short accumulator.One of the characteristics of CAcc is that values can be added and set membership verified in constant time.The authors of [97] use it as part of the process of creating SSI identities from traditional PKI-based identities and [110] to achieve issuer authorization.
Lastly, Fully Homomorphic Encryption (FHE) [186] is used to prevent reuse of presented information in [94].FHE allows encrypted data to be processed without decryption.
8 RQ-3: What conceptual ideas have been introduced or refuted?
Christopher Allen [3] stated that there is currently no agreement on a definition of SSI and then presented ten guiding principles as a starting point.Our third research question is answered by an examination of the literature's debates on the SSI definition, which is now presented to the reader.
We found seventeen works that contribute to Allen's discussion regarding the meaning of SSI by using our review process.Table 7 summarizes these studies in accordance with our taxonomy, which has the facets add and refute under conceptual.Furthermore, the facet add is subdivided into functional and non-functional.[19,112,197]

Functional
No central authority means that no single organization should be in charge of or own an SSI solution [198, 202-204, 206, 207].The articles that define this property, as well as the articles that say that SSI should be free [198][199][200], make good arguments at first glance.However, upon closer examination, these characteristics may discourage businesses from investing in SSI.They would have to seek alternative sources of income and share control over their products.To some extent, this is what Evernym [209], a for-profit company, did when it split off Sovrin, a non-profit foundation that is supported by other organizations [210].Sovrin, on the other hand, is not free.While end users can join the network, receive VCs, and issue VPs for free, companies or other entities that enroll their end users must pay fees to [211]: (i) join the network; (ii) register a credential format, i.e., a credential schema; (iii) begin issuing credentials using a registered schema; (iv) register a revocation registry; and (v) revoke VCs.
According to two studies, SSI systems must be compatible with legacy identity management systems and protocols [198,206].According to the reviewed literature, this is a highly researched subject.The applied research focuses on two aspects of legacy compatibility: (i) protocol integration with prior standards such as SAML, OAuth 2.0, and OpenID Connect; and (ii) identity derivation in order to migrate identities from identity providers that adopt the aforementioned protocols to SSI systems.
According to [19, 112, 116, 197-199, 201, 207], the concept of verifiable presentation is an integral part of SSI such that, without it, we cannot achieve SSI.
Toth and Anderson-Priddy [59] defined four additional functional properties of SSI, two of which have not been accounted for by others: (i) counterfeit prevention, which involves the impossibility of producing fake identities from others; and (ii) identity verification, which requires interacting parties to be assured of the authenticity of the identity owner.According to the property identity assurance, which has been proposed elsewhere [206], entities that rely on (self-sovereign) identities should be able to see proof that the entities with whom they interact are who they claim to be.The fourth additional property proposed by [59] and others [199,206,207] is the impossibility of tampering with communications between identity owners, i.e. secure transactions.
Delegation is the final functional characteristic of SSI proposed in the literature [207].It is the capacity of identity owners to delegate some of their identity data to other individuals or groups of individuals of their choosing.This is a developing field of study [72,130,131].

Non-Functional
According to the authors of [198,199,201,204,206], a critical component of SSI is ensuring that people's data are recoverable in the event of loss of personal device.This theoretical proposition is also an active area of applied research, as evidenced by six recent articles [72,[133][134][135][136][137].
Six studies assert that usability is critical in SSI [59,[203][204][205][206][207].These works affirm that: (i) interfaces and experience must be optimized [206,207]; (ii) users' needs and expectations must be met and consistent across all platforms and services [203]; (iii) users should not require prior knowledge of blockchain technology [204]; as well as (iv) other underlying technologies such as cryptographic operations, biometrics, databases, and protocols [59].One way to accomplish these goals is to mimic physical identities and the interactions we have with them, thereby exposing the user to familiar workflows [59].Ultimately, if the user does not comprehend what is occurring and is unable to reason about it, the user is not sovereign [205].
Accessibility is a concept related to usability but has a more specific focus.According to three research papers in the reviewed literature, identity-related solutions should be accessible to as many people as possible [199,206,207].
Two authors claim that identities should always be available [128,199].The challenge of having highly available identity-related information in SSI is being addressed on multiple fronts.For example, [116] and [114] propose ensuring the availability of issuers' revocation registries in a decentralized and offline fashion.
In terms of auditability, Schutte [208] argued that auditing requires not only access to the details, but also the ability to read and understand them.
Another significant factor to consider is the scalability of SSI systems [198,199,203].While practical research observes and considers this aspect [92,100,109], it is not the norm in the surveyed literature.
Finally, there is a subset of articles arguing for the importance of regulatory compliance in the SSI ecosystem [116,201], such as the GDPR [44] and CCPA [45].Chotkan et al. [116] argued for the importance of verification and legislation compliance, despite the fact that the latter may weaken the strength of other SSI principles (such as privacy).The author of [202] did not say that GDPR compliance was necessary, but they discussed about how SSI systems can use verifiable claims to meet the following articles of the GDPR: (i) consent; (ii) pseudonymization; (iii) the right to be forgotten; (iv) records of processing activities; (v) data portability; and (vi) data protection by design and by default.

Refute
There are three works [59,200,208] that add new properties to SSI while also refuting some of Allen's concepts [3].They all refute the existence principle, which states that individuals cannot exist entirely in digital form, and that (self-sovereign) identities expose some aspects of the user.Toth and Anderson-Priddy [59] have also argued against transparency and protection, suggesting that more debate is needed on these topics.Similarly, the authors of [200] argued that previous discussions [202,212] about identity had failed to address the issue of existence.
Unlike the previous two studies, Schutte [208] examined Allen's principles through a more philosophical and less technical lens.He contended that an individual, or "self" is not an indivisible entity, but rather the result of constant interactions between various agents, both internal and external.He then criticized the principles of existence, control, access, and consent, claiming that an individual's identity is a "heuristic that simplifies information processing and decision making" [208], which is imprecise by nature and thus cannot fully anchor identity processes.Finally, he argued that claims are critical and can be viewed as signals broadcast by some actors and perceived by others, who must decide how to prioritize and interpret them.
9 RQ-4: When, where, and by whom were SSI studies published?
To address RQ-4, we aggregate the General data items gathered via our data extraction form.The following section discusses the findings.

Frequency of publication
In terms of publication frequency, Table 8 summarizes publications by year.Although it is a brief overview, it demonstrates the growing academic interest in SSI.Using Venn diagrams to represent the facets of our taxonomy, we can discern finer details regarding annual publication frequency.Figure 5 depicts the number of publications classified in this manner.
In response to Allen's introduction of the ten principles in 2016 [3], two publications were released in the same year [198,208].Works published in 2016 and 2017 are mostly conceptual writings that expand on Allen's discussion, proposing new principles/requirements [197,198,201,202,204,208] for SSI as well as refuting some [208].Since 2016, researchers have been conducting continuous conceptual research, indicating that the meaning of SSI is still being debated.Beginning in 2018, articles started to significantly introduce new pragmatic problems and solutions to the SSI ecosystem, as well as mathematical formalisms.Nonetheless, mathematical formalization and formal description of cryptographic tools in applied research, which help SSI grow into a well-defined field of study, account for less than or equal to half of all applied research published each year.

Publishing Venues
In terms of publication venues, forty-two papers were held in congresses, symposia, or forums, as shown in Table 9 under the category conference.Forty-two conference publications and six master's theses show research momentum.However, it is still in its infancy, with just one Ph.D. thesis and fifteen journal articles.The authors with the most publications in this survey are Andreas Gr üner, Alexander M ühle, and Christoph Meinel.They have co-authored three research papers [99,100,122] and two more with Tatiana Gayvoronskaya [19,125].As a result, the vertices and edges representing these three authors and their publications have the most weight in this graph (i.e., the thickest vertices and edges).
Andreas Abraham is the only author who has written four articles.Abraham's publications include a technical report [202], a research paper with Felix H örandner, Olamide Omolola, and Sebastian Ramacher [98], a second paper with Felix H örandner, Christof Rabensteiner, and Stefan More [114], and a third paper with the last two authors [97].
After introducing Andreas Abraham, who is a co-author of four publications, we now introduce the researchers who are co-authors of three: Stefan More, Martin Schanzenbach, and Hye-Young Paik.Apart from the two publications with Andreas Abraham, Stefan More also co-authored a research paper with Lukas Alber, Sebastian M ödersheim, and Anders Schlichtkrull [145].Schanzenbach's publications include his doctoral dissertation [109] and two articles co-written with Julian Sch ütte, one with Georg Bramm [107], and one with Thomas Kilian and Christian Banse [66].Hye-Young Paik and Liming Zhu co-authored an article with Yue Liu, Qinghua Lu, Xiwei Xu, and Shiping Chen [130], and Paik published another article with Yashothara Shanmugarasa and Salil S. Kanhere [141].Paik also shares a third article with Rahma Mukta, Qinghua Lu, and Salil S. Kanhere [111].
We present in Figure 7 the co-reference network of the surveyed literature.The vertices in this directed graph represent publications.The edges represent references between articles, with the destination of an edge indicating that the source of the edge references this work.The number of received citations determines the diameter of the vertices, and the color of the vertices is determined by the year of publication.
This graph shows the significance of W3C standards DID [58] and VC [57] for SSI.They are the two most referenced works in this map, with twenty-nine and twenty-one references, respectively.The first survey of SSI [19], published in 2018, ranks third in terms of citations, with seventeen.It is followed by the fourth most cited article, a comprehensive mathematical formulation of SSI from 2019 [128].
In terms of cross-references, forty-seven works are not cited in any of the surveyed publications.Thirty-five of these unreferenced works are from 2021, nine from 2020, two from 2019, and one from 2018.Similarly, twenty-seven publications do not contain any references to mapped work.Eight of these are from 2021, three are from 2020, six are from 2019, three are from 2018, five are from 2017, and two are from 2016.The scope of our survey is one of the reasons for publications that do not include references to other mapped works.We excluded SSI platforms such as Sovrin, Uport, and Jolocom, which are mentioned in many of these essays.

Open Challenges
The surveyed materials detail developments in the field of SSI.New publications will advance the conceptual debate about what it means for an identity to be self-sovereign, while also introducing new and unexpected challenges to the SSI ecosystem.We identify future research challenges based on the evidence gathered to address our research questions.They are discussed in detail below, along with recommendations.
A definition of SSI that researchers and practitioners accept.We have gathered evidence (see Section 8) that the majority of articles on SSI fundamentals agree with Allen's principles [3], while also adding new ones.Promoting a thorough review and discussion is critical in order to develop a new set of rules for defining SSI.Furthermore, mathematical formalization can be used to define precise boundaries.Having an exact definition of SSI will benefit future efforts and, ultimately, users who will be able to transition between SSI systems with the confidence that they share the same fundamentals.
Fundamental research.The majority of materials surveyed that include a mathematical model do so by designing it to their particular context.Only one of the articles reviewed provides a comprehensive mathematical formulation of SSI [128], but it does not address the SSI's inherent decentralized trust properties.Another article [130] discusses realistic considerations and provides design patterns for numerous facets of SSI, including trust.These publications serve as a valuable starting point.However, additional basic research is necessary to foster discussion about how to jointly represent identities, credentials, claims, and trust, which is critical for future pragmatic research.By addressing RQ-2 and RQ-3 (see Sections 7 and 8), we established a foundation for future fundamental research.
Special case attribute sharing.Revised publications allow VPs to: (i) selectively disclosure attributes [98,114,117]; (ii) create Boolean predicates about attributes [118]; and (iii) produce general expressions over attributes [64][65][66].Nonetheless, these methods are unsuitable when sharing characteristics that will likely stay unchanged for several years.For instance, the shipping address associated with an online purchase.As a result, additional research on VP is required to ensure that a diverse range of use cases is covered.

Sound trust models.
Trust plays an essential role in SSI and will be of paramount importance for the adoption of SSI solutions.Without comprehensive testing, trust models will become attractive targets for hackers.
This open challenge is exacerbated by the current standardization effort [214], which specifies a Boolean trust model in which a verifier either trusts or distrusts the issuer.This model does not cover the fuzzy scenarios of the real world.For example, an entity may present multiple claims about the same attribute where some issuers are trusted and others are not.Can this claim be trusted?Quantifiable trust/reputation models are needed, but only five of the surveyed articles address this issue [105,[122][123][124][125]. Furthermore, trust models require strong security, so formal verification techniques must be employed [215].
Blockchainless SSI.On blockchain-based SSI systems, dependence in centralizing authorities has been reduced but not eliminated entirely; instead, it has been replaced by a decentralized entity in which the user must place their trust in order to embrace SSI.To participate in an SSI ecosystem, the user should not be required to trust and rely on a blockchain consortium.However, the majority of publications operate under the erroneous assumption that blockchain is a necessary component of SSI.To be self-sovereign, the user should not have to trust anyone, not even a blockchain.
To facilitate the migration from other paradigms.In federated and user-centric models, the IdP bears the administrative burden.Users need only to be concerned with their passwords.With SSI, users are also overburdened with management tasks such as backing up their keys, identities, and credentials, as well as creating and presenting claims.We mapped publications that propose techniques for deriving (self-sovereign) identities from federated and user-centric identities [95,97,98], as well as those that discuss backup and recovery [72,[133][134][135][136][137].As a result, academia is gaining momentum on this migration issue.
Usability.Humans will interact with SSI systems.It is critical to research interfaces and how people engage with them, as well as how users interact with one another.Meaningful interaction must occur between users and applications and, more importantly, between individuals in an SSI ecosystem.Otherwise, users are unlikely to leave the comfort of their current federated/user-centric identities.A common trend in usability research in SSI is to mimic physical wallets [140,205], thus presenting the user with everyday interactions.
Innovative solutions are necessary and can be decisive for the widespread adoption and success of SSI.

Final Remarks
SSI is a new and promising identity management paradigm that increases people's agency in the digital world.It is gaining popularity among academics and industry.We filled in the gaps left by existing surveys, which lack methodological rigor and present biased results in favor of blockchain, thus missing the bigger picture.
In this article, we systematically surveyed both peer-reviewed and non-peer-reviewed literature that: (i) expanded the conceptual discussion on what SSI is; (ii) used mathematical formulation to precisely define one or more SSI-related problems and what cryptographic and non-cryptographic tools were used to solve them; and (iii) introduced novel pragmatical problem related to the SSI ecosystem and present a solution to it.After keywording the selected materials, a novel taxonomy of SSI was proposed.
To answer our four research questions, we conducted four separate investigations on the surveyed literature.The results were reported in accordance with the proposed taxonomy and summarized in tables.Maps and tables were also created to categorize the current state-of-the-art research in SSI.These resources, when combined, enable the reader to comprehend each contribution individually while also providing a broad understanding of the current state and maturity of research in SSI.The reported results of our systematic method serve as a foundation for researchers and entrepreneurs who wish to conceptually expand SSI or develop new SSI-related systems.Finally, we discussed unresolved issues and provided recommendations for future research.
Thanks go to the Federal Institute of Education, Science and Technology of Rio Grande do Sul (Instituto Federal de Educação, Ciência e Tecnologia do Rio Grande do Sul, IFRS), which allowed doctoral studies for Frederico Schardong.

Figure 1 :
Figure 1: The IAM models.Constant lines represent interactions, and dashed lines mean trust.

Figure 3 :
Figure 3: Number of articles in each stage of our study selection.

Figure 5 :
Figure 5: The number of publications in each facet of our taxonomy over time.

Figure 6 :
Figure 6: Co-authorship network graph, where vertices represent authors and edges their co-authorship of one or more works.

Table 1 :
Comparison with other secondary studies in the literature.

Table 4 :
Number of studies.
functional, which refers to the well-defined functionalities of SSI systems; and non-functional, which refers to more generic behaviors.The practical facet is used to classify publications that make pragmatic contributions, i.e., those that contribute to the data items Novel Problem and Proposed Solutions, and thus related to RQ-1.It is divided into three facets that are used to analyze work that presents challenges and proposes solutions in the following areas: (i) management and operational aspects of credentials; (ii) system design; and (iii) trust.The operational facet is further subdivided into the VC and VP facets.
Figure 4: Taxonomy of SSI.The conceptual facet categorizes the research efforts that, during our data extraction process, filled in the data items Add Concept or Refute Concept and thus help answer RQ-3.The new concepts are divided into two facets:

Table 5 :
Publications that introduced and solved novel problems in the SSI ecosystem.

Table 6 :
Publications that introduce mathematical formalism to SSI.Techniques are divided into cryptographic and non-cryptographic tools.

Table 7 :
Publications that add or refute philosophical views of SSI.

Table 8 :
Publications per year.

Table 9 :
Types of publishing venues over the years.

Table 11 :
Studies published in journals.