Identifying Consensus and Open Questions around Assessing or Predicting the Quality and Success of Cartilage Repair: A Delphi Study

: A range of surgical techniques have been developed for the repair or regeneration of lesioned cartilage in the human knee and a corresponding array of scoring systems have been created to assess their outcomes. The published literature displays a wide range of opinions regarding the factors that inﬂuence the success of surgical cartilage repair and which parameters are the most useful for measuring the quality of the repair at follow-up. Our objective was to provide some clarity to the ﬁeld by collating items that were agreed upon by a panel of experts to be important in these areas. A modiﬁed, three-round Delphi consensus study was carried out consisting of one idea-generating focus-group and two subsequent, self-completed questionnaire rounds. In each round, items were assessed for their importance and level of consensus against pre-determined threshold levels. In total, 31 items reached consensus, including a hierarchy of tissues in the joint based on their importance in cartilage repair, markers of repair cartilage quality and the implications of environmental and patient-related factors. Items were stratiﬁed into those that can be employed for predicting the success of cartilage repair and those that could be used for assessing the structural quality of the resulting repair cartilage. Items that did not reach consensus represent areas where dissent remains and could, therefore, be used to guide future clinical and fundamental scientiﬁc research.


Introduction
Articular cartilage (AC) is the specialised tissue covering the ends of the long bones and providing a low-friction, lubricated and wear-resistant surface for articulation of the synovial joints during locomotion [1][2][3]. Cartilage lesions are common in the human knee. Although the exact incidence is unknown, several studies report the presence of such lesions in 60-66% of knee arthroscopies of patients presenting with knee symptoms that require investigation (including unexplained knee pain and dysfunction) [4][5][6][7] and estimate a 12% incidence in the population as a whole [8]. Articular cartilage is characteristically avascular and aneural which, in combination with the low metabolic turnover of cartilage extracellular matrix components by the resident chondrocytes, results in a poor intrinsic capacity for self-repair [2,3,9]. Therefore, without intervention, AC lesions often fail to heal and can predispose the patient to further, progressive cartilage loss and eventually to secondary osteoarthritis (OA) [10].
The need to prevent the progression from cartilage lesion towards secondary OA has led to the development of numerous techniques that aim to repair, replace or regenerate lesioned AC [11][12][13]. These techniques range from palliative approaches (debridement, lavage), intrinsic reparative strategies (marrow stimulation through abrasion, subchondral drilling and microfracture), whole tissue transplantation (osteochondral auto-and Surgeries 2021, 2 287 allografting) and tissue engineering strategies (Autologous Chondrocyte Implantation, Matrix-induced Autologous Chondrocyte Implantation and Autologous Matrix-Induced Chondrogenesis) [11,12,[14][15][16]. There has also been an associated development of new scoring systems to assess the outcome of cartilage repair techniques. A multitude of histological [17], arthroscopic [18,19] and imaging-based scoring systems [20][21][22] are in use to assess the structural quality of repair cartilage, each containing a number of different parameters.
Thus, the published literature displays a wide range of opinions regarding the most important parameters and outcome measures in the structural assessment of cartilage repair. There is also a range of opinions on baseline factors that may influence the repair, including demographic factors such as gender [23] and age [24][25][26][27][28], and defect factors such as number [23,29], size [23,26,30] and location [26,28,29], with little agreement on which is the most important. We hypothesized that the combined expertise of a panel of experts could identify where consensus exists and areas where consensus is lacking on parameters that could be used to assess or to predict cartilage repair.
The Delphi technique is a method for acquiring group knowledge by turning individual opinions into a group consensus. The technique was primarily developed in the 1950's by Norman Dalkey and Olaf Helmer and found publication for the first time in 1963 following the declassification of some of the military projects for which the technique was developed [31][32][33]. The Delphi technique aims to collate existing beliefs and ideas surrounding a specific topic, deduce which of these are the most important and determine the consensus among a group of relevant people on an issue where previously there was little agreement [32]. The Delphi technique is based on the theory that the opinion of a group is more valid than that of the individual or that 'two heads are better than one' [32,34,35]. This is implemented through a series of iterative questionnaire rounds, between which there is a statistical analysis and controlled feedback of results [36]. Our Delphi study sought to compile items that were deemed to be important in the field of cartilage repair and, subsequently, to determine the levels of consensus on these items amongst a panel of experts.

The Delphi Panel
Individuals attending the two-day Oswestry Cartilage Symposium (a UK ICRS meeting) were invited clinicians and industry representatives. All attendees had expertise in cartilage repair to participate in the study. The delegation of this meeting was made up mostly of research scientists, with several clinicians and industry representatives. All attendees had expertise in cartilage repair and represented a range of backgrounds, including rheumatology, regenerative medicine, orthopaedic surgery, biomedical engineering and stem cell biology. All participants in the study were based in the United Kingdom.
Attendees at the first day of the meeting were approached to participate in the idea-generating round of this Delphi study. Attendees at the second day of the meeting were approached to form the Delphi panel for the two subsequent self-completed questionnaire rounds.
Absolute anonymity was not possible in this Delphi study as both the idea-generating round and the first questionnaire round were carried out at a face-to-face meeting. However, a quasi-anonymity was maintained throughout the process. In this way, the members of the panel were aware of the identity of the other panel members, but the opinions and judgements of individual participants remained anonymous [37].

The Delphi Process
The present Delphi study consisted of three rounds: • Idea-generating focus-group: Attendees at the first day of the symposium were invited to anonymously submit free-text opinions on factors that they considered important in terms of influencing or being used to assess the structural quality of repair cartilage. Participants were also invited to submit opinions relating to the cartilage repair field Surgeries 2021, 2 288 more generally. The idea-generating focus ground resulted in the generation of a mix of single statements and statement series which were collated and structured into the first self-completed questionnaire. The wording of both single statements and statement series was minimally edited, only to correct spelling and improve uniformity. • First self-completed questionnaire round: The first questionnaire was distributed to the Delphi panel. The panel was asked to complete the questionnaire to the best of their knowledge and to leave free-text comments to justify their answers. Completed questionnaires were collected and, then, followed a review and statistical analysis of the results. Items that were deemed to have reached an acceptable threshold level of consensus (see 'Judging Consensus and Importance') were removed, and those that remained were used to create the second self-completed questionnaire. A summary of the group answers and directly quoted comments from the previous round were included. • Second self-completed questionnaire round: The second questionnaire was distributed to the Delphi panel electronically. The panel was again asked to answer the questions to the best of their knowledge, this time considering the summary of the group answers and comments from the previous round. Again, the panel was invited to add free-text comments for each question to justify their answer, particularly if their answer lay outside of the trend of group answers from the previous round. Completed questionnaires were returned electronically.

Self-Completed Questionnaires
The questionnaires in round 2 and round 3 contained two types of items, single statements and statement series created using the ideas collected in round 1. The two types asked the panel:

•
To rank statement series in order of their perceived importance.
An example of the Likert-scale-rated single-statement-type items from the round 2 and round 3 questionnaires is shown in Figure 2A,B, respectively. An example of the statement-series-type item from the round 2 and round 3 questionnaires is shown in Figure 3A,B, respectively.

Likert-Scale-Rated Single Statements
The average percentage of majority opinions (APMO) and the interquartile range (IQR) were used as cut-off rates to examine the level of consensus achieved in the rating of single-statement items [36,39]. The APMO was calculated, where "majority agreements" and "majority disagreements" were the number of responses which represented a majority opinion (whether agreeing or disagreeing) to each statement. A majority was defined as more than 50% of respondents agreeing ("Largely" or "Completely") or disagreeing ("None" or "Somewhat") with a statement. The APMO was calculated separately for round 2 and 3 and was used as a cut-off rate to decide whether a statement had reached consensus [39,40]. The IQR threshold for consensus was pre-set at ≤1, which is deemed appropriate when using a 5-point Likert scale [36,40,41].
The perceived strength of support for each item to the panel was judged using the median and mode as measures of central tendency. Descriptive statistics such as mean and standard deviation would be inappropriate to use in this case as the Likert-scale is not delineated at regular intervals [36].

•
To rank statement series in order of their perceived importance.
An example of the Likert-scale-rated single-statement-type items from the round 2 and round 3 questionnaires is shown in Figure 2A,B, respectively. An example of the statement-series-type item from the round 2 and round 3 questionnaires is shown in Figure  3A,B, respectively.   In the round 3 questionnaire, the DELPHI panel was also provided with a summary of the group answers (a count of answers and the median) and directly quoted comments from the previous round. Note that the first statement 'size and depth of lesion should both be measured' is no longer present on the round 3 questionnaire, as this statement reached the threshold consensus levels in round 2 and was removed from the subsequent questionnaire.

Likert-Scale-Rated Single Statements
The average percentage of majority opinions (APMO) and the interquartile range (IQR) were used as cut-off rates to examine the level of consensus achieved in the rating of single-statement items [36,39]. The APMO was calculated, where "majority agreements" and "majority disagreements" were the number of responses which represented a majority opinion (whether agreeing or disagreeing) to each statement. A majority was defined as more than 50% of respondents agreeing ("Largely" or "Completely") or disagreeing ("None" or "Somewhat") with a statement. The APMO was calculated separately for round 2 and 3 and was used as a cut-off rate to decide whether a statement had reached consensus [39,40]. The IQR threshold for consensus was pre-set at ≤1, which is deemed appropriate when using a 5-point Likert scale [36,40,41].
The perceived strength of support for each item to the panel was judged using the median and mode as measures of central tendency. Descriptive statistics such as mean and standard deviation would be inappropriate to use in this case as the Likert-scale is not delineated at regular intervals [36].

Ranked Series
Kendall's coefficient of concordance (W), a consensus criterion representing the level of consensus between participants, was used to examine the degree of consensus in the ranked series questions [39,42]. Resulting consensus-judged ranks were calculated using the mean ranks.

Statistical Analysis
All data were processed, organised and analysed using Microsoft ® Office Excel 2013.

Ranked Series
Kendall's coefficient of concordance (W), a consensus criterion representing the level of consensus between participants, was used to examine the degree of consensus in the ranked series questions [39,42]. Resulting consensus-judged ranks were calculated using the mean ranks.

Statistical Analysis
All data were processed, organised and analysed using Microsoft ® Office Excel 2013.

The Delphi Panel
The Delphi process in this study was carried out over three rounds. The resulting Delphi panel size, and associated response rate, in each of the three rounds is shown in Table 1. There were 51 experts (37 research scientists, 8 clinicians and 6 industry representatives) in attendance on the first day of the Oswestry Cartilage symposium who, therefore, made up the Delphi panel for the first round-the idea-generating focus group. On the second day of the meeting, there was a reduced attendance of 38, and of these, 24 completed the first, self-completed questionnaire round. The composition of the panel in this round was unknown to the authors. The attendees of the second day were then approached by email to complete the second self-completed questionnaire round, of which 15 (14 research scientists and 1 clinician) did so.

Responses
Round 1, the idea-generating focus group, resulted in the collection of 54 free-text statements and opinions in total. These statements were used to generate the first selfcompleted questionnaire (round 2) which comprised 46 single statements and 4 statement series, where the single statements were rated and the statement series were ranked. The second self-completed questionnaire round, round 3, resulted in the rating of 30 single statements and the ranking of 3 statement series.
A full list of the statements included in the questionnaires and the responses for the two self-completed questionnaire rounds are provided in Supplementary Tables S1 and S2 (single statements) and Supplementary Tables S3 and S4 (ranked series).

Likert-Rated Single Statements
A total of 16 statements reached consensus in the first self-completed questionnaire round (round 2), demonstrating a percentage of agreeing answers over the APMO cut-off rate (80.4%) and an inter-quartile range (IQR) ≤ 1 (Table 2, Figure 4). The consensus opinion for all 16 of these statements was supportive, with the Delphi panel's modal and median responses determined as 'Largely' or 'Completely' agreeing with the statements on a 5-point Likert-scale.
In the second self-completed questionnaire round (round 3), a further 14 statements reached consensus, demonstrating an APMO greater than the cut-off rate for this round (66.7%) and an IQR ≤ 1 (Table 3, Figure 5). Again, the consensus opinion for all of these statements was supportive.
The 30 single statement items that reached consensus over the course of the study were separated into three distinct groups: those that were relevant to assessing the structural quality of repair cartilage, those that represent baseline factors that may influence a successful repair and those that relate broadly to the economic and ethical considerations surrounding the cartilage repair field (Table 4). It is important to assess the identity and quality of both the repair cartilage and the bone-cartilage interface in humans. 100 The repair has an effect on surrounding cells in animal models. 100 The repair has an effect on surrounding cells in humans. 100 Any technique or product for cartilage repair has to be scalable. 100 The cell type should raise as few ethical and safety issues as possible.
More investment in cell therapies is needed. 93 C C 0    Table 4. A summary of the Likert-scale-rated single statements that can be used as guidelines by which to assess the structural quality of repair cartilage, those factors that could influence a successful cartilage repair and those that consider the economic and ethical implications of cartilage repair.

Factors that Can Be Used to Assess the Structural Quality of Repair Cartilage
• An increase in collagen type II expression and aggrecan expression are key markers of cartilage production in pellets, animal models and humans. • An increase in lubricin expression is a key marker of cartilage production in animal models and humans.

•
The repair has an effect on other surrounding cells in humans and in animal models. • A more extensive histology scoring system is required for human samples.

•
Scoring systems for studies in humans and in animal models should include both structural and inflammatory features.

•
It is important to assess all tissues of the joint in humans and in animal models.

•
It is important to assess the identity and quality of the cartilage and the bone-cartilage interface in humans and in animal models.

Factors that may influence a structurally successful cartilage repair
• Size and depth of lesion should both be measured. • Disease status (e.g., bone sclerosis, inflammation) should be taken into account.

•
Determining the mechanism of damage repair is important in both humans and in animal models.

Factors relating to research economics and ethics
• More investment in cell therapies is needed.

•
Any technique or product for cartilage repair must be scalable. • Treatment cost should be justifiable.

•
Cell type should raise as few ethical and safety issues as possible.

Ranked Series
A summary of the consensus levels of ranked series for round two and round three are shown in Tables 5 and 6, respectively. One of the ranked statement series, 'Tissue type', reached consensus in round 2, demonstrating a strong consensus, determined by Kendall's coefficient of concordance (W = 0.736). The resulting hierarchy of tissues, based on their importance in cartilage repair as perceived by the Delphi panel, is demonstrated in Table 7. No further ranked series reached consensus in round 3. Table 4. A summary of the Likert-scale-rated single statements that can be used as guidelines by which to assess the structural quality of repair cartilage, those factors that could influence a successful cartilage repair and those that consider the economic and ethical implications of cartilage repair.

Factors that Can Be Used to Assess the Structural Quality of Repair Cartilage
• An increase in collagen type II expression and aggrecan expression are key markers of cartilage production in pellets, animal models and humans. • An increase in lubricin expression is a key marker of cartilage production in animal models and humans.
• The repair has an effect on other surrounding cells in humans and in animal models. • A more extensive histology scoring system is required for human samples.
• Scoring systems for studies in humans and in animal models should include both structural and inflammatory features. • It is important to assess all tissues of the joint in humans and in animal models. • It is important to assess the identity and quality of the cartilage and the bone-cartilage interface in humans and in animal models.

Factors that may influence a structurally successful cartilage repair
• Size and depth of lesion should both be measured. • Disease status (e.g., bone sclerosis, inflammation) should be taken into account.
• Determining the mechanism of damage repair is important in both humans and in animal models. • Environmental factors (e.g., age, gender) are important in influencing repair.

Factors relating to research economics and ethics
• More investment in cell therapies is needed. • Any technique or product for cartilage repair must be scalable.
• Treatment cost should be justifiable.
• Cell type should raise as few ethical and safety issues as possible.
• Lab research should push boundaries, not slow down for the clinic. There is a need for clinical innovation.
• Patients should be better stratified prior to clinical trial entry. • Access to specialist rehabilitation programmes would be useful for all patients, pre-and post-repair.
• Non-invasive measures provide a way to reduce time and cost.  Table 6. A summary of the consensus results for the ranked series in round 3. None of the ranked series showed consensus in round 3. Note that the 'Tissue type' rank was removed as this series reached consensus in the previous round.

Round 3 Ranked Series 'Treatment Outcome' 'Treatment Choice Basis' 'Repair Quality Assessment'
Kendall'  in APMO), 8 showed a decrease in the consensus (decrease in APMO) and 2 showed no change between rounds two and three ( Figure 6). Table 8. A summary of the Likert-scale-rated single statements that did not reach threshold consensus over the course of this Delphi study.

Round 3 Statement Number Statement
1a We should measure the lacunae (cysts) in the bone. 1b We should measure the area of bone marrow oedema-like (BML) signal. 3a An increase in lubricin expression is a key marker of cartilage quality in pellets.

4
Functional fibrocartilage is sufficient in the repair.

5
Hyaline cartilage is necessary in the repair. 6 Collagen type X expression should not be present in the repair at the mRNA level. 7 Collagen type X expression should not be present in the repair.

8
Measuring cartilage changes is irrelevant, the pathogenic mechanisms that lead to the changes are more important and may be largely cartilage dependant.

9
Collagen type VI is a useful measure of cartilage quality.
10a A more extensive histology scoring system is needed for pellets.
10b A more extensive histology scoring system is needed for animal models.
11a A simpler scoring system is best for MRI in animals.
11b A simpler scoring system is best for MRI in humans.  Table 8. A summary of the Likert-scale-rated single statements that did not reach threshold consensus over the course of this Delphi study.

Round 3 Statement Number
Statement 1a We should measure the lacunae (cysts) in the bone. 1b We should measure the area of bone marrow oedema-like (BML) signal. 3a An increase in lubricin expression is a key marker of cartilage quality in pellets. 4 Functional fibrocartilage is sufficient in the repair. 5 Hyaline cartilage is necessary in the repair. 6 Collagen type X expression should not be present in the repair at the mRNA level. 7 Collagen type X expression should not be present in the repair.

8
Measuring cartilage changes is irrelevant, the pathogenic mechanisms that lead to the changes are more important and may be largely cartilage dependant. 9 Collagen type VI is a useful measure of cartilage quality. 10a A more extensive histology scoring system is needed for pellets. 10b A more extensive histology scoring system is needed for animal models. 11a A simpler scoring system is best for MRI in animals. 11b A simpler scoring system is best for MRI in humans. 15 Advancement of the bone front is negative. 18a MRI should be performed in short bursts to keep costs down in animal models. 18b MRI should be performed in short bursts to keep costs down in human research.

Ranked Series
Three of the four ranked series in this Delphi study did not reach the threshold consensus. All three, however, demonstrated an increase in consensus levels between rounds two and three (Figure 7). Surgeries 2021, 2, FOR PEER REVIEW 12

Ranked Series
Three of the four ranked series in this Delphi study did not reach the threshold consensus. All three, however, demonstrated an increase in consensus levels between rounds two and three (Figure 7).

Discussion
The present Delphi study utilised a panel of experts to compile items deemed to be important mainly in assessing or predicting the outcome of cartilage repair, for which 46 single statement items and 4 ranked series were put forward. Subsequently, the same panel was used to determine the level of consensus on support for these items, of which 30 single statements and 1 statement series reached threshold consensus levels.
The items collected in the idea-generating focus group and, therefore, the content of the subsequent questionnaires, varied widely in the subtopic and scope within the cartilage repair field. This is not wholly surprising as the study was designed to be broad, allowing panel members to raise and discuss issues, with minimal restrictions, from their own research that they consider to be important. We attempted to collate the 30 single statement items that were supported by the panel as being important factors in cartilage repair into three useful groups (Table 4). While the novelty of the items that reached consensus resided in their curation and collation in these groups, it was of interest to appraise some of the individual items to understand their utility as a collection.
An increase in collagen type II, aggrecan and lubricin expression were agreed by the panel as important markers in determining the quality of repair cartilage in human and animal studies. Abundant collagen type II and proteoglycans such as aggrecan has long been considered a marker for repair cartilage quality and longevity, making consensus on

Discussion
The present Delphi study utilised a panel of experts to compile items deemed to be important mainly in assessing or predicting the outcome of cartilage repair, for which 46 single statement items and 4 ranked series were put forward. Subsequently, the same panel was used to determine the level of consensus on support for these items, of which 30 single statements and 1 statement series reached threshold consensus levels.
The items collected in the idea-generating focus group and, therefore, the content of the subsequent questionnaires, varied widely in the subtopic and scope within the cartilage repair field. This is not wholly surprising as the study was designed to be broad, allowing panel members to raise and discuss issues, with minimal restrictions, from their own research that they consider to be important. We attempted to collate the 30 single statement items that were supported by the panel as being important factors in cartilage repair into three useful groups (Table 4). While the novelty of the items that reached consensus resided in their curation and collation in these groups, it was of interest to appraise some of the individual items to understand their utility as a collection.
An increase in collagen type II, aggrecan and lubricin expression were agreed by the panel as important markers in determining the quality of repair cartilage in human and animal studies. Abundant collagen type II and proteoglycans such as aggrecan has long been considered a marker for repair cartilage quality and longevity, making consensus on these items unsurprising [43][44][45]. Lubricin is less established as a marker of cartilage quality but in a recent paper lubricin was found in the superficial zone of 84% of biopsies taken from repair cartilage following ACI [43]. Lubricin is known to reduce friction [43], and prevent abnormal cell adhesion and overgrowth [46,47] at the cartilage surface and, therefore, its presence in repair cartilage may be indicative of its resemblance to native articular cartilage and, therefore, its success.
The panel further agreed that all tissues in the joint should be considered when assessing the quality of cartilage repair in human or animal studies. These statements reflect the view of the knee as an organ in which the constituents work together to maintain function and dysfunction affects multiple tissues [48]. The panel's statement suggests that the knee should also be regarded as an organ when determining the success of cartilage repair. The panel also agreed on the statement 'the repair has an effect on the other surrounding cells', which conveys a similar message that the success of cartilage repair should not be judged solely on the quality of the repair cartilage as this is not the only tissue affected by the repair. These two statements, combined with the agreed statement that non-invasive measures would reduce time and costs, suggest developing imagingbased cartilage repair scoring systems that are able to consider and assess all of the joint tissues. Such a scoring system could combine a whole-joint MRI scoring system such as the Whole-Organ Magnetic Resonance Imaging Score (WORMS) or the Magnetic Resonance Imaging Osteoarthritis Knee Score (MOAKS) with a repair-specific system such as the 3-Dimensional Magnetic Resonance Observation of Cartilage Repair Tissue (3D-MOCART) score [20,21,49]. While the MOAKS score is able to assess synovitis semi-quantitatively, recently, Maksymowych and colleagues described a new scoring system, the OMERACT Knee Inflammation Scoring System (KIMRISS) which is able to more reliably quantify synovitis-effusion [50]. The inclusion of a quantitative soft-tissue inflammation score such as this would further improve the whole-joint assessment.
The panel agreed that the bone-cartilage interface is another important marker of the structural quality of repair cartilage. It has been reported that cartilage repair strategies that lead to the formation of fibrocartilage often demonstrate little regeneration of the tidemark and calcified cartilage and, therefore, develop a less stable tissue-bone interface [51]. The regeneration of the osteochondral interface and, therefore, the integration of the repair cartilage to the bone is necessary for a stable repair. The calcified cartilage layer contributes not only to mechanical functionality and stability, but also to cartilage-bone homeostasis [52]. Thus, the quality of the cartilage-bone interface could be indicative of the quality of the repair as a whole.
The panel also agreed on a number of baseline factors that could influence the success of cartilage repair. While the precise nature of the influence of these factors may not be clear, the panel did agree they were important. These factors included the disease status of the patient, as patients with chronic symptoms and related inflammation tend to have an increased failure rate of cartilage repair techniques or do not benefit at all [53]. A consensus was also reached on the influence of environmental factors, such as age, gender and BMI, on the success of cartilage repair. In the elderly, for example, the chondrogenic potency of bone marrow-derived mesenchymal stem cells is inferior to that of younger patients, which could lead to a reduced chance of success of marrow stimulation techniques such as microfracture [54,55]. The size and depth of the cartilage lesion were also agreed upon by the panel as important factors to consider that may influence the outcome of cartilage repair surgery. Not only does the size and depth of the lesion often determine which repair technique is employed, but most procedures have a maximum size recommendation, beyond which success rates for that particular technique worsen [56].
Only one of the four ranked series, 'Tissue Type', reached the consensus in this Delphi study (Table 6), providing a hierarchy of joint tissues based on their importance in the cartilage repair process. This agreed hierarchy is particularly useful given the panel's opinion to regard cartilage degeneration and repair as processes that affect and involve all tissues in the joint, rather than the articular cartilage alone [48,57]. One of the difficulties in appraising cartilage repair techniques, and finding ways to improve them, is determining the contribution of the other tissues of the joint to the cartilage repair process. This hierarchy can, therefore, serve to aid in prioritising the other knee joint tissues for future research of their role in affecting the structural quality of repair cartilage.
A number of the items put forward by the Delphi panel in the idea-generating focus group did not reach the threshold consensus in subsequent rounds, suggesting a dissent amongst the panel and, by extension, within the field. There was an increase in consensus between rounds two and three in six of the remaining 16 single-statement items that did not reach consensus and in the remaining three rated series. In theory, a Delphi process can have an unlimited number of rounds and further rounds might have led to further convergence on these nine items. However, the rate of participant attrition suggests that further rounds were unlikely to provide sufficient returns to be viable. The lack of a consensus more likely suggests a corresponding lack of knowledge around the statements. These statements can, therefore, serve as a list of potential research topics within the cartilage repair field. A total of ten single statements showed either no change or a decrease in the consensus between rounds two and three, suggesting that a difference in opinion on these topic areas remains.
Of the items that did not reach a consensus, the statements 'hyaline cartilage is necessary in the repair' and 'functional fibrocartilage is sufficient in the repair' are of particular interest and neither reached the threshold consensus levels. These statements represent two of the major opposing arguments in the field of cartilage repair. The ultimate aim of any cartilage repair technique is to (re)generate a tissue that is as close as possible to the native hyaline articular cartilage in order to achieve the best possible biomechanical properties and longevity of the repair. Fibrocartilage is considered biomechanically inferior to hyaline cartilage and, therefore, provides a more temporary repair that only slows the progression from cartilage lesion to OA [7,58]. The more the repair tissue resembles hyaline cartilage, the better the repair quality is considered [43]. However, the fact that neither of these statements reached consensus indicates that this idea is not as ingrained as expected.
To our knowledge, no guideline criteria have been published for the selection of 'experts' to form a Delphi panel and expertise itself is hard to define. In the case of this Delphi study, we used the criteria put forward by Adler and Ziglio (1996) to define expertise in the Delphi context: 'Knowledge and experience with the issue under investigation', 'capacity and willingness to participate', 'sufficient time to participate' and 'effective communication skills' [59]. A proportion of those that were invited to take part declined to do so (as demonstrated in Table 1) and there was no attempt to encourage attendees to do so, as voluntary participation ensured that the entirety of the Delphi panel met these requirements. Throughout this Delphi process, the panel was composed entirely of individuals working in the United Kingdom. Although the study was, therefore, limited in its geographical scope, the results will have potential international applicability.
An additional limitation of the present study was highlighted by appraising the composition of the panel. In both rounds one and three the vast majority of the panel members were research scientists, indicating that this group was over-represented throughout. Over-representation of one particular group in the panel is a commonly reported limitation of the Delphi technique [60][61][62][63][64]. Other studies have employed purposive sampling in the selection of their Delphi panels in an attempt to manufacture a balance between backgrounds [63,65,66]. However, due to high levels of participant attrition, also commonly reported in Delphi studies, certain groups are more likely to complete the process and, therefore, are commonly over-represented in the final round, even in studies with purposive sampling [60,63]. In the case of our study, the over-representation of research scientists, particularly in round 3, did not diminish the impact of the findings. Rather, the fact that a number of items still did not reach a consensus, even when appraised by a largely homogenous group (in terms of job title), highlights further the dissent within the field and the need for studies such as this and further basic research to improve clarity and convergence.
Previously published Delphi studies varied widely in the size of the panel, from five participants, to around 400 [36,42,67,68]. A larger panel size increases the variety of expertise but ultimately is likely to lead to diminishing responses [69]. The first-round panel size of 28 participants in this Delphi study allowed for the inclusion of a series of comments and opinions from a range of experts, without making the subsequent questionnaires overly time consuming. A recent systematic review, which aimed to evaluate previously published Delphi studies and produce guidance for future studies, detailed the number of experts that were invited to take part in 80 studies [68]. Of these, 76 reported the number of individuals that were invited to participate. The median number invited was 17 (IQR 11-31), suggesting our number was on the large side.
As demonstrated in Table 1, there was some degree of participant attrition over the course of this Delphi process, with the panel size decreasing for each subsequent round. However, this smaller size did not impede the ability of the final round panel to reach consensus, as a further 14 single-statement items reached consensus in this round.
The decrease in panel size came with an associated decrease in the response rate, presented in Table 1 as a percentage of the previous round respondents. The lowest response rate was observed in round 3, likely due to the distribution of this questionnaire electronically rather than in person as in the previous rounds. Once again, the lack of overreaching methodological guidelines for the Delphi process made it difficult to appraise the response rates resulting from this study. It is widely accepted that a 100% response rate is very rare in Delphi studies, particularly those which are at least partly carried out remotely, such as ours [32,37]. The previously mentioned systematic review reported that of the 80 Delphi studies that were interrogated, the median round one response rate was 90% (IQR 80-100%) and the median final round response rate was 88% (IQR 69-96%) [68]. However, only 31 of the 80 studies (39%) reported their response rates, so these numbers could possibly suffer from publication bias [68]. A handbook recommends that a response rate of 70% should be maintained for each round but also acknowledges this is difficult to obtain [32]. This recommended response rate was obtained in the second round (85.7%) but not in the third round (62.5%), likely due to the electronic distribution of the third-round questionnaire. A higher response rate is easier obtained if all Delphi rounds are carried out face-to-face [32,37], which was not possible in this case.

Conclusions
Measuring and assessing the structural quality of repair cartilage, and determining the important influencing factors, is imperative. Here, we reported a 3-round Delphi process which resulted in a set of guideline parameters by which to assess the structural quality of cartilage repair and a set of baseline factors that may influence structurally successful cartilage repair. The items that failed to reach a consensus in this study represent areas of incomplete knowledge and can, therefore, be used to formulate future clinical and fundamental science (laboratory) research questions with a view to filling gaps in knowledge, increasing the consensus and determining priorities for the assessment of cartilage repair in the clinic and laboratory.
Delphi studies such as ours are based around a set of comments, questions and opinions around a specific issue generated in round one, which are then converted in a set of items for the further rounds. They, therefore, do not tend to generate completely new items; instead their value lies in describing the current state-of-consensus around an issue via the curation and interrogation of these items by a diverse group of experts.