An Extensive Evaluation of Different Knee Stability Assessment Measures : A Systematic Review

Re-injury to a recently rehabilitated or operated knee is a common occurrence that can result in significant loss of function. Knee stability measures have been used to diagnose and assess knee stability before and after rehabilitation interventions. Here, we systematically review the literature and evaluate the different anterior-posterior and rotational knee stability measures currently in use. A computer-assisted literature search of the Medline, CINAHL, EMBASE, PubMed and Cochrane databases was conducted using keywords related to knee stability measures. In a second step, we conducted a manual search of the references cited in these articles to capture any studies that may have been missed in the searched databases. The literature search strategy identified a total of 574 potential studies. After revisiting the titles and abstracts, 34 full-text articles met the inclusion criteria and were included in this review. Most articles compared knee stability measures, whilst other studies assessed their sensitivity and specificity. Several techniques and devices used to measure knee stability are reported in the literature. However, there are only a limited number of quality studies where these techniques and/or devices have been evaluated. Further development and investigation with high quality study designs is necessary to robustly evaluate the existing devices/techniques.


Introduction
Knee stability is critical for many sports, and decreased stability is strongly associated with risk of injury [1,2].The "giving way" phenomenon associate with knee joint instability has been shown to result from injury to mechanical constraints and associated neuromuscular impairment [3].At present, an objective and universally-accepted measure of knee joint stability does not exist.It is therefore difficult to sufficiently quantify when an injured knee has recovered and when an individual may safely return to sport.
Despite a large number of preventative rehabilitation protocols proposed to reduce knee injuries [4,5], the incidence of knee injury remains high, with one study suggesting it accounts for nearly a quarter of all injuries sustained in professional football [6].For example, Prodromos et al. [7] reviewed the incidence rates of Anterior Cruciate Ligament (ACL) injury, which is the most common knee stabilizer that is injured.They noted that collegiate soccer players had an incidence of 0.32, basketball players 0.29 and recreational alpine skiers 0.63 per 1000 exposures.
A key element to reduce recurrent knee injuries following treatment is the integration of subjective examination techniques, objective instrumented devices and imaging techniques for diagnosis and guidance in return-to-play [8,9].Several studies [10][11][12][13] have investigated the usefulness of different knee stability measures to diagnose knee injuries and to provide additional information on return-to-play or return-to-work decisions.Although the diagnostic accuracy of knee stability measures has previously been evaluated, as yet, a gold standard measure has not been synthesized.
Surgery for the correction of knee instability has increased over the last two decades.Nevertheless, return to competitive sport has been reported as low as 55% [14].This inconsistency together with the high cost of such surgeries [15] highlights the need for better clinical pre-and post-surgical measures of knee instability.The use of better clinical assessments is likely to improve surgical case selection and post-surgical rehabilitation.Previous reviews suggested that further research is needed to truly understand the clinical relevance inherent in new device designs [9,16].Therefore, the objective of this study was to conduct an extensive systematic review of the literature to describe and evaluate knee stability measures used for diagnosis and assessment for return-to-play after instability-related knee injury/surgery.

Search Strategy
Medline, CINAHL, EMBASE, PubMed and Cochrane databases were searched electronically for English-language studies published up to December 2015.All databases were searched using the Index Medicus Medical Subject Headings (MeSH), such as "anterior cruciate ligament" and "arthrometry" (see Table 1 for the full search details).A manual search was also performed to check reference list of each of the included articles in order to capture articles that might not have been listed on the databases.Differences of opinion were resolved through discussion with the third author (MB).This review used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines for the search and reporting phases of the study.

Data Extraction
One reviewer independently extracted the data and information regarding the examined knee stability measure, study population, age, sensitivity and specificity (JA).Any possible disagreement was resolved during a scheduled meeting.Both quantitative data and qualitative data were extracted from the included studies.The quantitative outcome measures extracted from each study were: (1) "test sensitivity", which was defined as the percentage of people who test positive for a specific pathology among a group of people who have the pathology; (2) "test validity", which was defined as the percentage of people who test negative for a specific pathology among a group of people who do not have the pathology.The qualitative data were the applicability of the measures.

Risk of bias/Quality Assessment
Two authors independently assessed the quality of the articles that met the inclusion criteria (JA + CA).Study quality was assessed with the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool [17].QUADAS is a validated clinometric tool used to assess the overall quality of diagnostic accuracy studies through individual quality component questions.Any possible disagreement was planned to be resolved during a scheduled meeting.Based on similar published reviews, any study with a QUADAS score ě10 was stratified as "high quality/low risk of bias", and any study scoring <10 was considered "low quality/high risk of bias" [18].

Synthesis of Results
It was not appropriate to combine studies for meta-analysis due to the heterogeneity of the included studies and the variable reference standard.Therefore, the results were tabulated for semi-quantitative comparison of the sensitivity and specificity variables.The qualitative data were descriptively discussed.

Selection of Studies
The systematic literature search strategy through the selected databases identified a total of 571 potential abstracts.Three additional abstracts were handpicked through manual search.Duplicate entries were removed from the two databases, leaving 105 abstracts to be assessed for eligibility.After revisiting the titles, abstracts and full text articles, 34 full-text articles met the criteria for inclusion in this review (Figure 1).This review included a total of 2133 participants investigating eight different knee stability measures.The sample size of the studies ranged from five to 401 participants.

Quality Scores
Table 2 provides the overall risk of bias score.Fourteen studies demonstrated "high quality/low risk of bias" and 20 demonstrated low quality/high risk of bias.

Quantitative Data
Fifteen studies investigated the KT-1000 arthrometer; three studies investigated the Lachman test; nine studies investigated the pivot shift test; seven studies investigated the anterior drawer test; three studies investigated a navigation system; five studies investigated the Genucom arthrometer; five studies investigated the rolimeter; two studies investigated Telos radiography; and five studies investigated the ACL-hamstring reflex arc.Table 3 reports which tests each study investigated and the sensitivity and specificity of each test, as well as the sample size and age of the study participants.

Quality Scores
Table 2 provides the overall risk of bias score.Fourteen studies demonstrated "high quality/low risk of bias" and 20 demonstrated low quality/high risk of bias.

Quantitative Data
Fifteen studies investigated the KT-1000 arthrometer; three studies investigated the Lachman test; nine studies investigated the pivot shift test; seven studies investigated the anterior drawer test; three studies investigated a navigation system; five studies investigated the Genucom arthrometer; five studies investigated the rolimeter; two studies investigated Telos radiography; and five studies investigated the ACL-hamstring reflex arc.Table 3 reports which tests each study investigated and the sensitivity and specificity of each test, as well as the sample size and age of the study participants.
Y = Yes; N = No; U = Unclear; N/A = Not Applicable; grey highlight = high quality studies; 1 = was the spectrum of participants representative of the patients who will receive the test in practice; 2 = were selection criteria clearly described; 3 = was the reference standard likely to classify the target condition correctly; 4 = was the period between the performance of the reference standard and the index test short enough to be reasonably sure that the target condition did not change between the two tests; 5 = did the whole sample or a random selection of the sample receive verification using the reference standard; 6 = did participants receive the same reference standard regardless of the index test result; 7 = was the reference standard independent of the index test (that is, the index test did not form part of the reference standard); 8 = was the execution of the index test described in sufficient detail to permit its replication; 9 = was the execution of the reference standard described in sufficient detail to permit its replication; 10 = were the index test results interpreted without knowledge of the results of the reference standard; 11 = were the reference standard results interpreted without knowledge of the results of the index test; 12 = were the same clinical data available when the test results were interpreted as would be available when the test is used in practice; 13 = were uninterpretable, indeterminate or intermediate test results reported; 14 = were withdrawals from the study explained.

Qualitative Data Lachman Test
The Lachman test is used widely in clinical setting as it is fast and easy to perform for assessing the instability of the knee [47].The test is performed with the patient supine and the knee relaxed at 20 ˝to 30 ˝of flexion.The examiner places one hand on the distal end of the thigh and the other hand behind the proximal end of the tibia.The tibia is then translated anteriorly on the femur, and the endpoint is assessed as firm (intact ACL) or soft (injured ACL).An injured ACL should be graded either I < 2 mm, II 2 to 5 mm, III > 5 mm.
The literature lacks consensus on the usefulness of the Lachman test as a measure of anterior knee stability.Its reliability and validity range from 87% to 97% and 91% to 97%, respectively [12,28,48].One of the disadvantages of the Lachman test is the difficulty for examiners with smaller hands to perform it properly.It is restricted to examiners with larger hands to properly perform it [49,50], as it needs a firm griping of the femur to displace the tibia anteriorly.As a result, conducting the test in a prone position has been proposed and yielded a positive alternative to the Lachman test [51].Moreover, Muller et al. [52] examined the proficiency in performing the prone Lachman test as opposed to the classic Lachman.They showed that prone Lachman yielded 78% of positive predictive value while the classic Lachman 28%.The prone Lachman test uses gravity to pull down the femur, which will let the examiner grip and displace the tibia in both hands [53].Consequently, the size of the knee may be an important factor in deciding which knee instability measure should be used to assess knee stability.

Pivot Shift Test
Galway, Beaupre and MacIntosh [54] initially described the pivot shift test as an examination tool of functional knee instability.This is performed with the patient supine with the examiner standing lateral to the patient holding the knee and ankle in 20 ˝of internal rotation, with the patient's hip flexed to 30 ˝.A valgus force is applied to the proximal tibia, to create impingement of the plateau on the femur.The knee is then flexed and assessed for a clunk due to the reduction of the displaced tibia on the femur, which normally occurs between 20 ˝and 30 ˝.The motion is then graded as: 0 = no clunk, I = glide, II = clunk and III = gross clunk with locking.A false negative may be obtained in patients with Iliotibial Band (ITB) pathology, medial collateral ligament injury, a bucket handle meniscus tear or a flexion contracture.A false positive pivot shift may be present in a patient with increased laxity.Comparison with the uninjured knee should always be undertaken.
There is a controversy in the literature on the usefulness of the pivot shift test.The controversy surrounds the various techniques used by clinicians when performing the pivot shift test.Variations exist particularly in the degree of knee flexion, hip flexion and tibial internal rotation [31].It is difficult to assess the effect on the test outcome of associated injuries to the knee and the limited range of motion in knees with injured meniscus [55].Similarly, the subjectivity on the amount of the applied valgus force whilst doing the test leads to difficulties in replicating the test for confirmation [33].The specificity of the pivot shift test has been shown to be dependent on whether or not the patient is anaesthetized [56].It ranges from 32% without to 85% with anaesthesia; this result was confirmed by Kuroda et al. [34], who theorised that muscular resistance can suppress the pivot shift manoeuver.

Anterior Drawer Test
The anterior drawer test specifically assesses the anterior stability of the knee [57].Several studies reported that clinicians use it widely in both clinics and operation theatres [16,38,54,[58][59][60].It is performed in a supine position, with the knee at 90 ˝flexion and the hip at 45 ˝flexion.The examiner sits on the patient's tested foot and with one or both hands grasping the proximal end of the leg aligning the thumb(s) with the anterior joint line.The tibia is then pulled anteriorly, and an assessment is made of the relative translation of the tibia on the femur.The tibia should displace within a similar range to the sound knee.If an excessive displacement occurs in the injured knee compared to the sound knee and a soft endpoint is felt, it is assumed that there is an ACL injury yet to be confirmed with an objective knee instability measure.
The anterior drawer test has an agreement in the literature regarding its usefulness [38,59].Mitsou et al. [38] highlighted the difficulty in performing the anterior drawer test at the acute stage following a suspected ACL injury.In addition, they reported specificity ranged from 78% to 99% when patients were examined under general anaesthesia.On the other hand, Scholten et al. [59] concluded that such a test is of unproven value.It has been shown that failure to quantify the amount of displacement of the tibia on the femur and inability to use it in the acute stage of injury were weaknesses of this test.

The Rolimeter
The rolimeter (Aircast Europa, Neubeuern, Germany) is a portable knee arthrometer used to measure anterior-posterior displacement of the tibia on the femur while performing the Lachman test [21].It is performed whilst the patient is positioned supine with 30 ˝flexion of the tested knee.Next, a proximal convex pad is placed over the patella and a distal pad placed over the tibia with a strap.The two pads are connected a few inches above the limb by a steel bar.A feeler should be placed over the tibial tubercle; the Lachman test is performed after the device has been zeroed.To that end, the anteroposterior displacement of the tibia on the femur is measured in increments of 2 mm by the marks on the feeler.A difference of 4 mm or greater, in comparison with the uninjured knee, is suggestive of an ACL injury [60].
Rolimeter provides an economic, exact and simple device for quantifying anterior knee joint instability [21].Among 20 healthy participants and 18 patient with chronic ACL injury, Ganko et al. [27] assessed the reliability of the rolimeter as opposed to the KT-1000.In the mean knee displacement, both devices showed strong correlation (r = 0.73, p < 0.001) for the injured knees, while there was no significant correlation in their uninjured knees (r = 0.32, p > 0.10).Hence, they concluded that, with experienced examiners, the rolimeter is a valid method to assess anterior knee instability.However, its specificity as a standalone measure of knee stability (84.3) was questioned when compared to its results alongside clinical examination (92.4) [40].This was justified based on the fact that the rolimeter does only measure the anterior-posterior stability rather than the rotational stability of the knee [27].The use of the rolimeter as a standalone measure can give false negative results with the notion that knee stability is maintained by both anterior-posterior and rotational mechanical stability [61].

Navigation Systems
This is a computerized navigation system designed to assist surgeons during knee ligament reconstructions and arthroplasty surgery [62].It uses kinematic measurements along with bone-morphing technology to determine data on alignment, kinematics and morphologic characteristics of the knee [63].Pins are placed within the tibia and femur; attached to these pins are markers, which are detected by the computer sensors and registered relative to predefined anatomical locations.Based on the movement of these markers relative to each other, small displacements can be detected and used to quantify knee joint stability during surgery [63,64].
The navigation system remains the gold standard for the measure of anterior-posterior knee laxity due to its precision, validity and accuracy [36,45,[63][64][65].Pearls et al. [65] investigated the reliability and repeatability of using a knee navigation system in knee instability examination by comparing the navigation system to a robotic testing system.Intra-class Correlation Coefficients (ICC) were used to assess the correlation between the two systems.The authors reported that the surgical navigation system is a precise intraoperative tool to quantify translational and rotational knee instability.The ICCs were all statistically significant at p < 0.01, and the overall ICC was 0.9976.Continuous developments to the knee navigation system have provided the ability to measure rotational knee stability in addition to the translational stability [66].Nevertheless, it is strictly used in connection with surgery.The use of navigation systems is limited to surgical procedures; it is expensive, invasive and requires surgical experience, due to the need for accurate fixation of sensors in the femur and tibia [63].Thus, it is a good research and clinical tool; however, it cannot be used on-field or within a clinical setting to aid in decision making [58].

The Genucom Knee Analysis System
The Genucom knee analysis system (FARO Medical Technologies Inc., Montreal, QC, Canada) is a computerized device developed in the1980s to objectively measure knee stability in different planes (e.g., sagittal and frontal planes) [60].The participant's tested knee is positioned in 20 ˝flexion and the thigh secured with restraints.An electro-goniometer is attached to the thigh, with anatomical markers placed on the medial and lateral femoral condyle, patella and tibial crest.The markers are digitized, and then, the relative displacement of the knee is recorded in addition to the distance between the markers [30].
The Genucom knee analysis system is the only objective instrument to provide a multiplanar measure of knee stability, but it is more complicated and time consuming to use compared to other measures [30,67].Furthermore, it has poor sensitivity, and its cost-effectiveness has been questioned [42].As a result, it has fallen out of common use.

The KT-1000/KT-2000 Arthrometer
The KT-1000 knee ligament arthrometer (MEDmetric Corp, San Diego, CA, USA) is the most commonly-used arthrometer in both a clinical and research setting [32].It is an objective device that measures anterior-posterior translation of the tibia on the femur in millimetres [68].The patient should remain in a supine position on the examination bed with the tested knee supported at 30 ˝of flexion using a goniometer.The thigh strap, thigh support platform and foot support should be placed on and attached to the patient.The KT-1000 arthrometer is secured over the participant's leg in the ideal position with reference to the knee joint line.The Lachman and anterior drawer tests can then be performed with the KT-1000.
Its reliability has been tested in several studies (Table 3).The side-to-side difference is the recommended measure to use for assessment of anterior knee stability [69].The experience of the examiners plays an important role in the result of the test [19,70].The KT-2000 has the same method of use as the KT-1000 with the added feature of graphic documentation via an X-Y plotter.It produces data regarding the amount of knee displacement and the magnitude of the applied force.
Despite the large number of KT-1000/2000 studies in the literature, there is no consensus on its sensitivity and specificity in measuring anterior-posterior knee laxity.Its sensitivity ranged from 0.50 [23] to 0.97 [16], and its specificity ranged from 0.70 [42] to 0.93 [16].The wide range in the sensitivity and specificity was justified based on the quality of the conducted studies, the experience of the examiner and the amount of force being utilized in each test [44].Regardless of the controversy regarding the sensitivity and specificity of the KT-1000 arthrometer, it is commonly used in research rather than in a clinical setting [8].However, the majority of the available literature supports the KT-1000 arthrometer as being at least equal to other available knee stability measures [16,23,29,42].

The Telos Stress Radiography Device
The Telos stress radiograph (Telos GmbH, Laubscher, Holstein, Switzerland) is a device that can measure knee stability by utilizing stress forces with high quality radiographic images [24].It was originally described by Staubli and Jacob [71].The test involves the application of an anterior stress to the injured knee; the subsequent displacement is the measured on a lateral X-ray.The displacement is described relative to the opposite "normal" side.
The two included articles in this review that investigated the usefulness of the Telos as a measure of knee stability showed that its sensitivity ranged from 0.72 [70] to 0.88 [40], and it had a specificity of 0.82 [40].Similarly, Jardin et al. [70] compared the KT-1000 to Telos after anterior cruciate ligament reconstruction.They recommended Telos instead of KT-1000 to assess knee stability after ACL reconstruction.The widespread range in the sensitivity and specificity was vindicated based on the variation in the quality of the X-rays obtained, the experience of the radiographer and the experience of the radiologist in reading such radiographs [71,72].Nevertheless, imaging techniques (e.g., Telos) are an established tool to confirm the diagnosis of suspected knee instability, to assess the ACL reconstruction outcome and to rule out injuries to other soft tissue structures [40,70].

ACL-Hamstring Stretch Reflex
This is designed to measure the onset latency of biceps femoris muscle.Hence, it is not designed per se as a test of knee stabilisation.It uses electromyography (EMG) to record muscle activity produced by a stretch reflex elicited by the application of anterior-posterior translation of the knee joint.The latency of the biceps femoris stretch reflex is then calculated and used as an indicator of knee stability and neuromuscular function [22].
It has been evaluated in several studies to assess muscle fatigue, knee stability [73] and knee proprioception [74].It is used in operative theatres through a direct pull of the ACL to differentiate short and long latencies [75].The reflex has been investigated intra-operatively by direct traction under arthroscopic visualization and in a research setting by instrumenting a laboratory-based rig [11]; thus, its clinical usefulness is doubtful.Three studies by Schoene et al. [11], Friemert et al. [26] and Melnyk et al. [73] revealed that the ACL-hamstring reflex measurement could be elicited, specifically for injured ACLs.Previous work by Friemert et al. [26] has shown that a prolonged reflex was present in patients with a ruptured ACL.The longer reflexes corresponded with patients who had instability symptoms even though mechanical testing with the KT-1000 showed no difference.On the other hand, Melnyk and Gollhofer [73] concluded that it was hamstring muscle fatigue during submaximal isometric exercises that was the reason behind the longer latencies of the hamstring stretch reflex and not the existing ACL injury.Despite the argument in the literature on its usefulness for detecting the aforementioned variables, the authors suggested that this technique has room for improvement in terms of its applicability in a clinical setting to guide rehabilitation protocols [3,11,74].

Discussion
This present paper systematically reviewed a broad spectrum of knee measures designed to assess anterior and rotational stability.Similar to others used in the hip [76], ankle [77] and hamstring injuries [78], the existing tests for knee stability assessment are deficient in relation to diagnosis, surgical outcome assessment and clinical decisions on return-to-play following injury or surgery.
The subjective tests demonstrated variability in sensitivity and specificity of each test (Table 3), thus questioning their clinical usefulness as stand-alone measures.As with any subjective test, comparing the outcome of tests is difficult due to the subjective nature of the grading system.On the other hand, objective tests can be quantitatively compared to each other in terms of their sensitivity and specificity.For example, the sensitivity and specificity of the Genucom knee analysis system have been reported to be low, at 60% and 65%, respectively [42].On the other hand, the KT-1000 sensitivity at maximum manual force is 93%, and it has a specificity of approximately 93% [16].Despite the existing studies on the use of the hamstring-stretch reflex [26,[73][74][75], the literature lacks evidence on whether the reflex latency can be a valid objective clinical knee stability measure.
Mitsou and Valiiianatos et al. [38] showed that the anterior drawer test is reliable when used for chronic knee cases.On the contrary, a review by Van Eck et al. [16] suggests that the anterior drawer test is less sensitive (0.74) than the KT-1000 arthrometer (0.93).The literature disagreement is based on the difference of knee conditions being examined and the quality of the studies conducted.However, the anterior drawer test has been used in a clinical setting and in-the-field as a quick and early assessment technique of ACL injury.
The specificity of the pivot shift test ranged from 32% without to 85% with anaesthesia.Muller et al. (2015) [52] reported an ICC for intra-tester reliability ranging from 0.913 to 0.999 (95% CI range: 0.319 to 1.000) and ICC for inter-tester reliability of 0.949 (95% CI: 0.542 to 1.000) for iPad software (The PIVOT software, iOS, programming language Objective-C) designed specifically for quantifying the pivot shift test.Nevertheless, Hoshino et al. (2013) [56] did quantify the pivot shift test using an iPad tablet (Apple Inc., Cupertino, CA, USA).They concluded that pivot shift measurements using an iPad did provide quantification of rotational stability for ACL-deficient knees.However, the limitation of their study was the use of subjective clinical grading as a reference standard to the quantified iPad results.This produced bias in the notion that the same tester performed the subjective grading and the quantitative measurements.The quantification of the pivot shift test using the iPad technique needs to be investigated further to assess its robustness on a larger and cross-sectional population.Due to the intra-examiner variation in the technique being used for the assessment of knee stability, subjective tests need to be tough and applied in a standardized fashion.These differences make comparisons between the reported results in the literature difficult because of the inability to accurately and systematically compare two different techniques [79].Hence, a better evaluation of each test needs to be conducted.
In spite of the accuracy of the navigation system when compared to a robotic testing system (ICC was 0.9976) [65], its use is limited to surgical procedures; it is expensive and requires surgical experience, thus limiting its use in clinical practice.Unlike the navigation and Genucom knee analysis systems, the rolimeter has superiority as a lightweight device and can be used in clinical, surgical and in-the-field settings.The inability to quantify the magnitude of the pulling force and the difficulty of assessing the functional instability of ACL reconstructed knees are disadvantages of this device [32].Consequently, using the rolimeter adjacent to physical examination and imaging techniques would be preferable.
Telos stress radiography has the advantage of measuring knee stability in a number of planes (sagittal, frontal and horizontal) [40]; unlike other devices, which only measure the laxity in one direction (anterior-posterior) or two directions (anterior-posterior and rotational movement) [8].The Telos system is unable to measure rotational instability and also has the disadvantage of radiation exposure when participants/patients are being tested; thus, it should be used judiciously.
There are a number of reasons for the poor clinical usefulness of the reviewed knee stability measures and the challenges in reviewing them.Firstly is the different pathomechanics between injured knees [79,80], as well as inter-individual variations in patient outcome during rehabilitation programs [80].Secondly, the majority of the knee stability measures for anterior and rotational instability were highly sensitive, but had lower specificity (Table 3).Consequently, a positive test means little in the diagnosis of rotational instability, since the same test will also be positive in anterior instability.Thirdly, there is a risk that lower quality studies fail to fully discriminate the true usefulness of the various knee stability measures.Fourthly, limitations in experiment design affect the interpretability and generalizability of these measures.
The quality assessment of the studies included in this review indicates that many of the studies in this area lack scientific rigour.The median QUADAS score was nine, and 20 of the 34 studies reviewed had a score of less than 10.The studies we reviewed were typically underpowered; they failed to report important study details (see the QUADAS scoring details in Table 2) or failed to compare to a reference standard (see Question 5 in Table 2).Only twelve of the 34 studies used a reference standard, whilst most studies had a sample size greater than 30 participants (range: five to 401), nearly half of the studies had a low effect size (d < 0.3), with five studies reporting a medium effect (0.3 < d < 0.8) and only three studies reporting a large effect size (d ě 0.8).As a result, it is difficult to draw strong conclusions from much of the reported data.In almost half of the reviewed studies (n = 15), the selection criteria of their samples were not clearly described (see Question 2 in Table 2), and some studies did not adequately describe participant withdrawals (see Question 14 in Table 2).In two studies, the sex of the recruited sample was not described [40,43], and several studies [35,44,46,63] recruited a mixture of males and females without accounting for the known increased knee instability of females compared to males [81].
The use of a reference standard in the testing diagnostic apparatus is critical for an understanding of accuracy and reproducibility [17].Whilst only twelve studies in the present review used a reference standard, eight studies did not use a reference standard, and the remaining thirteen studies did not clarify if a reference standard was used or not (see Question 5 in Table 2).Of the studies that used a reference standard, two studies used the KT-2000 arthrometer [27,28]; one study used both the genucom and the KT-2000 [20]; five studies used knee arthroscopy [19,22,36,40,42]; two studies used both physical examination and arthroscopy [23,41]; and two studies used Magnetic Resonance Imagery (MRI) [9,35].Although knee arthroscopy was used in five studies, such a procedure is not cost effective [82].Additionally, the ability to assess and diagnose patients in a clinic with simpler diagnostic tests allows rehabilitation to proceed rapidly and economically.Unlike the KT-2000 arthrometer, the magnetic resonance scan is not a knee stability measure.The MRI only shows the integrity of the knee structures in a static position [83], rather than measuring the displacement of the tibia relative to the femur.These results highlight the lack of a robust gold standard knee stability measure.Hence, the lack of an accepted reference standard leads to biased estimates of the tested instruments and reconstruction techniques.

Conclusions
We have reviewed a broad spectrum of knee stability measures designed to detect anterior-posterior and rotational knee instability.Whilst there is a wide variety in diagnostic accuracy, many of the studies lack scientific rigour.Despite the importance of such measures, there is no consensus in the literature on a single gold standard measure of knee instability.As a result, there is a need for high-quality randomized control trials, which are sufficiently powered, in order to move closer to a gold standard knee stability measure.In the meantime, clinicians must consider the limited capacity of the reviewed knee stability measures in making a definite clinical decision on the severity of an injury and/or return-to-play.

Figure 1 .
Figure 1.Flow diagram of the search strategy and the study selection.

Figure 1 .
Figure 1.Flow diagram of the search strategy and the study selection.

Table 2 .
Quality Assessment of Diagnostic Accuracy Studies (QUADAS) quality assessment scores of the included studies.

Table 3 .
Summary of articles reporting on the accuracy of different anterior-posterior and rotational knee laxity measures.
The KT-1000 was not capable of overcoming result variation and providing reliable and reproducible measurement of laxity of the ACL