Discrepancies in Embryonic Staging: Towards a Gold Standard

For over half a century, the Carnegie staging system has been used for the unification of chronology in human embryo development. Despite the system’s establishment as a “universal” system, Carnegie staging reference charts display a high level of variation. To establish a clear understanding for embryologists and medical professionals, we aimed to answer the following question: does a gold standard of Carnegie staging exist, and if so, which set of proposed measures/characteristics would it include? We aimed to provide a clear overview of the variations in published Carnegie staging charts to compare and analyze these differences and propose potential explanatory factors. A review of the literature was performed, wherein 113 publications were identified and screened based on title and abstract. Twenty-six relevant titles and abstracts were assessed based on the full text. After exclusion, nine remaining publications were critically appraised. We observed consistent variations in data sets, especially regarding embryonic age, varying as large as 11 days between publications. Similarly, for embryonic length, large variations were present. These large variations are possibly attributable to sampling differences, developing technology, and differences in data collection. Based on the reviewed studies, we propose the Carnegie staging system of Prof. Hill as a gold standard amongst the available data sets in the literature.


Introduction
Derived from the Greek "embryon", embryology is the understanding of how our bodies came into being. More specifically, it is the branch of biology that studies the formation, growth, and development of an embryo from a fertilized egg [1]. Findings within this field have helped to develop our understanding of congenital abnormalities and their respective solutions. From as early as 1969, the importance of establishing a chronological timeline within human embryonic development was understood as "The need for standardized stages in the embryonic development of various organisms for the purpose of accurate description of normal development and for utilization in experimental work has long been recognized" [2]. As such, a morphological scheme was devised to provide a standardized and unified staging system of embryonic development. Composed of 23 unique and detailed stages (Figure 1), the Carnegie staging system helps to distinguish the key structural developments of the vertebrate embryo [3]. For humans, this staging system provides an in-depth coverage of the first 60 days within embryonic development, otherwise known as the embryonic period. Despite its use as a universal staging system for ex vivo human embryos, the literature regarding the distinctions between the Carnegie stages is inconsistent and convoluted, with leading researchers supplying differing understandings and data on the respective internal and external embryonic features allocated to each individual stage. Furthermore, no verified explanation for these discrepancies amongst established researchers could be located, highlighting a prominent gap in the research and understanding regarding the most established embryological staging system. Therefore, the goal of this research can be broken down into several aims, the first of which was to provide a clear overview of the variations in commonly available Carnegie staging charts, wherein the differing data are compared and analyzed to establish a clear overview for embryologists or other medical professionals by means of a review of the literature. Secondly, we aimed to explain the presence of these differences by evaluating how these data were collected (e.g., post ovulatory days). In doing so, we aimed to research whether a gold standard for Carnegie staging charts exists, and if so, which chart it would be. Subsequently, we aimed to better standardize the staging system used across the field of embryology.  Carnegie collection: CS3-8794, CS4-0610, CS5-8020, CS6-7801, CS7-8752, CS8-8671, CS9-H712, CS10-6330, CS11-6344, CS12-8505A, CS13-0836, CS14-8314, CS15-3512, CS16-6517, CS17-6521, CS18-6524, CS19-2114, CS20-0462,  CS21-4090, CS22-0895. Despite its use as a universal staging system for ex vivo human embryos, the literature regarding the distinctions between the Carnegie stages is inconsistent and convoluted, with leading researchers supplying differing understandings and data on the respective internal and external embryonic features allocated to each individual stage. Furthermore, no verified explanation for these discrepancies amongst established researchers could be located, highlighting a prominent gap in the research and understanding regarding the most established embryological staging system. Therefore, the goal of this research can be broken down into several aims, the first of which was to provide a clear overview of the variations in commonly available Carnegie staging charts, wherein the differing data are compared and analyzed to establish a clear overview for embryologists or other medical professionals by means of a review of the literature. Secondly, we aimed to explain the presence of these differences by evaluating how these data were collected (e.g., post ovulatory days). In doing so, we aimed to research whether a gold standard for Carnegie staging charts exists, and if so, which chart it would be. Subsequently, we aimed to better standardize the staging system used across the field of embryology.

Background: Historical Beginnings of Embryonic Staging
Although embryonic staging was introduced as early as the 1800s, the use of such a staging system on humans was only employed early in the 20th century. Founded by Franklin P. Mall, the Carnegie collection is composed of numerous sectioned and serially conducted ex vivo human embryos. With its first designated human embryo cataloged in 1887, this detailed and quintessential collection would subsequently grow, lending valuable knowledge to its directors and international researchers alike. Furthermore, detailed reconstructions and elaborate drawings based on this collection were published and applied within academic writing from 1890 onwards, paving the way for detailed analyses and deeper understanding in a previously inaccessible scientific field.
Named after this detailed collection of human embryos, the Carnegie stages are based on a combination of several embryonic features. Beyond morphological features, the stages include age ranges, number of somites present, and embryonic length (mm). However, these factors are less heavily weighed against morphological changes due to the higher variability and how single size values or somite levels may span across multiple stages [4].
The notion of developmental stages was first introduced in 1914 by Franklin P. Mall [8], who categorized 266 human embryos, splitting them amongst 14 separate "stages" during his time as director of the Carnegie collection. Shortly thereafter, Mall's position and proposition would be replaced by George L. Streeter, who further refined Mall's 14 embryonic stages into 23 "developmental horizons" [9]. The term "horizons", borrowed from archaeology and geology, was utilized by Streeter to stress the ever-increasing complexity of developing embryos. Despite initially planning on composing twenty-five distinct age groups, Streeter subsequently concluded that 23 stages could effectively encompass the embryonic period [9]. This use of 23 stages (Figure 1) [10] was applied, as "each stage is merely an arbitrary cut section through the time-axis of the life of an organism" [11]. Upon further research, these developmental horizons were better described and distinguished by Ronan O'Rahilly and his wife, Fabiola Müller, in 1987, who retained the use of 23 distinct stages, but proposed the term "stages" in place of "horizons", due to its simpler and more comprehensible nature [4]. During his time serving as the director of the Carnegie collection (since 1973), O'Rahilly's work on staging went through several iterations, becoming the first widely recognized staging system for human embryos. Since then, no major alterations have been made, as alternative systems and terminology for embryonic staging have never maintained a foothold within research and have ultimately been rendered obsolete.

Embryonic Age
The term "embryonic age" and what exactly it entails has always been a point of contention amongst naturalists and embryologists alike and has been laced with ambiguity and disagreements. To provide clarity, this section is included to situate current academic understanding and shine a light on areas of confusion. A range of challenges exist in attempting to determine the age of an embryo, but most importantly amongst them is the lack of a precise timing or indicator as to when fertilization occurs [12]. Hence, two primarily utilized measurements should be highlighted. The first of these measurements is gestational age. Gestational age can be defined as a measure of the age of a pregnancy that is taken from the beginning of the woman's last menstrual period (LMP). In general, the starting point of this measure is approximately two weeks before the actual fertilization. In contrast to this is the developmental or postovulatory age. This measure represents the actual age of the embryo by utilizing the time of fertilization as a starting point, as showcased within Figure 2. Due to this, the difference between these two measures is approximately two whole weeks, and therefore the establishment and clarification of which system is being applied with regards to the age of the embryo is essential.

Embryonic Length
From as early as 1749, the utilization of embryonic length to determine age (which would, in turn, be translatable to Carnegie stage) was attempted [13,14]. Interestingly, over the course of the last two centuries, embryonic staging through the use of embryonic length has been rendered much more precise, and due to current technological advancements, embryonic length can now serve as a workable estimate of the embryos' respective Carnegie stage. However, within the clinical setting, the notion of what axis of the embryo should be measured showcases no singular consensus. With possibilities such as head circumference (HC), biparietal diameter (BPD), and brain length, it remains inconclusive which measurement can most accurately stage embryos. This dilemma is further complicated, as certain measurements only become feasible further into development.
Amongst the available measurements, the crown-rump length (CRL) appears to be most frequently used. Defined as the distance between the top of the head (crown) and the bottom of the buttocks (rump), the CRL can be measured through the use of an ultrasound and has showcased exceptional use in calculating the gestational age of the embryo [15]. Henceforth, when embryonic length is mentioned in the paper, it refers to CRL, in line with its prominent use in papers such as O'Rahilly, Hill, and Nishimura et al. [4,5,16].

Carnegie Stages: Academic Discrepancies and Nonuniformity
When evaluating the Carnegie staging system, Streeter's work from 1951 is frequently viewed as the foundation upon which current understanding was built [9]. However, Figure 2. The "periodic table" of human development. Showcasing the different methods to date a pregnancy, as gestational age based on the last menstrual period (LMP), the postovulatory or embryonic age as developmental days or weeks, counted from the time of fertilization. Schematic representation of Carnegie Stages 1-6 (not to scale). Carnegie Stages 7-23 are plotted against length in mm on the y-axis, and developmental days are on the x-axis.

Embryonic Length
From as early as 1749, the utilization of embryonic length to determine age (which would, in turn, be translatable to Carnegie stage) was attempted [13,14]. Interestingly, over the course of the last two centuries, embryonic staging through the use of embryonic length has been rendered much more precise, and due to current technological advancements, embryonic length can now serve as a workable estimate of the embryos' respective Carnegie stage. However, within the clinical setting, the notion of what axis of the embryo should be measured showcases no singular consensus. With possibilities such as head circumference (HC), biparietal diameter (BPD), and brain length, it remains inconclusive which measurement can most accurately stage embryos. This dilemma is further complicated, as certain measurements only become feasible further into development.
Amongst the available measurements, the crown-rump length (CRL) appears to be most frequently used. Defined as the distance between the top of the head (crown) and the bottom of the buttocks (rump), the CRL can be measured through the use of an ultrasound and has showcased exceptional use in calculating the gestational age of the embryo [15]. Henceforth, when embryonic length is mentioned in the paper, it refers to CRL, in line with its prominent use in papers such as O'Rahilly, Hill, and Nishimura et al. [4,5,16].

Carnegie Stages: Academic Discrepancies and Nonuniformity
When evaluating the Carnegie staging system, Streeter's work from 1951 is frequently viewed as the foundation upon which current understanding was built [9]. However, across the relevant international literature, variations within the internal and external embryonic features for each stage are present, such as inconsistent ages (days) and varying somite numbers [16][17][18]. Due to this individual variation across scholarly publications, staging criteria such as the mean dates for each stage are not uniform across the literature. This inevitably threatens the "universal" purpose of the staging system, as the system is no longer standardized. Therefore, it remains difficult for professionals in the field to decide which Carnegie staging chart should be consulted.
In modern research, O'Rahilly and Müller's revision of Streeter's work is more widespread, despite the significant differences from Streeter's publication, in both morphological and non-morphological features [4,9]. Yamada et al. helps to clearly showcase these differences and highlights the relationship between embryonic ages in respective Carnegie stages from various researchers, including O'Rahilly and Streeter [19]. Alongside these publications, several other researchers have brought forth their individual Carnegie staging charts, e.g., Nishimura et al. proposing values that vary as far as 2-3 days off from the mean dates of O'Rahilly's values [4,16,20]. Similarly, such variation can be seen within publications from Jirásek [17], Hill [5], Harkness [21], the Human Developmental Biology Resource [22], and the Heirloom Collection [23]. To further complicate the selection process for Carnegie charts, prominent researchers such as O'Rahilly have been shown to further build upon their previous work, releasing updated or revised values of their prior Carnegie stages. One key example of this would be O'Rahilly's revision, which was published in 2010, and altered numerous values and criteria [24]. However, the relevance of this aforementioned revision is questionable, with numerous researchers within the field choosing to refer to the 1987 iteration instead, due to its established credibility.
Despite a scarcity of embryology resources that are available to the general public, M. Hill has worked on making this information more accessible to the public through the use of the Embryology Education and Research website [7]. Similarly, the values provided within this website differ from the values present within O'Rahilly's published work from 1987. Due to its intended use as an educational resource, understanding the underlying reasons for such discrepancies is of the utmost importance.

The Importance of Concise Embryonic Staging Systems
With regards to relevance and rationale, the need for precision and clarity within the medical field should be well understood. Within the field of embryology, it is important to differentiate the different developmental stages to identify developmental anomalies. This would allow medical professionals to more actively notice embryonic complications and reduce the uncertainty within an already highly variable field. Regarding its relevance, the standardization of these Carnegie stages would be primarily noticeable within maternal and prenatal care. Within such a clinical field, accurate estimation of developmental age is of utmost importance, as an incorrect estimation can have short-and long-term consequences for both the mother and unborn child (e.g., iatrogenic labor at a premature age instead of a term age). Within the Netherlands, the screening procedure employed (e.g., NIPT-test and 13-and 20-week anomaly ultrasound scan) is almost exclusively based upon an accurate gestational age estimation. Gestational age estimation in pregnancy includes a pregnancy dating ultrasound scan, purely based on the CRL measurement of the developing fetus between the 10th and 12th week of pregnancy [25,26]. Knowing that the improved quality of ultrasound machines allows for earlier (3D) ultrasound examinations, including volumetric measurements of the embryo [27], the Carnegie staging system should serve as a consistent and reliable source of values that can be consulted amidst confusion. Therefore, formulating a cohesive understanding and a consistent set of values for the "universal" staging system of embryos is essential, with its effects cascading to expecting mothers and clinicians alike.

Methods
For this review, the methodology was split into two sections. The first section was a review of the literature that focused on a few established academic works, each of which proposed a differing set of data/characteristics for the individual stages of the Carnegie Life 2023, 13, 1084 6 of 16 system. A review of the literature was utilized to allow us to compare and contrast existing differences across the slightly varied Carnegie charts (Figure 3). For this process, works from the English scientific literature were included. These covered the topic of embryonic growth or embryonic staging but did not extend to the fetal period. For the literature search, no restrictions were set on the publication date, as within the field of embryology, the early literature is still highly relevant and applicable. Sources were selected using the following keywords: Carnegie system, embryonic growth, developmental horizons, embryology, and embryonic stages. These sources were then screened initially according to their title and abstract, and, subsequently, the full-text articles were skimmed to further evaluate the quality and eligibility of the studies. This was performed to investigate the differing charts and data.
Life 2023, 13, x FOR PEER REVIEW 6 of 16

Methods
For this review, the methodology was split into two sections. The first section was a review of the literature that focused on a few established academic works, each of which proposed a differing set of data/characteristics for the individual stages of the Carnegie system. A review of the literature was utilized to allow us to compare and contrast existing differences across the slightly varied Carnegie charts (Figure 3). For this process, works from the English scientific literature were included. These covered the topic of embryonic growth or embryonic staging but did not extend to the fetal period. For the literature search, no restrictions were set on the publication date, as within the field of embryology, the early literature is still highly relevant and applicable. Sources were selected using the following keywords: Carnegie system, embryonic growth, developmental horizons, embryology, and embryonic stages. These sources were then screened initially according to their title and abstract, and, subsequently, the full-text articles were skimmed to further evaluate the quality and eligibility of the studies. This was performed to investigate the differing charts and data. Secondly, we investigated which of the aforementioned charts is most frequently applied in the embryology literature. To research this, another review of the literature was conducted with similar parameters and criteria. These sources were subsequently screened to achieve a clear overview of the most frequently applied Carnegie staging values. Both of these reviews of the literature were carried out by one reviewer, under the supervision of experienced embryologists. All of this was carried out with the aim of reaching a consensus on what staging method should be used as the gold standard of embryological staging to ascertain the most agreed-upon and universal set of values that can be consulted with regards to gestational age, embryonic length, and somite numbers.

Results
Within Table 1 and Figure 4, stage 1-5 embryos showcase approximately equal mean days across the various academic publications. However, as early as stage 6, substantial differences can be observed between the studies. The presence of a steep increase in mean days can be seen within the Heirloom Collection and O'Rahilly [23,24]. These higher mean Secondly, we investigated which of the aforementioned charts is most frequently applied in the embryology literature. To research this, another review of the literature was conducted with similar parameters and criteria. These sources were subsequently screened to achieve a clear overview of the most frequently applied Carnegie staging values. Both of these reviews of the literature were carried out by one reviewer, under the supervision of experienced embryologists. All of this was carried out with the aim of reaching a consensus on what staging method should be used as the gold standard of embryological staging to ascertain the most agreed-upon and universal set of values that can be consulted with regards to gestational age, embryonic length, and somite numbers.

Results
Within Table 1 and Figure 4, stage 1-5 embryos showcase approximately equal mean days across the various academic publications. However, as early as stage 6, substantial differences can be observed between the studies. The presence of a steep increase in mean days can be seen within the Heirloom Collection and O'Rahilly [23,24]. These higher mean day values compared to the other studies remain present throughout stages 6-13, after which a more uniform data set can be observed across the studies once again. Furthermore, Harkness also showcases a much higher mean days value at stage 8 and maintains this higher average till approximately stage 16-17, wherein it falls below the average trend of the other academic publications [21]. Life 2023, 13, x FOR PEER REVIEW 8 of Figure 4. Graph demonstrating the differences in embryonic age (mean days), across the embryo literature [4,5,18,21,23,24], in relation to Carnegie stages. The first 6 stages of Hill were not pres within his publication and as such are not showcased within the graph [5,18]. Similarly for Harkn [21], the first 7 stages are not included.
Interestingly, both of Hill's publications [5,18] showcased a higher terminal me days for Carnegie stage 23 embryos (~2-3 days higher than other publications). Alongs this, the mean days published by Hill underwent minute changes or revisions betwe 2007 and 2018, with the biggest change being observed for Stage 22 (1 day differen Figure 4. Graph demonstrating the differences in embryonic age (mean days), across the embryonic literature [4,5,18,21,23,24], in relation to Carnegie stages. The first 6 stages of Hill were not present within his publication and as such are not showcased within the graph [5,18]. Similarly for Harkness [21], the first 7 stages are not included. Interestingly, both of Hill's publications [5,18] showcased a higher terminal mean days for Carnegie stage 23 embryos (~2-3 days higher than other publications). Alongside this, the mean days published by Hill underwent minute changes or revisions between 2007 and 2018, with the biggest change being observed for Stage 22 (1 day difference) [5,18]. Contrary to this would be O'Rahilly's publications, which displayed widely revised mean days, with differences as large as 7 days (Stage 10) between his 1987 and 2010 paper [4,24]. Table 2 and Figure 5 showcase similar trends, although a majority of the results display more uniform results, with the exception of Stages 18-21 within O'Rahilly's [4,24] values, which vary up to 2 mm from the majority of other academic publications. Similar to Figure 4, Harkness' values also showcase a higher mean length value at stage 8, although this higher average only remains present in stages 8 through 10, wherein it falls in line with the average trend of the other academic publications before subsequently falling below the average trend once more for stages 19-23 [21].  Figure 5. Graph demonstrating the differences in embryonic length (mm) across the embryonic literature [4,5,21,23,24] in relation to Carnegie stages. The first 6 stages of Hill were not present within his publication and, as such, are not showcased within the graph [5]. Similarly, for Harkness [21], the first 7 stages are not included.

Discussion
When comparing the embryonic ages and lengths within the current literature, we observed a wholesomely non-uniform set of values. The aim of the current study was to provide an overview of variations in Carnegie staging charts in the available literature and to clarify if a gold standard of Carnegie staging could be identified through the comparison of various reputable academic publications and their respective data sets and differences (Table 3). Based on Figures 4 and 5 presented here, we can conclude that the "universal" staging system is peppered with discrepancies and ambiguity and remains inconclusive regarding which of the studies should be consulted. Although the cause behind these variances is not fully understood, we sought to propose a set of factors that may have played a role in this dissimilarity to better ascertain which publication should be consulted for the most accurate staging. Figure 5. Graph demonstrating the differences in embryonic length (mm) across the embryonic literature [4,5,21,23,24] in relation to Carnegie stages. The first 6 stages of Hill were not present within his publication and, as such, are not showcased within the graph [5]. Similarly, for Harkness [21], the first 7 stages are not included.

Discussion
When comparing the embryonic ages and lengths within the current literature, we observed a wholesomely non-uniform set of values. The aim of the current study was to provide an overview of variations in Carnegie staging charts in the available literature and to clarify if a gold standard of Carnegie staging could be identified through the comparison of various reputable academic publications and their respective data sets and differences (Table 3). Based on Figures 4 and 5 presented here, we can conclude that the "universal" staging system is peppered with discrepancies and ambiguity and remains inconclusive regarding which of the studies should be consulted. Although the cause behind these variances is not fully understood, we sought to propose a set of factors that may have played a role in this dissimilarity to better ascertain which publication should be consulted for the most accurate staging.

Staging Differences: A Matter of Sampling?
Within embryological research, the acquisition of human embryo samples has been an ethical challenge throughout the history of the field, due to rigorous guidelines and regulations. Subsequently, across the various studies analyzed within this review, the samples and sampling methods applied differ greatly. Within both his 1987 publication and his 2010 revision, O'Rahilly utilized embryos from the Carnegie collection, composed of a mixture of human histology and fixed specimens [4,24]. At the time of his initial publication, the Carnegie collection served as the most reputable collection of human embryos, contributing to the credibility of his study.
However, Hill's recent publication in 2007 utilized a wider set of samples [5] and, instead, analyzed embryonic samples from both the Carnegie collection and the Kyoto collection in Japan (details on these collections shown within Table 4). Although the use of a multi-collection approach was not available to O'Rahilly at the time, modern web resources such as the Human Embryology website [28] have enabled researchers to expand upon their sample sizes. Aside from the use of pre-existing collections, certain authors, such as Harkness, opt to utilize a new collection of embryos [21], ascertained through abortions regarding embryos which have undergone less than 9 weeks of gestation, and were referred to the researchers by local family planning services and general practitioners. Although it is impossible to discern to what extent this difference in sampling technique might affect embryonic age and length, it is well within reasoning to attribute some of these differences in values to sampling.   [4], and the characteristics of Stages 9-23 were acquired through a combination of sources, including the HDBR atlas, O'Rahilly (1987), Hill (2007), and Pietersma (2023) [4,5,22,29]. P.O days and embryonic size data utilized within the table were taken from O'Rahilly's study (1987) [4]. This substantial difference might also be attributable to the fact that O'Rahilly utilized 310 embryos that were deemed of "good/excellent" condition. Throughout their research, both Hill and O'Rahilly avoided strictly "abnormal" specimens. In doing so, values and data sets obtained would therefore be more concise and reliable, with less outliers. This would, in turn, allow the results of the study to be more generalizable and applicable to healthy embryos. Contrary to this, Harkness utilized any embryo that was accessible, applying the exclusion criteria of mothers with evidence of multifetal pregnancies, history of serious medical disorders, and those aged less than 17 years [21]. However, as our knowledge within embryology has changed, so has our perception on the normality of specimens, and more and more embryos that were previously deemed normal have showcased signs of newly discovered abnormality. This shift in perception could play a role in the differences, as the embryos upon which these various studies were conducted may ultimately not fall under the same modern-day categories, despite utilizing similar selection criteria.
Lastly, with regards to sampling, the majority of samples within both the Carnegie collection and the Kyoto collection were placed within fixative, and hence researchers such as O'Rahilly standardized this amongst their samples, ensuring that all embryos under scrutiny had been placed within fixative for a period of time. O'Rahilly comments on this use of fixation and highlights how it may result in a change in length, although the extent and direction of this change in embryonic length requires further studies [4].

Technological Advances: A Cause for Discrepancies?
As our understanding of embryology has grown, so have our technological capabilities. Compared to past methods, present-day embryonic age estimation has been rendered far more precise due to refined fertilization dating methods/techniques. This, in combination with improved ultrasonography (invented 1956), has provided a new approach to embryonic age estimation. Furthermore, ultrasonography in vivo has provided a new approach to sonoembryology, contributing to more adept measurements of embryonic length [30]. Alongside this, technological advancements such as three-dimensional ultrasound, three-dimensional reconstructions, and virtual embryoscopy all aid in providing more means to determine embryonic age and, as such, can assist in providing a more refined estimation. A perfect showcase of how these newfound techniques may play an instrumental role in the future of embryological studies can be found within recent studies conducted by Dr. M. Rousian et al. in Rotterdam. Her publications have shed light on how three-dimensional ultrasound and virtual reality are ideal for visualizing embryological structures and also how these specialized techniques can help to evaluate embryonic growth and development [31][32][33], extending to areas such as brain development, which has always been complex and challenging to study. Furthermore, Rousian et al. employed embryonic volume as a measure of embryonic growth, in addition to CRL and other previously established methods of measurement [31]. Therefore, taking into consideration the large extent to which technology has grown, these innovative techniques may indeed play a role in the formation of these recorded differences, despite being acknowledged.

Staging Differences: A Matter of Data Collection?
Another potential hindrance for uniform values is differing methods of data collection. Regarding embryonic length within O'Rahilly's study, measurements were carried out with the use of calipers (measuring to 0.1 mm accurately), without any attempt to straighten the natural curvature of the embryo [4]. Additionally, accurately scaled models were utilized initially, supplanting the pre-stage 10 embryos, as up until this stage, the embryos were too small to accurately measure through the use of calipers. Within his 2010 revision, however, O'Rahilly used enlarged photographs, graphic reconstructions, and solid plaster reconstructions to measure embryonic length [24]. Despite these plaster reconstructions showcasing a relative decrease in size, proper adjustments were made to the recordings to adequately account for this. Contrary to this, Harkness obtained embryonic lengths by placing the embryos under analysis upon a 1 mm graph paper, where their measurements were then recorded under a dissecting microscope [21]. This marked difference in data collection techniques might serve as a factor behind the differences observed within em-bryonic length. In addition to this, throughout his revision, O'Rahilly makes his aversion towards the use of CRL clear, highlighting how the point of measurement directly above the midbrain (crown) and the definitive point of the rump was hard to determine, leading to inaccuracies. This and the inability to measure CRL in exceedingly young embryos, as these structures simply cannot be identified, serve as potential reasons behind the relatively large differences that can be observed within Harkness and O'Rahilly's embryonic length values. Despite this, the aforementioned reasons do little to explain the differences found within embryonic ages.

Embryonic Diapause: A Novel Theory
Serving as a reproductive strategy present within a variety of different mammals, embryonic diapause (ED) can be defined as the temporary arrest of embryonic development. This occurs through a delayed implantation into the uterus, resulting in a dormant yet competent zygote [21]. In the absence of appropriate uterus stimulation, the metabolism of the embryo is slowed, resulting in an extension of the gestational period. Aside from its use as a protective phenomenon, little is known regarding ED, and the climatic, metabolic, and psychosocial conditions required for its occurrence are not well understood. Within animals, ED is believed to be the consequence of physiological stressors (e.g., day length), whereas, within humans, it is conversely believed to be the consequence of psychological stress [21].
Delayed implantation, as a process, has long been identified within humans from as early as 1996 and has been associated with adverse pregnancy outcomes. Should ED occur in humans, the current clinical use of LMP in the estimation of ovulation and embryonic age would be inevitably misleading and, as such, would provide a viable explanation into the variability witnessed within embryonic ages across academic publications.

Limitations
In our attempt to provide clarity and guidance through a review of the literature (Table 5), certain factors complicated our aim. Initially, we did find a diverse range of works from the literature. However, upon a deeper dive, it was found that the majority of current embryological publications simply defaulted to the use of O'Rahilly's set of values from 1987, due to its esteemed stature. The reasoning for this is presumably the fact that most embryological research must be based on a pre-existing collection, as acquiring embryonic samples is ethically challenging and time-consuming. As such, researchers turn to pre-established studies regarding embryological collections, most definitively of which would be O'Rahilly's revered study in 1987. However, the frequent use of O'Rahilly's values did not consider the precision or accuracy of these values but is instead based upon its widespread and familiar nature.
Another potential limitation of this study would be the lack of access to certain academic works, especially those published early in the 19th-20th century (such as some of Streeter's publications between 1873 and 1948), as these publications may be of academic importance but were not effectively covered within this review.

Future Perspectives
A clear basic understanding of the embryonic staging system enables a more accurate estimation of embryonic age and its associated internal and external features and, as such, helps prevent erroneous gestational age estimation, along with offering a more accurate monitoring of natural embryonic development. Despite a high level of variance across each of the academic publications, a clear overview of the current embryonic literature regarding Carnegie stages was provided, highlighting their independent differences. By providing potential factors behind these differences, alongside individual considerations, we believe we provided a first step towards a more uniform and reliable system and guidance towards a universal staging procedure. This review is helpful for clinicians and serves as a setup for further embryological research. Consequently, future research should concentrate on