Scientific methodology (logos) is predicated upon generating hypotheses and testing them, following where the collected data and evidence lead. The data often generate new hypotheses. Once such a hypothesis is generated, there are two major components: unbiased elicitation/extraction of pertinent data/evidence and analysis and interpretation of that evidence. The former may seem straight forward, the latter is perhaps not so easily pursued. Seemingly straightforward data acquisition is itself actually quite susceptible to “mischief”, compromising or even precluding interpretation. The current review examines some of the challenges that have compromised advancement from variably recording observations on the basis of a priori, speculation-based criteria to a rigorous scientifically based evidence accession approach that is amenable to interpretation. If historical and paleopathological studies are to meaningfully contribute (be relevant) to rheumatology and the effect of our environmental interactions (e.g., health and disease, global warming), it seems that attention to fundamentals and the application of scientifically based approaches by individuals whose skills have been vetted (independently verified) would facilitate that effort.
The avoidance of all speculation on the meaning of observations has been previously suggested, but even that approach is compromised because of the speculative assumptions involved in making the observations themselves. Perhaps the most important challenge relates to the speculation that a given examiner’s observations are valid—not just that he/she adhered to criteria that had been scientifically vetted, but also that his/her skills in applying the criteria have themselves been independently validated. If underlying systematic bias (e.g., speculative criteria and insufficient skills in recognizing the element(s) necessary for fulfillment of the criteria) is not recognized, the results that are often perceived as consistent with preconceived notions actually lack accuracy.
Speculation is not evidence. It consists in only unproven hypotheses that have not been scientifically tested/vetted. Repetition of a speculative comment is not evidence. It simply imbues a mythology. As Douglas Verret noted (12 September 2018, personal communication), consensus is political, not scientific. Many believe that scientific consensus identifies a group in which there is total agreement that a statement is valid. Actually, such a consensus is often simply a group that agrees with and supports each other. It is not evidential.
Biehler-Gomez et al. [
1] assessed researchers who were three to ten years post-degree, using a checklist and noting interobserver disagreement (20–59%), especially in recognition of periosteal reaction, noting overdiagnosis (making a diagnosis in absence of disease) in 71% of cases. The results of this study thus testifies to the need for more specific and thorough training in the description of bone lesions to all practitioners on dry bone, regardless of their field of specialization or experience. She clearly delineates overdiagnosis and failure to delineate sufficient recognition criteria, let alone adhere to them.
Part of the challenge has been the initially untested speculation, which was subsequently falsified, that “diseases cannot be expected to manifest in the same way in every environment or human population”. Harper and Armelagos [
2] address the importance of paleoepidemiologic study, speculating on what diseases might have been present at different times but without actually providing data-based criteria for documenting their actual presence. Hahn et al. [
3] note that the major cause of “errors may arise from inappropriate pre-analytics, which include all working steps prior to the actual measurement”.
In a study by Araoye et al. [
4] of post-graduate trainees’ comprehension of statistics, “38% could not apply the concept of specificity and sensitivity”. This is further exemplified by the otherwise excellent article by Plomp et al. [
5] Interested in the reproducibility of findings across populations, they selected a few individuals from a plethora of sites rather than examining all from each site or utilizing a random number system to identify appropriate candidates. Thus, the studied individuals were not identified by a statistically valid method and therefore violated that statistical premise, rendering their statistics moot. Hahn et al. [
3] note that sensitivity and specificity are commonsense criteria, but there are other considerations: “measurement accuracy, accuracy expressed as systematic error, comparative precision expressed as random error, and repeatability, theoretical and practical limits of detection”. They further note that “sample preparation usually starts with the correct choice of specimens” and “the requirement for “much training and experience”.
The failure of specific reporting that does not faithfully reflect the nature and range of findings distorts impressions. Boutron and Ravaud [
6] note that this can result from a “lack of understanding of methodological principles, parroting of common practices, a form of unconscious behavior, or an actual willingness to misread”. Unacceptable conscious behaviors include selective reporting of statistically significant results or cherry-picking the choice of statistical test based on which (e.g., listed by SAS) gives the most impressive results or those most compatible with the authors’ preconceived notions or biases.
Falcone [
7] comments on bias in citation practices, suggesting that “citation lies at the very heart of our gifting rituals”. She further states “that gifts are not free and volitional…but that gifts are always a part of a complex system of obligations”. Obligatory celebration of the work of one’s academic advisors or colleagues contrasts with negative reciprocity, defined as trying to maximize utility at the expense of others. Equally problematic is citation of tertiary rather than primary sources. A tertiary source is typically a citation of another article, which itself describes the authors’ perspective of the information provided by the primary source. Assurance of the validity of the information can only be pursued by examination and vetting of the data and interpretations provided in primary sources.
Not only do the same terms differ significantly across scientific fields, but they also have historical context. This is exemplified by the 10,000 leprosaria that Pope Clement closed in 1508–1510. While the appellation led to the presumption that the 100 or so individuals buried in each of the associated cemeteries had leprosy, the authors seemingly failed to recognize that the term leprosy was historically applied to essentially any individual with a skin condition. They also seemed to have not calculated the math: 10,000 leprosaria with 100 individuals hospitalized per leprosaria equal a million burials. The suggestion that a million people had leprosy seems quite unreasonable. The evaluation of the Batavia leprosaria in Suriname revealed that neither archival or skeletal examination nor DNA testing revealed any evidence of leprosy. So-called leprosaria were actually not repositories for leprosy. Further, the character and distribution of pathology reported are at variance with (different from) that observed in contemporary clinics and hospitals devoted to leprosy.
Snoddy et al. [
8] noted that “lack of awareness of best practices by scholars from other professional spheres can perpetuate a misunderstanding of the level of scientific study in our field,” while failing to acknowledge the converse. They also noted that “methods have sometimes suffered from a kind of circular logic wherein older literature, which is no longer clinically accurate, is used as the foundation for entire diagnostic schemes”. These statements are repeated to emphasize that the concerns are not just those of the author of this manuscript. All testify to/document the need for more specific and more through training in the description of bone lesions for all practitioners of dry bone research, regardless of their field of specialization, experience or academic credentials.
There is hope that a stalwart defense of the status quo and pursuit of entrenched rote approaches will be abandoned and critical thinking embraced. The incorporation of critical thinking (to recognize and avoid biases), physiology and statistical theory (especially the premises required for their valid use) into training/education would allow participation as credible contributors to the opportunity for 21st-century enlightenment.