A Chi-Squared Analysis of the Measurements of Two Cosmological Parameters Over Time

The aim of this analysis was to determine whether or not the given error bars truly represented the dispersion of values in a historical compilation of two cosmological parameters: the amplitude of mass fluctuations ($\sigma_8$) and Hubble's constant ($H_0$) parameters in the standard cosmological model. For this analysis, a chi-squared test was executed on a compiled list of past measurements. It was found through analysis of the chi-squared ($\chi^2$) values of the data that for $\sigma_8$ (60 data points measured between 1993 and 2019 and $\chi^2$ between 182.4 and 189.0) the associated probability Q is extremely low, with $Q = 1.6 \times 10^{-15}$ for the weighted average and $Q = 8.8 \times 10^{-15}$ for the best linear fit of the data. This was also the case for the $\chi^2$ values of $H_0$ (163 data points measured between 1976 and 2019 and $\chi^2$ between 480.1 and 575.7), where $Q = 1.8 \times 10^{-33}$ for the linear fit of the data and $Q = 1.0 \times 10^{-47}$ for the weighted average of the data. The general conclusion was that the statistical error bars associated with the observed parameter measurements have been underestimated or the systematic errors were not properly taken into account in at least 20\% of the measurements. The~fact that the underestimation of error bars for $H_0$ is so common might explain the apparent 4.4$\sigma$ discrepancy formally known today as the Hubble tension.


The Standard Cosmological Model
The standard cosmological model is a model that aims to describe the evolution and structure of the Universe that we live in. This theoretical model accounts for our Universe's beginning through inflation caused by the Big Bang all the way up to the present-day dark energy dominated Universe (∼70%). In addition to explaining the evolution and current state of the Universe, the standard cosmological model can be interpreted to predict the Universe's fate. The standard cosmological model consists of 12 parameters [1]: Ω M is the ratio of the current matter density to the critical density, Ω Λ is the cosmological constant as a fraction of the critical density, H 0 is Hubble's constant, σ 8 is the amplitude of mass fluctuations, Ω b is the baryon density as a fraction of the critical density, n is the primordial spectral index, β is the redshift distortion, m v is the neutrino mass, Γ is Ω m H 0 /100 kms −1 Mpc −1 , Ω 0.6 m σ 8 is a combination of two other parameters that is useful in some peculiar velocity and lensing measurements, Ω k is the curvature, and w 0 is the equation of state for the dark energy parameter [1]. For this study, the two parameters in question are σ 8 and H 0 .

Amplitude of Mass Fluctuations (σ 8 )
The amplitude of mass fluctuations (σ 8 ) is a parameter in the standard cosmological model that is concerned with the respective distributions of mass and light in the Universe [2]. This is of interest to cosmologists because if σ 8 1, the implication is an "unbiased" Universe in which mass and light are evenly distributed in a sphere of radius R = 8 h −1 Mpc, whereas if σ 8 0.5, the result would be a "biased" Universe in which mass is distributed more extensively than light in a sphere of radius R = 8 h −1 Mpc [2]. It is important for cosmologists to study and understand the distribution tendencies of mass and light in the Universe through σ 8 because large-scale differences in distribution of matter and energy in the present-day Universe tell us about density fluctuations in the early Universe on the cluster mass scale of R = 8 h −1 Mpc [2].

Hubble's Constant (H 0 )
Hubble's constant (H 0 ), like the amplitude of mass fluctuations, is a parameter in the standard cosmological model.
H 0 is the slope of the line in the Hubble-Lemaître Law, relating the recession velocity of a galaxy to the distance that it is from an observer. A representation of this law can be seen in Figure 1, obtained from Paturel et al. [3]. In other words, H 0 relates to the expansion of the Universe on cosmic scales and is named after Edwin Hubble who discovered it in 1929 when he realized that galaxies' velocities away from an observer are directly proportional to their distance from that observer, except for cases of peculiar velocities [4]. In recent years however, credit has also been given to Georges Lemaître jointly with Hubble for the discovery of this relationship [5]. The parameter is measured in km s −1 Mpc −1 and describes the velocity with which a galaxy of distance d from an observer is moving radially away from that observer. Since the Universe is so large, these recession velocities in the form of redshift (z) are used to describe the distances to far away galaxies rather than units of length. Knowing the exact value of H 0 is important to cosmologists, as H 0 can also be used to roughly calculate the age of the Universe.  [5] representing radial recession velocity vs. distance from observer.

Values and Errors
The first step in the process of determining the best observed values for the amplitude of mass fluctuations parameter (σ 8 ) and Hubble's constant (H 0 ) was to compile a list of several tens of measurements of these parameters. For this specific project, 60 values were compiled for σ 8 between the years of 1993 and 2019 and 163 values were compiled for H 0 between the years of 1976 and 2019. In addition to the values themselves, we were interested in a few other details about the measurements, namely, the years that those measurements were made in and the sizes of the error bars corresponding to the observed values. A list of all 60 observed measurements for σ 8 163 observed values for H 0 can be found in Tables A1 and A2, respectively, in the Appendix A. For H 0 values (units throughout this paper in km s 1 Mpc −1 ) between 1990 and 2010; all of the values stem from Croft and Dailey [1]. These tables include the observed values along with their years of observation, sizes of error, and references to source articles. All of the referenced papers were found using the Astrophysics Data System (https://ui.adsabs.harvard.edu/), or from the tables in Croft and Dailey [1]. For the statistical analysis of this data, a simplifying assumption was made that each observed measurement is independent of the other observed measurements, eliminating the need for a covariance term. It should also be noted that the given error bars account for all statistical effects.

Chi-Squared Test
In order to analyze the trends in our datasets when viewed in scatter plots (see Figures 2 and 3), a good statistical test is a chi-squared test. We used a chi-squared test to examine the probabilities of the deviations and determine whether the simplifying assumption made that the measurements were independent of one another was correct.  The chi-squared value of a set of data gives the likelihood that the trend observed in the data occurred due to chance, and is also known as a "goodness of fit" test [6]. The chi-squared value of a dataset is given by the following expression: where in the case of our dataset x n,i is the observed value for the parameter, x t,i is the theoretical value for the parameter (weighted average or linear fit), σ 2 i is the variance of the observed parameter value, and N is the number of points. The term for covariance term is absent from this expression due to the simplifying expression made that all of the observed measurements are independent of one another. This independence of data is precisely the hypothesis we want to test. If the data were not independent, we would have to add a term for covariance to Equation (1). In any case, non-independency of our data would make the spread of the points lower than is indicated by the error bars, making the probability Q (see Section 2.3) of higher deviations even lower, and thus number of points to reject in order to have a distribution compatible to the error bars even larger. Therefore, our simplified approach can be considered a conservative calculation.
This calculation was carried out twice, first using the weighted average σ 8 and H 0 values as the theoretical values (x t ), and then again using the best fit values from a linear fit designed to minimize the value of χ 2 as x t . Lines representing both the weighted average of the dataset (blue) and the best fit for the dataset (red) that were used to calculate chi-squared can be seen with the data points in Figures 2 and 3. The weighted averages (λ w ) of the parameters in question were calculated by weighting each point by the variance of that value, as shown below, where σ 2 i is the variance of data point i: For σ 8 , λ w ≈ 0.8038 and for H 0 , λ w ≈ 69.3815. Substituting these weighted averages in for x t in Equation (1) gives χ 2 ≈ 189.037 for σ 8 and χ 2 ≈ 575.655 for H 0 .
In order to find the linear fit of the form: where Y is the theoretical value for the parameter being analyzed and X is the year of that measurement minus 2000. A program was written in Python that minimizes χ 2 . When replacing Y from Equation (4) for x t in Equation (1), we found that χ 2 ≈ 182.4 for σ 8 and χ 2 ≈ 480.1 for H 0 . In order to calculate the error bars for the parameters A and B, a program was written in Python to estimate the range of values for σ 8 and H 0 with an error of 1σ added. The 1σ error (68% C.L.) was obtained by adding the value of 2.3 χ 2 n to the minimum of χ 2 values of 182.4 (σ 8 ) and 480.1 (H 0 ) in accordance to the process followed in Avni [7], where n is the number of degrees of freedom and the second factor was added to account for either under or overestimation of the error bars. For our σ 8 values, this process resulted in an A value of 0.781 ± 0.012 and a B value of (1.7 ± 0.8) × 10 −3 . With these values for A and B, the function of the linear fit for σ 8 becomes: For the H 0 values, this process resulted in an A value of 65.3 ± 0.6 and a B value of 0.26 ± 0.04, making the function of the linear fit for H 0 : as can be seen in Figures 2 and 3, represented by the red line.

Reduced Chi-Squared
In order to account for the degrees of freedom in the data, a reduced chi-squared test was used to test the goodness of fit for both the weighted average and best fit values. Reduced chi-squared is commonly used for several purposes in astronomy, namely, model comparison and error estimation [8]. The reduced chi-squared value of a dataset is simply the chi-squared value divided by the degrees of freedom (n) of that dataset, as shown in the following relation: In the case of this analysis, for the weighted average calculations there were 59 degrees of freedom for σ 8 and 162 degrees of freedom for H 0 (one free parameter). For the linear fit calculations there were 58 degrees of freedom for σ 8 and 161 degrees of freedom for H 0 (two free parameters). When applying the χ 2 value calculated using the weighted average of the dataset to Equation (5), we get a reduced chi-squared (or, chi-squared per degree of freedom) of 3.20 for σ 8 and a reduced chi-squared value of 3.55 for H 0 . Likewise, the reduced chi-squared value obtained from the best fit function meant to minimize reduced chi-squared is 3.04 for σ 8 and is 2.95 for H 0 , both of which, in accordance to theory, are less than those calculated using the weighted average (0.16 difference for σ 8 and 0.60 for H 0 ).

Statistical Significance, Q
The probability that a calculated χ 2 value for a dataset with n degrees of freedom is due to chance is represented by Q and is given by the following expression: where Γ x is given by: and is known as the generalization of the factorial function to real and complex arguments [9]. Values with the largest contribution to χ 2 (bad values) were removed first.

Amplitude of Mass Fluctuations
For the value of χ 2 calculated using the weighted average of σ 8 (n = 59, χ 2 ≈ 189.0), the probability that the observed trend is due to chance is Q = 1.6 × 10 −15 . In order to reach a value for Q that is statistically significant (Q ≥ 0.05), 14 bad values must be removed from the data (n = 45, χ 2 ≈ 58.1548), producing a value for Q of 0.0902. For the value of χ 2 calculated using the best fit function designed to minimize χ 2 (n = 58, χ 2 ≈ 182.4), Q = 8.8 × 10 −15 . In order to reach a statistically significant value for Q, 10 bad values must be removed from the data (n = 48, χ 2 ≈ 61.0), producing a value for Q of 0.099. With this last subsample of 50 points, the best linear fit of σ 8 returned an A value of 0.787 ± 0.008 and a B value of (1.1 ± 0.5) × 10 −3 ; see Figure 4.

Hubble's Constant
For the value of χ 2 calculated using the weighted average of H 0 (n = 162, χ 2 ≈ 575.655), the probability that the observed trend is due to chance is Q = 1.0 × 10 −47 . In order to reach a value for Q that is statistically significant (Q ≥ 0.05), 36 bad values must be removed from the data (n = 125, χ 2 ≈ 152.5541), producing a value for Q of 0.0538. For the value of χ 2 (n = 161, χ 2 ≈ 480.1) calculated using the best fit function designed to minimize χ 2 , Q = 1.8 × 10 −33 . In order to reach a statistically significant value for Q, 24 bad values must be removed (n = 137, χ 2 ≈ 164.1), producing a value for Q of 0.057. With this last subsample of 139 points, the best linear fit of H 0 returned an A value of 65.9 ± 0.4 and a B value of 0.277 +0.032 −0.034 ; see Figure 5. The non-zero value of B is very significant; however, the error of B may be non-Gaussian and we cannot directly interpret this as significant evolution. The correlation factor of H 0 with time 1 is c = 0.027 ± 0.013, a 2σ significant correlation.

Conclusions and Discussion
The original Q values for both the weighted average and best fit calculations of the probability of the data for both parameters are extremely low before the removal of bad values. Even though this is the case, a rather large discrepancy can be seen in how many bad values need removing to reach a statistically significant dataset (Q ≥ 0.05). For the σ 8 values, to attain statistical significance, the weighted average calculation needs 14 bad values removed, whereas the best fit calculation needs only 10 bad values removed. For the H 0 values, to attain statistical significance, the weighted average calculation requires 36 bad values be removed, whereas the best fit calculation only needs 24 bad values removed. With the studies of both parameters ending in the aforementioned conclusions, it is reasonable to conclude that the linear fit with time (year-2000) on the x−axis and measurements of the parameters in question (σ 8 and H 0 ) on the y−axis is a better estimation of the data than the 1 For two independent variables X and Y, the correlation factor is defined as The Pearson correlation coefficient would be .
Universe 2020, 6, 114 8 of 17 weighted averaged of the data weighted with the inverse square proportion of the error of each value in question, a linear fit is a better estimate of the data than the weighted average. For H 0 , we observed a slight growing trend (at 2-σ level) in the value of the measurements in the last 43 years, although the interpretation of this upward trend as a random fluctuation is not excluded.
In addition to the increasing precision of measurements, it is concluded from this analysis that the error bars of the observed parameters have been largely underestimated in at least 20% of the measurements, or the systematic errors of the observation techniques were not fully considered. It should also be stated that, due to the simplifying assumption about the covariance of each observed measurement, 20% of the error bars being underestimated is a conservative percentage (in reality, it is a minimum of 20% the measurements). In the light of the analysis carried out in this paper, one would not be surprised to find cases like the 4.4σ discrepancy seen between the best measurement using Supernovae Ia in Riess et al. [10] of H 0 = 74.03 ± 1.42 km s −1 Mpc −1 and the value derived from cosmic microwave background radiation of H 0 = 67.4 ± 0.5 km s −1 Mpc −1 . It is likely that the underestimation of error bars for H 0 in many measurements contributes to the apparent 4.4σ discrepancy formally known as the Hubble tension. Acknowledgments: Thanks are given to Martin Sahlen and Andreas Korn for their suggestions for this work, and to Rupert Croft for providing data of his paper Croft and Dailey [1]. Thanks are given to the two anonymous referees for helpful comments.

Conflicts of Interest:
The authors declare no conflict of interest.