A Quantitative Comparison of Exoplanet Catalogs

In this study, we investigated the differences between four commonly-used exoplanet catalogs (exoplanet.eu; exoplanetarchive.ipac.caltech.edu; openexoplanetcatalogue.com; exoplanets.org) using a Kolmogorov-Smirnov (KS) test. We found a relatively good agreement in terms of the planetary parameters (mass, radius, period) and stellar properties (mass, temperature, metallicity), although a more careful analysis of the overlap and unique parts of each catalog revealed some differences. We quantified the statistical impact of these differences and their potential cause. We concluded that although statistical studies are unlikely to be significantly affected by the choice of catalog, it would be desirable to have one consistent catalog accepted by the general exoplanet community as a base for exoplanet statistics and comparison with theoretical predictions.

In this work, we present a simple statistical comparison between the different exoplanet catalogs. We mainly focus on the EU, ARCHIVE and OPEN catalogs. The database of the ORG catalog contains a single and reliable set of parameters for each planet. However, since it has not been updated for a couple of years now (see website and discussion in Reference [8]), we perform only a coarse comparison. As discussed in Reference [8], there are plans to restart regular updates in the near future.

Methods
We have downloaded lists of confirmed planets from the following four catalogs: EU, ARCHIVE, OPEN and ORG on 3 April 2018. As discussed previously, because of the different planetary mass criteria of each catalog (see Table 1), we set 10M J as an upper bound for the planetary mass, to strictly exclude any potential brown dwarfs. Thus, we avoided any biases that might emerge from the different mass cutoffs the catalogs use.
The parameters we use in order to compare the catalogs are the stellar mass (M * ), surface temperature (T e f f ) and metallicity ([Fe/H]), and planetary mass (M p ), radius (R p ) and orbital period (Period). We chose this set of six parameters because they are the fundamental parameters that are most easily available from current photometric and spectroscopic detection methods [12]. Physically, these parameters provide basic, broad information about the planetary system [6]. The process of deriving the stellar properties involves a collection of literature values for atmospheric properties (temperature, surface gravity, and metallicity) derived from different observational techniques (photometry, spectroscopy, asteroseismology, and exoplanet transits), and then fitting them to stellar isochrones (e.g., References [13,14]). The stellar properties are then used in the derivation of almost all planet properties from radial velocities (RV), transits or transit timing variation (TTV) data. Thus, a reliable estimate of these parameters is crucial for the quality of the planet properties estimate (e.g., Reference [15]).
In the framework of this analysis, we compared separately (and in combination) the planetary properties of the confirmed planets from the listed catalogs. In addition, we performed a comparison between planetary systems by examining the distributions of stellar and planetary properties of the main star and each system's first detected planet. By doing so, we were able to find the biases of the planet properties emerging from the stellar properties. There was no sense in comparing the stellar parameters of all the confirmed planets since it is possible to unintentionally give more weight to multi-planetary systems when performing the analysis. Table 2 lists the total number of confirmed planets and systems of each catalog as a function of the different stellar and planet properties. The significant variability of those numbers raised the following questions: Does the catalog with the largest number of planets include all the listed objects of the other catalogs? How different is the distribution of planets from two different catalogs? Most of the statistical work in this analysis is based on comparing the different sets using a two-sample Kolmogorov-Smirnov test (hereafter, KS test, [16]). Broadly speaking, the p-value of the KS test indicates to what extent two samples can be considered to be drawn from the same distribution-a high p-value indicates a good agreement. It is sensitive to differences in both shape and location of the empirical distribution functions of the two samples. A KS test comparison between two catalogs would compare the distributions of all available estimates of one of the planetary properties mentioned above. If a specific object's quantity is unavailable, we excluded the object from the comparison pertaining to this property. In cases where only lower/upper bounds were available, we set it to the listed value instead of excluding it. Thus, there were some cases in which two catalogs agreed on the planetary nature of a specific object, yet, since one of the catalogs had a missing value for some property, we excluded this planet from the test.
For each pair of catalogs, 'A' and 'B', we compiled three subsets: (1) The overlap subset, including all objects listed in both catalogs; (2) the unique 'A' subset, including only the planets that are unique to catalog 'A' and not listed in catalog 'B'; and (3) the unique 'B' subset, including only the planets that are unique to catalog 'B' and not listed in catalog 'A'. We then applied the KS test to compare the three subsets. We performed this analysis for each one of the six parameters separately, as well as a comparison of subsets that include information about the planetary mass, radius and orbital period. Figure 1 describes the methodology applied to compare the different catalogs. A problem we often encountered in our analysis is the possibility of different planet names and aliases listed in each of the catalogs. The differences are typically caused by spaces, uppercase/lowercase mix-up, and numbers or unique symbols that are used in spelling the stellar and planet names. Yet, in most cases, we found a given object to have two different names in two different catalogs. To overcome this difficulty, we first cross-matched the different objects according to their stellar names using the SIMBAD database, and then identified the different planetary names that populate each system. This practice imparted us with the following unfortunate conclusion: Currently, there is no consensus on an unambiguous method to mark a specific planet and to find its aliases in a convenient way. Therefore, there is no reliable way to cross-match two planetary tables.

Results
In general, we found the distributions of planets and systems from the different catalogs to be quite similar. An exception was the ORG catalog, where its missing planetary mass values were derived from some theoretical mass-radius (M-R) relation. See Appendix A for the detailed and complete analysis with the quantitative results (Tables A1 and A8; Figures A1 and A7). To infer the differences between the catalogs, we compared the derived populations of the overlap and unique subsets for two catalogs at a time, as discussed above. We found the distributions of the overlapping planets between the different catalogs to be similar. However, there were significant differences between the overlap and unique subsets. In Figure 2, a kernel density estimation (KDE) [17] for the probability density functions (PDF) of the mass, radius and orbital period for the different subsets is given. The solid blue curves represent the three overlap subsets when comparing each pair of the EU-ARCHIVE-OPEN catalogs. The brown curves represent the unique subsets, where the dashed, dashed-dotted and dotted curves correspond to the unique EU, ARCHIVE and OPEN subsets, respectively. Evidently, the overlap and unique subsets seem to have very different distributions for the three planet properties. Table 3 presents the KS tests p-values for the different comparisons between KS tests. We found a low p-value for most comparison tests. A problem we often encountered in our analysis is the possibility of different planet names and aliases listed in each of the catalogs. The differences are typically caused by spaces, uppercase/lowercase mix-up, and numbers or unique symbols that are used in spelling the stellar and planet names. Yet, in most cases, we found a given object to have two different names in two different catalogs. To overcome this difficulty, we first cross-matched the different objects according to their stellar names using the SIMBAD database, and then identified the different planetary names that populate each system. This practice imparted us with the following unfortunate conclusion: Currently, there is no consensus on an unambiguous method to mark a specific planet and to find its aliases in a convenient way. Therefore, there is no reliable way to cross-match two planetary tables.

Results
In general, we found the distributions of planets and systems from the different catalogs to be quite similar. An exception was the ORG catalog, where its missing planetary mass values were derived from some theoretical mass-radius (M-R) relation. See Appendix A for the detailed and complete analysis with the quantitative results (Tables A1 and A8; Figures A1 and A7). To infer the differences between the catalogs, we compared the derived populations of the overlap and unique subsets for two catalogs at a time, as discussed above. We found the distributions of the overlapping planets between the different catalogs to be similar. However, there were significant differences between the overlap and unique subsets. In Figure 2, a kernel density estimation (KDE) [17] for the probability density functions (PDF) of the mass, radius and orbital period for the different subsets is given. The solid blue curves represent the three overlap subsets when comparing each pair of the EU-ARCHIVE-OPEN catalogs. The brown curves represent the unique subsets, where the dashed, dashed-dotted and dotted curves correspond to the unique EU, ARCHIVE and OPEN subsets, respectively. Evidently, the overlap and unique subsets seem to have very different distributions for the three planet properties. Table 3 presents the KS tests p-values for the different comparisons between KS tests. We found a low p-value for most comparison tests. As for the property of planetary mass, we noted that a higher density of small mass planets ( ~10 ⊕) populate the unique subsets. We found most of these planets to be TTV planets, detected in multi-planetary systems. Usually, the mass of a planet detected through TTV is not resolved directly, and in practice, is degenerate with the planet's eccentric orbit [18]. As a result, we have many small TTV planets with an upper mass limit, rather than a nominal value. However, there is no uniform approach to displaying this mass limit. At times, the catalogs choose to omit the value altogether, while at other times it is displayed as an upper limit. However, in many cases, especially with the EU catalog, the mass limit is reported as a valid nominal estimate. Due to this difference in the mass property criteria, we found many small mass planets biasing the unique subset towards smaller masses.
For the planet radii and orbital period distributions, we found the EU and OPEN unique subsets to have a relative higher density of planets in radii values of ~1 , and periods of ~1000 days. Examining the planets that comprise these subsets suggests the reason for this difference is probably related to the different inclusion criteria the catalogs use. Some examples are: Listed planets where the confirmation paper has used some theoretical M-R relations to infer the planetary radius (or mass), some unusual large radii planets suffering strong tidal forces due to their proximity to the parent stars (in short periods), planets with 'strange' transiting light curves that make the planetary detection more controversial, etc. As for the higher consistency, in terms of the overlap and unique parts of the ARCHIVE catalog, we found it to be somewhat artificial. By examining the ARCHIVE unique planets, we found over 75% of them to be K2 planets, suggesting the reason for the good agreement with the overlap subsets related to the simple fact: Most of the overlap planets are transiting planets (Kepler and K2), with similar properties as the K2 planets.

OPEN-ARCHIVE
Overlap EU-OPEN Overlap ARCHIVE-EU Performing a similar analysis for the subset of planets in which all three properties of planetary mass, radius and orbital period were available, we found the unique and overlap subsets were different again (Figure 3), although having slightly higher p-values (Table 4) than the individual property comparisons presented above. As for the property of planetary mass, we noted that a higher density of small mass planets ( M p ∼ 10M ⊕ ) populate the unique subsets. We found most of these planets to be TTV planets, detected in multi-planetary systems. Usually, the mass of a planet detected through TTV is not resolved directly, and in practice, is degenerate with the planet's eccentric orbit [18]. As a result, we have many small TTV planets with an upper mass limit, rather than a nominal value. However, there is no uniform approach to displaying this mass limit. At times, the catalogs choose to omit the value altogether, while at other times it is displayed as an upper limit. However, in many cases, especially with the EU catalog, the mass limit is reported as a valid nominal estimate. Due to this difference in the mass property criteria, we found many small mass planets biasing the unique subset towards smaller masses.
For the planet radii and orbital period distributions, we found the EU and OPEN unique subsets to have a relative higher density of planets in radii values of R p ∼ 1R J , and periods of Period ∼ 1000 days. Examining the planets that comprise these subsets suggests the reason for this difference is probably related to the different inclusion criteria the catalogs use. Some examples are: Listed planets where the confirmation paper has used some theoretical M-R relations to infer the planetary radius (or mass), some unusual large radii planets suffering strong tidal forces due to their proximity to the parent stars (in short periods), planets with 'strange' transiting light curves that make the planetary detection more controversial, etc. As for the higher consistency, in terms of the overlap and unique parts of the ARCHIVE catalog, we found it to be somewhat artificial. By examining the ARCHIVE unique planets, we found over 75% of them to be K2 planets, suggesting the reason for the good agreement with the overlap subsets related to the simple fact: Most of the overlap planets are transiting planets (Kepler and K2), with similar properties as the K2 planets.  Performing a similar analysis for the subset of planets in which all three properties of planetary mass, radius and orbital period were available, we found the unique and overlap subsets were different again (Figure 3), although having slightly higher p-values (Table 4) than the individual property comparisons presented above. In this case, most of the overlap planets had large masses and radii, with short periods. This in itself is a bias, caused by the combined sensitivity of the RV and transit detection methods [19]. As for the distributions of the unique subsets, we found them to be compromised by a higher number of TTV-detected planets, causing the distributions to be almost uniform in the regime of both small and large planets. Another interesting aspect we observed about these unique subsets was their relative similarity between each other, as displayed quantitative in Table A10 displaying the unique subsets p-values. In spite of this relative similarity, these subsets populate different planets, causing this resemblance to be a product of the TTV biases in the catalogs planet mass inclusion criteria. Table 4. Same as Table 3, but for the subset of planets in which all three properties of planetary mass, radius and orbital period are available.
(3 × 10 -5 , <10 -10 , 3 × 10 -11 ) ---(0.02, 3 × 10 -7 , 8 × 10 -4 ) Unique OPEN (6 × 10 -5 , 7 × 10 -5 , 6 × 10 -4 ) (6 × 10 -3 , 2 × 10 -7 , 10 -3 ) ---After examining the planetary systems according to the stellar properties, we again found similarity between the overlap subsets and significant differences between the unique to overlap subsets ( Figure 4). We noted that the unique distributions of stellar mass and surface temperature were biased towards smaller mass and lower temperature stars (K, M-stars). Spectroscopy of small stars (especially M-stars) is challenging because of their intrinsic faintness and high activity [20]. As a result, detection of planets around these stars is purportedly more difficult and somewhat controversial, thus explaining why we found a higher relative number of these objects in the unique subsets. As for the stellar metallicity, we found it to be an unreliable property when analyzing possible biases in the catalog comparison. Although the metallicity is an important property, providing an early record of the chemical composition of the initial protoplanetary disk [21], the planetary detection methods do not rely on this property directly. The catalogs usually reported this property with high relative errors, probably linked to the imprecise derivation that is used to determine the metallicity value. Adding to this the fact that each catalog referred to different stellar survey sources, we found the highest inconsistency between catalogs to be in this parameter (when compared with the other stellar and planetary properties). Nevertheless, we still found the following trends: The unique subsets that included larger planets, especially in terms of the OPEN catalog, was biased towards higher metallicity values, as expected from previous studies [22,23]. On the other hand, small planets, especially listed in the unique ARCHIVE catalog, had a wider distribution of metallicities [24]. As before, the p-values for the different KS tests of stellar properties in the overlap vs. unique subsets are provided in Table 5. The seemingly improved mean p-value results of the OPEN catalog are caused by its relative smaller number of listed stellar property objects. In this case, most of the overlap planets had large masses and radii, with short periods. This in itself is a bias, caused by the combined sensitivity of the RV and transit detection methods [19]. As for the distributions of the unique subsets, we found them to be compromised by a higher number of TTV-detected planets, causing the distributions to be almost uniform in the regime of both small and large planets. Another interesting aspect we observed about these unique subsets was their relative similarity between each other, as displayed quantitative in Table A10 displaying the unique subsets p-values. In spite of this relative similarity, these subsets populate different planets, causing this resemblance to be a product of the TTV biases in the catalogs planet mass inclusion criteria. Table 4. Same as Table 3, but for the subset of planets in which all three properties of planetary mass, radius and orbital period are available.
After examining the planetary systems according to the stellar properties, we again found similarity between the overlap subsets and significant differences between the unique to overlap subsets ( Figure 4). We noted that the unique distributions of stellar mass and surface temperature were biased towards smaller mass and lower temperature stars (K, M-stars). Spectroscopy of small stars (especially M-stars) is challenging because of their intrinsic faintness and high activity [20]. As a result, detection of planets around these stars is purportedly more difficult and somewhat controversial, thus explaining why we found a higher relative number of these objects in the unique subsets. As for the stellar metallicity, we found it to be an unreliable property when analyzing possible biases in the catalog comparison. Although the metallicity is an important property, providing an early record of the chemical composition of the initial protoplanetary disk [21], the planetary detection methods do not rely on this property directly. The catalogs usually reported this property with high relative errors, probably linked to the imprecise derivation that is used to determine the metallicity value. Adding to this the fact that each catalog referred to different stellar survey sources, we found the highest inconsistency between catalogs to be in this parameter (when compared with the other stellar and planetary properties). Nevertheless, we still found the following trends: The unique subsets that included larger planets, especially in terms of the OPEN catalog, was biased towards higher metallicity values, as expected from previous studies [22,23]. On the other hand, small planets, especially listed in the unique ARCHIVE catalog, had a wider distribution of metallicities [24]. As before, the p-values for the different KS tests of stellar properties in the overlap vs. unique subsets are provided in Table 5.
The seemingly improved mean p-value results of the OPEN catalog are caused by its relative smaller number of listed stellar property objects.
Although it would be reasonable to detect most biases in the exoplanets catalogs in the extreme ends of the examined distributions, the analysis we have presented used a quantitative approach to study the biases the catalogs possess. We conclude that the biases in the catalogs are caused by: Some missing or obsolete information about a planets' properties (e.g., the system of 'Kepler 53 b,c' is labeled in the ARCHIVE catalog with main planetary masses of < 18 and < 15 , for planets 'b', 'c' respectively [25], however a later paper [26] finds the masses to be much smaller ~0.18 , ~0.11 , respectively, as listed in both EU and OPEN); model dependent planetary information based on some theoretical assumption and not a direct measurement (e.g., 'Kepler-446 b,c,d': Both EU and OPEN display their mass property, however, according to the reference article [27], the given value is only a coarse estimate for the planets' masses and expected RV semi-amplitude signatures using recent empirically-measured data); roughly estimated measurements, which is especially relevant for stellar parameters; or approximated upper limits as the nominal value (e.g., 'Kepler-114 c' is a TTV detected mass planet. The EU catalog displays its mass to be ~40 ⊕ [26]. However, since this measurement is an upper limit, we find it to be inconsistent with the nominal measurement the ARCHIVE displays of ~2.8 ⊕ [28], while the OPEN catalog does not include this planet).

Summary & Discussion
Our analysis suggests that, although the main exoplanet catalogs overlap significantly, which results in similar distributions for most astrophysical parameters, the small discrepancies between the subsets highlight some of the catalogs' biases. These biases can best be seen in the extreme ends of the examined distributions of small mass, long orbital period planets or small stars (less than our sun). These biases do not only result from different numbers of confirmed planets in each catalog, but mainly from contributing factors, related to the data collection policy of each catalog, such as: The process each catalog uses to present and collect the properties of a specific planet, the decision whether to include a controversial object as a planet, or the routine maintenance each catalog team performs to its current listed planets. Catalogs' subsets PDF's of stellar mass, metallicity and surface temperature properties using KDE, where: overlap is given in blue and unique is given in brown. In each plot, three overlap subsets and six unique subsets may be noted (see text for further details).
Although it would be reasonable to detect most biases in the exoplanets catalogs in the extreme ends of the examined distributions, the analysis we have presented used a quantitative approach to study the biases the catalogs possess. We conclude that the biases in the catalogs are caused by: Some missing or obsolete information about a planets' properties (e.g., the system of 'Kepler 53 b,c' is labeled in the ARCHIVE catalog with main planetary masses of M P < 18 M J and M P < 15 M J , for planets 'b', 'c' respectively [25], however a later paper [26] finds the masses to be much smaller M P ∼ 0.18 M J , M P ∼ 0.11 M J , respectively, as listed in both EU and OPEN); model dependent planetary information based on some theoretical assumption and not a direct measurement (e.g., 'Kepler-446 b,c,d': Both EU and OPEN display their mass property, however, according to the reference article [27], the given value is only a coarse estimate for the planets' masses and expected RV semi-amplitude signatures using recent empirically-measured data); roughly estimated measurements, which is especially relevant for stellar parameters; or approximated upper limits as the nominal value (e.g., 'Kepler-114 c' is a TTV detected mass planet. The EU catalog displays its mass to be M p ∼ 40M ⊕ [26]. However, since this measurement is an upper limit, we find it to be inconsistent with the nominal measurement the ARCHIVE displays of M p ∼ 2.8M ⊕ [28], while the OPEN catalog does not include this planet).

Summary & Discussion
Our analysis suggests that, although the main exoplanet catalogs overlap significantly, which results in similar distributions for most astrophysical parameters, the small discrepancies between the subsets highlight some of the catalogs' biases. These biases can best be seen in the extreme ends of the examined distributions of small mass, long orbital period planets or small stars (less than our sun). These biases do not only result from different numbers of confirmed planets in each catalog, but mainly from contributing factors, related to the data collection policy of each catalog, such as: The process each catalog uses to present and collect the properties of a specific planet, the decision whether to include a controversial object as a planet, or the routine maintenance each catalog team performs to its current listed planets.
Furthermore, in our analysis, we excluded planets with masses larger than M p > 10M J . However the different catalogs use different mass boundaries, which also adds to their different biases. Unfortunately, most of the biases we found are due to the use of various subjective criteria in compiling and maintaining the database. Although all catalogs usually include in their database planets announced in peer-reviewed publications, this should not be the only criterion for a confirmed planet. We suggest that the explosive growth in the known planet population in recent years once again highlights the need for a more rigorous and objective mechanism to tag planets as confirmed. The differences among the catalogs demonstrate that there are conflicting views in the community regarding such criteria. The International Astronomical Union (IAU) is an objective and well-accepted authority by the community, and we therefore suggest that a central catalog could be maintained by Division F (Planetary Systems and Bio-astronomy) of the IAU, and specifically its Commission F2 (Exoplanets and the Solar Systems). Discussions within the commission should resolve the various differences and arrive at a system that can be agreed upon.
After performing this analysis and scrutinizing the different calculated biases, we can carefully make the following statements:

•
The ARCHIVE catalog is the most up-to-date catalog, with recent Kepler and K2 planet discoveries. It is also the least biased catalog in terms of the interpretation of the mass upper limit, being the true value or the adoption of a model-based value instead of a genuine measurement. Another interesting feature the catalog has is a list of "removed targets" displaying objects that had been listed in the catalog but were removed, suggesting a more rigorous process applied by the ARCHIVE team.

•
The EU catalog is less restrictive when listing the planetary properties, and therefore could include imprecise estimates. The EU catalog differs the most with the overlap subsets, probably due to its more permissive acceptance criteria and the use of mixed sources of information. However, it has the most wide and large coverage of planets.

•
The OPEN catalog is somewhere in the middle, between ARCHIVE and EU. In some cases, we find that it resembles the EU subsets, while in others the ARCHIVE. This might not be surprising, given that this catalog is an open-source catalog which is managed and updated by the astronomical community. Although its interface is elegant and user friendly, it has its drawbacks, especially the lack of detection reference and a smaller planet population.
Finally, while each catalog suffers biases, for an exostatistics work, there should not be too much difference among the databases, since the planet population (especially the one compared in this work) is large enough to wash out the small biases and discrepancies. Nevertheless, we find the fusion of catalogs (the overlap subset) a powerful tool as a starting point for increasing the reliability of exostatistics research. A promising platform seems to be the Data & Analysis Center for Exoplanets (DACE) database (https://dace.unige.ch), which includes a linked table to commonly-used exoplanet catalogs. DACE offers an accessible option to check the properties of a specific planet listed in different catalogs, and to compare its properties as they are displayed on the catalogs.
Besides a careful and detailed inspection of each exoplanet related paper confirmation, other useful techniques that can be used to increase the confidence of some exoplanet databases is to check other related parameters such as: Discovery date and update times, which can solve issues of "catch-up" times between catalogs and the rate by which they upload new exoplanets; a measure of the velocity semi-amplitude K parameter can suggest the mass measurement is truly deduced from a RV measurement and not derived from some theoretical model; a TTV flag with reported eccentricity parameter can suggest the reported mass measurement is probably not an upper limit, but some nominal value.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A.1 Comparison of Planetary Properties
We first compared the different catalogs using a KS test for the planet properties of mass (M P ), radius (R P ) and orbital period (Period). Table 2 summarizes the number of available objects in each catalog for the different properties. We present in Table A1 and Figure A1 the p-values of the corresponding KS tests and relevant empirical cumulative distribution functions (CDF) of each subset respectively. The distributions of the planetary properties from the different catalogs were found to be very similar, apart from the ORG catalog, which bases its missing planetary mass values on a theoretical mass-radius relation.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Details of the Comparison
Appendix A. 1

. Comparison of Planetary Properties
We first compared the different catalogs using a KS test for the planet properties of mass ( ), radius ( ) and orbital period ( ). Table 2 summarizes the number of available objects in each catalog for the different properties. We present in Table A1 and Figure A1 the p-values of the corresponding KS tests and relevant empirical cumulative distribution functions (CDF) of each subset respectively. The distributions of the planetary properties from the different catalogs were found to be very similar, apart from the ORG catalog, which bases its missing planetary mass values on a theoretical mass-radius relation.  Although the catalogs seemed similar in this comparison, to reveal their true differences, we applied the analysis discussed in Section 2 to compare the overlap and unique subsets between each pair of the EU-ARCHIVE-OPEN catalogs. In all the comparisons we also investigated whether there was any difference between the properties of the overlap parts listed in catalog 'A' or 'B'. In all cases, the p-values were nearly one, suggesting that the two catalogs are comparable. In the following subsections we present the comparison for the different planetary properties.

Planet Property EU-ORG EU-ARCHIVE ARCHIVE-ORG EU-OPEN OPEN-ARCHIVE OPEN-ORG
Appendix A. 1

.1. Planetary Mass
We summarize the results of the planetary mass comparisons in Tables A2 and A3. In these tables (and the following tables), each row represents two catalogs we compare, where the columns represent the number of planets in that subset (Table A2) or the p-values of the comparison between each two subsets (Table A3). We abbreviate by 'UA' and 'UB' the unique subsets 'A' and 'B' respectively. For instance, the first row in Table A3 represents (by column order from left to right): Although the catalogs seemed similar in this comparison, to reveal their true differences, we applied the analysis discussed in Section 2 to compare the overlap and unique subsets between each pair of the EU-ARCHIVE-OPEN catalogs. In all the comparisons we also investigated whether there was any difference between the properties of the overlap parts listed in catalog 'A' or 'B'. In all cases, the p-values were nearly one, suggesting that the two catalogs are comparable. In the following subsections we present the comparison for the different planetary properties. Appendix A. 1

.1 Planetary Mass
We summarize the results of the planetary mass comparisons in Tables A2 and A3. In these tables (and the following tables), each row represents two catalogs we compare, where the columns represent the number of planets in that subset (Table A2) or the p-values of the comparison between each two subsets (Table A3). We abbreviate by 'UA' and 'UB' the unique subsets 'A' and 'B' respectively. For instance, the first row in Table A3 represents (by column order from left to right): The p-value of the comparison between the overlap of the EU and OPEN catalogs with the unique part of EU catalog; the p-value of the comparison between the overlap of the EU and OPEN catalogs with the unique part of OPEN catalog; and the p-value of the comparison between the unique EU and OPEN subsets (see also Figure 1, for a graphic explanation).
The CDFs of the different subsets is shown in Figure A2. We found that the number of planets in the overlap subsets was large, yet the population of the unique subsets (in this section and the following to come) were non-negligible. Moreover, although the number of planets in each catalog was different, there was no single catalog that included all the objects from the other catalog. We found that the unique subsets were different from the overlap subsets with the exception of the unique ARCHIVE and OPEN subsets, when comparing them with the overlap subset of the EU. Correspondingly, we also found the unique parts of the ARCHIVE and OPEN to be very similar based on the comparison between these two catalogs. The p-value of the comparison between the overlap of the EU and OPEN catalogs with the unique part of EU catalog; the p-value of the comparison between the overlap of the EU and OPEN catalogs with the unique part of OPEN catalog; and the p-value of the comparison between the unique EU and OPEN subsets (see also Figure 1, for a graphic explanation). The CDFs of the different subsets is shown in Figure A2. We found that the number of planets in the overlap subsets was large, yet the population of the unique subsets (in this section and the following to come) were non-negligible. Moreover, although the number of planets in each catalog was different, there was no single catalog that included all the objects from the other catalog. We found that the unique subsets were different from the overlap subsets with the exception of the unique ARCHIVE and OPEN subsets, when comparing them with the overlap subset of the EU. Correspondingly, we also found the unique parts of the ARCHIVE and OPEN to be very similar based on the comparison between these two catalogs.  We found that most of the disagreement between the overlap and unique subsets to be concentrated in the region of lower-mass planets. At present, most planets with masses < 10 ⊕ , are estimated from TTVs in multi-planetary systems. Usually, the mass of a planet detected through TTV is not resolved directly, and in practice, is degenerate with the planet's eccentric orbit [18], following knowledge on an upper mass limit, rather than a nominal value. Consequently, we found the catalogs use different inclusion criteria for these kinds of planets with no uniform approach to displaying this mass limit. Sometimes the catalogs chose to omit the value altogether, sometimes to display it as an upper limit, but in many cases, the mass limit was reported as a legitimate nominal We found that most of the disagreement between the overlap and unique subsets to be concentrated in the region of lower-mass planets. At present, most planets with masses < 10M ⊕ , are estimated from TTVs in multi-planetary systems. Usually, the mass of a planet detected through TTV is not resolved directly, and in practice, is degenerate with the planet's eccentric orbit [18], following knowledge on an upper mass limit, rather than a nominal value. Consequently, we found the catalogs use different inclusion criteria for these kinds of planets with no uniform approach to displaying this mass limit. Sometimes the catalogs chose to omit the value altogether, sometimes to display it as an upper limit, but in many cases, the mass limit was reported as a legitimate nominal estimate. The derived CDF presented in Figure A2 suggest the EU catalog, as opposed to the ARCHIVE and OPEN, displays many of its TTV planets' mass as the reported upper limit value given from their confirmation paper without any strict filtering. The modest agreement we found between the OPEN and ARCHIVE unique parts was artificial, driven from the similar number of TTV planets the two catalogs choose to include.

Appendix A.1.2 Planetary Radius and Orbital Period
Although we inspected the properties of planetary radius and orbital period separately, we found the result of the comparison analysis between their overlap and unique subsets to be similar. We present the number of planets with reported radii, calculated p-values and CDFs of the different subsets in Tables A4 and A5 and Figure A3, respectively. We present the number of planetary orbital periods and calculated p-values and CDFs of the different subsets in Tables A6 and A7 and Figure A4, respectively. Although we inspected the properties of planetary radius and orbital period separately, we found the result of the comparison analysis between their overlap and unique subsets to be similar. We present the number of planets with reported radii, calculated p-values and CDFs of the different subsets in Tables A4 and A5 and Figure A3, respectively. We present the number of planetary orbital periods and calculated p-values and CDFs of the different subsets in Tables A6 and A7 and Figure  A4, respectively.      In the analysis of both properties, we found relative similarity between the overlap and unique subsets of the ARCHIVE catalog and a large difference with the EU and OPEN subsets. It seems that the OPEN and EU unique parts are shifted towards larger radii ( > 1 ) or longer orbital periods. However, when we examined the list of ARCHIVE unique planets we found over 75% of them to be K2 planets. We found also a similar ratio of Kepler and K2 planets that populate the overlap ARCHIVE subset. These two facts together suggest the reason for the good agreement we found between the ARCHIVE overlap and unique subsets related to the similar properties and biases that emerged from the transiting detected Kepler and K2 planets: Relative small radii ( < 4 ⊕ ) and short orbital period ( < 100 ) [29]. Other differences we found between the listed subsets are: Listed planets where the confirmation paper has used some theoretical M-R relations to infer the planetary radius (or mass), some unusual large radii planets suffering strong tidal forces due to their proximity to the parent stars (in short periods), planets with a 'strange' transiting light curves that make the planetary detection more controversial, etc.
We conclude that the reason for the disagreement between the catalogs radii and orbital period distributions is derived from: (1) Different criteria for including planets in the catalogs that do not emerge directly from the criteria presented in Table 1; (2) Different update and confirmation processes for new candidates. In this subsection we compare the total number of planets in each catalog for a subset in which all three properties of planetary mass, radius and orbital period are available. Information of these three physical properties can provide important constraints for planetary characterization and formation models [7]. Most of the current MRP planets were detected with both RV and transit methods while, as of today, most confirmed mass-radius planets are not TTV detected [30]. Consequently, we expect to detect especially large mass-radius, close orbit planets (Hot-Jupiters) around bright solar-like stars. As before, we started by analyzing the MRP distributions of the different catalogs (including the ORG catalog). We present in Table A8 and Figure A5 the p-values of In the analysis of both properties, we found relative similarity between the overlap and unique subsets of the ARCHIVE catalog and a large difference with the EU and OPEN subsets. It seems that the OPEN and EU unique parts are shifted towards larger radii (R P > 1R J ) or longer orbital periods. However, when we examined the list of ARCHIVE unique planets we found over 75% of them to be K2 planets. We found also a similar ratio of Kepler and K2 planets that populate the overlap ARCHIVE subset. These two facts together suggest the reason for the good agreement we found between the ARCHIVE overlap and unique subsets related to the similar properties and biases that emerged from the transiting detected Kepler and K2 planets: Relative small radii (R P < 4R ⊕ ) and short orbital period (Period < 100 days) [29]. Other differences we found between the listed subsets are: Listed planets where the confirmation paper has used some theoretical M-R relations to infer the planetary radius (or mass), some unusual large radii planets suffering strong tidal forces due to their proximity to the parent stars (in short periods), planets with a 'strange' transiting light curves that make the planetary detection more controversial, etc.
We conclude that the reason for the disagreement between the catalogs radii and orbital period distributions is derived from: (1) Different criteria for including planets in the catalogs that do not emerge directly from the criteria presented in Table 1; (2) Different update and confirmation processes for new candidates. In this subsection we compare the total number of planets in each catalog for a subset in which all three properties of planetary mass, radius and orbital period are available. Information of these three physical properties can provide important constraints for planetary characterization and formation models [7]. Most of the current MRP planets were detected with both RV and transit methods while, as of today, most confirmed mass-radius planets are not TTV detected [30]. Consequently, we expect to detect especially large mass-radius, close orbit planets (Hot-Jupiters) around bright solar-like stars. As before, we started by analyzing the MRP distributions of the different catalogs (including the ORG catalog). We present in Table A8 and Figure A5 the p-values of the corresponding KS tests and relevant empirical cumulative distribution functions (CDF) of each subset respectively.  Figure A5. Same as Figure A1, but for planets with combined information of planetary properties.
Similar with the overall catalogs comparison, excluding the ORG catalog with its added theoretical supplement, all three other catalogs agreed on their distributions. We noted that as expected, the distributions were different than those displayed for the separate properties (see Figure A1), with indeed a higher number of planets in the large mass-radius and small period regimes.
By comparing the total numbers, p-values and distributions of the catalogs subsets (Tables A9  and A10 and Figure A6), we found most of the disagreements between the overlap and unique subsets to be evident in the small planetary regime of TTVs' detected planets. We found the unique subsets, and especially the planetary masses distributions, to be similar between the catalogs, although the listed planets are different. We conclude that most of the disagreement with the planets of listed MRP comes especially from the different approaches each catalog uses for the confirmed TTV planets. 6 × 10 -4 3 × 10 -11 0.05 0.02 3 × 10 -7 0.51 Figure A5. Same as Figure A1, but for planets with combined information of planetary properties.
Similar with the overall catalogs comparison, excluding the ORG catalog with its added theoretical supplement, all three other catalogs agreed on their distributions. We noted that as expected, the distributions were different than those displayed for the separate properties (see Figure A1), with indeed a higher number of planets in the large mass-radius and small period regimes.
By comparing the total numbers, p-values and distributions of the catalogs subsets (Tables A9  and A10 and Figure A6), we found most of the disagreements between the overlap and unique subsets to be evident in the small planetary regime of TTVs' detected planets. We found the unique subsets, and especially the planetary masses distributions, to be similar between the catalogs, although the listed planets are different. We conclude that most of the disagreement with the planets of listed MRP comes especially from the different approaches each catalog uses for the confirmed TTV planets.

. Planetary Systems
We first compared the different catalogs by using a KS test for the stellar properties of the planetary systems of mass ( * ), metallicity ( [ / ]) and surface temperature ( ). Table 2 summarizes the number of available objects in each catalog for the different properties. We present in Table A11 and Figure A7 the p-values of the corresponding KS tests and relevant empirical CDFs of each subset respectively. We found the distributions to be very similar between all catalogs. The slightly lower metallicity p-values we found was caused by the large error that follows the

Appendix A.2 Planetary Systems
We first compared the different catalogs by using a KS test for the stellar properties of the planetary systems of mass (M * ), metallicity ([Fe/H]) and surface temperature (T e f f ). Table 2 summarizes the number of available objects in each catalog for the different properties. We present in Table A11 and Figure A7 the p-values of the corresponding KS tests and relevant empirical CDFs of each subset respectively. We found the distributions to be very similar between all catalogs. The slightly lower metallicity p-values we found was caused by the large error that follows the metallicity property measurement and does not relate to any different distribution between the catalogs.  Similar with the planetary properties, we next compared the overlap and unique stellar properties subsets between each pair of the EU-ARCHIVE-OPEN catalogs. We performed the following analysis for the stars of each planetary system. As mentioned in Section 2, we also compared the distributions of the first detect planet in each system to better understand the reasons for the possible differences.
Appendix A.2.1. Stellar Mass and Surface Temperature Finding the stellar mass is an important prior when assessing a planet's mass by the RV method [31]. We expected to detect most of our overlap planets around F, G and K stars since most of the planetary detection projects and efforts as of today have been dedicated towards searching planets around a solar mass star. We present the number of objects with reported stellar mass (and first planet's planetary properties), calculated p-values and CDFs of the different subsets in Tables A12  and A13 and Figures A8 and A9, respectively. We found the unique vs overlap subsets to be different, with higher disagreement in the regime of small-mass stars (especially K and M-stars). The spectrum of M-stars presents a difficulty for measuring, due to its intrinsic faintness and high activity [20]. Consequently, detection of planets around these stars is supposed to be more difficult and somewhat controversial, thus explaining why we found a higher relative number of these objects in the unique subsets. Similar with the planetary properties, we next compared the overlap and unique stellar properties subsets between each pair of the EU-ARCHIVE-OPEN catalogs. We performed the following analysis for the stars of each planetary system. As mentioned in Section 2, we also compared the distributions of the first detect planet in each system to better understand the reasons for the possible differences.

Appendix A.2.1 Stellar Mass and Surface Temperature
Finding the stellar mass is an important prior when assessing a planet's mass by the RV method [31]. We expected to detect most of our overlap planets around F, G and K stars since most of the planetary detection projects and efforts as of today have been dedicated towards searching planets around a solar mass star. We present the number of objects with reported stellar mass (and first planet's planetary properties), calculated p-values and CDFs of the different subsets in Tables A12 and A13 and Figures A8 and A9, respectively. We found the unique vs overlap subsets to be different, with higher disagreement in the regime of small-mass stars (especially K and M-stars). The spectrum of M-stars presents a difficulty for measuring, due to its intrinsic faintness and high activity [20]. Consequently, detection of planets around these stars is supposed to be more difficult and somewhat controversial, thus explaining why we found a higher relative number of these objects in the unique subsets.  When comparing the planetary properties of the first detected planets in these systems, we observed a bias towards small planets that were probably easier to detect around lower mass stars. We also found a small fraction of long orbital period planets ( ~1000 days) derived from direct imaging of large, high mass, semi-major axis objects around a small mass star or brown dwarf (especially evident in the EU subsets). Examining the stellar surface temperature (not displayed here), we found the unique vs. overlap subsets analysis to be analogous to that of the stellar mass analysis. This was no surprise, especially because of the well-known correlation the two properties possess according to some massluminosity relation [32]. For this reason, we have chosen not to elaborate about it here. When comparing the planetary properties of the first detected planets in these systems, we observed a bias towards small planets that were probably easier to detect around lower mass stars. We also found a small fraction of long orbital period planets ( Period ∼ 1000 days) derived from direct imaging of large, high mass, semi-major axis objects around a small mass star or brown dwarf (especially evident in the EU subsets).
Examining the stellar surface temperature (not displayed here), we found the unique vs. overlap subsets analysis to be analogous to that of the stellar mass analysis. This was no surprise, especially because of the well-known correlation the two properties possess according to some mass-luminosity relation [32]. For this reason, we have chosen not to elaborate about it here.

Stellar Metallicity
We found the catalogs usually reported the metallicity property with high relative errors, probably linked to imprecise derivation that is used to determine the metallicity value. Combined with the different survey sources the exoplanet catalogs choose to present in their sites, we consequently found the highest inconsistency between catalogs to be with this parameter (as compared with the other stellar and planetary properties). We present the number of objects with reported stellar metallicity (and first planet's planetary properties), calculated p-values and CDFs of the different subsets in Tables A14 and A15 and Figures A10 and A11, respectively. We found the unique vs. overlap subsets to be yet again different. While the overlap subsets have a clear peak around solar metallicity, the unique subsets have a wider range of metallicities values. Combining information we acquired from the system's first detected planet, we noted the unique OPEN subset to be populate with a relatively low number of small planets: The OPEN catalog lists fewer Kepler planets with their stellar metallicity, and hence they automatically move to the opposite unique subset. We found the unique subsets of the ARCHIVE catalog to be especially distributed with a wide variance coming from its population of many K2 and Kepler small radii planets [24].

Appendix A.2.2 Stellar Metallicity
We found the catalogs usually reported the metallicity property with high relative errors, probably linked to imprecise derivation that is used to determine the metallicity value. Combined with the different survey sources the exoplanet catalogs choose to present in their sites, we consequently found the highest inconsistency between catalogs to be with this parameter (as compared with the other stellar and planetary properties). We present the number of objects with reported stellar metallicity (and first planet's planetary properties), calculated p-values and CDFs of the different subsets in Tables A14 and A15 and Figures A10 and A11, respectively. We found the unique vs. overlap subsets to be yet again different. While the overlap subsets have a clear peak around solar metallicity, the unique subsets have a wider range of metallicities values. Combining information we acquired from the system's first detected planet, we noted the unique OPEN subset to be populate with a relatively low number of small planets: The OPEN catalog lists fewer Kepler planets with their stellar metallicity, and hence they automatically move to the opposite unique subset. We found the unique subsets of the ARCHIVE catalog to be especially distributed with a wide variance coming from its population of many K2 and Kepler small radii planets [24].     larger than our 10 M J cutoff, dropped from our analysis). The explanation for this difference is referred in EU to be related with the planetary inclination, measured by astrometry, to be small ( i ∼ 1.75 • , [33]), resulting indeed with a M P ·sini ∼ 0.37 M J that was set to be the planet mass in the OPEN catalog. (II) Planetary radius inconsistency: 'CoRoT-21 b' is a R p ∼ 1.3R J planet in short orbit ( Period ∼ 2.7 days) which exchanges extreme tidal forces with its parent star [34], and is listed in the EU and OPEN catalogs only; On the other hand, the radius of 'HD 219134 d' with a bottom radius limit > 1.6R ⊕ [35] is listed only on the ARCHIVE catalog. (III) Planetary orbital period inconsistency: '51 Eri b' is an imaged astrometric giant planet with an assumed orbital period of period~14965 days (41 years) listed on both the EU and OPEN catalogs [36], but with missing information in the ARCHIVE, displaying only the semi-major axis measurement [37]. Another difference between the catalogs for this planet is with its reported planetary mass: M p ∼ 2 M J (OPEN and ARCHIVE) M p ∼ 9 M J (EU); 'Kepler-37 e' is a TTV detected planet reported only on the ARCHIVE and OPEN catalogs with no extra reported information except to each period of period~51.19 days [26]. (IV) Stellar mass inconsistency: 'HIP 57050 b' [38] is listed in all three catalogs but with a missing stellar mass measurement ( M * ∼ 0.34M) in the ARCHIVE catalog; 'OGLE-2014-BLG-1722 b' is a M p ∼ 55.3M ⊕ planet detected by the Microlensing method around a M * ∼ 0.4M star (two-planet system), listed only on the EU catalog [39].