# Development of a Geogenic Radon Hazard Index—Concept, History, Experiences

## 1. Introduction

## 2. Concepts

#### 2.1. Geogenic and Anthropogenic Factors that Contribute to Indoor Radon

- A structural aspect that reflects the regional characteristic of the phenomenon, i.e., the trend;
- A random aspect that is the partly spatially structured, partly unstructured variability from one point to another at a local scale around the trend.

_{0})/(−log

_{10}k − 10)

_{10}, the logarithm to base 10. SRC

_{0}, a small value, has been originally introduced for statistical reasons, but is set to zero by many authors, e.g., [34]. In [14], it was set to 1 kBq/m³. The numerical value of GRPNez depends on the sampling protocol, e.g., sampling depth and collection period (grab sampling or longer-term collection). As an example, for applications outside the Czech Republic [35,36], the German GRP map [34] is also based on the Neznal-GRP, but applies a sampling protocol [37] that is slightly different from the original Czech one.

#### 2.2. History of the Geogenic Radon Hazard Index

#### 2.3. Concept and Desired Properties of the GRHI

- a quantity which measures the contribution of geogenic factors to the potential risk that exposure to indoor Rn causes;
- a quantity which measures the availability of geogenic Rn at surface level;
- a measure of susceptibility of a location or of an area to increased indoor radon concentration for geogenic reasons;
- a measure of “Rn proneness” or “Rn priorityness” (in the logic of the BSS) of an area due to geogenic factors; i.e., a tool to decide whether an area is RPA.

- (I)
- consistency, across borders between regions, characterized by different databases used for the estimation; this implies independence of the actual database used,
- (II)
- exhaustiveness, which should reflect as much as possible the available geogenic information;
- (III)
- simplicity, which should be simple to calculate;
- (IV)
- predictor of the IRC, which should be a valid predictor of the geogenic contribution of indoor Rn concentration. This is motivated by its very concept.

#### 2.4. A Taxonomy of Approaches to Define a Geogenic hazard Index

## 3. Methods

#### 3.1. The Geogenic Radon Potential Compared to the Geogenic Radon Hazard Index

#### 3.2. Databases

- Geological maps:
- Soil properties: LUCAS database [76]; the database includes the following quantities (among others): topsoil fine fraction (as proxy of the soil permeability); available water content (AWC) (proxy of the soil porosity), chemical properties [77]. Another database of soil information is SoilGrid, containing global data estimated on a fine grid by machine learning [78].
- Aquifers (International Hydrogeological Map of Europe (IHME) 1:1,500,000) [81].
- Ambient dose rate: Across Europe, more than 5000 automatic stations continuously monitor ambient dose rate (ADR) as part of national radiological emergency warning systems. The data are stored and displayed by European Radiological Data Exchange Platform (EURDEP) [82,83] and the EANR. Normally, the ADR represents the natural background, of which their terrestrial component ([84]) is mainly due to natural radionuclides U, Th (more precisely their progeny), and K. Therefore, ADR is a proxy to geogenic radon (see below). A problem is that the data originate from technically different systems of which their harmonization is difficult.

_{1}in the figure, which is statistically related to IRC (=Z

_{2}) because both share the same predictor, namely the uranium content in the ground (Z

_{0}). However, both ADR and IRC are also influenced by other variables, e.g.,

^{137}Cs fallout and Th concentration in soil (Z

_{0}”’ and Z

_{0}”) influencing dose rate and ground permeability (Z

_{0}′), the IRC; therefore, their correlation is weak.

#### 3.3. Estimation Methods

#### 3.3.1. Concepts Type A

#### Multivariate Classification

#### Principal Component Analysis (PCA)

#### Transfer Models

_{1}, …, Y

_{n}); if predictor Y

_{i}is not available, estimate it from different variables U

_{i}as Y

_{i}= f

_{(i)}(U

_{1}, …, U

_{k}) and so on. Rules are look-up tables, which associate a level of a categorical variable with a needed Y

_{i}; (e.g., factor = geology, level i of this factor = L

_{i}= quaternary sediment, which has Y

_{j}= mean soil Rn concentration value y

_{j}= 20 kBq/m³). The transfer formulas are deduced from studies about relationships between geogenic variables.

_{1}(x), …, Y*

_{n}(x)), Y*(x) the interpolated value. Alternatively, (2), calculate GRHI(x

_{i}) at points x

_{i}, where predictors are available, and afterwards subject GRHI to geostatistics to obtain interpolated GRHI* (x) for every x.

#### Spatial Multi-Criteria Decision Analysis (SMCDA)

#### 3.3.2. Concepts Type B

#### Multivariate Regression (MR)

#### Machine Learning (ML)

## 4. Exemplifying Preliminary Results

#### 4.1. Geological Classification

#### 4.2. Multiple Regression

- Geochemistry: A combination of FOREGS and GEMAS databases, 59 elements; missing uranium values estimated by lanthanum and cerium because these elements are highly correlated; about 5000 data points in Europe.
- Soil properties: from LUCAS; point data projected to geochemical data points by geostatistics. Fine fraction tentatively defined asFF = (clay + silt + 0.05 sand)/(100 + coarse fraction)
- Geology: IGME 5000.

_{2}O, Al

_{2}O

_{3}, SiO

_{2}, Fe

_{2}O

_{3}, CaO, and geo1; with geo1 = {carbonate, meta-sediments, siliciclastics, Cenozoic sediments, basic igneous rocks, intermediate igneous, pre-Variscan acid igneous; Variscan acid igneous, post-Variscan acid igneous}.

_{cell}[ln(IRC)] = arithmetic mean of the logarithms of IRC within 10 km × 10 km cell), interpolated to geochemical locations, i.e., AML in hypothetical cells around these locations.

_{2}O, ln(U)} as the best predictor, which explains r² = 26% of variance. Inclusion of annual mean temperature would increase this to 29%.

_{Z}(z). Different rescaling is equally possibly, e.g., by linear rescaling, z ⟶ (z–z

_{min})/(z

_{max}–z

_{min}), tgh, or nscore transforms. The result is shown in Figure 7. In the map, 0 is the lowest and 1 is the highest GRHI.

#### 4.3. Machine Learning

- Geology: IGME 5000: lithological unit (attribute “Portr_Petr”, 92 classes);
- Hydrogeology: IHME 1500 ([116]): attribute “Litho level 2” (85 classes);
- Soil: regions of Europe (285 classes) ([117]);
- Soil physical properties [76]: Silt content, Clay content, available water capacity, bulk density, coarse fragments;
- Soil hydraulic properties: hydraulic conductivity [118]: Saturated hydraulic conductivity (at depths 0 cm, 60 cm, and 200 cm);
- Location: Longitude and latitude.

- (1)
- categorical predictor data (geology, hydrogeology, soil regions) could be re-classified with respect to Rn to reduce the classes and the risk of over-fitting.
- (2)
- no external predictor selection procedure was applied, only the model inherent predictor selection. This might result in the appearance of non-informative predictors in the final model and might cause over-fitting.
- (3)
- The cross-validation procedure in this study (stratified sampling) did not account for spatial auto-correlation in the data. This might produce a too optimistic r² as a consequence of spatial auto-correlation because test data might be within the correlation length of training data (see [119] for details). Therefore, independence between training and test data is not guaranteed. In newer versions (currently in work), spatial cross-validation is being implemented.

#### 4.4. Principal Component Analysis

- Geochemistry: GEMAS + FOREGS, U, Th, and K, as in the European Atlas of Natural Radiation.
- Soil properties: Fine fraction FF in topsoil from LUCAS, as in the Natural Atlas.
- Tectonic fault lines: global fault layer from ArcAtlas, ESRI; areal density.
- Earthquake epicenters: [120].
- Geothermal and volcanic areas: in terms of heat flow (the heat flow map of Europe has been obtained by analyzing the Global Heat Flow (International Heat Flow Commission of the International Association of Seismology and Physics of the Earth’s Interior, IASPEI).

_{(over variables j)}w

^{(1)}

_{j}y

_{j}(x)

_{j}(x)—value of variable j (e.g., U concentration etc.) at location x, w

^{(1)}

_{j}—loading of variable j in the first principal component = abscissa (F1) value in Figure 10.

## 5. Conclusions

## Acronyms

AD(E)R | ambient dose (equivalent) rate (usually nSv/h or µSv/h, ADR also nGy/h) |

AM | arithmetic mean |

BSS | Basic Safety Standards |

EANR | European Atlas of Natural Radiation |

FF | fine fraction of soil matter (dimensionless) |

GIS | Geographic information system |

GRHI | geogenic radon hazard index (dimensionless) |

GRP | geogenic radon potential (usually treated as dimensionless value) |

IRC | long-term mean indoor radon concentration (usually Bq/m³) |

k | gas permeability of the ground (m²) |

MARS | multivariate adaptive regression splines |

ML | machine learning |

MR | multivariate regression |

PC(A) | principal component (analysis) |

ReV | regionalized variable; variable which refers to a location |

RL | reference level of indoor Rn concentration, according the BSS |

Rn | radon; here Rn-222 |

RPA | radon priority area: area, in which a high fraction of indoor spaces has or is expected to have IRC above the RL, and in which particular action according BSS has to be taken. |

SMCDA | spatial multicriteria decision analysis |

SRC | soil radon concentration (usually kBq/m³) |

TGDR | Terrestrial gamma dose rate (usually nSv/h or nGy/h), terrestrial component of AD(E)R |

**Figure 1.**General workflow of multivariate classification approach to construct a geogenic radon hazard index (GRHI) [49]. TGDR—terrestrial gamma dose rate.

**Figure 4.**Consistency between quantity GRHI calculated in regions A and B from different sets of predictors, Y

^{(A)}and Y

^{(B)}.

**Figure 6.**Classification of geological units according to the Neznal-GRP; from [112].

**Figure 7.**GRHI map created by multiple regression (from [68]).

**Figure 8.**GRHI map created by machine learning (MARS) (from [68]).

**Figure 9.**Raw PCA result. Loading plot, showing the coefficients of each variable for the first component versus the coefficients for the second component. This graph shows which variables have the largest effect on each component. Percentages: Explained variance (in percentages) of first principal components F1 and F2 (From [30]).

**Figure 10.**GRHI map derived from the first principal component (From [30]).

**Table 1.**Taxonomy of GRHI definitions. See Section 3.3 for more details.

A “Geogenic” | B “Optimal~IRC” | |
---|---|---|

(1) “global” | [54] physical reasoning leading to the radon availability number (RAN). [55,56,57] classification of factors related to lithology, soil characteristics, relief, soil cover, sealing of the ground, and other. [58,59] cross-classification of control factors SRC, permeability. [60] Classification of lithology, U concentration, and presence of features like faults and mines. [61,62] Classification of geology and ADER. [31] Principal component analysis (PCA) of various geogenic factors. [63] regression of Neznal-GRP vs. soil U concentration, IRC, and ADER. [64,65] Integration of hierarchical multicriteria analysis and GIS, SMCDA, incorporating various geogenic variables. | [14] Neznal-GRP, method: regression IRC vc. SRC and permeability classes [42,66] Neznal-GRP, application [67] logistic regression of IRC vs. lithological classes, TGDR, permeability, faults. [32] ML regression IRC vs. many geogenic predictors (geochemistry, soil properties etc.) [68] Regression IRC vs. many geogenic predictors (geochemistry, soil properties etc.) Multivariate classification through contingency tables: a possible method, no references so far. |

(2) “local” | [69,70] multivariate classification: U.S. EPA approach; missing values allowed. [47] transfer models to estimate GRP from various geogenic quantities. [49] weighted mean of classified quantities, see Figure 1. [50] correlation of various geogenic quantities with Neznal-GRP. | [50] correlation of various geogenic quantities with IRC |

**Table 2.**Compliance of approaches A and B and variants (1) and (2) with the desired properties of the GRHI.

A + (1) | A + (2) | B + (1) | B + (2) | |
---|---|---|---|---|

I consistent | yes | difficult | yes | difficult |

II exhaustive | no | yes | no | yes |

III simple | some not simple | relatively simple | some not simple | relatively simple |

IV predictor IRC | to be checked | to be checked | yes | yes |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

