Stochastic Forecasting of Regional Age-Specific Fertility Rates: An Outlook for German NUTS-3 Regions

Vanella, Patrizio; Hassenstein, Max J.

doi:10.3390/math12010025

Open AccessArticle

Stochastic Forecasting of Regional Age-Specific Fertility Rates: An Outlook for German NUTS-3 Regions

by

Patrizio Vanella

^1,2,3,*

and

Max J. Hassenstein

¹

Demography Cluster, Department of Health Monitoring & Biometrics, aQua Institute, 37073 Göttingen, Germany

²

Chair of Empirical Methods in Social Science and Demography, University of Rostock, 18051 Rostock, Germany

³

Working Group of Demographic Methods, German Demographic Society (DGD), 37073 Göttingen, Germany

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(1), 25; https://doi.org/10.3390/math12010025

Submission received: 23 November 2023 / Revised: 13 December 2023 / Accepted: 20 December 2023 / Published: 21 December 2023

(This article belongs to the Section D1: Probability and Statistics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Regional fertility forecasts are important for long-term planning in a variety of fields that include future birth numbers in their forecast, such as school or kindergarten planning. They are one of the major components of regional population forecasts as well. Therefore, it is important to construct reliable forecasts that are based on sophisticated models that cover the high complexity of future regional fertility. We suggest a novel forecast model for forecasting regional age-specific fertility rates that covers long-term trends by time series models, demographic and regional correlations by principal component analysis, and future uncertainty by Monte Carlo simulation. The model is applied to all German NUTS-3 regions (districts/Kreise) simultaneously, where we forecast all regional age-specific fertility rates through the period of 2022–2045. The results from the simulations are presented via median predictions with 75% prediction intervals of the regional total fertility rates. The simulation shows strong regional heterogeneities in long-term fertility trends that are associated with the historical background of Germany, housing supply for families, opportunities for education, and the strength of labor markets, inter alia.

Keywords:

demography; regional fertility; cross-correlation; autocorrelation; time series analysis; principal component analysis; ARIMA models; forecasting; Monte Carlo simulation; stochastics

MSC:

62P25

1. Introduction

1.1. Background and Motivation

Fertility is a major component that shapes a society’s long-term demographic development [1]. Demographics, in turn, have major implications for long-term planning in a variety of fields, such as the labor market [2], pensions [3], health insurance [4], long-term care [5], education [6], and housing [7]. Therefore, the long-term forecasting of fertility trends is essential for sound economic or political planning [1]. The analysis of trends and disparities across regions have gained momentum in demographic research [8,9,10,11]. However, a research gap remains concerning (stochastic) forecasting of regional fertility. Previous research indicates that regional fertility differences may have a larger impact than those on the national level, which are associated with a variety of infrastuctural, geographic, economic, and sociocultural factors [12,13]. Fertility is of major importance for regional development. For instance, regions with low fertility are susceptible to a long-term lack in the labor force if they cannot compensate persistent low births with, e.g., the immigration of qualified workers [11]. Regional fertility trends also have substantial implications for the planning of educational resources [14]. Very low regional fertility may lead to underutilization and inefficiencies in regional services, e.g., schools and kindergartens, whereas very high (or abruptly increasing) regional births may simultaneously lead to insufficient capacities in schools or kindergartens.

These examples show the necessity of good information on future regional fertility patterns. The efficient resource allocation by local governments includes long-term planning, which, in turn, needs to be well informed by high-quality forecasts. The planning of supply with educational infrastructures, for instance, requires regional fertility forecasts as a basis of decision making [15]. Germany has been a low-fertility country for half a century [1]. Enduring low fertility is considered problematic, e.g., from the perspective of labor market supply or the social insurance demand, among a variety of fields (see [14] for an overview). In the long term, low fertility is the main driver of the aging and depopulation of societies [1,16]. However, for a relatively large and heterogeneous country such as Germany [11], inequalities in fertility trends are prevalent as well. The regions mostly affected by low fertility rates are economically weaker regions, metropolises, and university cities [17,18]. The regional analysis of long-term fertility trends, instead of a purely national perspective, for Germany or other bigger low-fertility countries is therefore much more informative and needed for local planning, since national forecasts are only representative for a limited number of regions [16].

Despite the variety of methods regarding approaches to fertility forecasting [19], the majority of forecasts of regional fertility are of a rather simplistic nature (see [20] for the recent practice in regional fertility forecasting). When conducting forecasts of regional fertility, there are some critical methodological issues that need to be considered. First, regional country-wide forecasts for larger countries, such as Germany, exhibit a high dimensionality as there are numerous regions and demographic groups that require precise analysis. The regional fertility data provided by the statistical offices were disaggregated into 401 districts and 6 age groups. Thus, if we are to construct simultaneous fertility forecasts for all district-age strata, we need to conduct multivariate time series analysis of 2406 fertility rates. Therefore, efficient approaches are a prerequisite to deal with high dimensionality. Multivariate methods, especially principal component analysis (PCA), are established approaches in this regard [11,21]. Second, fertility rates among adjacent or structurally similar regions and age groups are often strongly correlated [13,17,19]. Not accounting for this multicollinearity could lead to biases or errors in the forecasts. Third, approaches that are commonly applied by regional governments and statistical offices are deterministic. They qualitatively determine or quantitatively estimate point forecasts and either do not include future uncertainty at all or only include it by defining a limited number of alternative scenarios. Therefore, these approaches do not account for the uncertainty of the estimators sufficiently [22,23]. However, point forecasts have statistical probabilities of occurrence that are close to zero [19]. Probabilistic forecasts, in contrast, quantify a theoretically infinitely large number of realistic future scenarios based on simulations that allow to quantify probabilities and intervals instead of point forecasts [22]. Therefore, stochastic approaches are preferable [11,13], especially in the regional context (where risks of misprojections are especially high because of a relatively small number of observations, and there is a high potential for misinformation regarding the targeted recipients of the forecasts). Stochastic forecasts, therefore, offer more information that can be used for well-informed decision making [24,25].

Current approaches to forecasting regional fertility are thus far rather simplistic. Local governments typically conduct isolated projections for their respective municipalities or districts, where constant total fertility rates (TFRs) and age schedules are assumed to derive projections of the age-specific fertility rates (ASFRs) (e.g., for the case of Cologne, see [26]). Uncertainty is regularly not included and supraregional fertility trends are ignored in those projections. This might lead to inconsistencies when the isolated regional forecasts are aggregated to federal state and national fertility forecasts [20]. Therefore, correlations between regional fertility patterns are often not considered. For Germany, the most established fertility projection on the NUTS-3 level was conducted by Maretzke et al. [18], who performed multivariate analysis on the ASFR time series of all German districts over the years 2007–2017 to form clusters of districts. Then, they extrapolated the trends from the time series of the ASFRs for these clusters, holding within-cluster long-term heterogeneity constant. For those clusters not exhibiting trending behavior in the data, the long-term averages were assumed to hold in the future. Moreover, the authors, in aggregate, stayed consistent to the national fertility projection by the federal statistical office and the statistical offices of the federal states [27]. There are a few more sophisticated regional fertility forecasts available that take the aforementioned complexity in fertility—namely cross-correlations between demographics and regions, autocorrelations in the time series, and the uncertainty of future fertility—into account sufficiently. An interesting recent example is by Ševčíková et al. [20], who expanded Alkema et al.’s [28] Bayesian Hierarchical Model for the forecasting of regional TFRs at a global level (the definition of a TFR will be given in the next subsection). That model merges quantitative data and qualitative knowledge on fertility patterns and simulates future trajectories over a range of region-specific parameters and between-region correlation coefficients. Therefore, it randomizes over a realistic scenario space of future fertility patterns. Recently, Rafael Caro-Barrera et al. [29] followed a similar approach with more detailed national data for stochastic TFR projections for all regions of Spain. In the case of Australia, Yang et al. suggested a concept of including regional factors and migration backgrounds via means of the mother’s birthplace into forecasts of ASFRs. However, the authors gave no specific forecast for the Australian regions [30].

This literature review shows that, thus far, there does not exist a forecast approach for regional fertility that covers trends in age-specific fertility, correlations between demographic groups and regions, and uncertainty. Our contribution to the literature is threefold. First, we suggest a novel model for forecasting regional ASFRs that tackles the three mentioned limitations of the previous projection approaches. The long-term trends in the ASFRs were included by parametric trend models, and correlations in the ASFRs between age groups and between regions were covered by PCA. Future uncertainty was included by the Monte Carlo simulation of ARIMA models. Second, we provide consistent time series data of the ASFRs for all German NUTS-3 regions (districts, or Kreise) for the period of 1996–2021 in the Supplementary Materials. Previously available data are not in a time series format and not readily available for statistical analysis since the data refers to temporally dependent geographical definitions that follow a variety of regional reforms over a baseline period. Third, we give concrete forecasts of all regional ASFRs for Germany through 2045.

After an overview of the aspects affecting regional fertility in Germany, Section 2 will present the data used in our study alongside a detailed description of our model. Since the paper aims an interdisciplinary readership, we opted for a trade-off between describing the mathematical technicalities in sufficient detail without becoming too theoretical. Section 3 will present the results generated by our model. Finally, in Section 4, we discuss the results from and the limitations of our approach, thereby giving an outlook of the future room for improvement.

1.2. Regional Aspects of Fertility in Germany

Fertility levels can be characterized using established summary statistics that are mainly differentiable between period and cohort statistics. Both concepts are based on ASFRs that divide the number of live births to certain female subpopulations by corresponding baseline populations (typically either the female end-of-year population of a respective cohort for the previous year or the estimated average population of the said cohort during the year [31]). Depending on the perspective of the analysis (i.e., a period vs. cohort perspective), the ASFRs have been regularly aggregated to TFRs [22] or cohort fertility rates (CFRs) [32]. The TFR is the sum of all ASFRs during a certain period [1], whereas the CFR cumulates a certain cohort’s ASFRs over their life course [31].

Between World War II and 1990, Germany was split into West and East Germany, which had major implications for the reproductive behavior of the two German parts, as illustrated in Figure 1.

Starting in the mid-1960s, the TFRs decreased until the early 1970s. However, East German fertility managed to recover temporarily before continuing its downward trend, albeit not at the low level of West Germany. Following the German reunification in 1990 (vertical blue line), the TFR of West Germany remained relatively stable, whereas East German fertility initially plunged following the change in the economic regime before converging to the West German level only at the onset of the financial crisis starting in 2007. Since then, the East German TFR has been slightly higher than the West German TFR.

However, Bujard and Scheller [17] note that the TFR was especially biased in this case because of a distinct tempo effect in the East German fertility during the said period (in fertility research, the tempo effect refers to the postponement of planned births to a later point in one’s life [1]). Therefore, the quantitative extent of the TFR always has to be considered carefully. Nevertheless, the TFR qualitatively provides a good overview of overall fertility trends. The diverging fertility patterns of the two former parts of Germany illustrate well why a subnational analysis of fertility in Germany is an especially interesting endeavor given the historical background. Overall, many aspects play into regional fertility developments. Bujard and Scheller [17] gave an overview of the different aspects shaping regional fertility, with a special focus on Germany. Figure 2 shows the TFRs by district for Germany over the period of 1996–2021.

It is interesting to see that, even at the small-area level, regional differences in fertility have been dominated by the East–West divide, which has played a more important role than regional factors. Overall, recently, a convergence alongside increases in fertility have been visible. However, other differences that have been investigated earlier have become apparent as well. For instance, the neighboring districts of Cloppenburg and Vechta in the west of Lower Saxony (readers that are unfamiliar with the German federal states may refer to Appendix A, and Figure A1 gives a graphical overview of all the federal states) have exhibited significantly larger TFRs over the past few years. This phenomenon has been associated with more classical family structures and religious effects as these regions are predominantly Catholic (in contrast to the prevalence of Protestants in the northern parts of Germany [17,37]).

Economics play a major role in reproductive behavior as well. For instance, regions with strong labor markets are associated with a higher job security and a relatively high income on average, ceteris paribus (c.p.), and they also exhibit higher fertility rates [17]. This has lately been visible, as shown in Figure 3, in, for instance, the east of Lower Saxony. There, Wolfsburg and its surrounding districts, Gifhorn, Helmstedt, and Brunswick, show relatively high TFRs. This is due to the strong labor markets in Wolfsburg and Brunswick, which are associated with relatively high incomes and good childcare opportunities. In 2020, Wolfsburg’s inhabitants had the highest median income among all German districts [38]. Simultaneously, both Brunswick and Wolfsburg were among the top 5 districts in terms of childcare coverage among the 45 districts in Lower Saxony [39]. Another important factor is urbanization. Higher urbanization is associated with a c.p. lower fertility, since large cities tend to have less housing opportunities for larger families, less nature, and are generally less family friendly [17]. However, a distinction must be made between strongly rural areas and the more rural areas in the outskirts of bigger cities. The latter tend to be especially attractive for families since they offer a good trade-off between perceived security and good infrastructure with shopping opportunities [11]. In Figure 2, this can be observed well in the areas north of Munich in central Bavaria or, lately, in Brandenburg, which surrounds Berlin.

2. Materials and Methods

2.1. Data Source and Preparation

We used the annual NUTS-3-level data on births as categorized by the mothers’ age groups through 1995–2021 (i.e., those below 20, 20–24, 25–29, 30–34, 35–39, and those at least 40 years of age) and age-specific data on the year-end female population through 1995–2021. These data were provided by the statistical offices of the federal states and the federal statistical office on the German regional data bank [35,36]. Annual data are available and correspond to the district definitions of the respective years. The birth data include all birth registrations in Germany as categorized by the mothers’ birth cohorts and the municipalities of registration at childbirth. Therefore, the data are subject to a high degree of accuracy and completeness in terms of data quality [40,41]. The population data are based on the resident population of each district according to the regularly conducted census, and they are updated annually using the cohort component method (i.e., adding births and in-migration, subtracting deaths and out-migration). The population data exhibited a structural break in 2011, the year of the last census in the present database. There, the population estimates were reduced by about 1.3 million on the national level, an error that was caused by the errors occurring during the German reunification in 1990 and by the unregistered migration that occurs continuously. However, these errors primarily occur in the elderly population, which is why the population data used for our analysis are not significantly affected and are of high quality [23,42]. We harmonized both data sources demographically and geographically by aggregating the data to the age groups of 15–19, …, 35–39, and 40–49, as well as via performing a geo-conversion according to Vanella et al. [11] to generate a consistent time series. All data preparation, analysis, and visualization was conducted using R (Version 4.3.2) and R-Studio (Version 2023.06.0, Build 421). The R packages tidyverse [43] (Version 1.3.2) support data wrangling, and ggplot2 [44] (Version 3.4.0) and sf [45,46] (Version 1.14) support map visualization. The matrix computations were supported by the MASS package [47] (Version 7.60), and the model fitting and forecasting were conducted with the packages tseries [48] (Version 0.55) and boot [49,50] (Version 28.1). Additionally, the backtests were facilitated by the Metrics package [51] (Version 0.1.4). Geo-conversion was necessary as there have been various regional district reforms concerning their borders during the baseline period that led to structural breaks in the raw data (see [11]). Since the baseline period of the mentioned paper, there has been one more major reform within the city of Eisenach (16056) that has been incorporated into the Wartburgkreis (16063) in 2021 [52]. Therefore, we included district 16056 into district 16063 over the whole baseline period and discarded the former from the analysis. Based on our adjusted data, we computed the district- and age-specific fertility rates (DASFRs) for the years 1996–2021. The DASFR

ϕ

for age group a (

a \in 1, 2, \dots, 6

) and district d (

d \in 1, 2, \dots, 395)

in year y (

y \in 1996, 1997 \dots, 2021

) is hereby defined by

\begin{matrix} \begin{matrix} ϕ_{d, a, y} : = \frac{B_{d, a, y}}{F_{d, a, y - 1}}, \end{matrix} \end{matrix}

(1)

with

B_{d, a, y}

denoting the number of live births to mothers in age group a and living in district d at childbirth during year y, and where

F_{d, a, y - 1}

is the female population of the said district and in the said age group at year end

y - 1

. The time series resulting from our data harmonization are available in Supplementary File S1 for further use. To keep the simulations produced by our Monte Carlo simulation within an empirically realistic range

(0, \frac{1}{6})

, we performed a logistic transformation according to Vanella and Deschermeier [1], and then applied it to the DASFRs:

\begin{matrix} \begin{matrix} l_{d, a, y} : = l n (\frac{ϕ_{d, a, y}}{1 / 6 - ϕ_{d, a, y}}), \end{matrix} \end{matrix}

(2)

where

l n ()

denotes the natural logarithm of the argument. Qualitatively, this implies that no district-specific age group gives birth to zero children in any given year and less than every sixth female in each stratum will have a live birth [1]. This is consistent with the historical data. We arranged all

l_{d, a, y}

observations in a 26 (Years) × 2370 (Districts × age groups) matrix.

2.2. Model Choice by Backtesting

Our first objective was to conduct a comparison of a selection of candidate models based on their forecast accuracy. It is good practice in demographic forecasting to assess competing models by conducting a backtest, which uses a subset of the data (also known as the training or train data) and predicts the data for the remaining periods (also known as the test data) based on the training data. The training data and the test data were then compared to evaluate the forecast performance of the respective model [53]. The prediction errors were then compared among the candidate models using some appropriate measures. In the case of Germany, Vanella et al. suggested using the symmetric mean absolute percentage error (SMAPE) to assess candidate models for the long-term forecasting of regional migration [11]. We followed that suggestion and compared forecast alternatives for regional ASFRs. Here, we used the DASFRs for 1996–2008 to predict the DASFRs for 2009–2021, as that approach was slightly more strict than the ratio between our baseline period (1996–2021) and the forecast horizon (2022–2045). Therefore, our backtest ought to be more conservative than the eventual forecast. For the present case, the SMAPE of Model m is

\begin{matrix} \begin{matrix} S M A P E_{m} = \frac{1}{30810} \sum_{d = 1}^{395} \sum_{a = 1}^{6} \sum_{y = 2009}^{2021} \frac{2 | ϕ_{d, a, y} - E_{m} (ϕ_{d, a, y}) |}{ϕ_{d, a, y} + E_{m} (ϕ_{d, a, y})} . \end{matrix} \end{matrix}

(3)

The SMAPE has desirable properties for our purposes since it is a relative measure that is not biased by the level of the logit-ASFRs, i.e., it measures the relative prediction error rather than an absolute one. Naturally, an absolute measure would identify very small errors for small ASFRs, which are prevalent in the younger and older age groups. Moreover, the SMAPE avoids asymmetries in the error measure that could be caused by values of ASFRs that are close to zero [11]. In this case, we had 30,810 predicted values and corresponding observations that originated from the combination of 395 districts, 6 age groups, and 13 periods. This should give reliable estimates of the models’ forecast accuracy.

Model 1

(M_{1})

is the simplest model variant, a naïve prediction that assumes that all DASFRs remain constant at their last respective observation, i.e.,

\begin{matrix} \begin{matrix} M_{1} : E_{1} (ϕ_{d, a, y}) = ϕ_{d, a, 2008}, f o r d = 1, \dots, 395, a = 1, \dots 6, y = 2009, \dots, 2021 . \end{matrix} \end{matrix}

(4)

M_{2}

, for each DASFR, assumes its long-term mean, i.e.,

\begin{matrix} \begin{matrix} M_{2} : E_{2} (ϕ_{d, a, y}) = \frac{1}{13} \sum_{j = 1996}^{2008} ϕ_{d, a, j}, f o r d = 1, \dots, 395, a = 1, \dots 6, y = 2009, \dots, 2021 . \end{matrix} \end{matrix}

(5)

M_{3}

assumes each DASFR’s respective median through 1996–2008 for the forecast, i.e.,

\begin{matrix} \begin{matrix} M_{3} : E_{3} (ϕ_{d, a, y}) = M e d i a n (ϕ_{d, a, 1996}, \dots, ϕ_{d, a, 2008}) f o r d = 1, \dots, 395, a = 1, \dots 6, y = 2009, \dots, 2021 . \end{matrix} \end{matrix}

(6)

M_{1}

serves as a benchmark model. It is common practice to use naïve models as benchmarks to check whether more complex models outperform models that rely on simple assumptions [54]. As the naïve alternatives, we tested models

M_{2}

and

M_{3}

, which include the long-term fertility rates instead of heavily weighting the current past. These models are inspired by the simple model variants suggested by Vanella et al. [11] for the case of regional (pseudo) migration rates in Germany. For these three rather simplistic models, the backtest accuracy was

S M A P E_{1} \approx 23.1 %

,

S M A P E_{2} \approx 33.4 %

, and

S M A P E_{3} \approx 33.9 %

, respectively. Therefore, our first conclusion was that the naïve random walk model performs significantly better than the two models that include the long-term means or medians of the DASFRs in the forecast.

M_{4}

–

M_{6}

are generalizations of the Vanella–Deschermeier model, as suggested in [1]. The model takes the time series in (2) and performs PCA on the matrix of all log-DASFR time series

L

, which is a

13 \times 2370

matrix, i.e, we derived a matrix of the time series of the principal components (PCs) by multiplying the time series matrix of the log-DASFRs with the matrix of their eigenvectors:

P = L Λ,

(7)

where

P

denotes the time series matrix

(13 \times 2370)

of the PCs and

Λ

is the matrix of eigenvectors that were derived by a singular value decomposition of the covariance matrix of

L

. PCA allows for both incorporating the earlier mentioned cross-correlations between fertility trends in different regions and the age groups in a forecast framework while also allowing for relatively efficient forecasts that reduce the high dimensionality that is caused not only by the demographic dimension [1], but also, especially, the high number of regions [11] to a manageable level. Figure 3 gives the time series of the first two PCs based on the pre-2009 data.

P C_{1}

showed a clear positive trending behavior, and

P C_{2}

increased until 2001 before recovering to its initial level by 2008. Therefore, a long-term trend in

P C_{2}

was not easily determined (which holds for the subsequent PCs as well).

P C_{1}

, however, can be expected to retain a long-term increase for the forecast. However, we needed to estimate which kind of trend function was appropriate in this case. Therefore, we tested different combinations of sub-models to compare their backtest performances.

The upper panel in Figure 3 suggests a progressive trend.

M_{4}

assumed this trend to hold for the forecast horizon. Therefore, we fit a quadratic trend model to the data by ordinary least squares (OLS). Since the remaining PCs showed no trending behavior and the comparison of

M_{1} - M_{3}

showed that a naïve model gives a better prediction than models that assume convergences, the remaining models assumed

P C_{2} - P C_{2370}

to follow random walk processes. Therefore,

M_{4}

first predicted the PCs as follows:

\begin{matrix} \begin{matrix} M_{4} : E_{4} (p_{1, y}) & = α_{4} + β_{4} y + γ_{4} y^{2}, y = 13, \dots, 25, \\ E_{4} (p_{k, y}) & = p_{k, 12}, k = 2, \dots, 2370, \end{matrix} \end{matrix}

(8)

where

Greeks: the parameters estimated based on the 1996–2008 data via OLS;
$p_{k, y}$ : the value of $P C_{k}$ in year y;
$y = 0 :$ the year 1996.

As the first alternative to

M_{4}

,

M_{5}

fit a linear trend to

P C_{1}

. The assumptions on the remaining PCs remained unchanged. Therefore,

M_{5}

predicted the PCs as follows:

\begin{matrix} \begin{matrix} M_{5} : E_{5} (p_{1, y}) & = α_{5} + β_{5} y, y = 13, \dots, 25, \\ E_{5} (p_{k, y}) & = p_{k, 12}, k = 2, \dots, 2370 . \end{matrix} \end{matrix}

(9)

M_{6}

was somewhat more defensive as it assumed the trend to change from a progressive into a degressive increase. A closer look at the past trend of

P C_{1}

revealed a degressive trend between 2006 and 2008. We, therefore, assumed a logistic trend function, as suggested in previous studies [1,19], since it emulates the change from progressive to degressive in the trend well. As described, the inflection point was assumed to be located in the year 2006 (i.e.,

y = 0

, which addresses the year 2006). The scale parameter (here

δ_{6}

) of the logistic function was computed by minimization of Akaike’s Information Criterion (AIC) as follows:

\begin{matrix} \begin{matrix} M_{6} : E_{6} (p_{1, y}) & = α_{6} + β_{6} \frac{e x p (y / δ_{6})}{1 + e x p (y / δ_{6})}, y = 3, \dots, 15, \\ E_{6} (p_{k, y}) & = p_{k, 2}, k = 2, \dots, 2370 . \end{matrix} \end{matrix}

(10)

Regardless of the model specification among Models 4–6, in the next step, we re-transformed the predictions of the PCs into predictions of the log-DASFRs by a right multiplication of the PC prediction matrix with the inverse of the loading matrix:

\begin{matrix} \begin{matrix} E_{m} = [\begin{matrix} E_{m} (p_{1, 2009}) & E_{m} (p_{2, 2009}) & \dots & E_{m} (p_{2370, 2009}) \\ E_{m} (p_{1, 2010}) & E_{m} (p_{2, 2010}) & \dots & E_{m} (p_{2370, 2010}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ E_{m} (p_{1, 2021}) & E_{m} (p_{2, 2021}) & \dots & E_{m} (p_{2370, 2021}) \end{matrix}] \end{matrix} \end{matrix} \times Λ^{- 1} .

(11)

The elements of

E_{m}

were then re-transformed to derive the predictions of the corresponding DASFRs for the SMAPE computation. The resulting SMAPEs were

S M A P E_{4} \approx 24.2 %, S M A P E_{5} \approx 21.3 %

, and

S M A P E_{6} \approx 21.1 %

. Accordingly,

M_{5}

and

M_{6}

performed better than the naïve model, whereby the logistic growth model outperformed all the other models. This was consistent with the national-level findings of Vanella and Deschermeier [1] for Germany.

2.3. Stochastic Forecast Approach

In the previous section, we found that

M_{6}

, which predicts the future DASFRs based on a logistic growth model for the first PC of the logit-DASFRs, generated the best model fit. Therefore, we used this model for our forecast. Thus, we performed PCA on the matrix of the logit-DASFRs, as described in Section 2.2, for the entire 26-year baseline (1996–2021). Figure 4 illustrates all of the loadings of

P C_{1}

, with each panel referring to one of the six age groups. We provided the exact loadings in Supplementary File S2.

Figure 4 shows the distinct regional patterns that were most notable between the former East and West German countries, whereas the directions of the loadings were similar for the majority of regions within one age group, and the figures for the age groups beyond 30 showed a darker green shade for East Germany, thus implying higher loadings. For the mothers below the age of 25, the loadings were close to zero (white spots) for East Germany, contrasting with the dark violet colored West German regions. For the age group of 25–29, the signs of the loadings shifted between the two German parts. However, those were, in general, close to zero. Overall, Figure 4 shows that

P C_{1}

represented the tempo effect in fertility, i.e., the postponement of births to a later age. This trend has been observed for decades in Germany, and it is illustrated well by the loadings given here. In an earlier work, Vanella and Deschermeier [1] have shown that PCA is adept for covering the tempo effect in the form of one PC, which the authors called a Tempo Index. Accordingly, we called

P C_{1}

the Regional Tempo Index. An interesting result was that the tempo appeared to somewhat diverge between the east and west. The ongoing tempo in West Germany pushed births c.p. beyond age 30, whereas the births in East Germany appeared to be delayed only until the mid-20s.

Table 1 gives the individual and cumulative shares of the variance in the logit-DASFRs explained by each PC, which was computed as the quotients of a PC’s singular value (eigenvalue) and the sum of all singular values [11].

We can see that the Regional Tempo Index individually explains over 71% of the variance of all 2370 time series through the baseline period. Table 1 gives a measure of the collinearity within the underlying data. For instance, an individual share of variance of 71% for

P C_{1}

implies that the Regional Tempo Index alone explains over 71% of covariance in the 2370 variables. The first two PCs explain close to 80% of the overall covariance. This shows that the underlying ASFRs were highly correlated and that their trends can be explained to a large extent by one latent trend, which is represented by

P C_{1}

.

The DASFR forecast is now constructed based on an approach similar to

M_{6}

, as described in the previous subsection. Accordingly, the Regional Tempo Index’s forecast employs a logistic trend model, and this was fit to the time series data for 1996–2021. Then, we enhanced the deterministic version given in (10) by a stochastic parameter to include future uncertainty in the forecast. Following a series of graphical (i.e., a visual inspection of the time series, the autocorrelation function, and the partial autocorrelation function) and statistical (significance tests and the augmented Dickey–Fuller test) tests (see [1]) in order to identify the best ARIMA models to include stochasticity in our forecast, we concluded a random walk to emulate the residuals between the time series and the trend model of the Regional Tempo Index well. Therefore, we added a random walk as a nuisance parameter. The stochastic model for the Regional Tempo Index is, therefore,

\begin{matrix} \begin{matrix} p_{1, y, t} & \approx - 37.64 + 51.87 \frac{e x p [(y - 2012) / 5.02]}{1 + e x p [(y - 2012) / 5.02]} + e_{1, y, t}, \\ e_{1, y, t} & = e_{1, y - 1, t} + ϵ_{1, y, t}, ϵ_{1, y, t} \sim N (0, 1 . 32^{2}), \\ p_{k, y, t} & = p_{k, y - 1, t} + ϵ_{k, y, t}, ϵ_{k, y, t} \sim N (0, {\hat{σ}}_{k}^{2}), k = 2, \dots, 2370, \end{matrix} \end{matrix}

(12)

with

e_{1, y, t}

representing the residual between the Regional Tempo Index and its prediction according to the trend model in year y and trajectory t, and the epsilons are white noise processes with zero expectation and variances, which were estimated from the annual data over the period of 1996–2021. The white noise processes were randomized 1000 times annually to derive 1000 trajectories of the future development of all PCs employing Monte Carlo simulation (MCS). MCS is a well-established approach for stochastic forecasting since it is relatively easy to apply and efficient while still being theoretically justified (since it is based on an underlying mathematical model [55]). In the case of stochastic demographic forecasting, MCS has been applied to incorporate future uncertainty in a variety of forecast problems, such as fertility [1], mortality [56], migration [11], labor market supply [2], pension demand [3], absenteeism [4], long-term care demand [5], or causal analysis [19]. Figure 5 illustrates the past development of the Regional Tempo Index and its prediction with a 75% prediction interval (PI) according to (12).

After simulating the trajectories of the PCs, they were then re-transformed into trajectories of the DASFRs by inverting (7). In this special case, we can construct annual simulation matrices for the PCs and re-transform them to the corresponding simulation matrices for the logit-DASFRs, i.e.,

\begin{matrix} \begin{matrix} S_{y, t} = Π_{y, t} Λ^{- 1}, \end{matrix} \end{matrix}

(13)

where

$S_{y, t}$ : the simulation matrix $(1000 \times 2370)$ of the logit-DASFRs for year y in trajectory t;
$Π_{y, t}$ : the simulation matrix $(1000 \times 2370)$ of PCs for year y in trajectory t;
$Λ^{- 1}$ : the inverse $(2370^{2})$ of the loading matrix.

We can derive corresponding simulations for the DASFRs by plugging the elements of

S_{y, t}

into the left-hand side of (2) and solving the following equation:

\begin{matrix} \begin{matrix} s_{d, a, y, t} = l n (\frac{ϕ_{d, a, y, t}}{1 / 6 - ϕ_{d, a, y, t}}) \Leftrightarrow ϕ_{d, a, y, t} = \frac{e x p (s_{d, a, y, t})}{6 + 6 e x p (s_{d, a, y, t})}, \end{matrix} \end{matrix}

(14)

where

$s_{d, a, y, t}$ : the simulated logit-DASFR for females in age group a living in district d at the end of year y in trajectory t;
$ϕ_{d, a, y, t}$ : the simulated DASFR for females in age group a living in district d at the end of year y in trajectory t.

Finally, to obtain a better understanding of the general fertility trends for all districts, we can accumulate our ASFR simulations to the simulations of the district-level TFRs as follows:

\begin{matrix} \begin{matrix} T F R_{d, y, t} = 5 \sum_{a = 1}^{5} ϕ_{d, a, y, t} + 10 ϕ_{d, 6, y, t} . \end{matrix} \end{matrix}

(15)

Each fertility rate by age group had to be multiplied with its length in years of age [57]. Therefore, the ASFRs of the age groups of 15–19, 25–29, …, 35–39 ought to be multiplied by 5 and the upper age group 40–49 by 10. This results in 1000 simulations of the TFR for each district through the forecast horizon of 2022–2045. From those, we can extract the empirical quantiles as estimators for the PIs [11]. In the next section, we will present the results from these simulations.

3. Results

Figure 6 illustrates the predicted TFRs derived from the simulations, as described in Section 2.3.

Overall, we can see that further fertility increases can be expected over the forecast horizon. However, the patterns are very heterogeneous geographically. The maps suggest that the the majority of German regions will, as expected, experience fertility increases. However, the regions that, already, have a relatively high TFR today, as mentioned in Section 1.2, will probably not witness further fertility increases. This also holds for the less family-friendly major cities, such as Berlin and Hamburg. Moreover, the districts marked by major universities, such as Göttingen at the southern border of Lower Saxony or Leipzig in Sachsen, which have a relatively low TFR because of a high share of students among the female population, are expected to retain their relatively low TFRs long-term. Moreover, at the northern border of the country, i.e., the areas that respectively border with Denmark and the Baltic Sea, there were no significant TFR developments that became apparent from the figures. This is especially problematic for large parts of Mecklenburg-Vorpommern, where the prevalently low TFRs will lead to a further aging of the resident population since they are expected to depopulate in the group of young females due to net out-migration [11].

Figure 7 shows the changes in the TFR from 2021 to 2045 that resulted from the simulations, i.e., the median prediction alongside the lower and upper bounds of the 75% PIs.

The maps better illustrate the changes in the TFRs as derived from the simulations. Here, we can see that a large share of the districts are expected to not witness a significant change in the reproductive behavior over the forecast horizon. However, the regional heterogeneities are illustrated well here and, in some instances (such as the districts south of Berlin), negative correlations of fertility trends appear to exist. The maps illustrate the flexibility of our model by showing a measure of the uncertainty quantification by district, whereas the trends for, e.g., Munich in the center of Bavaria are quite stable. Moreover, the forecast exhibits a high stochasticity for the majority of regions. This illustrates how difficult future fertility patterns are to predict, especially at the regional level. We will discuss this point further in the next section. Interested readers on this point are referred to Supplementary File S3, where we provide the annual district-specific TFRs through 1996–2021, as computed from the baseline data together with the median forecasts through 2022–2045. The data are arranged in a time series matrix to allow for easy implementation in further analyses.

4. Discussion

The present study provides a novel approach for regional fertility forecasting when applied to the ASFRs of all NUTS-3 regions in Germany through to the year 2045. Our model includes trends in the fertility time series by time series analysis and covers cross-correlations in the fertility patterns between all age groups and regions by PCA. The latter implicitly covers the tempo effect of fertility since the model is a regional adaption of an approach suggested earlier at the national level [1]. A notable feature of the model is that it covers the regional heterogeneities in the tempo. Other regional specifics that may be associated with differences in the landscapes, housing opportunities, labor market opportunities, or education (e.g., where there is a high share of students among the population), are covered indirectly by the model as well. The uncertainties in future fertility rates were included by Monte Carlo simulation. The variances in the future fertility rates show strong heterogeneities both for different age groups and different regions, as well as for their intersections. Those heterogeneities are covered by the model as well, as illustrated. This may help regional planning that relies on the future reproductive behavior of the respective population (e.g., for the long-term planning of school or kindergarten supply) to obtain a better picture of the future. After all, it is not only important how many births we expect at some point in time, but it is also important to quantify the PIs of the births that give ranges of birth numbers according to a specific probability. Our model might also be included in eventual district-specific population forecasts that not only cover one isolated district, but also complete regions, federal states, or the whole nation, including the interdependencies between the individual regional trends.

Our study demonstrated that postponements of births over a life course occur individually among the regions. For instance, the tempo effect in fertility appears to be stronger for West German regions than for East German regions. However, the analysis indicates that differences between East and West Germany have converged recently, a process we expect to continue in the future based on the study results. However, our results show that the spatial heterogeneities between the fertility rates between the metropolises, university cities, and structurally weak districts on the one hand, and medium-sized cities with strong labor markets and their more rural neighbor regions on the other hand are likely to intensify further. Therefore, we expect fertility increases at the national level to affect primarily the latter group of districts. Fertility in less family-friendly regions, i.e., those centered around large universities and those with a weakened economy, will not significantly profit from fertility trends and will remain low-fertility regions. This is especially problematic for the latter since these regions simultaneously exhibit net migration outflows in the younger age groups and can expect two-dimensional depopulation alongside an aging in their populations [11]. However, the stochastic nature of our study shows that the future uncertainty in fertility is huge and, for most regions, both fertility increases and decreases are realistic in the long term, which allows for local governments to try to influence future fertility in their districts by family policies, such as providing a sufficient supply of housing for families or childcare. Previous studies focused on German fertility have shown that fertility is indeed sensitive toward adequate family policy measures [1,14,17,58].

Our model is subject to limitations. First of all, fertility trends are not easily predictable since (regional) fertility may be affected by various factors that have been discussed in detail elsewhere [17]. Our model is based on a pure time series approach that does not include those explanatory factors. Where this may be perceived as a strong drawback, forecasters need to keep in mind that predictors in forecasts need to be forecast themselves over the forecast horizon, which is, in many instances, even more difficult than forecasting the variables of interest (such as the ASFRs in this study) [11,23]. Second, the forecasts exhibit relatively broad (yet regionally varying) PIs, as illustrated in Section 3. This is both due to the differences in the regional robustness of the trends (e.g., regions with already high ASFRs that do not have a huge potential for further increases and metropolises that are rather unattractive for families, which does not change) and, statistically speaking, due to different variances because of smaller sample sizes. Naturally, the confidence interval of an estimator (for an ASFR in this study) is broader for a small sample size (e.g., a small population as in this study) than for a big sample size. Therefore, trends in very populous districts, such as Berlin with its currently close to 1.9 million female inhabitants, are easier to compute than those for small districts, such as Lüchow Dannenberg, which stands at about 25,000 [36]. However, it is always a point of discussion whether wide PIs are too wide or if they rather realistically represent large uncertainties in the future. Third, the data used for this study have their limitations. Although quite detailed, the time series are still quite short, as regional demographic data for Germany are only available since 1995. Naturally, long-term trends are better represented by longer time series. Therefore, a longer time series perspective will improve the forecasts as well. Moreover, there have been various reforms of German district borders over the baseline period (see [11] for an overview) that make it difficult to construct consistent time series. We followed a geo-conversion, as suggested by Vanella et al. [11], to circumvent this issue. Furthermore, the way the data for Germany are provided hamper multivariate time series analysis and require several data preparation steps that we did not describe in detail here to obtain data in an analyzable format. It would facilitate regional demographic analysis to have time series data provided in matrix format, such as those prepared by us and described in Section 2.1, to generate scientific output more efficiently. We note that our model is not easily applicable for less-experienced forecasters that have little expertise in the theory of multivariate methods, Monte Carlo simulation, and their practical application using large datasets with a multitude of variables. Therefore, a smaller scaled forecast for a smaller number of regions may produce more accurate forecasts in some cases. However, our intention was not to present a readily available forecast model for regional statistical offices but to present a novel approach for multi-regional fertility forecasting that could be applied more appropriately and efficiently to handle a large number of regions in one forecast model, while accounting for supraregional trends and incorporating future uncertainty. Moreover, one such approach would make sure that inconsistencies between national and regional forecasts would diminish or would at least be kept to a reasonable level [23]. These kinds of approaches are valuable for national statistical offices that provide predictions at the subnational level, or even for the whole country. In Germany, such an approach addresses the work of Maretzke et al. (for instance, [18]).

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math12010025/s1, File S1: Age-specific Fertility Rates by District, 1996–2021. File S2: Loadings of Regional Tempo Index. File S3: Oberserved and Predicted Median Total Fertility Rates by District, 1996–2045.

Author Contributions

Conceptualization, P.V.; methodology, P.V.; software, M.J.H. and P.V.; validation, P.V.; formal analysis, P.V. and M.J.H.; investigation, P.V.; resources, P.V. and M.J.H.; data curation, M.J.H. and P.V.; writing—original draft preparation, P.V.; writing—review and editing, P.V. and M.J.H.; visualization, M.J.H. and P.V.; supervision, P.V.; project administration, P.V.; funding acquisition, P.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data prepared for this study are based on the raw data from the regional data bank [35,36], and the selected forecast results are available in Supplementary Materials. Further data are available from the corresponding authors on reasonable request.

Acknowledgments

We appreciate the timely and helpful comments by the three reviewers of the paper that helped us improve the previous version of our manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ASFR	Age-specific fertility rate
CFR	Cohort fertility rate
DASFR	District- and age-specific fertility rate
NUTS	Nomenclature des unités territoriales statistiques
OLS	Ordinary least squares
PC(A)	Principal component (analysis)
PI	Prediction interval
$S M A P E_{m}$	Symmetric mean absolute percentage error of Model m
TFR	Total fertility rate

Appendix A. German Federal States

Figure A1. Overview of the German federal states(source: own illustration).

References

Vanella, P.; Deschermeier, P. A Principal Component Simulation of Age-Specific Fertility—Impacts of Family and Social Policy on Reproductive Behavior in Germany. Popul. Rev. 2019, 58, 78–109. [Google Scholar] [CrossRef]
Fuchs, J.; Söhnlein, D.; Weber, B.; Weber, E. Stochastic Forecasting of Labor Supply and Population: An Integrated Model. Popul. Res. Policy Rev. 2018, 37, 33–58. [Google Scholar] [CrossRef] [PubMed]
Vanella, P.; Rodriguez Gonzalez, M.; Wilke, C.B. Population Ageing and Future Demand for Old-Age and Disability Pensions in Germany—A Probabilistic Approach. Comp. Popul. Stud. 2022, 47, 87–118. [Google Scholar] [CrossRef]
Vanella, P.; Wilke, C.B.; Söhnlein, D. Prevalence and Economic Costs of Absenteeism in an Aging Population—A Quasi-Stochastic Projection for Germany. Forecasting 2022, 4, 371–393. [Google Scholar] [CrossRef]
Vanella, P.; Heß, M.; Wilke, C.B. A probabilistic projection of beneficiaries of long-term care insurance in Germany by severity of disability. Qual. Quant. 2020, 54, 943–974. [Google Scholar] [CrossRef]
Bomsdorf, E.; Babel, B.; Schmidt, R. Zur Entwicklung der Bevölkerung, der Anzahl der Schüler, der Studienanfänger und der Pflegebedürftigen. Sozialer Fortschr. (German Rev. Soc. Policy) 2008, 10, 943–974. [Google Scholar]
Deschermeier, P.; Henger, R. Wie viel Wohnfläche benötigen wir? Vergangene und zukünftige Trends beim Wohnflächenkonsum—Empirische Evidenz und stochastische Prognose bis 2030. In Zur Relevanz von Bevölkerungsvorausberechnungen für Arbeitsmarkt-, Bildungs- und Regionalpolitik. IAB-Bibliothek 372; Deschermeier, P., Fuchs, J., Iwanow, I., Wilke, C.B., Eds.; wbv Media: Bielefeld, Germany, 2020; pp. 178–201. [Google Scholar]
Lappegard, T.; Klüsener, S.; Vignoli, D. Why are marriage and family formation increasingly disconnected across Europe? A multilevel perspective on existing theories. Popul. Space Place 2018, 24, e2088. [Google Scholar] [CrossRef]
Matysiak, A.; Sobotka, T.; Vignoli, D. The Great Recession and Fertility in Europe: A Sub-national Analysis. Eur. J. Popul. 2021, 37, 29–64. [Google Scholar] [CrossRef]
Ebeling, M.; Rau, R.; Sander, N.; Kibele, E.; Klüsener, S. Urban–rural disparities in old-age mortality vary systematically with age: Evidence from Germany and England & Wales. Public Health 2022, 205, 102–109. [Google Scholar] [CrossRef]
Vanella, P.; Hellwagner, T.; Deschermeier, P. Parsimonious Stochastic Forecasting of International and Internal Migration on the NUTS-3 level—An Outlook of Regional Depopulation Trends in Germany. Vienna Yearb. Popul. Res. 2023, 21, 361–415. [Google Scholar] [CrossRef]
Basten, S.; Huinink, J.; Klüsener, S. Spatial Variation of Sub-national Fertility Trends in Austria, Germany and Switzerland. Popul. Space Place 2011, 36, 573–614. [Google Scholar] [CrossRef]
Campisi, N.; Kulu, H.; Mikolai, J.; Klüsener, S.; Myrskylä, M. Spatial variation in fertility across Europe: Patterns and determinants. Popul. Space Place 2020, 26, e2308. [Google Scholar] [CrossRef]
Bujard, M. Consequences of Enduring Low Fertility—A German Case Study. Demographic Projections and Implications for Different Policy Fields. Comp. Popul. Stud. 2015, 40, 131–164. [Google Scholar] [CrossRef]
Williamson, L.; Norman, P. Developing strategies for deriving small population fertility rates. J. Popul. Res. 2011, 28, 129–148. [Google Scholar] [CrossRef]
Brzozowska, Z.; Zhelenkova, E.; Gietel-Basten, S. Population decline: Towards a rational, scientific research agenda. Vienna Yearb. Popul. Res. 2023, 21, 1–11. [Google Scholar] [CrossRef]
Bujard, M.; Scheller, M. Impact of Regional Factors on Cohort Fertility: New Estimations at the District Level in Germany. Comp. Popul. Stud. 2017, 42, 55–88. [Google Scholar] [CrossRef]
Maretzke, S.; Hoymann, J.; Schlömer, C.; Stelzer, A. Raumordnungsprognose 2040—Bevölkerungsprognose: Ergebnisse und Methodik; BBSR-Analysen KOMPAKT: Bonn, Germany, 2021. [Google Scholar]
Vanella, P.; Greil, A.L.; Deschermeier, P. Fertility Response to the COVID-19 Pandemic in Developed Countries—On Pre-pandemic Fertility Forecasts. Comp. Popul. Stud. 2023, 48, 19–46. [Google Scholar] [CrossRef]
Ševčíková, H.; Raftery, A.E.; Gerland, P. Probabilistic projection of subnational total fertility rates. Demogr. Res. 2018, 38, 1843–1884. [Google Scholar] [CrossRef]
Camiz, S.; Pillar, V.D. Identifying the Informational/Signal Dimension in Principal Component Analysis. Mathematics 2018, 6, 269. [Google Scholar] [CrossRef]
Vanella, P.; Deschermeier, P. A Probabilistic Cohort-Component Model for Population Forecasting—The Case of Germany. J. Popul. Ageing 2020, 13, 513–545. [Google Scholar] [CrossRef]
Vanella, P.; Deschermeier, P.; Wilke, C.B. An Overview of Population Projections—Methodological Concepts, International Data Availability, and Use Cases. Forecasting 2020, 2, 346–363. [Google Scholar] [CrossRef]
Deschermeier, P. Population Development of the Rhine-Neckar Metropolitan Area:A Stochastic Population Forecast on the Basis of Functional Data Analysis. Comp. Popul. Stud. 2012, 36, 769–806. [Google Scholar] [CrossRef]
Wilson, T.G.; Grossman, I.; Alexander, M.; Rees, P.; Temple, J. Methods for Small Area Population Forecasts: State-of-the-Art and Research Needs. Popul. Res. Policy Rev. 2022, 41, 865–898. [Google Scholar] [CrossRef]
Stadt Köln. Bevölkerungsprognose für Köln 2022 bis 2050. Mit kleinräumigen Berechnungen bis 2035; Kölner Statistische Nachrichten; Stadt Köln: Köln, Germany, 2022. [Google Scholar]
Destatis. Bevölkerung im Wandel: Annahmen und Ergebnisse der 14. Koordinierten Bevölkerungsvorausberechnung. 2019. Available online: https://www.destatis.de/DE/Presse/Pressekonferenzen/2019/Bevoelkerung/pressebroschuere-bevoelkerung.pdf (accessed on 16 November 2023).
Alkema, L.; Raftery, A.E.; Gerland, P.; Clark, S.J.; Pelletier, F.; Büttner, T.; Heilig, G.K. Probabilistic Projections of the Total Fertility Rate for All Countries. Demography 2011, 48, 815–839. [Google Scholar] [CrossRef]
Rafael Caro-Barrera, J.; de los Baños García-Moreno García, M.; Pérez-Priego, M. Projecting Spanish fertility at regional level: A hierarchical Bayesian approach. PLoS ONE 2022, 17, e0275492. [Google Scholar] [CrossRef]
Yang, Y.; Shang, H.L.; Raymer, J. Forecasting Australian fertility by age, region, and birthplace. Int. J. Forecast. 2022. [Google Scholar] [CrossRef]
Jasilioniene, A.; Jdanov, D.A.; Sobotka, T.; Andreev, E.M.; Zeman, K.; Shkolnikov, V.M.; Goldstein, J.R.; Nash, E.J.; Philipov, D.; Rodriguez, G. Methods Protocol for the Human Fertility Database. 2015. Available online: https://www.humanfertility.org/File/GetDocumentFree/Docs/methods.pdf (accessed on 2 November 2023).
Schmertmann, C.; Zagheni, E.; Goldstein, J.R.; Myrskylä, M. Bayesian Forecasting of Cohort Fertility. J. Am. Stat. Assoc. 2014, 109, 500–513. [Google Scholar] [CrossRef]
Max Planck Institute for Demographic Research (Germany); Vienna Institute of Demography (Austria). West Germany, Period Total Fertility Rates and Period Total Fertility Rates by Age 40. 2023. Available online: https://www.humanfertility.org/Country/Country?cntr=DEUTW (accessed on 17 November 2023).
Max Planck Institute for Demographic Research (Germany); Vienna Institute of Demography (Austria). East Germany, Period Total Fertility Rates and Period Total Fertility Rates by Age 40. 2023. Available online: https://www.humanfertility.org/Country/Country?cntr=DEUTE (accessed on 17 November 2023).
Statistische Ämter des Bundes und der Länder. 12612–93-01-4: Lebendgeborene nach Alter der Mütter—Jahressumme—Regionale Tiefe: Kreise und krfr. Städte. 2023. Available online: https://www.regionalstatistik.de/genesis/onlineoperation=table&code=12612-93-01-4#astructure (accessed on 18 July 2023).
Statistische Ämter des Bundes und der Länder. 12411–02-03-4: Bevölkerung nach Geschlecht und Altersgruppen (17)—Stichtag 31.12.—Regionale Tiefe: Kreise und krfr. Städte. 2023. Available online: https://www.regionalstatistik.de/genesis//online?operation=table&code=12411-02-03-4#astructure (accessed on 1 June 2023).
Püschel, O. Religion und Glauben im Blickpunkt des Zensus 2011. Stat. Monatshefte Niedersachs. 2014, 2014, 395–404. [Google Scholar]
Bundesinstitut für Bau-, Stadt- und Raumforschung. Privateinkommen, Private Schulden: Medianeinkommen. 2023. Available online: https://www.inkar.de/ (accessed on 15 November 2023).
Bundesinstitut für Bau-, Stadt- und Raumforschung. SDG-Indikatoren für Kommunen: Betreuungsquote Kleinkinder. 2023. Available online: https://www.inkar.de/ (accessed on 15 November 2023).
Destatis. Qualitätsbericht zur Statistik der Geburten: 2021–2022. 2023. Available online: https://www.destatis.de/DE/Methoden/Qualitaet/Qualitaetsberichte/Bevoelkerung/geburten.pdf (accessed on 12 December 2023).
Hassenstein, M.J.; Vanella, P. Data Quality–Concepts and Problems. Encyclopedia 2022, 2, 498–510. [Google Scholar] [CrossRef]
Destatis. Fortschreibung des Bevölkerungsstandes: Qualitätsbericht 2017. 2019. Available online: https://www.destatis.de/DE/Methoden/Qualitaet/Qualitaetsberichte/Bevoelkerung/bevoelkerungsfortschreibung-2017.pdf (accessed on 12 December 2023).
Wickham, H.; Averick, M.; Bryan, J.; Chang, W.; McGowan, L.D.; François, R.; Grolemund, G.; Hayes, A.; Henry, L.; Hester, J.; et al. Welcome to the tidyverse. J. Open Source Softw. 2019, 4, 1686. [Google Scholar] [CrossRef]
Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer-Verlag: New York, NY, USA, 2016. [Google Scholar]
Pebesma, E. Simple Features for R: Standardized Support for Spatial Vector Data. R J. 2018, 10, 439–446. [Google Scholar] [CrossRef]
Pebesma, E.; Bivand, R. Spatial Data Science: With Applications in R; Chapman and Hall/CRC: Boca Raton, FL, USA, 2023. [Google Scholar] [CrossRef]
Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2002; ISBN 0–387-95457-0. [Google Scholar]
Trapletti, A.; Hornik, K.; LeBaron, B. Package ‘tseries’. 2023. Available online: https://cran.r-project.org/web/packages/tseries/tseries.pdf (accessed on 13 December 2023).
Canty, A.; Ripley, B.D. boot: Bootstrap R (S-Plus) Functions; R Package Version 1.3-28.1; R Core Team: Vienna, Austria, 2022. [Google Scholar]
Davison, A.C.; Hinkley, D.V. Bootstrap Methods and Their Applications; Cambridge University Press: Cambridge, UK, 1997; ISBN 0-521-57391-2. [Google Scholar]
Hamner, B.; Frasco, M.; LeDell, E. Package ’Metrics’: Evaluation Metrics for Machine Learning. 2022. Available online: https://cran.r-project.org/web/packages/Metrics/Metrics.pdf (accessed on 13 December 2023).
Wartburgstadt Eisenach. Fusion der Stadt Eisenach mit dem Wartburgkreis. 2021. Available online: https://www.eisenach.de/rathaus/fusion-der-stadt-eisenach (accessed on 28 July 2023).
Brownlee, J. Machine Learning Mastery: Understand Your Data, Create Accurate Models and Work Projects End-to-End; Jason Brownlee: Melbourne, Australia, 2016. [Google Scholar]
Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2018. [Google Scholar]
Kroese, D.P.; Brereton, T.; Taimre, T.; Botev, Z.I. Why the Monte Carlo method is so important today. WIREs Comput. Stat. 2014, 6, 386–392. [Google Scholar] [CrossRef]
Vanella, P. A principal component model for forecasting age- and sex-specific survival probabilities in Western Europe. German J. Risk Insur. 2018, 106, 539–554. [Google Scholar] [CrossRef]
International Union for the Scientific Study of Population. The Total Fertility Rate. 2023. Available online: http://papp.iussp.org/sessions/papp101_s04/PAPP101_s04_080_010.html (accessed on 21 November 2023).
Bujard, M. Wirkungen von Familienpolitik auf die Geburtenentwicklung. In Handbuch Bevölkerungssoziologie; Niephaus, Y., Kreyenfeld, M., Sackmann, R., Eds.; Springer Fachmedien: Wiesbaden, Germany, 2015; pp. 619–646. [Google Scholar]

Figure 1. Total fertility rates of West and East Germany during 1956–2017 (sources: [33,34]; own illustration).

Figure 2. District-level TFR in Germany during 1996–2021 (sources: [35,36]; own computation and illustration).

Figure 3. Time series of the first two principal components based on pre-2009 data (source: own computation and illustration).

Figure 4. The loading for the first principal component.

Figure 5. Time series of the Regional Tempo Index with forecast and 75% prediction intervals (PIs).

Figure 6. Predicted regional total fertility rates through 2045 (sources: [35,36]; own computation and illustration).

Figure 7. Median forecasts of the absolute increase in regional total fertility rates from 2021 to 2045 with the upper and lower bounds of 75% prediction intervals (sources: [35,36]; own computation and illustration).

Table 1. Share of the variance explained by principal components (in %).

Principal Component	Individual Share	Cumulative Share
1	71.1	71.1
2	7.7	78.8
3	2.9	81.8
4	2.0	83.7
5	1.1	84.8
6	1.1	85.9
7	1.0	86.9
8–2370	<1.0	100.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vanella, P.; Hassenstein, M.J. Stochastic Forecasting of Regional Age-Specific Fertility Rates: An Outlook for German NUTS-3 Regions. Mathematics 2024, 12, 25. https://doi.org/10.3390/math12010025

AMA Style

Vanella P, Hassenstein MJ. Stochastic Forecasting of Regional Age-Specific Fertility Rates: An Outlook for German NUTS-3 Regions. Mathematics. 2024; 12(1):25. https://doi.org/10.3390/math12010025

Chicago/Turabian Style

Vanella, Patrizio, and Max J. Hassenstein. 2024. "Stochastic Forecasting of Regional Age-Specific Fertility Rates: An Outlook for German NUTS-3 Regions" Mathematics 12, no. 1: 25. https://doi.org/10.3390/math12010025

APA Style

Vanella, P., & Hassenstein, M. J. (2024). Stochastic Forecasting of Regional Age-Specific Fertility Rates: An Outlook for German NUTS-3 Regions. Mathematics, 12(1), 25. https://doi.org/10.3390/math12010025

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Stochastic Forecasting of Regional Age-Specific Fertility Rates: An Outlook for German NUTS-3 Regions

Abstract

1. Introduction

1.1. Background and Motivation

1.2. Regional Aspects of Fertility in Germany

2. Materials and Methods

2.1. Data Source and Preparation

2.2. Model Choice by Backtesting

2.3. Stochastic Forecast Approach

3. Results

4. Discussion

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. German Federal States

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI