A Constrained Stochastic Weather Generator for Daily Mean Air Temperature and Precipitation

A constrained stochastic weather generator (CSWG) for producing daily mean air temperature and precipitation based on annual mean air temperature and precipitation from tree-ring records is developed and tested in this paper. The principle for stochastically generating daily mean air temperature assumes that temperatures in any year can be approximated by a sinusoidal wave function plus a perturbation from the baseline. The CSWG for stochastically producing daily precipitation is based on three additional assumptions: (1) In each month, the total precipitation can be estimated from annual precipitation if there exists a relationship between the annual and monthly precipitations. If that relationship exists, then (2) for each month, the number of dry days and the maximum daily precipitation can be estimated from the total precipitation in that month. Finally, (3) in each month, there exists a probability distribution of daily precipitation amount for each wet day. These assumptions allow the development of a weather generator that constrains statistically relevant daily temperature and precipitation predictions based on a specified annual value, and thus this study presents a unique method that can be used to explore historic (e.g., archeological questions) or future (e.g., climate change) daily weather conditions based upon specified annual values.


Introduction
The impact of climate change on agricultural productivity is as important to understanding prehistoric subsistence as it is to today's economic landscape. Researchers studying potential yield of modern crops use a variety of climate variables, such as temperature, precipitation, solar radiation, etc. [1][2][3][4][5]. Data for these variables are often recorded as daily measurements. This level of precision is important because conditions vary and uncertainty can grow across time and thus result in disparate effects. Commonly used (and typically the only available) climate data for archaeologists to study prehistoric cultures are often at a temporal scale longer than the annual scale. The most precise data come from tree ring data, which are used to estimate annual precipitation and temperature [6]. Some successful efforts have been spent on reconstructing seasonal temperature and precipitation using isotopes and pollens [7]. Thus, what is lacking in comparison to modern data is an understanding of how temperature or precipitation varies within a growing season. Without finer-scale temporal resolution, it is difficult to develop and test hypotheses about comparatively precise, within-year shifts in temperature and precipitation that were likely important in early farming societies. A finer temporal resolution can be achieved by modeling daily temperature and precipitation using a stochastic weather generator (SWG). A SWG can be used to stochastically generate infinite sets of daily weather patterns that can be employed to assess probability of crop failure between years using the ensemble modeling approach.
The purpose of a weather generator is to model daily weather at a site or a number of locations simultaneously based on the statistical characteristics of observed climate at those locations. There is another type of weather generator that estimates daily weather at the locations where local climatic observations are not available based on observed climate collected at other similar locations, which is also known as the space-time weather generator [8][9][10][11][12][13]. For this kind of weather generator, knowledge of the spatial autocorrelation of each climatic variable and spatial correlations among different climatic variables allows predictions of climatic scenarios at the locations where climatic observations are unavailable. Such spatial data are not as commonly available for past climate; thus, we focus on modeling weather from annual records from tree-ring data. Before we describe our SWG, we briefly review other common models.
Because weather information including daily maximum and minimum temperature, solar radiation, and precipitation represents key input data for agricultural crop models, weather generators originated as tools to evaluate the impacts of climate change on crop growth and yield [1][2][3][4][5]14,15]. Precipitation is usually stochastically generated first, because it is argued that it affects the statistics of many other climatic variables to be stochastically generated [16]. The traditional method for generating daily precipitation is to use a Markov chain to simulate the occurrence of wet or dry days and then to utilize a gamma distribution function to approximate the precipitation amount on a wet day [17][18][19][20]. It was found that a first-order Markov chain is simple and effective in representing precipitation occurrence [21][22][23][24][25][26]. The weather generator (WGEN) developed by Richardson and Wright [26] has been widely used to stochastically produce daily precipitation, maximum and minimum temperatures, and solar radiation. In the WGEN, precipitation is considered as the primary variable. The wet or dry days are simulated using a first-order Markov chain, and an exponential distribution function is used to approximate the distribution of daily precipitation amounts. Maximum temperature, minimum temperature, and solar radiation are considered as continuous multivariate stochastic processes with daily means and standard deviations conditioned on the wet or dry state of the day. Instead of a first-order Markov chain, a second-order Markov chain [27], and a third-order Markov chain [28] were also utilized for simulating the occurrence of wet or dry days. The low-frequency signal was also included in the stochastically generated daily weather through perturbing monthly parameters using a low-frequency stochastic model [29].
Another commonly used stochastic weather generator, known as the Long Ashton Research Station Weather Generator (LARS-WG), is capable of simulating daily weather at a single site [30][31][32], which includes daily precipitation, maximum and minimum temperatures, and solar radiation. To improve the simulation of the occurrence of rainy days by a first-order Markov chain, LARS-WG took account of the semi-empirical distributions of the lengths of wet and dry days, daily precipitation, and daily solar radiation. In the LARGS-WG, daily maximum and minimum temperatures are modeled as stochastic processes with daily means and standard deviations conditioned on the wet and the dry days [30][31][32].
A comparison of two weather generators independently developed by groups within the Agricultural Research Service of the U.S. Department of Agriculture, i.e., USCLI-MATE [33,34] and CLIGEN [35,36], was conducted in [37]. Both weather generators use a first-order Markov chain to estimate the occurrence of wet or dry days. The precipitation amount on a wet day is described by a skewed normal distribution in CLIGEN [1] and by a mixed exponential distribution [38] for daily precipitation on wet days with amounts above 0.25 mm in USCLIMATE. Daily maximum and minimum temperatures are generated in CLIGEN using a normal distribution of daily maximum and minimum temperatures with a weighting factor based on the dry/wet day probability. A multivariate autoregressive process is used in USCLIMATE to describe the processes of daily maximum and minimum temperatures and solar radiation.
One conclusion that can be reached by the preceding review of several widely used weather generators is that they all share one common feature: none of them utilize annual mean air temperature nor annual precipitation as input variables for generating daily weather data, which means that none of them can produce daily temperature and precipitation of a particular year with a given annual mean temperature and annual precipitation, which indicates that those models are 'not constrained' in the sense that those models are incapable of generating annual predictions that yield a specific annual mean air temperature or precipitation data set. This suggests that these previously published weather generators could not meet the demand of stochastically generating daily mean air temperature and precipitation that also achieve a specified annual mean temperature or precipitation. However, this demand is particularly important for paleoclimatology, paleo-hydrology, paleo-agriculture, and archaeology where annual mean air temperatures and annual precipitations are commonly reconstructed from proxy indicators. Therefore, the objective of this study is to develop a new stochastic weather generator to produce daily weather that constrains statistically relevant daily temperature and precipitation predictions based on a specified annual value. Achieving this research goal is equivalent to addressing the following question: if annual mean temperature and annual precipitation at a site are specified for a given year based on proxy indicators, what could the daily weather conditions be in that year? That is what are the likely daily weather scenarios that occurred within years? To answer this question, we have developed a CSWG that we refer to as the Daily Weather Generator Constrained by Specified Annual Mean Temperature and Precipitation.
The remainder of this paper is organized into four sections. Section 2 introduces study area and Section 3 describes the meteorological data used in this study. The methods developed in this study for stochastically generating daily mean air temperature (DMAT) and daily precipitation (DP) are introduced in Section 4. Implementation and application of the CSWG to producing DMAT and DP are presented in Section 5 followed by conclusions in Section 6.

Study Area
The study uses data from the Mesa Verde region in the American Southwest for several reasons. The region has an extensive tree-ring database, thus a record of climate at an annual scale [6]. Additionally, for purpose of our research, Mesa Verde is an arid region, where the variability in precipitation and temperature across the year can significantly affect crop yield. Finally, an important aspect of the culture history of the area is the depopulation of parts of the Mesa Verde region by Ancestral Puebloans by approximately AD 1300 [39][40][41][42]. Based on the reconstructed annual mean temperature and annual precipitation from treering data [6], a severe drought in the late AD 1200s is thought to have had a major impact on agricultural productivity and thus on the number of people that could have been supported in the region [42]. However, these reconstructed annual mean temperature and annual precipitation do not provide sufficient temporal resolution to address the impacts of climate change on the region. For example, for a given year with a rainfall deficit compared to other years, it is not possible to determine if precipitation was concentrated during the growing season, although the annual total precipitation has a negative anomaly. As another example, a relatively wet year with a positive precipitation anomaly may not guarantee an above average harvest year, if above normal precipitation happened outside the growing season. Additionally, freezing temperatures during the growing season could cause crop failure. By simply examining annual mean temperature, these cold spells would not be detected. An increase in the temporal resolution of temperature and precipitation to daily values is necessary for understanding the impacts of climate change on paleo-agriculture and crop failure within particular years.

Meteorological Data
Two contemporary weather station data sources were utilized to model daily weather: the Global Summary of Day (GSOD) and the Global Historical Climatology Network (GHCN). The GSOD dataset contains daily mean, maximum, and minimum temperatures and daily precipitation, while only daily maximum and minimum temperatures and precipitation are available in the GHCN dataset. In the Mesa Verde region, the most representative weather station is in Cortez, Colorado. For the GSOD prior to 2007, the weather station was at Cortez Muni (37.3 • N, 108.633 • W), and after 2007 the weather station was relocated to the Cortez Municipal Airport (37.307 • N, 108.626 • W). The GSOD data at Cortez is from 1973 to present. The GHCN Cortez station (37.344 • N, 108.595 • W) has a longer data record starting from 1911, but prior to 1930, there are many missing data records; thus, the GHCN data from 1911 to 1929 are not used in this study. To utilize the GHCN temperature measurements from 1930 to 1973, we converted the daily maximum and minimum temperatures to the daily mean temperature based on the strong linear relationship between (T max + T min )/2 and T mean , where T max , T min , T mean are daily maximum, minimum, and mean temperatures at the GSOD Cortez station, respectively. The linear regression equation is y = 1.036x + 0.08012, the root mean square error (RMSE) is 1.58 • C, and the correlation coefficient (r 2 ) is 0.974.

Methods
The CSWG developed in this study contains two functions: (1) stochastically generating daily mean air temperature based on annual mean air temperature, and (2) stochastically generating daily precipitation based on annual precipitation. The associated methods for these two functions are described in Sections 4.1 and 4.2, respectively.

Stochastically Generating Daily Mean Air Temperature Based on Annual Mean Air Temperature
The principle involved in the CSWG for stochastically producing daily mean air temperature (DMAT) is based on the assumption that DMAT in any year can be approximated by a sinusoidal wave function plus a perturbation (or residual element) from the baseline (i.e., the sinusoidal wave) as where T i is DMAT of day of year DOY i , i is day index (i = 1, . . . , 365), a, b, and c are parameters and ∆T i is the perturbation term from the estimated baseline temperature which will be randomly generated based on the probability distribution of ∆T. Among the three parameters in Equation (1), a is the annual mean air temperature, b is the amplitude of the sinusoidal wave, and c is the phase (with a unit of day). For the CSWG proposed in this study, the annual mean air temperature (a) actually is known as an input variable, which means that two other unknown parameters (i.e., b and c) in Equation (1) need to be estimated based on the annual mean air temperature (a) if there exist relationships between a and b, and a and c. Assuming the relationships between a and b, and a and c can be established through regression analysis, parameters b and c then can be determined as where b and c are estimated values using the constructed relationships between a and b, and a and c, and ∆b and ∆c are stochastically generated residual terms based on the probability distributions of ∆b and ∆c. There are four major steps involved in stochastically producing DMAT by the CSWG, as illustrated in Figure 1. The first step is to apply the sinusoidal wave function of DOY as shown in Equation (1) to fit the time series plot of each year s DMATs of n years of temperature data (n is the number of years of observed temperature data for constructing the CSWC for stochastically producing DMATs) and obtain n sets of three parameters (i.e., a, b, c). The second step is to establish the relationships between a and b, and a and c using the regression analysis method, and then construct the probability distributions of residual terms ∆T, ∆b, and ∆c. To ensure that the stochastically generated DMATs have a comparable lag-one autocorrelation magnitude as the observed DMATs (i.e., the current day temperature should correlate to the previous day temperature to some degree), the probability distribution of the DMAT difference between two consecutive days (i.e., δ = T i − T i−1 ) is also constructed in this step.
The third step is a loop starting from the first day of the year (i = 1) to the last day of the year (i = 365 or 366) for stochastically producing each day DMAT through adding a randomly generated ∆T based on the constructed probability distribution of ∆T to the randomly generated baseline DMATs of the year. Prior to moving to the next day, the absolute value of the DMAT difference between the stochastically generated current day and previous day temperatures (i.e., |T i − T i−1 | S ) is computed and compared with a randomly generated |δ| based on the probability distribution of δ: if |T i − T i−1 | S > |δ|, reject the current stochastically generated T i and stochastically re-generate a new T i , and a new δ. If |T i − T i−1 | S ≤ |δ|, accept the re-generated T i as the current day DMAT i and move to the next day for producing DMAT, otherwise re-generate a T i and a δ, till the condition of |T i − T i−1 | S ≤ |δ| is satisfied. After all daily temperatures are produced, a final adjustment of all DMATs is necessary to set their average and the given annual mean temperature (i.e., a) equal using the equation where T * i is the final stochastically generated daily mean air temperature of day i.

Stochastically Generating Daily Precipitation Based on Annual Precipitation
The function of stochastically generating daily precipitation (DP) based on annual precipitation in the CSWG was developed based on the following assumptions: (1) For each month, the total precipitation can be estimated from annual precipitation if there exists a relationship between the annual precipitation and monthly precipitation. (2) For each month, number of dry days and the maximum daily precipitation can be estimated from the total precipitation in that month. (3) A probability distribution of daily precipitation amount for each wet day can be constructed.
There are four major steps in the CSWG for stochastically generating daily precipitation, as illustrated in Figure 2. The first step is to use n years of observed daily precipitation data to compute annual precipitation (AP) and 12 monthly precipitations (MP) in each year, and number of dry days (NDD) and maximum daily precipitation (MDP) in each month, and construct the probability distribution of daily precipitation amount (DPA) of each month. The second step is to establish relationship between annual precipitation and one of 12 monthly precipitation of that year through the regression analysis and construct the probability distribution of the residual term ∆MP (i.e., difference between the observed and estimated monthly precipitation). In each month, relationships of (MP vs. NDD) and (MP vs. MDP) are also established through the regression analysis and the probability distributions of the residual terms ∆NDD and ∆MDP are subsequently constructed. With a series of the established relationships and probability distributions, the third step is to stochastically generate 12 monthly precipitations from an input annual precipitation. The stochastically generated monthly precipitations are adjusted to satisfy the two conditions: (1) no monthly precipitation is negative; and (2) summation of 12 monthly precipitations is equal to the input annual precipitation.
The fourth step is a loop of stochastically generating daily precipitation in each month starting from January to December. If the stochastically generated monthly precipitation is zero, every day precipitation amount is set to be zero in the month. Otherwise, number of dry days (NDD) and maximum daily precipitation (MDP) are randomly generated based on the relationships of (MP vs. NDD) and (MP vs. MDP) and the probability distributions of ∆NDD and ∆MDP. The randomly generated NDD and MDP are also adjusted to satisfy the following conditions: 0≤ NDD ≤ ND − 1 and 0< MDP ≤ MP, ND is the number of days in the month. If the number of wet days (i.e., ND-NDD) is equal to 1, randomly generate an integer number between 1 and ND and use that integer as the wet day index, and assign MP to the precipitation amount of that day. Otherwise, randomly generate ND-NDD integers between 1 and ND and use them as the wet day indices, randomly assign MDP to one of the wet days, and stochastically generate ND-NDD-1 daily precipitation amounts (DPA) based on the probability distribution of DPA, and assign them to ND-NDD-1 wet days. After all wet days are assigned a precipitation amount, the adjustment of daily precipitation (DP) in the month is carried out to ensure that the summation of stochastically generated daily precipitations is equal to the stochastically generated monthly precipitation.

Stochastically Generating DMAT Using the CSWC
As introduced in Section 3, the daily mean air temperature (DMAT) data collected at Cortez from 1930 to 2016 with less than 10% missing data in each year were used to build the CSWC for stochastically generating DMAT. For the years with less than 10% missing data, the linear interpolation method was applied to fill the data gaps between known data points. Between 1930 and 2016, there are 74 years with less than 10% missing data in each year. After all data gaps were filled, Equation (1) was applied to best fit each of the 74 time series plots of daily air temperatures and obtain three parameters (i. e., a, b, and c) for each year as listed in Table 1. Based on the extracted three parameters of the sinusoidal wave function of DMAT with respect to DOY as listed in Table 1, the linear regression method was employed to fit the scatter plots of a vs. b and a vs. c, separately, and yielded b = −0.2168a + 14.75 (4) The statistics of the linear regressions are listed in Table 2. Although the root mean square errors (RMSEs) are small, i.e., 0.69 • C and 3.4 day for the estimated b and c values, respectively, the correlation coefficients are very low (i.e., 0.25 and 0.13), which indicates that the uncertainty in the linear regression equations should be considered as we estimate b and c from a. The uncertainties in the estimated b and c can be represented by the probability distribution of discrepancies between the observed b and c vs. estimated b and c values, i.e., the residual elements. The probability distribution of the residual term ∆b is defined as the number of years with ∆b within a certain range (i.e., bin) as given below where ∆b i is the possible difference between the observed b and estimated b values within a bin of 0.1 • C. As shown in Table 2, the ∆b ranged from −1.96 • C to 1.61 • C. Similarly, the probability distribution of ∆c is defined as where ∆c i is the possible difference between the observed c and estimated c values within a bin of one day. The difference between the observed c and estimated c ranged from −6.9 day to 7.9 day (see Table 2). The probability distributions of ∆b and ∆c are shown in Figure 3. Based on the probability distributions of ∆b and ∆c, two one-dimensional arrays were produced, each with a length of 74 (because of 74 samples from the DMAT datasets), and the elements of one array are ∆b and the elements of the other array are ∆c. The number of a particular ∆b or ∆c appearing in the array depends on the occurrence frequency of that ∆b or ∆c, e.g., if the frequency is 6 for ∆b = 0.1, then 0.1 would appear 6 times in the array. Using these two arrays and the linear regression equations (i.e., Equation (4)), a baseline of DMAT can be generated from an input of annual mean air temperature i.e., (a). For example, if a is 9.5 • C, according to Equation (4), b = 12.7 • C and c = 255.9 day. Next, two integers between 1 and 74 were randomly generated, one for ∆b and one for ∆c. For example, using the randomly generated numbers 30 and 63 as the array indices to determine the elements in the ∆b and ∆c arrays, which are −0.20 • C and 4.0 days respectively. Next, a float number was randomly generated in the range of [−0.25, −0.15] for ∆b, and a float number was randomly generated in the range of [3.5, 4.5] for ∆c. For example, ∆b = −0.19 • C, ∆c = 4.1 day, and thus b = 12.7 − 0.19 = 12.51 • C and c = 255.9 + 4.1 = 260.0 day. Finally, the randomly generated b and c, along with the input annual mean air temperature (a) were substituted into the sinusoidal wave function as shown in Equation (1) to produce the baseline of DMAT.
To stochastically generate DMAT, in addition to the baseline of DMAT, a randomly generated temperature perturbation from the baseline for each day is also needed. Using 74 years of DMAT data and the associated baselines, observed DMATs for each year were subtracted from each year baseline temperatures, resulting in 26,997 temperature difference data points (i.e., ∆T). These 26,997 data points were used to construct the probability distribution of ∆T as Pr(∆T i ) = number o f days with ∆T i − 0.05 ≤ ∆T < ∆T i + 0.05 (7) where ∆T is in the range of [−19.9 • C, 12.4 • C] with an interval of 0.1 • C. The constructed probability distribution of ∆T is plotted in Figure 4. Based on the computed probability distribution of ∆T, a one-dimensional array with 26,997 elements was generated. The number of a particular ∆T i appearing in the array depends on the occurrence frequency of that ∆T i . Using this ∆T one-dimensional array, temperature perturbation for each day was produced through randomly generating an integer number between 1 and 26,997, and using the randomly generated integer number as the ∆T one-dimensional array index to determine ∆T. As discussed in Section 4.1, to ensure that the stochastically generated DMATs have a comparable lag-one autocorrelation magnitude as the observed DMATs, the GSOD data were used to compute daily mean air temperature difference δ (= T i − T i−1 ) between two consecutive days (the GHCN temperature data were not used in computing δ because the GHCN daily mean temperatures were estimated from the observed daily maximum and minimum temperatures). The 11,468 computed δs were used to calculate the probability distribution of δ as Pr(δ i ) = number o f days with δ i − 0.05 ≤ δ < δ i + 0.05 (8) where δ is in the range of [−14.5 • C 13.9 • C] with an interval of 0.1 • C. The probability distribution of δ is plotted in Figure 5. Based on the calculated probability distribution of δ, a one-dimensional array with 11,468 elements was created. The number of a particular δ i appearing in the array depends on the occurrence frequency of that δ i .

Estimate Monthly Precipitation from Annual Precipitation
As described in Section 4.2, the first step for stochastically generating daily precipitation (DP) is to estimate monthly precipitations based on annual precipitation, and thus a linear regression method was applied to establish a relationship between each monthly precipitation with the annual precipitation as where MP i is the observed total precipitation in month i, AP is the observed annual total precipitation, and f i is the ratio of the total precipitation in month i to the annual precipitation. The precipitation data used for establishing the relationship as shown in Equation (8) were from 1930 to 2016 daily precipitation measurements collected by the GHCN at Cortez, Colorado. Among those 86 years, there are 21 years missing more than 10% precipitation data, and thus only 65 years of GHCN's precipitation data were used for the regression analysis. The scatter plots and linear regression results of annual precipitation vs. monthly precipitation of 12 months are shown in Figure 7, and the statistics of the linear regressions are listed in Table 3. The low correlation coefficients shown in Table 3 indicate that a simple linear regression cannot capture the monthly precipitation variations with adequate accuracy, thus the difference between the observed and estimated monthly precipitations (∆MP) should be considered for producing monthly precipitation from annual precipitation through computing the probability distribution of ∆MP as where the range of ∆MP i (i = 1, . . . , 12) is listed in Table 3. The probability distribution of each month's ∆MP is plotted in Figure 8. Using the probability distribution, a onedimensional ∆MP array was produced for each month with a length of 65 (because of 65 samples). The number of a particular ∆MP value appearing in the one-dimensional array depends on the occurrence frequency of that ∆MP value. With the regression equations and the one-dimensional ∆MP array for each month, monthly precipitation can be estimated from observed annual precipitation based on the equation where ∆MP i is generated through randomly selecting an array index between 1 and 65 and using the array index to determine the median ∆MP i value, and then randomly generating a float number between (∆MP i − 0.5, ∆MP i + 0.5). If the randomly generated MP i value was negative, MP i was set to be zero. After each randomly generated monthly precipitation was estimated, checked, and set to zero when appropriate, it was necessary to determine if the sum of estimated monthly precipitations matched the observed annual precipitation. If they did not match, the difference between them was equally divided by 12 and subtracted (or added) from each monthly precipitation MP as appropriate. The difference between the sum of estimated monthly precipitations vs. observed annual precipitation was recalculated until the difference was less than 0.01 mm.

Estimate Number of Dry Days in Each Month
A 'dry day' is defined as the one with precipitation less than a trace of rain, which is 0.254 mm in this study. To estimate number of dry days in each month, the GHCN precipitation data were used to count the number of dry days in each month in each year. Scatter plots of monthly precipitation vs. observed number of dry days (NDD) in each month were presented in Figure 9, showing an inverse relationship between monthly precipitation and number of dry days for each month. Therefore, the following linear equation was used to fit each scatter plot where NDD i is the number of dry days in month i, g i is the slope of the regression line, and ND i is the number of days in month i. By forcing the best-fit line to pass through the point (0, ND i ), it can be assured that if the monthly precipitation is zero, the number of dry days is equal to the number of days in that month. The statistics of the regression analyses of the monthly precipitation vs. the number of dry days are listed in Table 4. To consider the uncertainty in the estimated NDD from MP based on Equation (11), a probability distribution of the difference between the observed and estimated numbers of dry days (∆NDD) was developed as where the range of ∆NDD is listed in Table 4. The probability distribution of each month's ∆NDD is plotted in Figure 10. Using the probability distribution, a one-dimensional array of ∆NDD for each month with a length of 65 was produced. With the regression equations and the one-dimensional ∆NDD array for each month, the number of dry days in each month was estimated from monthly precipitation based on the equation where ∆NDD i was created through randomly generating an integer between 1 and 65, and using the integer as the array index to determine the median ∆NDD i value, and then randomly generating a number between (∆NDD i − 0.5, ∆NDD i + 0.5) to represent ∆NDD i in Equation (14). Since the number of dry days in each month should be between 0 and ND i , if NDD i was less than zero, set NDD i = 0, and if NDD i was greater than ND i , set NDD i = ND i .

Estimate Maximum Daily Precipitation in Each Month
To estimate the maximum daily precipitation, the observed maximum daily precipitation in each month in each year was regressed against observed total monthly precipitation, as shown in Figure 11. The positive linear relationship between monthly precipitation and the maximum daily precipitation of each month suggests a line passing the origin (0, 0) to fit each scatter plot as given in the equation where MDP i is the maximum daily precipitation and h i is the ratio of the maximum daily precipitation to the monthly precipitation of month i. The statistics of the regression analyses between MP and MDP are listed in Table 5.  To consider the uncertainty in the estimated maximum daily precipitation from the observed monthly precipitation based on the linear regression equations, a probability distribution of the difference (∆MDP) between the observed and estimated maximum daily precipitation in each month was determined through computing the occurrence frequency of the difference ∆MDP as Pr ∆MDP j = number o f years in month i (∆MDP j − 0.5 ≤ ∆MDP < ∆MDP j + 0.5) (16) where the range of ∆MDP in each month is listed in Table 5. The probability distribution of each month's ∆MDP is plotted in Figure 12. Using the probability distribution, a one-dimensional array of ∆MDP for each month with a length of 65 was generated. With the regression equations and the one-dimensional array of ∆MDP for each month, maximum daily precipitation in each month was estimated from total monthly precipitation based on the equation.
where ∆MDP i was determined through randomly generating an integer between 1 and 65, and using the integer as the array index to determine the median ∆MDP i value along with randomly selecting a float number between (∆MDP i − 0.5, ∆MDP i + 0.5). Since the maximum daily precipitation MDP i in each month should be between 0 and the total monthly precipitation MP i , if MDP i was less than zero, MDP i was set to zero; and if MDP i was greater than MP i , MDP i was set to MP i .

Construct the Probability Distribution of Daily Precipitation Amount in Each Month
Given the number of dry days and the maximum daily precipitation, stochastically generating daily precipitation in each month is simply a matter of selecting a precipitation amount for each wet day in that month. If the number of wet days was one, the precipitation amount in that wet day was set to be equal to the monthly precipitation of that month (MP). If the number of wet days was more than one, the precipitation amount of one wet day was set to be the maximum daily precipitation of the month (MDP), and the possible precipitation amounts for all other wet days fell in the range of the trace amount of precipitation (i.e., 0.01 inch or 0.254 mm) to the maximum daily precipitation MDP. To stochastically generate daily precipitation amount for each wet day, the number of wet days with the precipitation amount falling in one of 11 categories was determined, as listed in Table 6. These 11 categories include 10 daily precipitation amount (DPA) ranges and one category of trace rain. The probability distribution of the 11 categories of daily precipitation in each month is shown in Figure 13. Using the probability distribution, a one-dimensional array of daily precipitation categories with a length of 1000 was developed for each month, with the number of each daily precipitation category appearing in the array set equal to the occurrence frequency (%) multiplied by 10.  Figure 13. Probability distribution of the 11 categories of daily precipitation amount in each month.

Stochastically Generate Daily Precipitation
Daily precipitation in each month was generated based on the following rules: 1.
If monthly precipitation MP is zero, every day has zero precipitation in the month.

2.
If there is only one wet day, the precipitation amount of the wet day is equal to MP.

3.
If there is more than one wet day, a randomly selected wet day's precipitation is set to be the maximum daily precipitation MDP. The precipitation amounts of other randomly selected wet days are assigned based on the probability distribution of daily precipitation category, through randomly generating an integer between 1 and 1000, and using the randomly generated integer as the array index to determine the daily precipitation amount category. If it is category 1, the daily precipitation is set to the trace rainfall (0.254 mm); otherwise, based on the daily precipitation amount range of the category defined in Table 6, a randomly generated float number within the daily precipitation amount range of the category is used.

4.
For each month, after all wet days are assigned a precipitation amount, total precipitation in the month is compared to the estimated MP from annual precipitation. If the difference between them is greater than a threshold (0.01 mm), precipitation amounts of all wet days are adjusted through subtracting or adding the difference divided by the number of wet days. Since each adjusted daily precipitation amount should be between trace precipitation (0.254 mm) and the maximum daily precipitation (MDP), sometimes more than two iterations are needed for adjusting daily precipitation amounts.
Two sets of stochastically generated daily precipitations as examples for illustrating results, along with the observed daily precipitations in year 2016 at Cortez are shown in Figure 14.

Conclusions
This study developed a CSWG for producing daily mean air temperatures and daily precipitations in a year with known or specified annual mean air temperature and annual precipitation. Since there appear to be no other published weather generators that utilize annual mean air temperature or annual precipitation as constraints to daily weather predictions, we believe we are presenting a unique method that researchers can use to explore historic (e.g., archeological questions) or future (e.g., climate change) daily weather conditions based upon specified annual values. Thus, in areas such as the American Southwest where tree-ring chronologies are widely available, a similar weather generator could be developed for specific locations and then used to model daily mean air temperature and daily precipitation for a specific year. These data could then be employed in a variety of paleo-environmental models.
How to validate our CSWG results is a challenging question. Since none of the published SWGs used the annual mean air temperature and precipitation as constraints to daily weather predictions, it is hard to directly compare our results with results produced by other SWGs. On the other hand, it is not proper to directly compare the stochastically generated daily temperatures and precipitations with the observed daily data on the daily basis, and thus some commonly used errors-such as root mean square error or relative error-cannot be utilized in the evaluation of our results. Our CSWG results share the same annual mean temperature and annual precipitation (i.e., the first order moments) as the observations, and the higher order moments may not be the same. Therefore, comparisons of the higher order moments between the observed and stochastically generated daily mean temperature and precipitation data might be necessary. In addition to high order moments, some other statistical characteristics of stochastically generated daily weathers need to be evaluated, which deserve a future systematic study.
After the criteria for evaluating our CSWG results are established, further improvements in the CSWG can be carried out, such as including the correlation between daily temperature and precipitation in the daily temperature generator, and considering the dependencies of the current day s wet/dry conditions on the previous day s wet/dry conditions in the daily precipitation generator. Additionally, a future study is necessary to assess the effectiveness and applicability of the CSWG developed in this study in other climatic regions, e.g., tropical, mesothermal, microthermal, and polar.  Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.