Understanding and Modeling Climate Impacts on Photosynthetic Dynamics with FLUXNET Data and Neural Networks

Global warming, which largely results from excessive carbon emission, has become an increasingly heated international issue due to its ever-detereorating trend and the profound consequences. Plants sequester a large amount of atmospheric CO2 via photosynthesis, thus greatly mediating global warming. In this study, we aim to model the temporal dynamics of photosynthesis for two different vegetation types to further understand the controlling factors of photosynthesis machinery. We experimented with a feedforward neural network that does not utilize past histories, as well as two networks that integrate past and present information, long short-term memory and transformer. Our results showed that one single climate driver, shortwave radiation, carries the most information with respect to prediction of upcoming photosynthetic activities. We also demonstrated that photosynthesis and its interactions with climate drivers, such as temperature, precipitation, radiation, and vapor pressure deficit, has an internal system memory of about two weeks. Thus, the predictive model could be best trained with historical data over the past two weeks and could best predict temporal evolution of photosynthesis two weeks into the future.


Introduction
Climate change, specifically global warming, has long been considered a pressing issue to the international society as it disrupts the stability of the ecosystem and threatens the prosperity of mankind [1]. Anthropogenic activities that yield excessive carbon emission, such as fossil fuel burning, industrialization and animal husbandry, have been adversely affecting the ecosystem by raising the global temperature. The dynamics of greenhouse gases, such as CO 2 , play pivotal roles in controlling the radiative forcing and the energy balance of the whole earth system [2]. Therefore, understanding the full cycle, especially the sources and sinks, of atmospheric CO 2 is critical to better control the CO 2 concentrations in the future to a reasonable extent.
Photosynthesis pathway is the largest land surface CO 2 sink that sequester atmospheric CO 2 into vegetation biomass and stores them as living biomass. The magnitude of photosynthetic carbon sequestration pathway is around 120-130 Pg C per year (1 Pg = 10 15 grams) [3]. Ecosystem respiration, including autotrophic and heterotrophic, consumes most of the photosynthetic carbon sink, cycles it back into the atmosphere, and results in a much smaller net carbon sink compared with gross carbon input from photosynthesis. The balance between photosynthesis carbon input and ecosystem reparation carbon output determines the fate of atmospheric CO 2 concentration and mitigates the anthropogenic CO 2 emissions, such as from fossil fuel and biomass burning [4].
Given the significant role of photosynthesis carbon uptake in determining the global carbon cycle, atmospheric CO 2 concentrations, radiative balance of earth system, and global warming, it is crucial to gain mechanistic understanding of photosynthesis machinery and its relationship to climate factors such as temperature, precipitation, and radiation. Observational networks, such as FLUXNET [5], have been established globally to measure and understand various different components of land surface carbon cycle including photosynthesis. At leaf scale, photosynthesis reaction is carried out by Ribulose-1,5-bisphosphate carboxylase oxygenase (RuBisCO enzyme), which combines CO 2 and water molecules to generate carbohydrate products. It occurs at two stages: (1) first, light-dependent reactions capture the energy of light and store in adenosine triphosphate (ATP) and nicotinamide adenine dinucleotide phosphate hydrogen (NADPH); (2) second, light-independent reactions capture and reduce carbon dioxide. This biological reaction relies on substrate concentration (CO 2 and water), depends on the activity of temperature-sensitive RUBISCO enzyme, and is driven by solar energy. Thus, theoretically, one is able to build an effective photosynthesis model that takes all those important factors into account and predicts the magnitude of land surface photosynthesis given relevant climate drivers.
Historically, mechanistic models have been used to study the dynamics of land surface photosynthesis and its relevance to the fate of global warming. These models are based on either Monteith law of light use efficiency [6] or Farquhar photosynthesis modeling framework [7]. Alternatively, the photosynthesis rate could also be estimated by data-driven machine learning models that are trained on a large amount of observed photosynthesis data.
The latter approach is gaining popularity because of the growth of Eddy covariance observational networks that collect photosynthetic data on a daily basis over the years, along with the advancements of effective machine learning techniques. Data-driven models have been applied to a wide-range of areas recently, and have demonstrated their robustness in multiple tasks. Methods such as feedforward neural network (FFNN) [8], random forest [9], model trees ensemble [10][11][12] and support vector regression [13,14] have been utilized to estimate land surface-atmosphere fluxes from site level to regional or global scales [3,11,12,[14][15][16][17][18][19]. Adaptive neuro-fuzzy inference system and general regression neural network have also been used to estimate daily carbon fluxes in forest ecosystems [20]. However, very little research has yet leveraged algorithms that specialize in the integration of temporal dependencies, such as hidden Markov model (HMM), long-short term memory (LSTM) [22] or transformer [23]. Thus, in this study, we employed LSTMs and transformers to predict photosynthetic activities from the past history of climate drivers, and compared their performances against an FFNN counterpart. Our intuition is that the temporal-aware neural network models shall be able to capture the multi-day dynamics of photosynthesis, thus yielding better predictions.
Gross primary product (GPP), a measure of the amount of energy that plants can trap from the sun, is used in this study as a correlate to the rate of photosynthesis. The objective of this study is to model the temporal dynamics of plant photosynthesis in terms of GPP with three different neural network architectures. With appropriately designed model experiments, we also aim to explore and understand the impact of different climate drivers on photosynthesis, as well as the natural ecosystem memory of the photosynthetic activities.

Results
We first modeled the dynamics of GPP across four different sites using Feedforward Neural Networks (FFNNs), one for each site and trained independently. The model architecture, formulas, and other details will be covered in the Materials and Methods section. In principle, each prediction of GPP on any day is made by the model based on the six climate drivers including GPP from the previous day. Afterwards, a leave-one-out experiment was conducted for climate drivers, where one driver is excluded from the input at a time, in the hope to draw inferences on which factors are the most influential on predicting GPP. The results are shown in the top panels in Figure 1. The experimental settings for subsequent models were the same unless otherwise specified. Figure 1. The gross primary product (GPP) prediction performance of (1) the feedforward neural network (FFNN), where only the current climate drivers are utilized; (2) the long short-term memory (LSTM), where current and past climate drivers are taken into account; and (3) the transformer, where current and past climate drivers are comprehensively encoded. A leave-one-out experiment is also conducted for each model, where one of the climate drivers are excluded from the input. Within each neural network candidate, a one-way analysis of variance (ANOVA) followed by a post-hoc Tukey's honestly significant difference (HSD) test [21] is performed to identify the climate drivers without which the model performance would deteriorate. Asteroids indicate the ones whose square errors significantly deviate from those of the version with all climate drivers provided. One, two and three asteroids respectively corresponds to HSD p-value of less than 0.05, 0.01, and 0.001. GPP: gross primary product; FSDS: shortwave radiation; Pre: precipitation; Temp: temperature; FLDS: longwave radiation; VPD: vapor pressure deficit.
Secondly, we employed Long Short-Term Memory (LSTM) Networks [22], a member of the Recurrent Neural Networks (RNNs) family which specializes in incorporating temporal information in time-series data, to replace the FFNN counterpart in the same analyses. Instead of merely utilizing the climate drivers in the day before, we let the LSTM candidate take into consideration the data within the past thirty days. Compared to FFNN, our LSTM model achieved a consistently lower mean square error and higher model-data correlation (the middle panels in Figure 1).
The last neural network variant we analyzed was the transformers [23], which was recently introduced as a state-of-the-art natural language comprehension model whose utility covers the areas of machine translation [24], question-answering [25], and various other tasks. Similar to the LSTMs, thirty days' worth of data was provided as the input. In our task, there were several cases that the transformers (the bottom panels in Figure 1) outperformed FFNN, such as it maintained a higher Pearson correlation when shortwave radiation data was absent, but overall it did not demonstrate superior performance over FFNN, let alone the even better-performing LSTM.
Given that the LSTM was the best candidate among the three, we then inspected multiple variants of the LSTM model to explore the natural temporal memory of the photosynthetic process. The photosynthetic activities as measured by GPP depend on the past history of environmental drivers, but that temporal dependency is not likely to extend infinitely to the past. For example, whether it rained a million years ago would minimally affect the GPP tomorrow. An LSTM variant that utilizes information within a reasonable temporal span is likely to achieve optimal prediction accuracy. On the other hand, incorporating information over a period much shorter or longer than the natural memory duration is likely to yield sub-optimal results. The natural length of memory may be deduced by the performance of different LSTM variants, and it may shed light on the internal linkage of photosynthesis process across time.
Finally, we experimented on the forecasting ability of the LSTM models such that all variants were given the historical data on climate drivers over the experimentally determined two-week period, but were asked to predict the GPP on varying days ahead in the future. We would like to find out the turning point at which the model was no longer able to yield faithful predictions, which may again imply the temporal memory of photosynthetic dynamics. The results are displayed in Figure 2. predicting the current GPP using varying length of historic data on climate drivers. Lower panel: predicting the GPP on varying days ahead in the future using the same two-week length of historic data.

Discussion
In all three network candidates, the prediction accuracy were fairly consistent across multiple sites when all climate drivers were provided as the input (Figure 1). The second grassland site seemed to be the easiest to predict, judging from the fact that all three candidates achieved the best performance on that site, as indicated by low mean square error and high Pearson correlation. The reason behind this observation is yet to be discussed. Furthermore, our leave-one-out experiment helped us identify the most influential climate drivers on GPP prediction. The accurate estimation of GPP relies heavily on knowing the shortwave radiation in the region. This observation is reasonable, as shortwave radiation is radiation energy with wavelengths around the range of visible light-the primary energy source for photosynthesis. Apart from shortwave, knowledge of any other climate driver does not significantly influence the GPP prediction accuracy consistently across all three neural network candidates, and interestingly enough this is also the case for GPP itself: presence or absence of past GPP data does not significantly affect the prediction of future GPP levels.
After identification of the most influential climate driver, the phenomenon that the second grassland is the "most predictable" site becomes more explainable. If we look ahead into the shortwave radiation distribution across the four sites shown in Figure 3 in the upcoming Materials and Methods section, we could draw a plausible conclusion that the abundance of shortwave radiation might be the reason behind the high predictability. The shortwave radiation distribution in the second grassland implies a lot of non-zero, information-carrying observations of this climate driver, which would reasonably improve the prediction accuracy of GPP.
In both the initial study and the leave-one-out experiments, the performance of the LSTM models was consistently better than the FFNN counterparts, which we would attribute to the inclusion of information over a longer duration of past history. The less appreciable performance of transformers compared to LSTMs was likely due to the fact that transformers were originally designed for natural language translation rather than for time-series interpretation. Besides being less task-specific, the transformer we implemented had more parameters than the other two candidates, meaning that it would require more data to optimize. That also might have yielded the less competitive performance given the same amount of training data.
In the study of past-history dependency and forecasting ability of the LSTM models (Figure 2), our experiment implied that photosynthesis has an internal memory length of two weeks. In the past-history dependency study, the LSTM performance peaked when the historical data fed into it covered the past two weeks; providing a more distant history of the climate drivers no longer improved the performance. In the forecasting study, our LSTM model was able to predict GPP two weeks into the future at most. Attempts to predict beyond two weeks demonstrated a significant decrease in prediction accuracy as indicated by higher mean squared error and lower correlation between the prediction and the respective ground truth. Such deterioration was especially obvious in grassland sites (orange and green bars). Our results on the optimal historical system memory were quite consistent with the longest temporal predictability, which were both about two weeks.

FLUXNET Dataset
We used Eddy covariance data collected from four different sites in the FLUXNET observational network [5]. Eddy covariance [26] is a measurement technique that quantifies ecosystem gas exchange and areal emission rates. By simultaneously measuring the velocity of the swirling wind with anemometers and the gas concentrations with infrared gas analyzers, the technique can eventually generate estimations of the flux of gases into or out of the ecosystem. By now, hundreds of sites are operating on a long-term and continuous basis. In this study, we included two evergreen needleleaf forest sites and two grassland sites (Table 1). Figure 3 shows the probability density distributions of the climate drivers at the four selected sites. Table 1. Summary of the geographic location and basic information of the four FLUXNET sites included in this study. Time-varying data is represented in mean ± standard deviation. AP: annual precipitation. GPP: gross primary production, a measure that correlates to the rate of photosynthesis in a region.

FFNN
The first neural network candidate we explored, FFNN, composes of five layers of cells as shown in Figure 4b. Each cell is connected to all cells in the previous and the next layer but intact from other cells. Information from the input feature vectors, which in our case are the six climate drivers, can only pass from a shallower layer to a deeper layer but not the other way around, hence the "feedforward" part in the name of the model. Figure 4a indicates the composition of the input feature vectors and prediction ground truths for not only the FFNN but also the two other candidates.
An arbitrary subset of the FFNN, with n pre-synaptic cells attaching to the same post-synaptic cell, is depicted in Figure 5a for a detailed demonstration of the network. Conceptually, the signal passed to the post-synaptic cell is a weighted sum of the outputs from the pre-synaptic cells, and that signal goes through an activation function before it passes on to the cells in the next layer.
The mathematical description of this scenario is denoted in Equation (1).
where x i are the input features; w i are neural network parameters, or in technical terms, weights, for each cell; f is the activation function, which is our implementation is a rectified linear unit (ReLU) [27]; and y i is the output. For a cell in the first layer, the input features are the climate drivers, whereas for other cells deeper down the network, the input features are abstractions extracted from previous cells. For the single cell in the last layer, its output is the predicted GPP.

LSTM
The second candidate, LSTM, models GPP based on not only climate drivers on the previous day but also climate drivers from the more distant past. As illustrated in Figure 4c, the six climate drivers over the past k days are sequentially fed into the LSTM in the correct chronological order. The order at which they enter the LSTM matters to the model, as the LSTM would adapt its states depending on the past and present information. On a conceptual level, LSTM shares some similar design philosophies with a hidden Markov model who computes the conditional probability P(x t |x t−k , ..., x t−1 , state). The linear layer immediately after the LSTM cell in the figure stands for a feedforward layer that very much resembles the last layer in the FFNN counterpart: all the extracted features from the LSTM cell are condensed into a single FFNN post-synaptic cell whose output is the predicted GPP at day t.
The building block, or an LSTM cell, is demonstrated in Figure 5b. Conceptually, an LSTM cell keeps track of two internal states (i.e., cell state and hidden state which respectively corresponds to long-term and short-term memory) and consists of three gates (i.e., forget gate that determines what fraction of the previous cell state shall be maintained, input gate that decides how much new information from the current input shall be integrated to the cell state and hidden state, and output gate that updates the hidden state).
Mathematically speaking, given an input sequence x = (x 1 , ..., x T ), a standard LSTM computes the hidden vector sequence h = (h 1 , ..., h T ), which can be used as the output of the model, by iterating the following equations from t = 1 to T: where σ is the sigmoid function, and f , i, o, c and h are respectively the input gate, forget gate, output gate, cell state and hidden state vectors. b f , b i , b o and b c are the biases (i.e., additive scalars) to their respective gates. Matrices W and U contain the model weights at the input and recurrent connections. x is the model input at each time step. The symbol "•" represents the point-wise multiplication operation.

Transformer
The third candidate, the transformer, also incorporates information from multiple time points in past history. Even though transformers and LSTMs may share similar high-level structures (Figure 4c,d), there exists a major difference between them. Transformers completely remove the concept of recurrence, or in other words, the way that LSTMs comprehend data as an ordered sequence that incrementally builds up information. They instead rely on two methods called positional encoding and self-attention to handle long series of data. The former means that although there is no longer a concept of "who comes before whom and after whom", the position of each data point is nevertheless encoded so that transformers do have spatial awareness. The latter method helps transformers to figure out "which data points" (or specifically, data points from "which positions") are more important and "pay more attention" to them, so that they can manage long sequence of data.
The fundamental building block of a transformer is shown in Figure 5c. All components should already be self-explanatory, except for the layer normalization blocks. The purpose of such blocks is to re-scale the output values from the previous layer in order to balance the data, prevent the model from being overwhelmed by extreme outliers, and prevent such outliers from propagating and deteriorating progressively.

Neural Network Implementation Details
Data was divided into train, validation and test sets at the ratio of 8:1:1, where the training set was used to train the model by updating model parameters according to the loss; the validation set was left for sanity checking each time the entire training set has been exhausted; and the test set was held out for the quantitative evaluation that yielded the results in Figures 1 and 2.
For all three candidates, model parameters were initialized as zeros and were iteratively optimized by back-propagation of loss (i.e., amount of difference between the model prediction and the GPP ground truth as determined by a loss function) via gradient descent. The loss function implemented for all candidates was mean square error loss (MSE loss). Adaptive learning rate (i.e., adjust model parameters at a larger magnitude at early stages of training, and gradually decrease that magnitude when the model reaches the fine-tuning stage) and early stopping (i.e., stop training when the model performance on the validation set no longer improves) were both implemented to prevent over-fitting (i.e., the phenomenon that the model performs great on the training set, most likely by "memorizing" the answers, but fails to generalize on the test set) and improve performance on the test set. Random seed was controlled to secure reproducibility: for example, the pseudo-randomized order at which each training sample was visited would be kept the same each time we ran the experiments.
Other more technical and less inspirational hyper-parameters, such as learning rate, batch size, etc., will not be mentioned in this paper, but can be found in the GitHub repository https://github. com/RosalieZhu/DL-CO2.

Conclusions
Greenhouse gas emissions such CO 2 could dramatically warm up climate system via positive radiative forcing effect. Fortunately, terrestrial ecosystems are able to mitigate the anthropogenic CO 2 emissions via photosynthesis. In this study, we aim to model the dynamics of plant photosynthesis activity using advanced machine learning frameworks with temporal awareness. Our results showed that incorporation of past information is important for improving model predictability. Our memory-based neural network model was able to successfully capture the temporal dynamics of plant photosynthesis at all four sites of interest, and it identified shortwave radiation as the most informative climate driver for faithful prediction of photosynthetic rates. Our modeling experiment also demonstrated a two-week internal system memory of the photosynthesis machinery.
Our study demonstrated that LSTM, with its capability of capturing temporal dependencies, is a technique worth investigating when researches study time series data regarding photosynthetic activities, or even environmental data in general. In the future, similar methods can be directly applied to evaluate other important climate drivers besides GPP, such as ecosystem respiration and net ecosystem exchange. Specifically, a more comprehensive study on mapping photosynthesis internal memory can be conducted if more data from additional FLUXNET sites that cover all land types are incorporated, and ultimately an internal memory map can be derived based on the prediction of the proposed LSTM network.