Identiﬁcation of Dissolved Organic Matter Origin Using Molecular Level Analysis Methods

: Natural organic matter (NOM) is ubiquitous; however, its characteristics are not fully understood because NOM is a complex organic mixture that reﬂects the surrounding ecosystem. Identiﬁcation of its molecular structure provides information for evaluating its environmental reactivity and behavior. This study characterized NOM regarding its molecular structure. The characteristics of different types of NOM, depending on their origin, were investigated using various analytical methods such as pyrolysis-gas chromatography-mass spectrometry, size exclusion chromatography, and ﬂuorescence spectrometry. NOM was collected from three different sites: river water, wetland, and wastewater treatment plant efﬂuent. The molecular size of NOM followed the order wetland > river water > wastewater treatment plant efﬂuent. The ﬂuorescence spectra distinguished the differences among the samples. The efﬂuent contained the highest proportion of protein-like peaks, and the wetland sample showed the highest humic-like substance peak. The origin of NOM was investigated by classifying the pyrolyzed fragments from each sample. NOM from wetlands contains a large amount of lignin, which originates from terrestrial substances. Proteins were the major biocomponents of the efﬂuent sample. The NOM in the river water contained proteins and lignin-derived compounds. The results show that NOM can be clearly identiﬁed through molecular-level analysis methods.


Introduction
Natural organic matter (NOM) is a complex mixture of aromatic and aliphatic molecules with a wide range of chemical structures and molecular weight distributions. NOM occurs naturally in the environment through ecological processes. The characteristics of NOM are influenced by its origin, climatic conditions, and biogeochemical cycle of the surrounding environment [1]. NOM does not directly affect human health; however, it causes taste and odor problems in drinking water. In addition, NOM is a target substance for removal during the water treatment process because it acts as a precursor of disinfection by-products [2,3]. In terms of pollutant behavior, NOM increases the mobility of pollutants in water when it combines with heavy metals and hydrophobic organic pollutants [4]. Therefore, the analysis of the chemical structures and components of NOM is an important issue in water quality management.
Several methods have been applied to characterize NOM, including total organic carbon (TOC) analysis, UV spectrometry, and elemental analysis [5]. However, these methods only provide rough information regarding the dissolved organic carbon concentration, UV absorbance, and elemental composition. Because NOM is heterogeneous and has various origins, detailed and specific investigations are required to understand its characteristics. There are several effective analytical methods for investigating NOM at the molecular level: size exclusion chromatography (SEC), fluorescence excitation-emission matrix (FEEM), and pyrolysis-gas chromatography-mass spectrometry (Py-GCMS).
SEC is used to estimate the molecular weight distribution of samples. The analysis confirmed the transformation of molecules or removal rate of specific size molecules after the treatment process [6]. FEEM classifies organic matter into humic-like and microbialoriginating substances, and a fluorescence spectrometer is used to obtain the excitation and emission wavelengths of the samples. The data are drawn to the FEEM and show a typical peak area, indicating the origin of organic matter. Generally, peaks are classified into four groups: humic, fulvic, tyrosine, and protein-like substances [7]. The origin of NOM is estimated based on the portion of each group. Py-GCMS is an efficient technique for analyzing complex organic polymers [8]. This method pyrolyzes a sample at a high temperature in a pyrolyzer, where the sample fragments into smaller molecules. The pyrolyzed fragments are identified as specific molecules using gas chromatography-mass spectrometry, which has been applied to identify the biopolymer composition of NOM because the various composition rates of each sample represent the origin of NOM [9]. Therefore, SEC, FEEM, and Py-GCMS are known to be effective analytical techniques that enable molecular-level research. In this study, analytical methods were applied to characterize NOMs originating from different environments. We characterized NOMs from a river, wetland, and wastewater treatment plant (WWTP) effluent to understand the origin of NOM in aquatic systems.

Sample Collection
The NOM samples used in this study were collected from the Nakdong River, Jinhae WWTP effluent, and Mulyeongari-oreum wetland in the Republic of Korea. Sample collection was conducted between October and November 2021. The NOM from the Nakdong River (ND NOM) was obtained from the Bonpo Ecological Park located in Dongeup, Changwon-si, Gyeongnam (35 • 37 N, 128 • 64 E). The ND River is the second largest river in southeastern Korea. The total main channel length is the longest nationally at approximately 510.4 km, and the watershed area is approximately 23,647 km 2 , accounting for approximately 25% of the total land area. NOM was sampled from Jinhae WWTP effluent (WWTP NOM) in Jinhae-gu, Changwon-si, Gyeongnam (35 • 14 N, 128 • 68 E). Jinhae WWTP (treatment capacity of 60,000 m 3 /d) uses a biological treatment process. Mulyeongari-oreum wetland is located in Sumang-ri, Namwon-eup, Seogwipo-si, Jeju-do (33 • 36 N, 126 • 69 E). The wetland (circumference:~1 km, depth:~40 m) is formed by the accumulation of rainwater in the crater of a parasitic volcano on Jeju Island. In addition, the DOC concentration is high because it accumulates well owing to the characteristics of the crater lake, where drainage does not occur.

Water Quality Analysis Methods
The pH and conductivity of the collected samples were measured using a multimeter (Orion 4 star, Thermo Scientific, Cleveland, OH, USA). The samples were transferred to the laboratory and stored at 4 • C until pre-treatment. The sample was filtered using a 0.45 µm syringe filter made of cellulose acetate (HM, Seoul, Korea), and then DOC was measured using a total organic carbon analyzer (TOC, Sievers M9 Portable TOC analyzer, SUEZ, Paris, France).
The UV absorbance was measured at 254 nm (UVA 254 ) using a UV spectrometer (UV-1800 uv-vis spectrophotometer, Shimadzu, Kyoto, Japan). Specific UV absorbance (SUVA) indicates the content of humic substances. The SUVA value was obtained by dividing the measured absorbance value by the DOC concentration and multiplying by 100. A fluorescence analyzer (RF-6000, Shimadzu, Kyoto, Japan) was used to measure excitation at wavelengths of 230-550 nm at intervals of 5 nm and emission at wavelengths of 250-600 nm at intervals of 2 nm. A long-pass filter (Longpass Filter/UV 300 nm, 50 × 50 mm, Asahi Spectra, Torrance, CA, USA) was used to remove the Raman spectrum and Rayleigh scattering effects.
The fluorescence spectrum was used to calculate the humification index (HIX), biological index (BIX), and fluorescence index (FI), which are indicators of NOM characteristics and origin. The HIX is an index indicating the degree of humification of organic matter given that the fluorescence intensity increases toward a longer wavelength as the organic material is humified. The sum of the fluorescence intensities ranging from an excitation wavelength of 254 nm to an emission wavelength of 435-480 nm was divided by the sum of the emission wavelengths from 300 to 345 nm [10]. The BIX is an index that evaluates the degree of activity of indigenous organisms. When the excitation wavelength is 310 nm, the value is obtained by dividing the fluorescence intensity of the emission wavelength of 380 nm by the fluorescence intensity of the emission wavelength of 430 nm [10]. FI is an index of the origin of organic matter and is a value obtained by dividing the fluorescence intensity at an emission wavelength of 450 nm by the fluorescence intensity of 500 nm at an excitation wavelength of 370 nm [11].
High-performance size-exclusion chromatography (HPSEC) measures the molecular weight and size distribution of DOM in water [12,13]. The molecular weight was measured using high-performance liquid chromatography (HPLC) (LC-20A, Shimadzu, Kyoto, Japan) equipped with an SEC column (Protein Pak 125, 10 µm, 7.8 × 300 mm, Waters, MA, USA). Phosphate buffer (2.4 mM NaH 2 PO 4 + 1.6 mM Na 2 HPO 4 + 96.0 mM NaCl) was used for the mobile phase, and the flow rate was maintained at 0.7 mL/min. The sample injection volume was 100 µL, and the analysis was performed using a 254 nm UV detector. Polystyrene sulfonates (PSS, Polysciences Inc., Warrington, PA, USA) with molecular weights of 1690, 5580, 8890, and 16,500 Da were used as standard materials to determine the molecular weight calibration curve.

Py-GCMS Method
For the pretreatment of Py-GC/MS analysis, 16 L of raw water was filtered using a 0.7 µm glass microfiber filter (GF/F) (Whatman, Cytiva, Marlborough, MA, USA) connected to a vacuum pump. It was then concentrated using a rotary evaporator (N-1300VW, EYELA, Tokyo, Japan) and freeze-dried using a freeze dryer (Bondiro, Ilshin biobase, Dongducheon, Korea).
Thermal decomposition was performed using a Curie-Point pyrolyzer (JCI-55; JAI, Tokyo, Japan). Approximately 0.1 mg of the dried sample was wrapped in ferromagnetic foil (Pyrofoil F590, JAI, Tokyo, Japan) and pyrolyzed at 590 • C. The generated volatile material was simultaneously injected into the GC with a carrier gas (He). GC-MS analysis was performed using a gas chromatograph-mass spectrometer (GCMS-QP2020 NX, Shimadzu, Kyoto, Japan) equipped with a 30 m × 0.25 mm × 0.50 µm DB-5MS capillary column. The initial oven temperature increased by 7 • C/min from 40 • C (held for 5 min) to 320 • C. MS parameters were mass analyzed at a detector voltage of 70 eV, an ion source temperature of 210 • C, and a mass range of 30-500 amu. After thermal decomposition, 100 of the most abundant compounds in each pyrogram were selected using the mass spectrum library and classified into polysaccharide (PS), amino sugar (AS), protein, polyhydroxy aromatic (PHA), lignin, and lipid according to Bruchet et al. Table 1 lists the area percentage ranking from 1 to 10 in the pyrogram. All sample analyses were performed in duplicates. Three model compounds (SRNOM, AHA, and BSA) were pyrolyzed to confirm reproducibility.

Confirmation of Py-GCMS Reproducibility
Colored cells denote molecules matched between the 1st and 2nd analyses.
SRNOM is natural organic matter present in a blackwater river [14] and is a popular reference material because it uses a systematic and qualified NOM extraction process. The duplicated results showed that seven fragments were detected repeatably between 1st and 2nd pyrolysis results: 2-butanone, phenol, 4-methly-phenol, 3-buten-2-one, methyl benzene, 2-methly-phenol, and 1,3-demethly-benzene. The total sums of the area percentages of repeated fragments were 41.85% and 41.28%, respectively.
BSA is a protein derived from cows and has been used as a protein standard in many experiments. BSA showed that eight fragments matched the 1st and 2nd pyrolysis results. The duplicated results showed five fragments: methyl benzene, phenol, methyl-4-phenol, acetonitrile, benzeacetonitrile, 1H-indole, 1H-pyrrole, and 3-methyl butanal. The total area percentages of the repeated fragments were 57.32% and 56.95% of the 1st and 2nd pyrolysis, respectively.
These results suggest that organic molecular structure identification using pyrolysis has reproducibility.

Water Quality of the Samples
The results of the water quality analysis are presented in Table 2. The pH values were 7.89, 6.90, and 5.35 for the ND NOM, WWTP NOM, and wetlands NOM, respectively. It was relatively high in the ND NOM and lowest in wetland NOM. The pH of ND DOM was high because of the photosynthesis of the underwater algae because it was sampled during the day. The pH of the wetland NOM was low because there was almost no inflow or outflow of water from the outside, except for rainwater, and organic matter and peat were continuously deposited, making it acidic. The electrical conductivities were 238.8, 1060, and 52.15 µS/cm at ND NOM, WWTP NOM, and wetland NOM, respectively. WWTP NOM contains many ionic substances because the influent is wastewater. The conductivity of the wetland sample was low because only rainwater was used as the inlet water source. The DOC was 2.60, 3.90, and 8.89 mg/L in the ND NOM, WWTP NOM, and wetland NOM, respectively. The wetland NOM showed the highest DOC because there was no water outlet, and the water content only decreased by evaporation. This process increased the concentration of DOC. SUVA is an index of the content of aromatic organic compounds in DOM and was used to determine the relative aromaticity of the humic fraction [16]. When the SUVA value is <3, the major portion of organic matter has hydrophilic properties and low molecular weight. A SUVA value of 4-5 indicates that hydrophobic aromatic substances and polymer substances are major components in the sample [17]. The SUVA values were 2.27, 2, and 3.42 in the ND NOM, WWTP NOM, and wetland NOM, respectively, with SUVA values < 4 at all points. It has been reported that the proportion of hydrophilic lowmolecular-weight organic substances is higher than that of hydrophobic aromatic polymer organic substances owing to the nature of the water quality in Korea [18]. Although the SUVA value of the wetland NOM was <3, it was relatively higher than those of the others. This suggests that terrestrial-derived organic matter accumulated owing to the water flow characteristics mentioned in Section 2.1.

Py-GCMS Results
The Py-GC/MS results are shown in Figure 1. The ND NOM contained 46% PS, 7% AS, 37% protein, 2% PHA, 6% lignin, and 2% lipid. The WWTP NOM contained PS 27%, protein 18%, lignin 4%, lipid 3%, and Cl-based compounds 48%. Wetland NOM contained PS 31%, protein 27%, PHA 8%, and lignin 34%. ND NOM contained a high proportion of polysaccharides and proteins because they are the main components of plant and microbial constituents in rivers [19,20]. Lignin is a biopolymer found in soil and plants and is often detected in stagnant water, such as wetlands [21]. In this study, the wetland NOM samples showed patterns similar to those in previous reports. Among the biopolymers, 34% of lignin was present, which explains why organic matter derived mainly from plants existed. The polysaccharide and protein percentages were also relatively high in the samples. This indicates that wetland organic matter contains debris from plants and microorganisms. WWTP NOM constituted a large proportion of chlorine-based compounds. This was likely due to the presence of chlorine-based organic compounds produced during chlorination. Generally, the WWTP effluent is disinfected with various methods such as chlorination, ozonation, and UV lighting. Those are commonly used disinfection systems, but chlorination has a disadvantage in that the occurring chlorine contained byproducts. Trihalomethanes are well-known byproducts in the drinking water treatment process. Although in this research, chlorine contained molecules that were detected in the WWTP effluent. The pyrolysis result shows a similar chemical reaction between the WWTP NOM and chlorine. Polysaccharide and protein components are major biopolymers. A large portion of these result from biological treatment processes. Generally, wastewater contains a high concentration of polysaccharides and lipids; however, most of them are decomposed by microorganisms. During this process, proteins and polysaccharides are released as extracellular polymeric substances (EPS) from microorganisms. This explains the large percentage of both biopolymers. and chlorine. Polysaccharide and protein components are major biopolymers. A large portion of these result from biological treatment processes. Generally, wastewater contains a high concentration of polysaccharides and lipids; however, most of them are decomposed by microorganisms. During this process, proteins and polysaccharides are released as extracellular polymeric substances (EPS) from microorganisms. This explains the large percentage of both biopolymers.

SEC and EEM Analyses
The molecular weight distributions of the collected NOMs are shown in Figure 2. The molecular weight peaks were 1095 Da, 1340 Da, 790 Da, and 400 Da in the ND River, 1380 Da and 1090 Da in the wetland, and 400 Da, 740 Da, and 1060 Da in the WWTP effluent. The molecular weight distribution of NOM at all sampling points ranged from 400 to 1380 Da, and the wetland sample showed the highest molecular weight (1380 Da). This means that the wetland sample had polymer peaks that were influenced by terrestrial-derived humic substances because humic-like substances contain large molecules [22,23]. The WWTP NOM has a low molecular weight peak, which is likely due to the decomposition of organic matter during biological processes. This is because a large amount of humic substances was removed from the primary settling tank, and biodegradable organic matter of high molecular weight is significantly decomposed by microbial activity in the aeration tank [24].

SEC and EEM Analyses
The molecular weight distributions of the collected NOMs are shown in Figure 2. The molecular weight peaks were 1095 Da, 1340 Da, 790 Da, and 400 Da in the ND River, 1380 Da and 1090 Da in the wetland, and 400 Da, 740 Da, and 1060 Da in the WWTP effluent. The molecular weight distribution of NOM at all sampling points ranged from 400 to 1380 Da, and the wetland sample showed the highest molecular weight (1380 Da). This means that the wetland sample had polymer peaks that were influenced by terrestrial-derived humic substances because humic-like substances contain large molecules [22,23]. The WWTP NOM has a low molecular weight peak, which is likely due to the decomposition of organic matter during biological processes. This is because a large amount of humic substances was removed from the primary settling tank, and biodegradable organic matter of high molecular weight is significantly decomposed by microbial activity in the aeration tank [24].    Figure 3 shows the FEEM results. Four types of fluorescence characteristics can be classified according to peaks observed in the fluorescence spectrum [25]. Protein-like has an excitation wavelength of 275 nm and an emission wavelength of 350 nm [26], Fulvic-like has an excitation wavelength of 320-340 nm and an emission wavelength of 410-430 nm [26], humic-like has an emission wavelength of 460-480 nm [26], and terrestrial humic-like has an emission wavelength of 450 nm [27]. The fluorescence analysis of ND NOM showed high intensity at ex/em 275/350 nm, which is a protein-like wavelength range related to allochthonous organic matter. This is because it was derived from the biodegradation of aquatic microorganisms, algae, and bacteria living in the aquatic environment of ND NOM. The wetland NOM had higher peaks in the fulvic acid-like (ex: 320-340 nm/em: 410-430 nm) and humic acid-like (ex: 370-390 nm/em: 460-480 nm) wavelength region. This demonstrates that the hydrophobic humic-based fluorescence was relatively high. High fluorescence intensity was observed in WWTP NOM at wavelengths of ex: 220 nm/em: 300-380 nm and ex: 275 nm/em: 325 nm. This demonstrates that the organic components of microorganisms and proteins were abundant due to the metabolic activity of aerobic microorganisms in the effluent, which is treated using the activated sludge method.  The HIX, BIX, and FI were calculated based on EEM data (Table 3). When the HIX value is <5, the sample is undecomposed organic matter [11]. The HIX values of the collected sample ranged from 1.75 to 3.39, showing values < 5. The ND NOM and WWTP NOM have HIX values of 2.63 and 1.75, respectively, indicating relatively low humification. However, the wetland sample had the highest HIX value of 3.39. This indicates that the wetland sample had a higher accumulation of refractory organic matter compared to the other sites. BIX is known to have a dominant autochthonous DOM of microbial origin if its value is >1. However, terrestrial organic matter is considered to be a major substance if it is <0.6 [28]. The BIX values at all sampling points ranged from 1.08-1.54. The content The HIX, BIX, and FI were calculated based on EEM data (Table 3). When the HIX value is <5, the sample is undecomposed organic matter [11]. wetland sample had a higher accumulation of refractory organic matter compared to the other sites. BIX is known to have a dominant autochthonous DOM of microbial origin if its value is >1. However, terrestrial organic matter is considered to be a major substance if it is <0.6 [28]. The BIX values at all sampling points ranged from 1.08-1.54. The content of autochthonous organic matter was higher than that of terrestrial organic matter in both ND NOM and WWTP NOM. However, wetland NOM showed the lowest values because of its high content of terrestrial organic matter. FI, which is an index for estimating the origin of organic matter, is an allochthonous organic matter of terrestrial origin when its value is closer to 1 and autochthonous organic matter of biological origin when it is closer to 2 [11].
The FI values of each point were 0.53-1.13. These results are similar to those of previous data in this study. The FI value of WWTP NOM was the highest (1.12), which reflects the biological wastewater treatment process at the sampling point.

Conclusions
This study investigated the molecular-level characteristics of NOM in the Nakdong River, Jinhae WWTP effluent, and Mulyeongari-oreum wetland to determine their origin.
(1) Py-GCMS analysis for each site showed the distribution of polysaccharides and protein components was high in ND NOM owing to the influence of plants and microorganisms. The NOM wetlands showed a high distribution of refractory and aromatic organic matter, such as lignin and PHA. The effluent from the WWTP NOM had the highest distribution of chlorine-based organic compounds due to chlorination, and microorganisms and protein-based compounds were high by the activated sludge process. In this study, different sources of NOM were identified using SEC, EEM, and py-GCMS techniques. It was determined that molecular-level characterization is useful for predicting the origin of NOM. The results are expected to provide basic data for water quality management in terms of the aquatic ecosystem and water treatment.