Low-Input Crops as Lignocellulosic Feedstock for Second-Generation Bioreﬁneries and the Potential of Chemometrics in Biomass Quality Control

Featured Application: 1. The utilization of so-called low-input crops (i.e., Miscanthus grasses and fast-growing trees) as lignocellulosic feedstock for second generation bioreﬁneries. 2. Lignin and lignin-derived materials as agrochemical products. 3. Chemometric methods to be used for fast and e ﬃ cient lignocellulose feedstock (LCF) quality control. Abstract: Lignocellulose feedstock (LCF) provides a sustainable source of components to produce bioenergy, biofuel, and novel biomaterials. Besides hard and soft wood, so-called low-input plants such as Miscanthus are interesting crops to be investigated as potential feedstock for the second generation bioreﬁnery. The status quo regarding the availability and composition of di ﬀ erent plants, including grasses and fast-growing trees (i.e., Miscanthus , Paulownia ), is reviewed here. The second focus of this review is the potential of multivariate data processing to be used for biomass analysis and quality control. Experimental data obtained by spectroscopic methods, such as nuclear magnetic resonance (NMR) and Fourier-transform infrared spectroscopy (FTIR), can be processed using computational techniques to characterize the 3D structure and energetic properties of the feedstock building blocks, including complex linkages. Here, we provide a brief summary of recently reported experimental data for structural analysis of LCF biomasses, and give our perspectives on the role of chemometrics in understanding and elucidating on LCF composition and lignin 3D structure.


Introduction
Global economic and ecological challenges of the twentieth century, such as limited fossil resources, climate change due to greenhouse gas emissions, and the global energy demand, are driving forces for
Statistics show that 170 million metric tons of lignocellulose is produced annually, while no more than 5% of these LCF components are exploited, mainly due to a significant recalcitrance caused by the lignin [17]. Biorefining is an important option to carry out innovative valorization of lignocellulosic materials, which has triggered intense research on how to convert lignins into target chemicals and fuels. LCF sources for biorefinery use include soft and hard wood, lignocellulose-rich grasses, and agroforestry waste. The market for bio-based products is expected to increase to €50 million by 2030 (average annual growth rate of 4%) [13]. Schematic diagram shows the differences between lignocellulosic feedstocks from the first and second generations: sources, valorization processes, and end products. Reproduced with permission from [13], Elsevier, 2019.
According to a recently reported market study, until 2023 an annual growth rate of 2% is predicted for the global lignin market, resulting in an increase of the total market size from €800,500,000 in 2017 to €904,500,000 in 2023 [18,19]. Among the most interesting products generated from lignocellulosic biomasses are biofuel and bioethanol. Here, we focus the isolation and application of lignins obtained from LCF biomasses. Lignin is mainly studied as a polyol-substitute for polyurethane and resin production, but also as an electrode material for sustainable electrochemical energy storage [20].
Lignocellulosic biomasses are rather resistant to enzymatic and chemical hydrolysis and therefore require harsh reaction conditions (i.e., strong acids or bases). LCF pretreatment and pulping results in the separation of cellulose/hemicellulose and lignin. Depending on the pulping process, the macromolecular lignin is partially degraded. In their review articles, Rinaldi et al. and Schutyser et al. discussed lignin depolymerization strategies (catalyzed reductive and oxidative cleavage, respectively) and correlated mechanisms in order to produce lignin oligomeric fragments, such as phenol derivatives, to be used for further polymerization [21,22].
In general, the detailed 3D lignin structure (monolignol ratio and linkages) depends on a number of different parameters: the biomass source and crop genotype/phenotype, due to different biosynthesis pathways (i.e., soft and hard wood, grasses), and the pulping process (e.g., kraft, steam explosion, organosolv). Figure 3 shows the most common lignin linkages formed during biosynthesis, some of which having been elucidated within the last five years [23,24]. Table 1 shows average values for monolignol linkages found for hard/soft wood and grasses [25][26][27]. These structural differences are rather difficult to specify by conventional analytical methods using data univariate processing, due to signal overlapping in spectral data.
Hirayama et al. studied the ratio of biphenyl fragments (5-5' linkages) of different biomasses (six softwoods and 15 hardwoods) [28]. Table 1. Abundance of linkages in lignins of soft and hard wood and Miscanthus grasses, including KOH-extractable and non-KOH-extractable, in percentages. Reprinted from [27] under open access license. A focus of lignin-derived materials includes novel bio-based polymers, such as polyurethanes [29][30][31][32][33][34], as coatings and/or foams for construction applications. In addition, the bioactivity of lignins is widely studied, including antioxidant, antiviral, and antimicrobial activity [35][36][37][38]. In order to obtain valuable oligomer fragments, the macromolecular lignin structure is depolymerized using various strategies, including oxidative and reductive depolymerization via homo-and heterogeneous catalysis, ozonolysis, and photolysis [21,22,39,40]. Very recently, Renders and colleagues reported the concept of a so-called "lignin-first biorefinery", which is based on a reductive catalytic fractionation (RCF) of lignocellulose biomass. The RCF procedure combines a lignin catalytic depolymerization with fractionation of the degraded low molecular weight lignin Reprinted from [27] under open access license. By 2023, the lignin market volume is expected to increase up to 18 million tons and US$6.0 billion [18,19]. In particular, the kraft lignin market volume will increase up to 125 kilo tons by 2021 and more than US$5 billion. For example, in North America the lignin market is dominated by lignosulfonates used as concrete and cement flow improver. Europe is the second largest market for lignin. Unlike North America, the focus is directed to lignin-based materials (end-use industry). The lignin market is segmented on the basis of product type, application type, and geographical analysis. By product type, this market is segmented on the basis of lignosulfonates, Kraft lignin, Organosolv lignin, and high purity lignin. Today, lignocellulose-rich biomasses, including agrochemical waste, are processed all over the world in commercial mills, demonstration plants, and pilot scale facilities, to produce pulp, paper, lignin, and various LCF-derived chemicals ( Table 2) [42][43][44][45][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60][61].

Reported Techno-Economic Analysis Studies
Currently, there are a number of techno-economic analysis studies reported including information about the economic value and environmental impact of single LCF products, such as bioethanol. For example, in 2019 Da Silva et al. published an assessment of different LCF pretreatment processes for bioethanol production. Taking into account five different pretreatment procedures of lignocellulosic biomass, the authors found that diluted acid is the best choice for bioethanol production, with an economic value of $39.2 million per year and an environmental impact of 83.9 kt CO 2 per year [62]. Patel et al. tried to quantify the production cost of biodiesel from agricultural waste, a comparative assessment recently reported [63]. Also in 2019, Albashabsheh et al. published their study on "mobile pelleting", a procedure applied to improve and optimize lignocellulosic biomass-to-biofuel supply chains. In particular, the authors investigated mobile pelleting machines (MPM) to minimize logistical costs and to find out at which point mobile densification becomes economically attractive. Therefore, they included about 20 different input parameters, like the type and price of biomass, densification and transport costs, storage capacity, and number of MPMs available [64]. A similar approach was reported by Srivastava et al. in 2019, to analyze costs for biofuel production [65].
In her PhD thesis, Karkee investigated the optimization and cost analysis of LCF supply chains. Considering corn stover as a by-product of grain production, the gate price of the biomass feedstock varies from $75 Mg −1 to $97 Mg −1 (depending on different factors, such as farm size, transport distance, and stover yield) [66]. Furthermore, the costs for harvesting and transport have been determined for different feedstocks (i.e., switchgrass). Quantification models were used which considered the number of machines, farm size, and biomass yields. Zhao et al. reported a Chinese market techno-economic analysis for the production of bioethanol. In particular, pretreatment using dilute acids and an enzymatic hydrolysis were studied for corn stover biomass. Using two different models, the authors calculated the plant-gate price for bioethanol and reported it to be $4.68-$6.05/gal following a biochemical conversion pathway. Thus, at this price point, ethanol from lignocellulose biomasses is still unable to compete with ethanol from fossil resources [67].
In their techno-economic analysis study reported in 2011, Gnansounou et al. comprehensively reviewed data for ethanol production from lignocellulosic feedstocks. They could identify and quantify some key parameters influencing the production costs, like type and composition of feedstock and its farm-gate price, conversion efficiency, the ethanol plant size, and the extent of investment costs, using three different types of cost management system, whereby the most significant contribution to the overall lignocellulosic bioethanol production costs is the biomass cost [68].

C 4 Grasses: Miscanthus
According to the European Common Agricultural Policy regulations, there are three so-called "greening measures", including maintenance of permanent pastures, crop diversification, and ecological focus areas (EFA) [69]. Thus, 5% of the land has to be specified as EFA by European farmers. Very recently, Miscanthus (an analogue to other perennial crops, such as short rotation coppice) was listed as an eligible EFA crop. Miscanthus genotypes combine different advantages, such as biodiversity and a significant greenhouse gas emission reduction [70][71][72][73][74]. In 2019, John Clifton-Brown et al. reported a detailed study of the breeding progress of various lignocellulose-rich biomasses, including switchgrass, Miscanthus, willow, and poplar crops [75,76].
Bergs et al. studied both the crop composition and detailed chemical structure of the corresponding Miscanthus-derived lignins. In detail, harvest yields of six different Miscanthus genotypes have been studied and compared for the years 2015 and 2016 [26,27]. Here, M. nagara showed the highest yields compared to various M. x giganteus samples, with M. robustus and M. sinensis having lowest values of all different genotypes.
Miscanthus crops belong to the group of perennial C 4 plants. Unlike C 3 plants, which produce D-3-phosphoglycerate, C 4 plants generate oxaloacetate, which is correlated with a significant effect on carbon sequestration [77][78][79]. Due to a rather low level of required water and fertilizer they are called low-input crops [80][81][82]. Figure 4 shows fields with different Miscanthus genotypes, cultivated at the Campus Klein-Altendorf in Rheinbach, Germany. Miscanthus crops are rather tall (up to four meters), yielding up to 25 t/ha. Miscanthus crops belong to the group of perennial C4 plants. Unlike C3 plants, which produce D-3-phosphoglycerate, C4 plants generate oxaloacetate, which is correlated with a significant effect on carbon sequestration [77][78][79]. Due to a rather low level of required water and fertilizer they are called low-input crops [80][81][82]. Figure 4 shows fields with different Miscanthus genotypes, cultivated at the Campus Klein-Altendorf in Rheinbach, Germany. Miscanthus crops are rather tall (up to four meters), yielding up to 25 t/ha. The advantages of perennial plants in general are rather low production costs, due to less tillage [83][84][85][86]. Kraska et al. recently reported the cascade utilization of Miscanthus, including exploitation of the stalks and fibers, as well as the leaves [87,88]. Other research groups reported the utilization of Miscanthus crops for the production of bioethanol [89], hydrogen [90], and other chemicals, including polymers and composites [91][92][93][94][95]. Although there is a huge number of published studies, very few systematic studies are available about Miscanthus-derived lignins [96][97][98][99][100][101][102][103]. Van der Weijde determined the cell wall composition of eight different M. sinensis samples [104]. Various authors reported the enzymatic depolymerization of Miscanthus-derived lignins, such as Baker, Ion, and Sonnenberg [105][106][107]. However, all of these studies exclusively focused on crop composition analysis (lignin ratio and distribution), but no details were reported regarding the detailed lignin structure.

Fast Growing Trees: Paulownia, Eucalyptus, and Pinus
Due to recent efforts in biorefinery development, fast growing trees attract more and more attention for study as an industrial crop. Besides bamboo, poplar, Eastern cottonwood, giant sequoia, and acacia (not discussed here), Eucalyptus, pine, and Paulownia belong to the fast growing lignocellulose-rich crops that are currently under investigation to be used as potential feedstock for second-generation biorefineries. Compared to conventional trees, the growing cycles (silviculture rotations) of fast-growing trees are below 15 years, thereby offering environmental and/or genetic manipulation [108].
One prominent example is the fast-growing Paulownia tree, originally cultivated in Asia, mainly in China and other tropical and sub-tropical regions, and characterized by a low demand for water. Paulownia trees grow quickly, reaching 10 to 20 m in height and 30-40 cm in diameter in less than ten years. Ye and colleagues reported a study on Paulownia tomentosa, a genotype that reaches 30-40 cm in diameter within five years ( Figure 5) [109]. The advantages of perennial plants in general are rather low production costs, due to less tillage [83][84][85][86]. Kraska et al. recently reported the cascade utilization of Miscanthus, including exploitation of the stalks and fibers, as well as the leaves [87,88]. Other research groups reported the utilization of Miscanthus crops for the production of bioethanol [89], hydrogen [90], and other chemicals, including polymers and composites [91][92][93][94][95]. Although there is a huge number of published studies, very few systematic studies are available about Miscanthus-derived lignins [96][97][98][99][100][101][102][103]. Van der Weijde determined the cell wall composition of eight different M. sinensis samples [104]. Various authors reported the enzymatic depolymerization of Miscanthus-derived lignins, such as Baker, Ion, and Sonnenberg [105][106][107]. However, all of these studies exclusively focused on crop composition analysis (lignin ratio and distribution), but no details were reported regarding the detailed lignin structure.

Fast Growing Trees: Paulownia, Eucalyptus, and Pinus
Due to recent efforts in biorefinery development, fast growing trees attract more and more attention for study as an industrial crop. Besides bamboo, poplar, Eastern cottonwood, giant sequoia, and acacia (not discussed here), Eucalyptus, pine, and Paulownia belong to the fast growing lignocellulose-rich crops that are currently under investigation to be used as potential feedstock for second-generation biorefineries. Compared to conventional trees, the growing cycles (silviculture rotations) of fast-growing trees are below 15 years, thereby offering environmental and/or genetic manipulation [108].
One prominent example is the fast-growing Paulownia tree, originally cultivated in Asia, mainly in China and other tropical and sub-tropical regions, and characterized by a low demand for water. Paulownia trees grow quickly, reaching 10 to 20 m in height and 30-40 cm in diameter in less than ten years. Ye and colleagues reported a study on Paulownia tomentosa, a genotype that reaches 30-40 cm in diameter within five years ( Figure 5) [109]. Paulownia samples were cultivated in the Shanxi province in China. The authors used enzymatic hydrolysis for biomass pulping, resulting in a ratio of about 42% cellulose, 20% hemicellulose, and 20% lignin [110,111]. Prior to enzymatic hydrolysis, various pre-treatment methods had been investigated (i.e., using dilute acid, alkali, and alkali supported by ultrasonic pretreatment, with the last one being the most efficient method).
Ashori and colleagues studied Iranian-cultivated Paulownia fortunei L. fibers, with a specific focus on their chemical and morphological characteristics. Results showed that Iranian Paulownia fortunei L. consisted of holocellulose, alpha-cellulose (about 52%), lignin (about 25%), and further extractives (about 15%, isolated from basic media). In addition, the authors determined fiber characteristics (i.e., length, width, cell wall thickness). Of special interest and a focus of scientific investigations is the fibrous parenchyma, a promising raw material for paper of high density, due to the material having a high tensile strength [112].
Zahedi et al. studied the polypropylene (PP) filler additives used to reinforce the polymer bulk. The studied samples included canola, paulownia, and nanoclay fillers in varying concentrations (3 and 5 wt%). Compared to canola and nanoclay fillers, Paulownia particles significantly improved the mechanical properties of the studied composites. Transmission electron microscopy and X-ray diffraction were used to specify the final polymer morphology and filler dispersion within the polymer matrix [113].
Besides Paulownia, Eucalyptus, and Pine are further examples of fast-growing trees. Pertuzzatti et al. recently reported a study on thermomechanical densification influenced by process parameters of two different crops: Eucalyptus grandis and Pinus elliottii [114]. Samples of both woods showed comparable densities and mechanical strength. Most obviously, significant differences resulted from differences in crop composition. Thus, the Eucalyptus hemicellulose (in difference to Pine) mainly consists of xylose of a higher degree of acetylation, that is more susceptible to degradation. Nevertheless, Eucalyptus samples showed densities close to 1.0 g·cm −3 and improved mechanical properties (i.e., bending, hardness, impact resistance) after pre-treatment.

Cup Plants: Silphium Perfoliatum
Unlike Miscanthus, Silphium perfoliatum L. belongs to the class of perennial C3 plants, with characteristic yellow flowers ( Figure 6). Originally, it was cultivated in North America and then brought to Europe in the 18th century. Currently, Silphium crops are established and distributed all over the world, including North and South America (Chile, USA), Asia (China, Japan), and Europe (France, Switzerland, Romania, Czech Republic, Germany, Hungary, Poland, Austria, Russia), with the plants mainly being investigated as a raw material for biogas, biofuel, and chemical production. The advantages of these plants as a raw material are the low maintenance requirements, optimal growth (even in arid conditions), and high yields ( Figure 6) [115][116][117]. Paulownia samples were cultivated in the Shanxi province in China. The authors used enzymatic hydrolysis for biomass pulping, resulting in a ratio of about 42% cellulose, 20% hemicellulose, and 20% lignin [110,111]. Prior to enzymatic hydrolysis, various pre-treatment methods had been investigated (i.e., using dilute acid, alkali, and alkali supported by ultrasonic pretreatment, with the last one being the most efficient method).
Ashori and colleagues studied Iranian-cultivated Paulownia fortunei L. fibers, with a specific focus on their chemical and morphological characteristics. Results showed that Iranian Paulownia fortunei L. consisted of holocellulose, alpha-cellulose (about 52%), lignin (about 25%), and further extractives (about 15%, isolated from basic media). In addition, the authors determined fiber characteristics (i.e., length, width, cell wall thickness). Of special interest and a focus of scientific investigations is the fibrous parenchyma, a promising raw material for paper of high density, due to the material having a high tensile strength [112].
Zahedi et al. studied the polypropylene (PP) filler additives used to reinforce the polymer bulk. The studied samples included canola, paulownia, and nanoclay fillers in varying concentrations (3 and 5 wt%). Compared to canola and nanoclay fillers, Paulownia particles significantly improved the mechanical properties of the studied composites. Transmission electron microscopy and X-ray diffraction were used to specify the final polymer morphology and filler dispersion within the polymer matrix [113].
Besides Paulownia, Eucalyptus, and Pine are further examples of fast-growing trees. Pertuzzatti et al. recently reported a study on thermomechanical densification influenced by process parameters of two different crops: Eucalyptus grandis and Pinus elliottii [114]. Samples of both woods showed comparable densities and mechanical strength. Most obviously, significant differences resulted from differences in crop composition. Thus, the Eucalyptus hemicellulose (in difference to Pine) mainly consists of xylose of a higher degree of acetylation, that is more susceptible to degradation. Nevertheless, Eucalyptus samples showed densities close to 1.0 g·cm −3 and improved mechanical properties (i.e., bending, hardness, impact resistance) after pre-treatment.

Cup Plants: Silphium Perfoliatum
Unlike Miscanthus, Silphium perfoliatum L. belongs to the class of perennial C 3 plants, with characteristic yellow flowers ( Figure 6). Originally, it was cultivated in North America and then brought to Europe in the 18th century. Currently, Silphium crops are established and distributed all over the world, including North and South America (Chile, USA), Asia (China, Japan), and Europe (France, Switzerland, Romania, Czech Republic, Germany, Hungary, Poland, Austria, Russia), with the plants mainly being investigated as a raw material for biogas, biofuel, and chemical production. The advantages of these plants as a raw material are the low maintenance requirements, optimal growth (even in arid conditions), and high yields ( Figure 6) [115][116][117]. Silphium crops are discussed as promising candidates for biogas production. According to Gansberger et al., the annual harvest yield can reach about 10 to 15 t per ha. Compared to maize, the biomethane production is 20% lower. However, so far there are a very limited number of studies and a lot of questions to be answered regarding the potential of these plants as lignocellulose feedstock. Thus, a seed technology must be developed, pathogen susceptibility has to be checked, and a suitable herbicide for weed management during the first cultivation year is most probably required [118].
The Lithuanian Research Centre for Agriculture and Forestry in Western Lithuania performed a field study reported by Šiaudinis and colleagues-the authors cultivated various perennial coarse-stemmed herbaceous energy plants, including mugwort (Artemisia vulgaris L.) and cup plant (Silphium perfoliatum L.). For their field trial, the authors used a two-factor design, including three levels of liming (not limed versus limed, using CaCO3 in different concentrations) and nitrogen as the fertilizer in varying concentrations, to study the influence of these parameters on the cup plant dry matter productivity. Results showed that both fertilizer and lime significantly influence (decrease) the energy output and energy use efficiency [119]. So far, Silphium perfoliatum L. has been studied in detail regarding its utilization as an additive for food and pharmaceuticals and as raw materials for bioenergy and biofuel production [119,120].
In another study, Klímek and colleagues investigated the exploitation of agricultural crop residues as renewable sources for particleboard production. The following samples were studied: cup-plant (Silphium perfoliatum L.), sunflower (Helianthusannuus L.), and topinambour (Helianthus tuberosus L.). Particleboards of 600 kg/m 3 density were produced using different adhesives (methylene diphenyl diisocyanate, urea formaldehyde resin). Various physical and mechanical properties of the final boards were measured, including rupture modulus, thickness, swelling, and water absorption. Based on the obtained data, the authors concluded that agricultural crop residues can be used for particleboard and furniture production, meeting European standard EN 312 class P1 [121].
Papadopoulos et al. studied the exploitation of sunflower stalks as an alternative raw material for particleboards. As a pretreatment method, acetylation was conducted, to increase the thickness swelling (TS) of the boards. Thus, up to 19.7% weight gain could be obtained. Unfortunately, the introduction of acetyl functionalities resulted in a decrease in the internal bond strength. The authors concluded that a mixture of industrial wood chips and sunflower stalks might be appropriate to improve the particleboard specifications [122]. Silphium crops are discussed as promising candidates for biogas production. According to Gansberger et al., the annual harvest yield can reach about 10 to 15 t per ha. Compared to maize, the biomethane production is 20% lower. However, so far there are a very limited number of studies and a lot of questions to be answered regarding the potential of these plants as lignocellulose feedstock. Thus, a seed technology must be developed, pathogen susceptibility has to be checked, and a suitable herbicide for weed management during the first cultivation year is most probably required [118].

LCF Structure Analysis and Quality Control
The Lithuanian Research Centre for Agriculture and Forestry in Western Lithuania performed a field study reported by Šiaudinis and colleagues-the authors cultivated various perennial coarse-stemmed herbaceous energy plants, including mugwort (Artemisia vulgaris L.) and cup plant (Silphium perfoliatum L.). For their field trial, the authors used a two-factor design, including three levels of liming (not limed versus limed, using CaCO 3 in different concentrations) and nitrogen as the fertilizer in varying concentrations, to study the influence of these parameters on the cup plant dry matter productivity. Results showed that both fertilizer and lime significantly influence (decrease) the energy output and energy use efficiency [119]. So far, Silphium perfoliatum L. has been studied in detail regarding its utilization as an additive for food and pharmaceuticals and as raw materials for bioenergy and biofuel production [119,120].
In another study, Klímek and colleagues investigated the exploitation of agricultural crop residues as renewable sources for particleboard production. The following samples were studied: cup-plant (Silphium perfoliatum L.), sunflower (Helianthusannuus L.), and topinambour (Helianthus tuberosus L.). Particleboards of 600 kg/m 3 density were produced using different adhesives (methylene diphenyl diisocyanate, urea formaldehyde resin). Various physical and mechanical properties of the final boards were measured, including rupture modulus, thickness, swelling, and water absorption. Based on the obtained data, the authors concluded that agricultural crop residues can be used for particleboard and furniture production, meeting European standard EN 312 class P1 [121].
Papadopoulos et al. studied the exploitation of sunflower stalks as an alternative raw material for particleboards. As a pretreatment method, acetylation was conducted, to increase the thickness swelling (TS) of the boards. Thus, up to 19.7% weight gain could be obtained. Unfortunately, the introduction of acetyl functionalities resulted in a decrease in the internal bond strength. The authors concluded that a mixture of industrial wood chips and sunflower stalks might be appropriate to improve the particleboard specifications [122].

Spectroscopic Data Processing Using Chemometric Methods for Biomass Analysis
Modern literature on the use of machine learning methods in chemical analysis (chemometrics) is, in general, quite extensive and diverse. In recent years, a large number of reviews have been published on individual methods and analyzed objects [123][124][125][126][127]. However, the number of studies using chemometric methods, against the background of the total number of analytical works, is still extremely small. Furthermore, even less work has been done that utilizes chemometrics for studying LCF. Iqbal and Lewandowski investigated the inter-annual variation in biomass yield and composition in a multi-genotype trial planted in southern Germany, focusing on climatic conditions (i.e., rainfall, temperature) and harvest dates [128]. Chemometric methods, such as multivariate regression analysis, were used to study correlations between harvesting time and rainfall. Boeriu et al. combined Fourier-transform infrared spectroscopy (FTIR) and principal component analysis (PCA) for the classification of the botanical origin of lignins [129]. Regression models (e.g., partial least squares, PLS) resulted in the accurate determination of phenolic hydroxyl groups, which could then be correlated to antioxidant capacity. Chen et al. used multivariate methods to process their experimental FTIR data obtained for various wood samples [130]. Results showed root-mean-square errors for all three LCF components, lignin, cellulose, and hemicellulose, of 1.51%, 0.96%, and 0.62%, respectively. Very recently, Lancefield et al. reported a study on lignin 3D structure analysis using attenuated total reflection (ATR)-FTIR analysis combined with PCA and PLS modeling. In addition, the obtained quantitative results were comparable to gel-permeation chromatography (GPC) and 2D heteronuclear single quantum coherence (HSQC) nuclear magnetic resonance (NMR) methods [131].
Thus, only classical chemometric methods have been used for the modeling of predominately FTIR data, leaving open many interesting topics for research. For example, nothing is known about the application of calibration transfer methods in LCF analysis, or the application of novel algorithms, such as independent component analysis (ICA), to improve existing chemometric models. The same is applicable for the complementary vibrational Raman spectroscopy, which gives important insights into a polymer's structure and its characteristics. These data also require multivariate methods for the data interpretation, due to overlapping peaks of polymers present in the data that cannot be interpreted without machine learning techniques.
Moreover, despite the obvious interest in multivariate modeling showed by some groups, there is no uniform methodology for applying machine learning methods in the analytical chemistry of LCF. It is also clear, however, that given the current level of automation, the amount of measured information, and throughput of analytical equipment, chemometrics should become an integral part of the analytical chemistry of natural polymers such as LCF.
The implementation of chemometrics can be helpful in different aspects of polymer analytical science. For example, up to now the determination of the molecular weight (MW), corresponding distribution (MWD), and polydispersity (PD) of natural macromolecular structures is usually performed via GPC (gel permeation chromatography) or SEC (size exclusion chromatography) using polystyrene (PS) or polymethyl methacrylate (PMMA) standards. Due to the complex and unique 3D structure of natural polymers (particularly lignin), the hydrodynamic volume usually differs significantly between standards and analytes [132]. Therefore, universal calibration or additional methods (i.e., osmometry, light scattering) have to be used in order to determine MW and polydispersity.
In general, experimental measurements can be replaced by multivariate models based on the modeling of spectroscopic data that possesses information about the molecular weight distribution of polymers (e.g., diffusion-ordered nuclear magnetic resonance, DOSY NMR). Other unexplored tasks include the evaluation of polymer linkages by using 2D NMR spectroscopy (HSQC, and heteronuclear multiple bond correlation, HMBC) and chemometrics, determination of the hydroxyl number, and total phenolic content, by spectroscopic techniques and others. Theoretical modeling can provide additional insights into the structure of lignin building blocks. Concerning existing instrumental techniques, no single analytical technique has been more comprehensively employed for the evaluation of LCF structure than NMR [21,[23][24][25]. Yet, there is no example of multivariate techniques for resolving overlapping peaks in 1D and 2D NMR profiling of LCF, or multivariate modeling of specific 31 P and 13 C NMR profiles. Doing so will bring additional important insights into the polymer structure, and enable the construction of multivariate models for the determination of important LCF qualitative characteristics, such as crop genotype/phenotype and geographical origin.
X-ray fluorescence analysis (XRF) is a rarely used analytical tool for LCF, although it is an attractive method for performing inorganic elemental analysis [133]. Even if LCF is mainly composed from organic matter and light elements that cannot be detected directly with XRF, an application of chemometric techniques to the scattering XRF profile may provide valuable information on integral LCF parameters. In our ongoing research, XRF (in addition to spectroscopic methods) is used for quantitative biomass analysis, with respect to heavier elements that can be a marker of certain features and in combination with machine learning methods for ascertaining the type and origin of LCF. In Table 3, a variety of studies reporting the structure and composition analysis of LCF, using experimental analytical methods combined with multivariate data processing, are summarized [134][135][136][137][138][139][140][141][142][143]. Quantitative visualization of lignocellulose components.
FTIR macro-and micro-spectroscopy Partial least-squares regression (PLSR) and a Montecarlo sampling method (MSM) were used to establish the quantitative determination model of lignocelluloses. [136]
Crop content prediction of hemicellulose, cellulose (sugar content) and lignin.
1D and 2D NMR spectroscopy (i.e., as 1H-1H TOCSY, 1H-13C HSQC, 1H-13C HMBC) The experimental NMR data were processed using the PLS-DA model. The prediction of hemicellulose showed errors up to 22%, while for the other two components the errors are in all the cases below 1%. Discriminant buckets from a PLS-DA model combined with linear models provided a useful and rapid tool for the determination of cell wall composition. [139]

Prediction of different chemical-physical
parameters of woodchip and pellet samples, such as moisture content, net calorific value, ash content and gross calorific value of woodchip samples.
Vis-NIR spectroscopy with and without sample pre-treatment (i.e., grinding or stabilization at 40 • C for 24 h) Visible NIR data were processed using partial least square regression to predict various chemical-physical parameters of wood-chips and pellets correlated to biofuel quality. Best results were obtained considering only the near IR region.

Linkage abundance and molecular weight characteristics of technical lignins
Attenuated Total Reflection-FTIR, gel-permeation chromatography (GPC) and nuclear magnetic resonance (NMR) for structure analysis of technical lignins.
Principal component analysis and partial least square modelling (using PLS_Toolbox v. 8.6, Eigenvector) in Matlab. Spectra were pre-processed using baseline correction, normalization and mean-centering. Results clearly showed similarities and deviations for the 54 lignins correlating to their botanic origin and pulping process (used for isolation). [131]

Chemometrics Used for Ligocellulose Feedstock Specification
Within the last five years, a tremendous number of LCF analysis studies have been reported, some of which include chemometric data processing (Table 3) [131,134]. For example, in 2014 Da Costa et al. reported an LCF cell-wall analysis study, including 25 Miscanthus genotypes of different developmental stages separated into stem and leaf portions. In detail, the authors combined mid-infrared spectroscopy with PCA in order to quantify the differences in cell-wall composition of stem and leaf-derived Miscanthus samples, which are in turn associated with different structural carbohydrates (Figure 7) [134].

Chemometrics Used for Ligocellulose Feedstock Specification
Within the last five years, a tremendous number of LCF analysis studies have been reported, some of which include chemometric data processing (Table 3) [131,134]. For example, in 2014 Da Costa et al. reported an LCF cell-wall analysis study, including 25 Miscanthus genotypes of different developmental stages separated into stem and leaf portions. In detail, the authors combined mid-infrared spectroscopy with PCA in order to quantify the differences in cell-wall composition of stem and leaf-derived Miscanthus samples, which are in turn associated with different structural carbohydrates (Figure 7) [134].  [135]. In detail, ash, silicon, nitrogen, potassium, phosphorous, calcium, chloride, and sulfur content, and the heating value of the grasses were determined. Compared to switchgrass and reed canary grass, Miscanthus genotypes showed significantly lower ash contents (1.6% to 4.0%, compared to 1.9% to 10.5% and 11.5%, respectively).
Li and colleagues studied various moso bamboo samples, with regard to crop composition and ratio of cellulose versus hemicellulose and lignin, respectively. The samples (15 stalks of five ages) were collected from three different sites in China, including Jingning and Guangan counties, Sichuan Province. FTIR macro-and micro-spectroscopic imaging techniques, combined with chemometric processing (using partial least-squares regression (PLSR) and Monte Carlo sampling to  [135]. In detail, ash, silicon, nitrogen, potassium, phosphorous, calcium, chloride, and sulfur content, and the heating value of the grasses were determined. Compared to switchgrass and reed canary grass, Miscanthus genotypes showed significantly lower ash contents (1.6% to 4.0%, compared to 1.9% to 10.5% and 11.5%, respectively).
Li and colleagues studied various moso bamboo samples, with regard to crop composition and ratio of cellulose versus hemicellulose and lignin, respectively. The samples (15 stalks of five ages) were collected from three different sites in China, including Jingning and Guangan counties, Sichuan Province. FTIR macro-and micro-spectroscopic imaging techniques, combined with chemometric processing (using partial least-squares regression (PLSR) and Monte Carlo sampling to identify abnormal data), have been applied for quantitative analysis of moso bamboo crop composition [136].
Uddin et al. investigated the cellulose and hemicellulose content (in particular alpha-cellulose and pentosan), as well as properties such as pulp viscosity, of dissolving jute pulp, using wet chemical analysis and various spectroscopic methods (i.e., FTIR, UV-Vis) combined with chemometric data processing. The authors were able to develop a fast and reliable procedure to quantify the abovementioned biomass parameters of dissolving pulp, with the help of simple and fast spectroscopic nondestructive methods combined with chemometric data processing [137].
Colares and colleagues used Raman spectral imaging to specify the ratio of cellulose and lignin in surfaces of various trees (i.e., Swietenia macrophylla King, Mahogany/Eucalyptus hybrid, E. urophylla × E. camaldulensis). They used a multivariate 'curve resolution' procedure to calculate the relative concentration maps and simulate the Raman spectra for cellulose and lignins (finding good correlations with literature data). For all samples, the lignin concentration varied between 20% and 45% for the Eucalyptus samples and some higher values for the Mahogany tree (depending on the local origin). The authors aimed to show that Raman image spectroscopy combined with chemometric data analysis (i.e., multivariate curve resolution-alternating least squares MCR-ALS) is an appropriate tool for final specification of the cellulose/lignin ratio in Mahogany and Eucalyptus hybrids [138].
Aguilera-Saeza et al. very recently reported the structural analysis to determine the ratio of cellulose, hemicellulose, and lignin of eight different greenhouse crop residues, namely Cucurbita pepo, Cucumis sativus, Solanum melongena, Solanum lycopersicum, Phaseolus vulgaris, Capsicum annuum, Citrullus vulgaris Schrad., and Cucumis melo, using chemometrics in NMR spectroscopy. In detail, the authors were able to specify correlations of metabolite profiles and cell wall composition using a PLS-DA (partial least square-discriminant analysis) and linear regression models (Figure 8) [139]. For reliability verification, composition analysis was also performed, according to the National Renewable Energy Laboratory (NREL) procedure, as control experiments.
Appl. Sci. 2018, 8, x FOR PEER REVIEW 2 of 30 identify abnormal data), have been applied for quantitative analysis of moso bamboo crop composition [136]. Uddin et al. investigated the cellulose and hemicellulose content (in particular alpha-cellulose and pentosan), as well as properties such as pulp viscosity, of dissolving jute pulp, using wet chemical analysis and various spectroscopic methods (i.e., FTIR, UV-Vis) combined with chemometric data processing. The authors were able to develop a fast and reliable procedure to quantify the abovementioned biomass parameters of dissolving pulp, with the help of simple and fast spectroscopic nondestructive methods combined with chemometric data processing [137].
Colares and colleagues used Raman spectral imaging to specify the ratio of cellulose and lignin in surfaces of various trees (i.e., Swietenia macrophylla King, Mahogany/Eucalyptus hybrid, E. urophylla × E. camaldulensis). They used a multivariate 'curve resolution' procedure to calculate the relative concentration maps and simulate the Raman spectra for cellulose and lignins (finding good correlations with literature data). For all samples, the lignin concentration varied between 20% and 45% for the Eucalyptus samples and some higher values for the Mahogany tree (depending on the local origin). The authors aimed to show that Raman image spectroscopy combined with chemometric data analysis (i.e., multivariate curve resolution-alternating least squares MCR-ALS) is an appropriate tool for final specification of the cellulose/lignin ratio in Mahogany and Eucalyptus hybrids [138].
Aguilera-Saeza et al. very recently reported the structural analysis to determine the ratio of cellulose, hemicellulose, and lignin of eight different greenhouse crop residues, namely Cucurbita pepo, Cucumis sativus, Solanum melongena, Solanum lycopersicum, Phaseolus vulgaris, Capsicum annuum, Citrullus vulgaris Schrad., and Cucumis melo, using chemometrics in NMR spectroscopy. In detail, the authors were able to specify correlations of metabolite profiles and cell wall composition using a PLS-DA (partial least square-discriminant analysis) and linear regression models (Figure 8) [139]. For reliability verification, composition analysis was also performed, according to the National Renewable Energy Laboratory (NREL) procedure, as control experiments. Woodchips and pellets of different plant species and origins have been studied by Mancini and colleagues in order to specify quantitative differences in their chemical composition. Methods used included wet chemical analysis and Vis-NIR (near infrared) spectroscopy, combined with chemometric data processing (i.e., PLS). The background for their study is the utilization of fast spectroscopy methods for biofuel combustion quality, i.e., moisture content, net calorific value, and Woodchips and pellets of different plant species and origins have been studied by Mancini and colleagues in order to specify quantitative differences in their chemical composition. Methods used included wet chemical analysis and Vis-NIR (near infrared) spectroscopy, combined with chemometric data processing (i.e., PLS). The background for their study is the utilization of fast spectroscopy methods for biofuel combustion quality, i.e., moisture content, net calorific value, and ash, according to EN ISO 17225. Chemometric data processing of the near infrared region delivered the best results [140].
Christou and colleagues investigated carob samples to specify their origin, using FTIR spectroscopy. With the help of PCA data processing, the authors were able to determine distinct groups, which could be assigned to the carob crop origin (Cyprus, Greece, Italy, Spain, Turkey, Jordan, and Palestine). In addition, chemometric methods, such as cluster analysis (CA), PLS, and Orthogonal Partial Least Square-Discriminant Analysis (OPLSDA), were applied, resulting in 95% confidence for origin specification (Figure 9) [141].
Appl. Sci. 2018, 8, x FOR PEER REVIEW 3 of 30 ash, according to EN ISO 17225. Chemometric data processing of the near infrared region delivered the best results [140]. Christou and colleagues investigated carob samples to specify their origin, using FTIR spectroscopy. With the help of PCA data processing, the authors were able to determine distinct groups, which could be assigned to the carob crop origin (Cyprus, Greece, Italy, Spain, Turkey, Jordan, and Palestine). In addition, chemometric methods, such as cluster analysis (CA), PLS, and Orthogonal Partial Least Square-Discriminant Analysis (OPLSDA), were applied, resulting in 95% confidence for origin specification (Figure 9) [141]. Boeriua et al. studied the fractionation of different technical lignins using selective extraction in green solvents. Five samples were investigated, including two soda-derived lignins (wheat straw and a mixture consisting of Sarkanda grass/wheat from Greenvalue SA, Switzerland), one organosolv lignin (Alcell, obtained from maple/birch/poplar hard wood mixture, Repap Technologies Inc./USA), a pine-derived kraft lignin "Indulin AT" (MeadWestvaco/USA), and a wheat straw-based lignin from a mild alkaline process (Technical University Dresden/Germany). The chemical composition was determined via 31 P NMR and corresponding data were processed using PCA, showing high heterogeneity ( Figure 10) [142]. The different extraction procedures resulted in distinct deviations in the functional group content. Structural information regarding the p-hydroxyphenyl (H), guajacol (G) and syringol unit (G/H/S) ratio and aliphatic OH content was obtained from PLS models based on FTIR data. Boeriua et al. studied the fractionation of different technical lignins using selective extraction in green solvents. Five samples were investigated, including two soda-derived lignins (wheat straw and a mixture consisting of Sarkanda grass/wheat from Greenvalue SA, Switzerland), one organosolv lignin (Alcell, obtained from maple/birch/poplar hard wood mixture, Repap Technologies Inc./USA), a pine-derived kraft lignin "Indulin AT" (MeadWestvaco/USA), and a wheat straw-based lignin from a mild alkaline process (Technical University Dresden/Germany). The chemical composition was determined via 31 P NMR and corresponding data were processed using PCA, showing high heterogeneity ( Figure 10) [142]. The different extraction procedures resulted in distinct deviations in the functional group content. Structural information regarding the p-hydroxyphenyl (H), guajacol (G) and syringol unit (G/H/S) ratio and aliphatic OH content was obtained from PLS models based on FTIR data.
Chen et al. investigated lignins of different origins using infrared spectroscopy to classify the botanical source. IR data were processed using PCA and partial least squares (PLS) regression to specify phenol-derived hydroxy groups, in order to draw correlation to the antioxidant activity of the lignins [130]. Chen et al. investigated lignins of different origins using infrared spectroscopy to classify the botanical source. IR data were processed using PCA and partial least squares (PLS) regression to specify phenol-derived hydroxy groups, in order to draw correlation to the antioxidant activity of the lignins [130].
Combining FTIR and multivariate data processing, Boeriu and Gosselink et al. examined a number of carob samples from seven different Mediterranean countries, using the first derivatives of the FTIR spectra, resulting in a confidence level of up to 95%. The contents of lignin, cellulose, and hemicellulose were determined. To do so, the authors processed a broad variety of input parameters, including wood species, resulting in root-mean-square errors of less than 1.51% [129,142,143].
Due to the fact that genotype and cultivation conditions significantly influence the 3D chemical structure of any crop, it is of importance to have access to specific plants. It should be emphasized that we do have unique access to well-defined LCF raw materials-crops cultivated at Campus Klein-Altendorf University Bonn, one of the largest field labs for Miscanthus cultivation in Europe (more than 30 genotypes), and further special biomasses, i.e., Silphium perfoliatum, Paulownia. In addition, there is also access to crops from specific harvesting seasons (i.e., September, December, April), specific years, and plant portions (leaf versus stem). Thus, in previous studies the correlation between crop genotype, harvesting time (year, season), plant portion and lignin amount, and 3D structure was investigated. For six different genotypes, the lignin content varies, as shown in Figure  11. Combining FTIR and multivariate data processing, Boeriu and Gosselink et al. examined a number of carob samples from seven different Mediterranean countries, using the first derivatives of the FTIR spectra, resulting in a confidence level of up to 95%. The contents of lignin, cellulose, and hemicellulose were determined. To do so, the authors processed a broad variety of input parameters, including wood species, resulting in root-mean-square errors of less than 1.51% [129,142,143].
Due to the fact that genotype and cultivation conditions significantly influence the 3D chemical structure of any crop, it is of importance to have access to specific plants. It should be emphasized that we do have unique access to well-defined LCF raw materials-crops cultivated at Campus Klein-Altendorf University Bonn, one of the largest field labs for Miscanthus cultivation in Europe (more than 30 genotypes), and further special biomasses, i.e., Silphium perfoliatum, Paulownia. In addition, there is also access to crops from specific harvesting seasons (i.e., September, December, April), specific years, and plant portions (leaf versus stem). Thus, in previous studies the correlation between crop genotype, harvesting time (year, season), plant portion and lignin amount, and 3D structure was investigated. For six different genotypes, the lignin content varies, as shown in Figure 11.
Based on this information, a decision can be made regarding the harvesting time (season) in order to obtain highest yields. In general, there are various advantages of Miscanthus cultivation: as a C 4 plant, the plants bind to four (instead of three) carbon atoms, resulting in an exceptional CO 2 fixation rate and high photosynthesis yields. Thus, Miscanthus crops are intensively studied for industrial exploitation, including lignin generation. Recently, we reported a systematic study showing strong correlations of the lignin structure with the Miscanthus genotype and plant portion (stem versus leaf) [26]. In detail, for lignins isolated via non-catalyzed organosolv, pulping the amount and linkages of the three monolignol building blocks (G, H, and S) was studied with different analytical methods (i.e., NREL protocol, FTIR, UV-Vis, HSQC-NMR, thermal gravimetric analysis (TGA), pyrolysis gas chromatography/mass spectrometry (GC/MS). The FTIR data have been processed using chemometric methods (i.e., principal component analysis). A comparison of beech wood and Miscanthus lignins could show that the Miscanthus-derived lignins showed lower molecular weight and narrow polydispersities (<1.5, compared to >2.5 for beech), most probably due to an increased homogeneity. Based on this information, a decision can be made regarding the harvesting time (season) in order to obtain highest yields. In general, there are various advantages of Miscanthus cultivation: as a C4 plant, the plants bind to four (instead of three) carbon atoms, resulting in an exceptional CO2 fixation rate and high photosynthesis yields. Thus, Miscanthus crops are intensively studied for industrial exploitation, including lignin generation. Recently, we reported a systematic study showing strong correlations of the lignin structure with the Miscanthus genotype and plant portion (stem versus leaf) [26]. In detail, for lignins isolated via non-catalyzed organosolv, pulping the amount and linkages of the three monolignol building blocks (G, H, and S) was studied with different analytical methods (i.e., NREL protocol, FTIR, UV-Vis, HSQC-NMR, thermal gravimetric analysis (TGA), pyrolysis gas chromatography/mass spectrometry (GC/MS). The FTIR data have been processed using chemometric methods (i.e., principal component analysis). A comparison of beech wood and Miscanthus lignins could show that the Miscanthus-derived lignins showed lower molecular weight and narrow polydispersities (<1.5, compared to >2.5 for beech), most probably due to an increased homogeneity.
The nature and ratio of different monolignol linkages has been studied in detail using heteronuclear single quantum correlation (HSQC) 2D-NMR. Results showed that leaves contain two-thirds of the G units, whereas in stems and mixtures the G content is rather low. Compared to G, H and S units were found to be highest in samples containing leaf and stem mixtures. Figure 12 shows the calculated ratio of the most abundant β-arylether (A) linkages (55-65%), followed by phenyl coumarane (B) and resinol (C) linkages. Stem-derived lignins mainly contain unsaturated esters (D) (ca. 30%). The concentration of residual carbohydrates was below the detection threshold, indicating the high purity of organosolv-derived lignins. April, respectively (arranged to follow the seasonal order from autumn to spring). Reprinted from [27] under open access license.
The nature and ratio of different monolignol linkages has been studied in detail using heteronuclear single quantum correlation (HSQC) 2D-NMR. Results showed that leaves contain two-thirds of the G units, whereas in stems and mixtures the G content is rather low. Compared to G, H and S units were found to be highest in samples containing leaf and stem mixtures. Figure 12 shows the calculated ratio of the most abundant β-arylether (A) linkages (55-65%), followed by phenyl coumarane (B) and resinol (C) linkages. Stem-derived lignins mainly contain unsaturated esters (D) (ca. 30%). The concentration of residual carbohydrates was below the detection threshold, indicating the high purity of organosolv-derived lignins.  (Gig17, Gig34, Gig35), M. nagara (NagG10), M. sinensis (Sin2), and M. robustus (Rob4), harvested in September, December, and April, respectively (arranged to follow the seasonal order from autumn to spring). Reprinted from [27] under open access license.
Based on this information, a decision can be made regarding the harvesting time (season) in order to obtain highest yields. In general, there are various advantages of Miscanthus cultivation: as a C4 plant, the plants bind to four (instead of three) carbon atoms, resulting in an exceptional CO2 fixation rate and high photosynthesis yields. Thus, Miscanthus crops are intensively studied for industrial exploitation, including lignin generation. Recently, we reported a systematic study showing strong correlations of the lignin structure with the Miscanthus genotype and plant portion (stem versus leaf) [26]. In detail, for lignins isolated via non-catalyzed organosolv, pulping the amount and linkages of the three monolignol building blocks (G, H, and S) was studied with different analytical methods (i.e., NREL protocol, FTIR, UV-Vis, HSQC-NMR, thermal gravimetric analysis (TGA), pyrolysis gas chromatography/mass spectrometry (GC/MS). The FTIR data have been processed using chemometric methods (i.e., principal component analysis). A comparison of beech wood and Miscanthus lignins could show that the Miscanthus-derived lignins showed lower molecular weight and narrow polydispersities (<1.5, compared to >2.5 for beech), most probably due to an increased homogeneity.
The nature and ratio of different monolignol linkages has been studied in detail using heteronuclear single quantum correlation (HSQC) 2D-NMR. Results showed that leaves contain two-thirds of the G units, whereas in stems and mixtures the G content is rather low. Compared to G, H and S units were found to be highest in samples containing leaf and stem mixtures. Figure 12 shows the calculated ratio of the most abundant β-arylether (A) linkages (55-65%), followed by phenyl coumarane (B) and resinol (C) linkages. Stem-derived lignins mainly contain unsaturated esters (D) (ca. 30%). The concentration of residual carbohydrates was below the detection threshold, indicating the high purity of organosolv-derived lignins. PCA processing of FTIR data from lignin samples was performed to determine the structural differences of lignins obtained from different Miscanthus x giganteus plant portions (stems, leaves, PCA processing of FTIR data from lignin samples was performed to determine the structural differences of lignins obtained from different Miscanthus x giganteus plant portions (stems, leaves, and their mixtures). Results are shown in Figure 13-the projections of IR spectra from lignin samples on the first three principal components (82% of variance). In particular, a differentiation of stem versus leaf-derived lignins was possible, since the aromatic in-plane deformation signals at 1160 cm −1 do correspond to the monolignol substitution pattern (Figure 13) [27]. and their mixtures). Results are shown in Figure 13-the projections of IR spectra from lignin samples on the first three principal components (82% of variance). In particular, a differentiation of stem versus leaf-derived lignins was possible, since the aromatic in-plane deformation signals at 1160 cm −1 do correspond to the monolignol substitution pattern (Figure 13) [27]. Lancefield et al. very recently reported a study including 54 lignin samples differing in origin and fractionation process. ATR-FTIR and NMR spectroscopy were used for structural analysis. The molecular weight and polydispersity were determined via gel permeation chromatography. All experimental data were processed using chemometric methods, i.e., PCA and PLS. Thus, molecular weight (number-average, Mn, and weight average, MW), as well as specific linkages (such as β-O-4, β-5, β-β') were studied using PLS data processing of ATR-FTIR, GPC, and NMR, resulting in coefficients of determination (R2 Cal. > 0.85). Via PCA, soft and hard wood-derived lignins can be separated. Lancefield and colleagues then used the first derivative spectra, resulting in significantly improved resolution and sample separation ( Figure 14) [131]. Lancefield et al. very recently reported a study including 54 lignin samples differing in origin and fractionation process. ATR-FTIR and NMR spectroscopy were used for structural analysis. The molecular weight and polydispersity were determined via gel permeation chromatography. All experimental data were processed using chemometric methods, i.e., PCA and PLS. Thus, molecular weight (number-average, Mn, and weight average, MW), as well as specific linkages (such as β-O-4, β-5, β-β ) were studied using PLS data processing of ATR-FTIR, GPC, and NMR, resulting in coefficients of determination (R2 Cal. > 0.85). Via PCA, soft and hard wood-derived lignins can be separated. Lancefield and colleagues then used the first derivative spectra, resulting in significantly improved resolution and sample separation ( Figure 14) [131].

Future Aspects Using Chemometrics for LCF Quality Control
In many cases, a strong overlap of spectral bands, even in two-dimensional experimental data, hampers classical data interpretation. This situation leads to the application of alternative chemometric methods for signal modeling. Here, methods such as PLS, ridge regression, stepwise regression with variable selection, principle component regression, and independent component

Future Aspects Using Chemometrics for LCF Quality Control
In many cases, a strong overlap of spectral bands, even in two-dimensional experimental data, hampers classical data interpretation. This situation leads to the application of alternative chemometric methods for signal modeling. Here, methods such as PLS, ridge regression, stepwise regression with variable selection, principle component regression, and independent component analysis are appropriate tools for chemometric modeling of experimental data for the determination of quantitative characteristics of natural polymers, as these methods have proven their versatility and effectiveness for complex samples [144,145]. Discriminant analysis algorithms, such as linear discriminant analysis (LDA), factorial discriminant analysis (FDA), and partial least squares-discriminant analysis (PLS-DA), are aimed at the construction of linear discriminant functions that maximize interclass dispersion and minimize intraclass variance by applying generalized decomposition. This arsenal of approaches is expected to be used for determining qualitative characteristics of biopolymers, such as their botanical origin. In addition, confusion matrices that compare information on the actual and predicted assignments of the samples for each particular group will be constructed to study the predictive ability of models. Common component and specific weights analysis (CCSWA) can be applied to simultaneously analyze different spectroscopic data sets (i.e., for processing of NMR, IR, Raman, XRF spectroscopy data) [146]. The possibility of transferring chemometric models between different spectrometers will be evaluated by calibration transfer methods, such as direct standardization (DS) and piece-wise direct standardization methods (PDS) [147]. Besides MATLAB packages, the possibility of Python3 can be explored for constructing multivariate models. Funding: This research was funded by BMBF program "IngenieurNachwuchs" projects "LignoBau" (03FH013IX4) and EFRE Infrastrukturförderung "Biobasierte Produkte" (EFRE0500035).