A Predictive Model for Maceral Discrimination by Means of Raman Spectra on Dispersed Organic Matter: A Case Study from the Carpathian Fold-and-Thrust Belt (Ukraine)

: In this study, we propose a predictive model for maceral discrimination based on Raman spectroscopic analyses of dispersed organic matter. Raman micro-spectroscopy was coupled with optical and Rock-Eval pyrolysis analyses on a set of seven samples collected from Mesozoic and Cenozoic successions of the Outer sector of the Carpathian fold and thrust belt. Organic petrography and Rock-Eval pyrolysis evidence a type II/III kerogen with complex organofacies composed by the coal maceral groups of the vitrinite, inertinite, and liptinite, while thermal maturity lies at the onset of the oil window spanning between 0.42 and 0.61 R o %. Micro-Raman analyses were performed, on approximately 30–100 spectra per sample but only for relatively few fragments was it possible to perform an optical classiﬁcation according to their macerals group. A multivariate statistical analysis of the identiﬁed vitrinite and inertinite spectra allows to deﬁne the variability of the organofacies and develop a predictive PLS-DA model for the identiﬁcation of vitrinite from Raman spectra. Following the ﬁrst attempts made in the last years, this work outlines how machine learning techniques have become a useful support for classical petrography analyses in thermal maturity assessment.

Most of the works dealing with Raman spectra of kerogen use a classic band fitting approach that in many cases can result in biased results, as recently outlined [36,40]. Thus, different automatic approaches have been proposed based on simplified band deconvolution [31,38,41] or on a multivariate analysis chemometric scheme [34,36] for the spectra analysis, proposing a challenging approach for geological studies. In particular, a principal component analysis (PCA)-partial least square (PLS)-based multivariate analysis has been recently proposed [34,36] to correlate predictive parameters from PLS regression against vitrinite reflectance. Moreover, Schito et al. [32] demonstrated how a partial least square discrimination analysis (PLS-DA) allows to define and predict the differences between vitrinite and sporomorphs on the base of their Raman spectrum for a given thermal maturity degree.
In this work, we test a similar approach on a set of seven samples collected in the Outer sector of the Carpathian Fold and Thrust belt (Ukraine), in order to define quantitatively the differences in Raman spectra between vitrinite and inertinite group macerals and predict them with the aim to provide a more robust dataset for thermal maturity assessment.

Materials
Samples were collected from black shales at various stratigraphic intervals (Early Cretaceous to Early Miocene) cropping out in the Outer Carpathians, in SW Ukraine, close to the Romania border ( Figure 1). This portion of the fold-and-thrust belt shows an imbricate fan architecture made up of a series of NE-verging thrust sheets [42] in which three tectonic units are recognized (Boryslav-Pokuttia tectonic Unit, Skiba tectonic Unit and Chornogora tectonic Unit, Figure 1). In detail, five samples (PL 93.1, PL 93.2, PL 95, PL 97, PL 101.1) come from organic-rich levels of the Melinite shales (Oligocene to Lower Miocene) cropping out in the Skiba Unit. Sample PL 102 was collected in a pelitic bed of the Eocene-Oligocene succession of the Globigerina Marls in the Skiba Unit, whereas sample PL 103 comes from an organic-rich level of the Hauterivian-Albian Spas-Shypot formation in the Chornogora Unit ( Figure 1).

Rock-Eval Pyrolysis
Rock-Eval pyrolysis is a traditional quantitative method for kerogen characterization, based on the relative intensity and distribution of three fluid peaks (S1, S2, S3), artificially generated at different lab temperatures from a whole rock specimen containing kerogen. Free hydrocarbons (S1) in the rock and the amount of hydrocarbons (S2) expelled during pyrolysis and TOC were measured using a Rock-Eval 6 equipment at ENI laboratories [44].

Vitrinite Reflectance Analysis
Samples were prepared for petrographic analysis via HCl-HF digestion to remove carbonates and silicates [45] and after, concentrated organic residue was prepared as polished blocks following ASTM standards [46]. The vitrinite mean random reflectance was measured at the University of Porto (Portugal) on a reflected light Leitz microscope coupled to a Diskus-Fossil System following ASTM standard D7708-14 [47].

Raman Spectroscopy
Micro-Raman spectroscopic analyses were performed on organic particles, some of them identified as vitrinite or inertinite, observed under reflected light on polished sections. Micro-Raman spectroscopy was carried out at the University of Porto (Portugal) using a Horiba Jobin-Yvon LabRam XploRA TM system in a backscattering geometry, in the range of 700-2300 cm −1 using a 1200 grooves/mm spectrometer gratings and CCD detector. The instrument is equipped with 50× and 100× objective lens, and an excitation wavelength of 532 nm from a Nd:YAG laser at a power of 25 mW. To avoid laser-induced degradation of kerogen and reduce the fluorescence background to minimal values, laser power was adjusted below 0.4 mW, using optical filters and the Raman backscattering was recorded after an integration time of 20 s for 6 repetitions. Each organic particle was analysed with an about 1 µm diameter spot using a 50× optical power objective, and about 30-100 measures for samples were performed depending on the abundance of organic particles.
Raman spectra of organic matter appear in the first order region between 1000 cm −1 and 1800 cm −1 , whereas bands in the second order region, between 2000 cm −1 and 3500 cm −1 , were not detected since they are weak in low matured organic matter and can be only detected using shorter Raman excitation wavelengths (e.g., less than 488 cm −1 ). The first order Raman spectra consists of two main bands known as the D and the G bands [48] and by other bands depending on the degree of the coal rank [49,50]. The G band is related to the in-plane vibration of the carbon atoms in the graphite sheets, while the D band at 1350 cm −1 becomes active in disordered graphite and has been interpreted as a results of a double resonant Raman scattering process [51][52][53] or alternatively as the ring breathing vibration in the graphite sub-unit or polycyclic aromatic compounds (PAHs; [54][55][56][57][58]). On the other hand, no general consensus has been reached regarding the assignment of the other bands that compose the first order Raman spectra of carbonaceous materials. Two bands at 1150 cm cm −1 and at 1250 cm cm −1 (S and Dl in [29,58]) were assigned by [59] (called in their work D4 and D5 bands, see Table 1) to C-H species in aliphatic hydrocarbon, while other authors [55,58] assigned them respectively to polyacetylene-like structures and low size aromatic domains. Moreover, band at 1465 cm −1 and 1380 cm −1 , which represent the "overlap" between D and G (Dr and Gl bands in [29,58]), were assigned mainly to amorphous carbon structures in char (D3 band by [60]), small ring systems (e.g., with 3-5 fused benzene rings) by [55] and [58] and polyacetylene-like structures. Alternatively, a band at around 1500 cm −1 was assigned to trapped hydrocarbons by [61] (D5 band in their work). See Table 1 for a complete review. Statistical principal component analysis (PCA) and partial least square discriminant analysis (PLS-DA) were performed in the range of Raman spectra between 1000 cm −1 and 2000 cm −1 . The ranges between 700 and 1000 cm −1 and between 2000 and 2300 cm −1 in the original spectra have been excluded from the PCA analysis since they do not contain relevant information and can only add a further source of error. Before performing PCA and PLS-DA, all spectra were pre-processed for spike removal and spectra normalization (relatively to the maximum intensity of the G band). The PCA routine of MATLAB R2017b software (The MathWorks, Natick, MA, USA) was used. The main purpose of PCA is to reduce the dimensionality of a multivariate dataset by explaining the variance-covariance structure of the data using a linear combination of the original variables to form principal components (PCs), minimizing the information loss. In this work we use the PCA to examine the qualitative differences within Raman spectra finding the maximum differences among them in the PC space (the space where the component scores correspond to the coordinates of each observation). In the case of Raman spectra, observations are represented by the number of measurements for each sample and the original variables (frequencies of the Raman spectra, in this case 1021 for each spectrum). The projection of the observations on the PC space is called "component score", while the weight of each original variable in the new space is called "loading". A score plot gives information about the relationship between observation (spectra), while loading explains which variables (i.e., frequencies) are responsible for the separation observed in the score plot.
PLS-DA was used to develop classification rules for macerals that cannot be optically ascribed to a maceral group. A spectral dataset, consisting of optically identified vitrinite and inertinite spectra, was used as the training set to develop a calibration model and find the predicting parameters that were used to classify a testing set. The testing set here is represented by the whole dataset of Raman spectra for which an optical classification was impossible in terms of recognisable macerals. PLS-DA on the training set, is a supervised classification since it is based on an external a priori classification (dependent variable). In this case the dependent variable is represented by the belonging to two different groups (group 1 for the vitrinite, and group 2 for the inertinite). The calibration of the training set produces a set of regression coefficients from which the predicted values of the dependent variables are computed. The relation between independent and dependent variables in the training set is given by the beta coefficients. Multiplying the beta coefficient by the independent variables allows to predict the unknown dependent variables for the testing set. Figure 2 shows an example of the workflow developed in this work. Starting from the raw spectra, we use the PCA score plot to identify the distribution of our data. Then, we identify where the optically recognized macerals fall in the score plot and improve their number through PLS-DA analysis. Finally, all R o % equivalent values are calculated for the identified macerals. In Figure 2, we show as an example also the reflectance equivalent values of inertinite, while the focus of this work is to find reflectance equivalent values only for vitrinite fragments (see discussion section). independent variables allows to predict the unknown dependent variables for the testing set. Figure 2 shows an example of the workflow developed in this work. Starting from the raw spectra, we use the PCA score plot to identify the distribution of our data. Then, we identify where the optically recognized macerals fall in the score plot and improve their number through PLS-DA analysis. Finally, all Ro% equivalent values are calculated for the identified macerals. In Figure 2, we show as an example also the reflectance equivalent values of inertinite, while the focus of this work is to find reflectance equivalent values only for vitrinite fragments (see discussion section). Workflow of the supervised classification used to identify vitrinite spectra. Original spectra have been plotted after PCA analysis on a score plot where identified macerals (vitrinite and inertinite) have been put in evidence before and after PLS-DA analysis (red dots for vitrinite and yellow dots for inertinite). Frequency histogram to the right shows reflectance equivalent values calculated before and after PLS-DA on different macerals (red bins for vitrinite, yellow bins for inertinite).

Rock-Eval Pyrolysis
Results from Rock Eval pyrolysis and TOC are listed in Table 2 and plotted in Figure  3 on a pseudo-Van Krevelen diagram based on HI and Tmax measurement [62].  Workflow of the supervised classification used to identify vitrinite spectra. Original spectra have been plotted after PCA analysis on a score plot where identified macerals (vitrinite and inertinite) have been put in evidence before and after PLS-DA analysis (red dots for vitrinite and yellow dots for inertinite). Frequency histogram to the right shows reflectance equivalent values calculated before and after PLS-DA on different macerals (red bins for vitrinite, yellow bins for inertinite).

Rock-Eval Pyrolysis
Results from Rock Eval pyrolysis and TOC are listed in Table 2 and plotted in Figure 3 on a pseudo-Van Krevelen diagram based on HI and Tmax measurement [62].  As shown in Table 2 and Figure 3, the Rock-Eval results indica TOC and HI, ranging between 0.58 and 11.29% and 108 and 365, re of good to excellent source rocks. The highest TOC and HI values ch black shales from the Krosno Beds and Lower Cretaceous Shypot fo Unit). Tmax values range between 421 °C and 439 °C indicating r and the early stages of HC generation.   Table 2 and Figure 3, the Rock-Eval results indicate type II -III kerogen. TOC and HI, ranging between 0.58 and 11.29% and 108 and 365, respectively, are typical of good to excellent source rocks. The highest TOC and HI values characterise pelites and black shales from the Krosno Beds and Lower Cretaceous Shypot formation (Chornogora Unit). Tmax values range between 421 • C and 439 • C indicating roughly the immature and the early stages of HC generation.  In reflectance histograms comprising all macerals (Figure 4c as an example and supplementary material Plate A for all results), Ro% values span across a wide range between about 0.2% and 2.0%. The lowest values (0.2 < Ro% < 0.35) were measured on dark grey fragments with low contrast to inorganic matrix under white light, which were interpreted as liptinite. These fragments often show a heterogeneous texture and can be sometimes confused with vitrinite fragments, but they can be distinguished thanks to their brown fluorescence, under fluorescent blue-light (Figure 4b).

Organic Petrography and Vitrinite Reflectance (R o %)
Vitrinite fragments were recognized on the base of morphological features and reflectance distribution in frequency histograms. Ro% values range between about 0.40% and 0.60% (Table 1 High reflectance values can correspond to reworked vitrinite but also to macerals belonging to the inertinite group (es. semifusinite and fusinite). Fusinite generally shows reflectance higher than 1.0% while reflectance values around 0.7-0.8%, in some cases close to those of vitrinite fragments (e.g., sample PL 103), correspond to semifusinite. Vitrinite fragments were recognized on the base of morphological features and reflectance distribution in frequency histograms. R o % values range between about 0.40% and 0.60% (Table 1 High reflectance values can correspond to reworked vitrinite but also to macerals belonging to the inertinite group (es. semifusinite and fusinite). Fusinite generally shows reflectance higher than 1.0% while reflectance values around 0.7-0.8%, in some cases close to those of vitrinite fragments (e.g., sample PL 103), correspond to semifusinite.

Raman Spectra and Raman Parameters
The normalized Raman spectra of each sample are shown in Figure 5a,b and in Supplementary Materials Plate B. As shown in the figures, there is a high heterogeneity in Raman spectra. Most of them show a high fluorescence background, a wide G band and a low intensity D band (high D/G intensity ratio). Nevertheless, spectra with a narrow G band and a D band shifted toward lower wavenumbers and lower fluorescence background are also present. The normalized Raman spectra of each sample are shown in Figure 5a and b and in Supplementary Materials Plate B. As shown in the figures, there is a high heterogeneity in Raman spectra. Most of them show a high fluorescence background, a wide G band and a low intensity D band (high D/G intensity ratio). Nevertheless, spectra with a narrow G band and a D band shifted toward lower wavenumbers and lower fluorescence background are also present.
Micro-Raman measurements were performed on discrete organic particles. The identification of different macerals was not easy using an air-immersion objective such as those mounted on the Raman equipment and it was not possible for most of the measured fragments. Nevertheless, in some cases, optical recognition of macerals was possible, as shown in Figure 5c,e,g,i. Generally, vitrinite fragments appear as low-relief dark-grey ones, usually with a squared or a triangular shape (Figure 5c) with a high to moderate fluorescence, depending on the maturity of the sample and a large G band generally more prominent than the D band (Figure 5d). Fragments darker than vitrinite (when observed under the same lighting) with a heterogeneous texture and higher fluorescence Raman spectra could be interpreted as liptinite (Figure 5e,f). Nevertheless, the classification of liptinite is prevented by the dark colour being similar to the surrounding matrix and by the high fluorescence that overwhelms the spectrum in most of the cases. Micro-Raman measurements were performed on discrete organic particles. The identification of different macerals was not easy using an air-immersion objective such as those mounted on the Raman equipment and it was not possible for most of the measured fragments. Nevertheless, in some cases, optical recognition of macerals was possible, as shown in Figure 5c,e,g,i.
Generally, vitrinite fragments appear as low-relief dark-grey ones, usually with a squared or a triangular shape (Figure 5c) with a high to moderate fluorescence, depending on the maturity of the sample and a large G band generally more prominent than the D band (Figure 5d). Fragments darker than vitrinite (when observed under the same lighting) with a heterogeneous texture and higher fluorescence Raman spectra could be interpreted as liptinite (Figure 5e,f). Nevertheless, the classification of liptinite is prevented by the dark colour being similar to the surrounding matrix and by the high fluorescence that overwhelms the spectrum in most of the cases.
Inertinite fragments are relatively easy to identify but are heterogeneous: semifusinite is more easily recognizable because of its intermediate shade of grey between vitrinite and fusinite and some preserved cell structures, but not as much as in fusinite (Figure 5g). Its spectra (Figure 5h) show low fluorescence with respect to vitrinite and well-defined D and G bands. Fusinite fragments show a light shade of grey, well preserved cell structure and a similar spectrum to the one of semifusinite, but with lower fluorescence and a narrower G band (Figure 5i,j). Table 3 shows the results of the mean values of Raman parameters for each sample derived from a separate deconvolution of the D and G band, according to [38]. Table 3. Mean values of Raman parameters for each sample, calculated by means of the automatic deconvolution proposed by Schito and Corrado (2018). pD, position of the D band (cm −1 ); pG, position of the G band (cm −1 ); wD, full width at the maximum height of the D band (cm −1 ); wG, full width at half maximum of the D band (cm −1 ); aD, integrated area of the D band; aG, integrated area of the G band; ∆D-G: difference between G band and D band position (cm −1 ); ID/IG, intensity ratio between the D and G bands; aD/aG, area ratio between the D and G bands; wD/wG, ratio between the full width at half maximum of D and G bands.  (Table 3). However, the standard deviation calculated from all values is quite high, reflecting the maceral heterogeneity described above.
The variability of the R o % equivalent calculated from Raman data can be observed in the histograms in Supplementary Figures S13-S16 in Plate C and Supplementary Figures S10-S12 in Plate D.

Multivariate Analysis on Raman Spectra
PCA was carried out on each sample building a matrix where each row corresponds to a different spectrum and each column to the intensity at different Raman shift. The goodness of the PCA model is expressed by the percentage of explanation for each PC. In our samples, the first two PCs components explain between the 88% and the 98% of the variance in the original matrix (Supplementary Figures S5-S8 in Plates C and Figures S4-S6 in Plate D). Generally, values higher than 80% in the first three PCs indicate acceptable models.
Plotting the two first PCs on the score plots (Supplementary Figures S1-S4 in Plates C and Figures S1-S3 in Plate D), data are usually distributed into two or three clusters with a trend along the x axis (i.e., first principal component). Figures S9-S12 in Plate C and Figures S7-S9 Plate B) indicate the frequency at which major changes in the Raman spectrum occur. Values close to zero mean that almost no changes occur, while maximum values depict the greatest variation among spectra. In our spectra the greatest variations always occur at around 1350 cm −1 and 1600 cm −1 that is the region of the D and G bands and for higher wavenumbers (>1650 cm −1 ), reflecting the increasing/decreasing fluorescence among different macerals.

Loading plots in (Supplementary
In Figures 6 and 7b,f,j the optically recognized vitrinite and inertinite spectra are shown on the score plots (red and yellow dots respectively). The figure shows that most of the vitrinite fragments fall in the main cluster (except for sample PL 103), while depending on the sample, inertinite usually falls in a second cluster, but shifted towards lower or higher first PC values. Plotting the two first PCs on the score plots (Supplementary Figures S1-S4 in Plates C and Figures S1-S3 in Plate D), data are usually distributed into two or three clusters with a trend along the x axis (i.e., first principal component). Figures S9-S12 in Plate C and Figures S7-S9 Plate B) indicate the frequency at which major changes in the Raman spectrum occur. Values close to zero mean that almost no changes occur, while maximum values depict the greatest variation among spectra. In our spectra the greatest variations always occur at around 1350 cm −1 and 1600 cm −1 that is the region of the D and G bands and for higher wavenumbers (>1650 cm −1 ), reflecting the increasing/decreasing fluorescence among different macerals.

Loading plots in (Supplementary
In Figures 6 and 7b,f,j the optically recognized vitrinite and inertinite spectra are shown on the score plots (red and yellow dots respectively). The figure shows that most of the vitrinite fragments fall in the main cluster (except for sample PL 103), while depending on the sample, inertinite usually falls in a second cluster, but shifted towards lower or higher first PC values.  Figures (a,e,i) show the raw spectra in a 3D view. Figures (b,f,j) the score plots distribution: red dots indicated optically recognized vitrinite while yellow dots optically recognized inertinite. Figures (c,g,k) show optically recognized and identified after PLS-DA vitrinites (red dots) and inertinites (yellow dots); number in brackets refers to explained variance by each principal component. Figures (d,h,l) histograms of Ro% equivalent values from Raman parameters calculated on vitrinite recognized by means of PLS-DA.  Figures (a,e,i) show the raw spectra in a 3D view. Figures (b,f,j) the score plots distribution: red dots indicated optically recognized vitrinite while yellow dots optically recognized inertinite. Figures (c,g,k) show optically recognized and identified after PLS-DA vitrinites (red dots) and inertinites (yellow dots); number in brackets refers to explained variance by each principal component. Figures (d,h,l) Figures (a,e,i) show the raw spectra in a 3D view. Figures (b,f,j) the score plots distribution: red dots indicated optically recognized vitrinite while yellow dots optically recognized inertinite. Figures (c,g,k) show optically recognized and identified after PLS-DA vitrinites (red dots) and inertinites (yellow dots); number in brackets refers to explained variance by each principal component. Figures (d,h,l) show the histograms of Ro% equivalent values from Raman parameters calculated on vitrinite recognized by means of PLS-DA.
Based on this evidences, a multivariate classification via the PLS-DA technique was used to derive prediction parameters for the classification of vitrinite and inertinite group macerals. In a first step, vitrinite and inertinite spectra were used as a training set to build a calibration model, whose goodness was validated by means of statistic tests (see  Figure S5 in Plate E). Thus, a statistically significant number of PLS components for the training model is three, since it explains a high percent of variance with the minimum error.
Once found the right number of factors, PLS-DA was run on the training set to establish the classification parameters for each class. The obtained beta factors were applied to the test set to categorize the remaining unclassified spectra into one of the two classes. This results in an increased number of the recognized vitrinite to be measured for thermal maturity. Table 4 shows the mean reflectance equivalent values and number measurements found for recognized vitrinite macerals and for the vitrinite macerals found only after the PLS-DA analysis.  Figures (a,e,i) show the raw spectra in a 3D view. Figures (b,f,j) the score plots distribution: red dots indicated optically recognized vitrinite while yellow dots optically recognized inertinite. Figures (c,g,k) show optically recognized and identified after PLS-DA vitrinites (red dots) and inertinites (yellow dots); number in brackets refers to explained variance by each principal component. Figures (d,h,l) show the histograms of R o % equivalent values from Raman parameters calculated on vitrinite recognized by means of PLS-DA.
Based on this evidences, a multivariate classification via the PLS-DA technique was used to derive prediction parameters for the classification of vitrinite and inertinite group macerals. In a first step, vitrinite and inertinite spectra were used as a training set to build a calibration model, whose goodness was validated by means of statistic tests (see Supplementary Materials Plate E). Two statistics tests were then performed (Supplementary Materials Plate E): test (1) "Percentage of variance versus Number of PLS components"; test (2) "Mean squared prediction error (MSEP) versus Number of PLS components". Test 1 results show that three first PLS components explained between 60% to 98% of the variance (only in samples PL 101.1 and PL 103 the first three components explain less than the 80% of the variance; Supplementary Figures S4 and S6 in Plate E). Test 2 results show that the MSEP is higher for the first two components and then decreases up to its minimum in almost all samples, except for sample PL 102 (Supplementary Figure S5 in Plate E). Thus, a statistically significant number of PLS components for the training model is three, since it explains a high percent of variance with the minimum error.
Once found the right number of factors, PLS-DA was run on the training set to establish the classification parameters for each class. The obtained beta factors were applied to the test set to categorize the remaining unclassified spectra into one of the two classes. This results in an increased number of the recognized vitrinite to be measured for thermal maturity. Table 4 shows the mean reflectance equivalent values and number measurements found for recognized vitrinite macerals and for the vitrinite macerals found only after the PLS-DA analysis. The analysis was not performed on sample PL 97 (where only vitrinite has been recognized) since PLS-DA analysis cannot be performed on a single class. Table 4 and Figures 6 and 7 indicate that the number of measurements on vitrinite increased from a minimum of 26% in samples PL 102 and PL 103 up to a maximum of 123% in sample PL 101.1, whereas the mean R o % eq values are very similar with a slightly increase in the standard deviation values (Table 4).

Source Rocks Quality, Organic Facies and Thermal Maturity
The area of interest in this study is part of the wider flysch belt of the Outern Carpathians, which is one of the oldest oil-producing regions in the world [63]. In particular, the Melinite shales, buried in the Boryslav-Pokuttya tectonic unit, acted as the main oil-bearing succession in the Carpathian region and have been widely studied, in particular in the polish sector [63]. In the area analysed in this work, belonging to the Ukranian sector, less data are available and the source rocks are not fully characterized.
The quality of the outcropping source rocks can be pointed out by means of TOC, HI, and S2 values (Figure 3), whereas the degree of thermal maturity has been accurately assessed by double checking R o % values from organic petrography with Tmax from Rock-Eval Pyrolysis. Via these methods, we can state that black shales of the Shypot beds in the Chornogora Unit, acted as a "good to very good" mixed gas and oil prone source rock with a thermal maturity falling at the oil window onset. As well, the Krosno beds in the internal portion of the Skiba unit, show an excellent oil-prone potential according to pyrolysis data (samples PL 93.1, PL 93.2, and PL 95), whereas Tmax and R o % values for samples PL 101.1 indicate both good gas and oil generation potential in the window of oil generation. On the other hand, thermal maturity of samples PL 93.1, PL93.2, and PL 95 indicate immature source rocks (Table 2; Figures 3 and 4). Samples PL 97 and PL 103 from the Melinite shales, located in the external part of the Skiba Unit and Chornogora Unit respectively, show good oil potential. However, Tmax and R o % values point out different thermal maturity levels suggesting that maximum temperatures were acquired in each tectonic unit due to a tectonic loading during the formation of the chain rather than at the end of sedimentation.
The petrographic analysis of the polished blocks, via incident light microscopy, revealed a complex assemblage of macerals for all samples (Figure 4). A significant number of suitable vitrinite particles was found for reflectance measurements in all samples except in sample PL 97 (only 11 suitable particles). In this sample, as well as in samples PL 93.2, PL 95, and to minor extent in samples PL 102 and PL 101.1, thermal maturity assessment was complicated by the diffuse presence of macerals with similar appearance, but having weak fluorescence, lower reflectance and darker shades of grey than vitrinite and have been interpreted to belong to the liptinite group. A further complication was given by the presence in all samples of higher reflectance macerals (>0.5-0.6 R o %; Figure 4), which were identified as inertinite and in some cases as reworked vitrinite fragments. The inertinite particles were grouped into semifusinite (R o = 0.6-0.8%) and fusinite (R o > 1%).
Vitrinite reflectance (R o = 0.42-0.46%) and Tmax data (421 • C-433 • C) indicate that most of the samples are in the immature stage of hydrocarbon. However, samples PL 101.1 and L 103 lie at the oil window onset with reflectance values of 0.51% and 0.61% and Tmax of 439 and 436, respectively.

Raman Spectroscopy and Vitrinite Reflectance Equivalent (R o % eq )
The presence of different Raman spectra shapes in each sample can be interpreted as the coexistence of in situ and reworked materials and/or of different particles of organic matter (Figure 4). Such Raman spectrum differences ( Figure 5) result, after a two-band fitting deconvolution, in higher values of the distance between the G and D bands and of the area, intensity and FWHM ratio of the D and G bands ( Table 4). These differences are mainly due to a red-shift and a D band area and width increase accompanied by a decrease of the G band width, which are the result of an increase of larger aromatic clusters, passing from disordered to more ordered materials [29,32,57,58,64,65], in this case, a progressive ordered structure: liptinite particles (when registered)< vitrinite < inertinite.
Comparing the histograms of the R o % eq of the undifferentiated particle spectra ( (Figure 7). R o % eq values on inertinite are usually higher than vitrinite and mostly agree with optically reflectance values measured on the same maceral. Nevertheless, we did not focus on them since the assessment of inertinite equivalent reflectance from Raman is beyond the aim of this work and useless for thermal maturity assessment and basin modeling.

Multivariate Analyses on Raman Spectra
Raman spectroscopy has become a promising tool for thermal maturity evaluation of coals and dispersed organic matter in diagenesis (see [66] for a comprehensive review). However, working with dispersed organic matter, the inability to couple optical observation under oil immersion, seriously limit Raman organic petrographic analyses as evidenced by the fact that only few works focus on single macerals measurements [23,32,37,57,67]. In this work we show how a multivariate analysis on Raman spectra can help in macerals description and identification when dealing with particularly complex organofacies.
Looking at the score plots derived from our samples, a linear trend can be generally seen, moving from negative to positive values on the first principal component axis, with a cluster of maximum density generally centered around 0 (Supplementary Materials Plate C Figures S1-S4 and Plate D Figures S1-S3). On the other hand, the variance on the second PCs axis is always limited with respect to the first. In these plots, outliers are easily recognized and can be excluded from the maturity conversion. Moreover, the score plots show two or more clusters of data. Samples with similar scores are similar and each cluster corresponds to particles with spectra showing similar "aromatization degree". When optically identified, vitrinite and inertinite macerals are plotted on the score plot (Figures 6 and 7) and vitrinite usually falls in the main cluster whereas inertinite falls in the most external part toward both more positive or negative values, depending on the relative abundance of different macerals.
Loadings plots (Supplementary Materials Plate C Figures S9-S12 and Plate D Figures S7-S9), on the other hand, show that major changes in Raman spectra among different macerals, occur at around 1350 cm −1 and 1600 cm −1 (e.g., D and G band region) and after 1700 cm −1 . These changes can be related to a shift in the D band position, a G band narrowing and a fluorescence decrease moving from hydrogen-rich to more aromatic organic matter [32].
Once defined the different classes of spectra in the samples, the PLS-DA analysis confirms that similar macerals fall in the same cluster on the score plot allowing to classify most of the remaining spectra.
In this way the number of vitrinite fragments increase, providing a more robust thermal maturity assessment (Figures 6 and 7). In Figure 8 the average R o % eq values calculated from the Raman spectra of vitrinite (red circles) and for the sum of vitrinite and PLS-DA derived vitrinite (blue circles) are plotted against the microscopic determination of vitrinite reflectance (R o %) together with the average R o % eq calculated on the whole macerals composition (green diamonds). Despite a slight increase of the standard deviation from R o % to R oeq % calculated from Raman ( Figure 8, Table 3), a better correlation between the mean values of both methods was found, together with an important increase of measurements after PLS-DA. The higher standard deviation of R o % eq with respect to R o % should be considered as an intrinsic limitation of the conversion equation given that the confidence interval of the equation indicates that an error higher than ± 0.05 can be expected for each measurement (see [38]), and thus the sum of the errors in a certain thermal maturity interval probably led to a relatively high standard deviation as those observed in our samples. More detailed studies based on further datasets are needed in future to provide a more effective tool. external part toward both more positive or negative values, depending on the relative abundance of different macerals.
Loadings plots (Supplementary Materials Plate C Figures S9-S12 and Plate D Figures  S7-S9), on the other hand, show that major changes in Raman spectra among different macerals, occur at around 1350 cm −1 and 1600 cm −1 (e.g., D and G band region) and after 1700 cm −1 . These changes can be related to a shift in the D band position, a G band narrowing and a fluorescence decrease moving from hydrogen-rich to more aromatic organic matter [32].
Once defined the different classes of spectra in the samples, the PLS-DA analysis confirms that similar macerals fall in the same cluster on the score plot allowing to classify most of the remaining spectra.
In this way the number of vitrinite fragments increase, providing a more robust thermal maturity assessment (Figures 6 and 7). In Figure 8 the average Ro%eq values calculated from the Raman spectra of vitrinite (red circles) and for the sum of vitrinite and PLS-DA derived vitrinite (blue circles) are plotted against the microscopic determination of vitrinite reflectance (Ro%) together with the average Ro%eq calculated on the whole macerals composition (green diamonds). Despite a slight increase of the standard deviation from Ro% to Roeq% calculated from Raman ( Figure 8, Table 3), a better correlation between the mean values of both methods was found, together with an important increase of measurements after PLS-DA. The higher standard deviation of Ro%eq with respect to Ro% should be considered as an intrinsic limitation of the conversion equation given that the confidence interval of the equation indicates that an error higher than ± 0.05 can be expected for each measurement (see [38]), and thus the sum of the errors in a certain thermal maturity interval probably led to a relatively high standard deviation as those observed in our samples. More detailed studies based on further datasets are needed in future to provide a more effective tool. Figure 8. The diagram shows the correlation between the microscopic determination of vitrinite reflectance (Ro%) and the reflectance equivalent of vitrinite (Roeq%) macerals calculated by means of Raman parameters, according to [38], for values obtained before (red circles) and after (blue circles) PLS-DA analysis. The size of the circles depends on the number of measurements for each sample. Green diamonds show Roeq% values calculated on all macerals composition.
Given this, the high correlation in such a small range of thermal maturity (between about 0.4 and 0.6 Ro%) confirms Raman spectroscopy as an accurate thermal maturity tool also in low diagenesis. Figure 8. The diagram shows the correlation between the microscopic determination of vitrinite reflectance (R o %) and the reflectance equivalent of vitrinite (R oeq %) macerals calculated by means of Raman parameters, according to [38], for values obtained before (red circles) and after (blue circles) PLS-DA analysis. The size of the circles depends on the number of measurements for each sample. Green diamonds show R oeq % values calculated on all macerals composition.
Given this, the high correlation in such a small range of thermal maturity (between about 0.4 and 0.6 R o %) confirms Raman spectroscopy as an accurate thermal maturity tool also in low diagenesis.

Conclusions
In this work organic petrography, Raman micro-spectroscopic and Rock-Eval pyrolysis analyses were performed on a set of samples collected from Mesozoic and Cenozoic successions of the Outer sector of the Carpathian fold and thrust belt, in Ukraine.
Organic petrography and Rock-Eval pyrolysis evidence a type II/III kerogen with complex composition characterized by the presence of macerals of the vitrinite, inertinite and liptinite groups. According to Tmax and R o % data thermal maturity lies at the onset of the oil window spanning between about 0.40 and 0.60 R o % (based on 30-50 measurements on vitrinite for most of the samples).
Micro-Raman analyses were performed on a higher number of organic fragments giving back from about 30 to about 100 spectra on each sample but, only for relatively few fragments was possible an optical classification according to their macerals group. For this reason, we performed a multivariate statistical analysis in order to define the variability of the organic facies and develop a predictive PLS-DA model for the identification of vitrinite from Raman spectra.
Results demonstrate that the PLS-DA model allows to successfully classify macerals on the basis of their Raman spectrum, considerably increasing the number of fragments that can be used for thermal maturity assessment.
Following the first attempts of [32,34,68,69], this work outlines how machine learning techniques can be a powerful support to classical organic petrography analyses in thermal maturity assessment on sediments where organic fragments are finely dispersed.