In the last years, Cannabis sativa
L. has been one of the most studied plant over the world [1
]. Cannabis sativa
L is a chemically complex plant which contains several classes of natural compounds, e.g., flavonoids, mono- and sesquiterpenes, steroids, nitrogenous compounds and cannabinoids, associated with medicinal properties of the plant [2
The main cannabinoid constituents are Δ9
-tetrahydrocannabinol (THC), that possess significant psychotropic properties, and other compounds with less or no psychotropic activity including neutral cannabinoids (cannabidiol CBD, cannabigerol CBG, cannabichromene CBC, cannabinol CBN, cannabidivarin CBDV) and acidic cannabinoids (tetrahydrocannabinolic acid THCA, cannabidiolic acid CBDA and cannabigerolic acid CBGA). The species sativa
is recognized as monotypic classification, that can be divided into different chemotypes based on the specific cannabinoid profile [4
The interest in Cannabis sativa
L. has increased in Italy mainly due to the latest legislation (N. 242 of 2 December 2016); the concentration of THC is used to classify two types of Cannabis sativa
L. a fiber-type plant with low content of THC (<0.2% w/w) and a prohibited drug-type plant where the content of THC is >0.6% w/w [6
]. There are hundreds of hemp strains available in the marketplace based on aroma, plant size, different cultivation, characteristic of the soil and overall yield [4
]. Therefore, there is a request to develop cost effective and easy-to-use quantitative and qualitative methods for the identification and classification of hemp products [6
The THC and CBD are the reference cannabinoids to extrapolate the phytochemical composition of the hemp, but different works have proved that that strains with similar THC/CBD content have different effects on human physiology [9
The present work focused on the chemical content of cannabinoids as a convenient tool to identify the hemp retailers without any other information on the hemp samples like Cannabis
strains, seeds, soil and cultivation characteristics, geographical origin, product storage, etc. Moreover, the results of this work supported the thesis that the only concentrations of THC and CBD are not enough to discriminate commercial hemp [9
]. The analysis of cannabinoids was carried out by means of HPLC-MS/MS, overcoming the phytocannabinoid decomposition due to the heating used to identify and quantify cannabinoids using methods like GC [12
PLS-DA chemometric approach identified correctly the four hemp retailers by using the nine cannabinoids analytical profile of 161 hemp samples, showing that an analytical method coupled with multivariate analysis can lead to a powerful tool for forensic purposes. Chemometric models have already been used to understand phytochemical diversity showing the advantages of multivariate analysis [9
A number of works in the literature reported the determination of THC and CBD concentration in hemp samples [8
], however to the best of our knowledge, there is a lack of information regarding the evaluation of the comprehensive cannabinoid profile in hemp products with HPLC-MS/MS [7
2. Results and Discussion
The dataset used for both univariate or multivariate analysis (Table S1, in Supplementary Material
) was composed of 161 hemp samples sampled in 2018 from four Italian retailers, and the concentration of the nine cannabinoids, the six neutral cannabinoids (THC, CBD, CBC, CBG, CBN and CBDV) and the three acidic cannabinoids (THCA CBGA and CBDA).
As reported in Table 1
, the hemp samples were sold in four Italian cities from three Italian regions. The hemp samples were sold without reporting information like Cannabis
strains, seeds, soil and cultivation characteristics, geographical origin, product storage, etc.
Data vectors belonging to the same hemp retailer were firstly evaluated by the analysis of variance via the graphical representation of box and whisker plot, and afterwards, by two multivariate techniques, the unsupervised principal component analysis (PCA) and supervised PLS-DA.
As shown in Figure 1
, the use of the nine cannabinoids variables led to no statistical differences between the hemp samples grouped as sold by the hemp retailers. Except of the high average concentration of THCA and CBGA found in the retailer D, the cannabinoids analytical profile was in all cases the same with high average concentrations of CBD and CBDA and low average concentrations of the other seven cannabinoids. CBN average concentration was found particularly high in A and B retailers. Interestingly, CBN is an oxidation product of THCA and the high content of this cannabinoid can also be a marker of inflorescence quality. The average concentration of THC was in all cases below of 0.35% w/w.
The results of Tukey HSD multiple comparison test reported in Supplementary Material (Table S2)
showed that the acidic cannabinoids had a lower capacity of discrimination than their neutral forms. In fact, the neutral cannabinoids showed a partial discrimination between the four hemp retailers, apart from CBDV.
The univariate data analysis highlighted that the hemp samples sold by the four retailers could not be differentiated only using the analysis of variance because of the overlap in cannabinoids composition of the hemp retailers.
The correlation between the nine cannabinoids analyzed in the hemp samples dataset was analyzed by computing the Pearson coefficients (Table 2
). The data showed a partial positive correlation between neutral cannabinoids especially between THC, CBD, and CBC. The only neutral cannabinoid without any correlation was CBDV that belongs to a class of cannabinoids synthesized from a different precursor and structurally different from the classical cannabinoids. No correlations were observed between the six neutral and two acidic cannabinoids THCA and CBGA or within the acidic forms. Only a weak correlation between CBG and CBGA could be remarked as reported also by other work, due to the decarboxylation of CBGA that produces not only CBG but also the other cannabinoids [18
]. The acidic cannabinoid CBDA slight correlated with all the six neutral cannabinoids.
These results agreed with data reported by other works and they can be explained by the biosynthesis of cannabinoids [19
This partial correlation between variables provided suitable contributions when treated with multivariate statistical procedures.
All data obtained with HPLC-MS/MS analysis were processed by PCA to find every possible cluster within the hemp samples dataset in an unsupervised way. Before applying the PCA algorithm, data were linearly normalized and auto-scaled (zero mean and unitary variance) in order to remove differences in concentration range. PCA was applied to inspect the multivariate data structure by decomposing the data matrix of the 161 hemp samples (in the matrix rows) and the nine cannabinoids analyzed (the matrix columns).
depicted the scores and loading plots of the first three principal components. The first component represented 42.7% of the variance, the second 15.8% and the third 14.9% accounting 74.4% of the total variance for the first three principal components.
The score points (Figure 2
A), representing the new coordinates of the hemp samples, were interpreted assuming that close distance in plot plane is a measure of the similitude between samples. PC 1 separated well both hemp samples sold by retailers B and D but clustered in the cartesian graph origin the hemp samples sold by the retailers A and C. PC 2 did not influence the hemp samples from retailers B and D but it contributed to the dispersion within the hemp samples sold by retailers A and C highlighting a similar variance behavior of those two retailers that started to be separated only along the PC 3 axis that spread both samples sold by retailers B and D.
The loadings (Figure 2
B), representing the contribution of each cannabinoid to the principal components, contributed mostly to the hemp samples separation on the PC 1 and PC 3. The PC 1 axis highlighted the differences among neutral and acidic cannabinoids. Neutral cannabinoids contributed significantly to the separation of the hemp samples sold by retailers B. On the other hand, the acid cannabinoid CBGA played an important role in separation of the hemp samples sold by retailer D. CBN, CBD and CBC had very similar pattern recognition performance contributing only in separating hemp samples on PC 1. The hemp samples sold by retailers A and C were influenced by THCA, CBDA and CBDV that contributed to the dispersion on PC 2 and to the separation on PC3.
The unsupervised PCA algorithm could partially discriminate hemp samples sold by the four Italian retailers.
To evaluate the real efficiency of the discrimination among the four retailers a supervised multivariate discriminant analysis was applied. The dataset of 161 hemp samples and nine cannabinoids concentration was used for this approach (Table S1
). As in any supervised classification techniques, the classes must be chosen a priori. The choice for the hemp samples in this case was to choose the hemp retailers as classes. With this classification scheme, a PLS-DA model has been built. PLS-DA is an extension of PLS, by projecting intercorrelated X-variables from high dimensional space into low-dimensional space according to a Y-vector that encodes the class membership in a set of categorized variables (1 and 0 values, respectively) [22
]. A numerical evaluation of the classification properties was obtained by considering the cross validation of the PLS-DA method according to the ‘venetian blinds’ technique.
The model was evaluated using the following parameters: component in model, explained variance in percentage, error rate in calibration and in cross-validation. Moreover, specificity sensitivity and precision of the four classes, corresponding to the four Italian hemp retailers, were computed in fitting and cross-validation.
The statistical summary results of the PLS-DA algorithm were reported in Table 3
. The results showed a very good discrimination between the four classes with an explained variance of 100% and low classification errors in both calibration (5%) and cross validation (6%) by using eight model components previously optimized by the algorithm.
Retailers C and D had the highest sensitivity respectively for fitting and cross-validation, contributing to low classification errors. A very good sensitivity and precision in both calibration and cross-validation were found for retailers A, B and D.
Real-predicted samples were reported in Table 4
using confusion matrix format. A total of 92% of the samples have been correctly classified by cannabinoids variables in fitting and cross validation. In fitting, retailers C and D showed 100% and 94% of correspondences between real and predicted samples. In cross-validation, this percentage decreased to 95% for retailer C and increased to 100% for retailer D. The retailers A and B had the highest percentage of misclassified samples, with all misclassified samples assigned to retailer C. Those results underlined a close analytical cannabinoids profile between the retailers A, B and C, also highlighted by the PCA model. The class assigned to each of the 161 hemp samples by the PLS-DA model in fitting and cross validation was reported in Supplementary Material (Table S3)
The analysis of nine cannabinoids by means of HPLC-MS/MS was used to evaluate the feasibility to identify hemp retailers located in Italy without having any other information on the hemp sold like Cannabis strain, soil and cultivation characteristics, geographical origin, product storage, etc.
The cannabinoids analytical profile was, in all cases, the same with high concentrations of CBD and CBDA and low concentrations of the other seven cannabinoids. The univariate evaluation of the cannabinoids showed no statistical differences between the hemp samples, demonstrating that analyzing each cannabinoid individually it was not enough to identify the hemp retailers.
On the other hand, the synergic contribution of the nine cannabinoids concentrations could identify the hemp retailers as demonstrated by PLS-DA algorithm. A total of 92% of the hemp samples were correctly classified by the cannabinoid variables in both fitting and cross validation.
In conclusion, the present study contributes to the characterization of hemp samples sold by retailers without any other information, proving that a simple chemical analysis coupled with a robust chemometric method could be a powerful tool for forensic purposes.