Connecting Obsidian Artifacts with Their Sources Using Multivariate Statistical Analysis of LIBS Spectral Signatures

: With the recent introduction of handheld instruments for ﬁeld use, laser-induced break-down spectroscopy (LIBS) is emerging as a practical technology for real-time in situ geochemical analysis in the ﬁeld. LIBS is a form of optical emission spectroscopy that is simultaneously sensitive to all elements with a single laser shot so that a broadband LIBS spectrum can be considered a diagnostic geochemical ﬁngerprint . Sets of LIBS spectra were collected for seven obsidian centers across north-central California, with data processed using multivariate statistical analysis and pattern recognition techniques. Although all obsidians exhibit similar bulk compositions, different regional obsidian sources were effectively discriminated via partial least squares discriminant analysis. Ob-sidian artifacts from seven archaeological sites were matched to their putative sources with a high degree of conﬁdence.


Introduction
Geochemical analysis of obsidian can provide insights into artifact production and past trade patterns [1][2][3].Handheld laser-induced breakdown spectroscopy (LIBS) has emerged as a practical technology for in situ geochemical analysis because it is capable of rapid and minimally destructive determination of the elemental composition of geological and archeological materials in the field under ambient environmental conditions.Since LIBS is simultaneously sensitive to all elements, and particularly sensitive to light elements which are difficult to determine using other methods (e.g., H, Li, Be, B, and C), a single laser shot may be used to measure the spectral intensity of specific elements for quantitative analysis or record the broadband LIBS emission spectrum that is a unique compositional signature, i.e., a geochemical fingerprint.Studies on the analysis of geological materials such as marble [4], flint [5], pozzolan [6], glass [7][8][9][10], and particularly obsidian [11][12][13][14] demonstrate that distinguishing compositional characteristics can be recognized via LIBS analysis.The objective of this study is to use multivariate chemometric analysis of LIBS broadband spectra to determine obsidian artifact provenance via comparison with a spectral library for samples of known origin, with a focus on the specific application of handheld Minerals 2023, 13, 1284 2 of 37 LIBS to discriminating of obsidian sources and determining the provenance of obsidian artifact sites in California.

Obsidian
Obsidian is a natural glass of volcanic origin formed when highly viscous silica-rich magma is extruded onto the Earth's surface and cools so rapidly that mineral crystallization is precluded [15].Obsidian glass fractures conchoidally with sharp edges because of its disordered atomic structure and, therefore, was widely used by indigenous peoples to produce tools, weapons, and decorative objects.Typically forming from highly silicic rhyolite magma on the margins of lava domes, and less commonly as extrusive lava flows [16], it is only rare volcanic events that produce obsidian of the high quality required for tool making.As a consequence of the need to obtain obsidian and its relatively restricted occurrence, obsidian was traded widely by Indigenous peoples across California and the adjacent Mojave Desert and western Great Basin physiographic provinces within the region known as the California Culture Area [17] and frequently transported far from its original geological source [18].Hence, there has been a long-standing and continuing interest within the archaeological community in linking obsidian artifacts to their geological source as a means of understanding resource procurement patterns, distribution networks, cultural contacts, and movements of people.Because obsidian is common in volcanic areas worldwide, forming when silicic-rich magma is extruded onto or near the Earth's surface where it rapidly chills to glass against air, water, or colder rock [15], the conclusions reached in this manuscript are clearly relevant for future studies of both geological and archaeological obsidian as well.

Geochemical Fingerprinting
Obsidian sources tend to occur in relatively discrete locations, although geological erosion can produce fluvial deposits in which obsidian transported away from a source region is present as clasts of sufficient size and quality to be worked for tools.It may be possible to assign an obsidian sample to a source on the basis of its visual characteristics or physical texture in some rare instances [19,20], but, in general, geochemical information is necessary for reliable source determination.The provenance attribution of obsidian to a geological source relies on the concept of geochemical fingerprinting [21,22], which typically employs the measurement of small differences in the chemical or isotopic composition to accomplish discrimination of samples with similar bulk composition.The concept of geochemical fingerprinting via LIBS has been validated through a series of investigations using both benchtop and handheld LIBS systems for a variety of geological materials, including obsidian (e.g., [11][12][13][14][23][24][25][26][27][28][29][30]).
As a high-silica rhyolite glass, obsidian has a broadly similar major element chemistry regardless of location because the processes of assimilation and fractional crystallization within intracrustal magma chambers [31] produce rhyolite under certain geological circumstances.This is illustrated in Table 1 for five of the obsidian sources analyzed in this study (Figure 1).Because of this similarity in bulk composition, small variations in minor and trace element composition at the 10s to 100s ppm level have been widely used for obsidian source discrimination and provenance determination [15,[43][44][45][46][47][48][49].The determination of obsidian trace element character has provided an important tool in assessing prehistoric trading patterns across the Great Basin (e.g., [50][51][52][53]).
All of the common analytical techniques used to determine obsidian chemical composition (e.g., electron microprobe analysis, optical emission spectroscopy, X-ray fluorescence spectrometry [XRF], instrumental neutron activation analysis, inductively coupled plasma mass spectrometry, scanning electron microscopy with energy dispersive X-ray spectroscopy, and particle-induced X-ray emission) are time-consuming laboratory-based techniques that, with the exception of XRF, cannot be used in the field.Analysis of obsidian via handheld XRF analysis over the past decade has shown promise for obsidian provenance attribution [54][55][56][57][58][59].However, portable XRF analysis is limited by its inability to analyze important elements of low atomic weight present in low concentration (e.g., Li, Be, B, C, Na, and Mg).By contrast, because every element in the periodic table has one or more emission lines in the ultraviolet, violet, visible, and near-infrared spectral regions between 200 and 900 nm, analysis via LIBS simultaneously captures the full elemental composition of a sample with a single laser pulse and, therefore, is particularly suitable for obsidian analysis.
Beginning with the work of Parks and Tieh [43], Jack and Carmichael [60], and Jack [61], geochemical analysis has become a standard means of investigating obsidian artifact provenance in California.Although small obsidian occurrences can be of uniform chemical character, it is not uncommon for obsidian domes and flows associated with large volcanic centers like those in Medicine Lake Highlands, Long Valley/Mono-Inyo, and the Coso Range of California (Figure 1) to have sufficient compositional heterogeneity to be distinguished on the basis of their trace element character [33,36,38,42].Thus, there is currently extensive geological and archaeological literature demonstrating that California obsidian centers are generally of sufficiently distinct character to be distinguished on the basis of their chemical composition (e.g., [15,42,[62][63][64][65]).Variations in trace element composition, commonly evaluated on bivariate elemental or element ratio plots, have been used routinely in this context, but advanced statistical analysis is becoming a more common approach [11,12].Because of this similarity in bulk composition, small variations in minor and trace element composition at the 10s to 100s ppm level have been widely used for obsidian source discrimination and provenance determination [15,[43][44][45][46][47][48][49].The determination of obsidian trace element character has provided an important tool in assessing prehistoric trading patterns across the Great Basin (e.g., [50][51][52][53]).
All of the common analytical techniques used to determine obsidian chemical composition (e.g., electron microprobe analysis, optical emission spectroscopy, X-ray fluorescence spectrometry [XRF], instrumental neutron activation analysis, inductively coupled plasma mass spectrometry, scanning electron microscopy with energy dispersive X-ray spectroscopy, and particle-induced X-ray emission) are time-consuming laboratory-based techniques that, with the exception of XRF, cannot be used in the field.Analysis of obsid-

Obsidian Sources and Archeological Sites in North-Central California
Obsidian is found extensively across California, with more than 50 different sources documented in the US Obsidian Source Catalog of the Northwest Research Obsidian Studies Laboratory [64].Many of these occurrences were exploited for tools by prehistoric populations.Archaeological evidence suggests that the conveyance of obsidian occurred over great distances within the California Culture Area [18,[65][66][67] and that separate trading obsid-Minerals 2023, 13, 1284 5 of 37 ian networks amongst prehistoric people in northern and south-central California [50,68].It is generally considered that obsidian acquisition/use patterns in California shifted over time [69,70] in response to both geological and climatic factors [71] and persisted into the colonial period.For example, Native American people living at Mission San José (ca.CE 1797-1840) acquired obsidian using long-distance conveyance, i.e., geographic displacement from its place of origin, from distant source areas as well as through recycling of archaeological artifacts [72].
Archaeological studies within the Great Basin region have determined that eight predominant obsidian sources between the Bodie Hills and Death Valley were exploited by prehistoric populations (Figures 1 and 2); Mt.Hicks and Truman/Queen in westernmost Nevada and Bodie Hills; Mono/Inyo Craters, Glass Mountain, and Casa Diablo in the Long Valley area; and Fish Springs, the Saline Range, and the Coso Range in eastern California [19,50,70,[73][74][75][76][77][78].A pattern of obsidian source utilization and conveyance has been recognized, with primary trade west across the Sierra Nevada from sources latitudinally parallel to the Owens Valley.The Bodie Hills source dominates archaeological assemblages in the northern area [18], with obsidian from the Casa Diablo site in Long Valley predominant in the southern area [18,79], although a variety of other sources are represented [80,81].Within archaeological contexts in the southern San Joaquin Valley and throughout southern California, obsidian derived from the Coso Range is most abundant [18,50,52,53,82,83].In this study, we analyzed obsidian from seven prominent geological sources: the Medicine Lake Volcanic Center in northern California, the North Coast Ranges in westcentral California, plus the Bodie Hills, Long Valley Caldera and Mono-Inyo Craters area, Fish Springs, the Saline Range, and the Coso Range in east-central California (Figure 1).In this study, we analyzed obsidian from seven prominent geological sources: the Medicine Lake Volcanic Center in northern California, the North Coast Ranges in westcentral California, plus the Bodie Hills, Long Valley Caldera and Mono-Inyo Craters area, Fish Springs, the Saline Range, and the Coso Range in east-central California (Figure 1).With the exception of three obsidian tools from the Long Valley area, the obsidian artifacts included in this study (Figure 1) were debitage collected from anthropological contexts at six sites in east-central California and a single site in western California.These samples are from collections curated by California State University-Bakersfield and the US Bureau of Land Management.

Medicine Lake Highlands, Northern California
Following the retreat of the Pleistocene glaciers from the southern Cascade Mountains, Holocene volcanism at the Medicine Lake Highlands volcanic center in Modoc and Siskiyou Counties (Figure 1) produced about 7.5 km 3 of lava [84,85] that displays a broad range of composition from 47 to 75 wt.%SiO 2 , with obsidian (>70% SiO 2 ) erupted at 12 widely dispersed locations.Prominent obsidian localities include Crater Glass Flow, Glass Flow, Glass Mountain, Glass Mountain North, Glass Mountain South, Grouse Hill, Little Glass Mountain, Grasshopper Flat, Lost Iron Wells, and Cougar Butte.Extensive chemical analysis of the volcanic rocks of the Medicine Lake Highlands by Donnelly-Nolan [32] documents minor and trace element variability among these different obsidian sources.
Obsidian from the Medicine Lake Highlands was a principal trade item and has been found as far distant as the northwestern California coast [86].From XRF analysis, Hughes [87] was able to distinguish between the Glass Mountain, Lost Iron Wells, and Grasshopper Flat obsidian sources on the basis of Sr-Zr variability.Analysis of obsidian artifacts from the Squaw Creek and Lorenzen archaeological sites demonstrated that obsidian from the Medicine Lake Highlands was being utilized for tools 8000 years ago by Indigenous peoples in northern California and was a principal trade item that has been found as far away as the northwestern coast and central valley of California [88].

North Coast Ranges
The North Coast Ranges consist of two parallel sets of mountains trending north from San Francisco to northern Humboldt County (Figure 1).Obsidian here occurs in Napa and Sonoma Counties and originated in the Sonoma volcanic field that produced a range of mafic through silicic volcanic as a consequence of phreatomagmatic ash flows to dome complex events occurring over the last 8 million years [34,89].Obsidian fragments from the volcanic field are found as clasts in the gravels of younger alluvial deposits along the fringes of the volcanic domain.The three sites were sampled in this study: Annadel, Franz Valley, and Napa Glass Mountain.The Annadel obsidian source is a quarry located in Annadel State Park near Santa Rosa in Sonoma County, where dark, grey to black-banded obsidian with a matte luster typically occurs as small pieces within a matrix of deeply weathered brecciated perlite [90].The obsidian source at Napa Glass Mountain, located near St. Helena in Napa County, occurs in a matrix of tuff and perlite and is usually observed as relatively small pieces throughout the ashy matrix [91].This obsidian is typically dark black and opaque, with a glossy to vitreous luster.Pumice-bearing ash flow tuffs outcrop in an extensive area around the Franz Valley in Napa County [91], where a small dome is characterized by green-brown obsidian with a vitreous luster that is similar to the Annadel source in composition but chemically distinct from Napa Valley obsidians by elevated Sr and Ba concentrations [92].Several studies have concluded that both local sources in the northern Coast Ranges, as well as more distant sources in the eastern Sierra Nevada, contributed to the obsidian present at archaeological sites in the greater San Francisco Bay area [51,61,92,93] and that obsidian from local sources dominated during colonial time compared to earlier pre-historic era [18,94,95].It is notable that projectile points of Franz Valley obsidian are documented at site CA-MRN-307 [96], whereas obsidian artifacts from site CA-SMA-113 in the Quiroste Valley Cultural Preserve, some 150 km to the south of the Sonoma Volcanic Field, have been attributed to the Annadel source in Napa Valley on the basis of Sr-Zr and Rb-Sr elemental composition [97].

East-Central California
The geology of east-central California is characterized by Tertiary volcanic overlying pre-Cretaceous granitic rock that intruded into a basement of older gneiss and schist [98,99].Present-day topography reflects a complex landscape shaped primarily by Tertiary-Late Pleistocene volcanism and episodic erosion during the Oligocene, Miocene, and Pliocene epochs.Volcanic rocks include plugs, lava flows, and tuffs that are predominately dacitic but range in composition from basalt to rhyolite [100].Several large occurrences of glassy obsidian associated with late Cenozoic to recent volcanism are present on the eastern side of the Sierra Nevada [50,101,102].From north to south, these include the Bodie Hills, Mount Hicks, Truman Meadows/Queen Valley Mono Craters, Glass Mountain, and Casa Diablo locations in Mono County, plus the Fish Springs, Saline Range, and Coso Range in Inyo County.A variety of archaeological studies detail the expedient and local use of obsidian from a variety of sources, including those exhibiting lower-quality toolstone-grade material.However, a small number of obsidian quarries supplied most of precolonial west-central and southwestern California [19,61,70,[73][74][75]77]. While many variables likely influenced the distribution of obsidian prehistorically, geographic proximity seems to have been a major driver, with the distance decay model explaining obsidian distributions fairly well.

Bodie Hills
The Bodie Hills are located on the western fringe of the Great Basin physiographic province approximately 30 km north of Mono Lake in Mono County, California (Figures 1 and 2).The underlying geologic structure of the Bodie Hills comprises Tertiary volcanics intruding onto a Paleozoic and Mesozoic basement.Created during the middle to late Miocene time [103], the Bodie Hills volcanic field is a large (>700 km 2 ) and episodic eruptive complex [41] located north of Mono Lake and east of Bridgeport.Four trachyandesite stratovolcanoes formed along the margins of the volcanic field, and numerous silicic trachyandesite to rhyolite flow dome complexes were erupted more centrally.The extant topography reflects a complex geologic landscape shaped primarily by volcanism and episodic erosion during the Oligocene-Pliocene time [98,104].Natural obsidian occurrences in the Bodie Hills area take two forms, either discrete terrace outcrops eroding from steep hillsides or as fluvially/alluvially deposited lag flows [105].
The Bodie Hills contain one of the most archaeologically significant obsidian sources in California prehistory.This obsidian source has a well-documented period of utilization that began during the terminal Pleistocene/early Holocene and continued through the contact period [106][107][108].Obsidian artifacts from the Bodie Hills were extensively exported and have been identified in archaeological deposits throughout northern California and into central and southern California [95,105,[108][109][110] and as far west as the Pacific coast [109,111].The Bodie Hills obsidian source (CA-MNO-4527) was first described geochemically by Jack and Carmichael [60] and archaeologically by Singer and Ericson [109], who identified the minimal spatial extent of the geological obsidian deposit, noted the variation in obsidian macro-attributes, and proposed a utilization curve based on an obsidian hydration analysis of what they described as the main quarry area.That study identified the main source to contain 8 km 2 of culturally modified material derived from three primary outcrops.Subsequent research by Halford [105] identified eight additional primary outcrops in two loci, termed Bodie Hills North and Bodie Hills West, as well as a substantial cobble flow trailing from them.Field surveys during those studies identified 9 km 2 of previously unreported obsidian deposits.In total, 14 km 2 of flakestone-viable obsidian deposits from 14 primary outcrops have been identified within the Bodie Hills [105].Singer and Ericson [109] did not attempt to segregate obsidian subsources within the Bodie Hills deposit, although they did note differences in obsidian macroscopic attributes and reasoned that these were likely reflective of variations in trace chemical composition.Bettinger et al. [19] observed that Bodie Hills obsidian exhibits a dominant gray-green banding interspersed with clear bands containing minute black and white phenocrysts.To date, there have been no attempts to identify geochemically discrete subsources within the Bodie Hills obsidian source.It is unknown if 7 of the 14 culturally significant outcrops sampled for this study represent a uniform source or distinct subsources.

Long Valley Caldera and Mono-Inyo Craters
The Long Valley caldera and the Mono-Inyo Craters volcanic chain are located in eastern California between the Sierra Nevada escarpment and the western edge of the Basin and Range province (Figure 1).This region has been volcanically active over the last 3.5 million years from late Tertiary times to the present due to crustal extension and consequent decompression melting [112].The first silicic eruptions that produced obsidian occurred from 2.2 to 0.8 Ma (1 Ma = 10 6 years B.P.) in the vicinity of Glass Mountain within the Long Valley caldera [112][113][114] formed from the eruption of the 600 km 3 Bishop Tuff 760,000 years ago.The caldera-forming eruption was followed by Lookout Mountain's early rhyolites at 760 to 650 Ka (1 Ka = 10 3 years B.P.).Recent rhyolitic eruptions over the past 50,000 years produced the Mono-Inyo Craters lineament of some 30 rhyolite centers along a volcanic chain localized along a narrow fissure system extending from north of Mammoth Mountain through the western moat of Long Valley caldera to Mono Lake [112,115,116] and include Obsidian Dome, Glass Creek Dome, Mono Craters, Wilson Bute, Deadman Creek Dome, and Panum Crater obsidian sources sampled in this study.The majority of these domes are Holocene in age, with the youngest eruptions occurring at the northern end of the chain less than a thousand years ago [115,117].
Analogous to the situation at the Medicine Lake Highlands, a diversity of rhyolite compositions erupted across the Long Valley/Mono-Inyo magmatic system.The Casa Diablo obsidian source consists of many discrete outcrops comprising more than 8 km 2 of geological exposure within more than 300 km 2 of the caldera, occurring as obsidian eroding from the margins of resurgent rhyolitic domes, as secondary deposits of waterworn obsidian cobbles within Pleistocene lake sediments, and as airflow beds containing smaller clast-sized obsidian that typically is insufficiently large to function as a source for flaked stone tools [116].
Deposits containing obsidian nodules occur widely on Glass Mountain and in its vicinity [71].Although its use was primarily local, Glass Mountain obsidian has been recognized at sites in the western Sierra Nevada Mountains and in the Central Valley, California [117].Hughes [63] argued that obsidian from Glass Mountain could be distinguished from the obsidian that erupted during the last millennium at Mono Craters based on differences in Mn-Zr character determined by laboratory XRF analysis, and Hughes [118] observed differences in the Sr-Ti-Ba character of obsidians from three different portions of the Casa Diablo source at Lookout Mountain.Three obsidian artifacts found in the Long Valley caldera but of unknown source were included in this study: a small projectile point, biface, and edge-modified flake tool.

Saline Range and Inyo Mountains
The Saline Range is a low-lying mountain range in Inyo County, California, of volcanic origin that is bounded by Eureka Valley to the north and Saline Valley to the south (Figures 1 and 2).Beginning about 4 Ma, during late Neogene time, volcanic rocks, including obsidian-bearing rhyolitic flows and tuffs, were emplaced in the northwestern portion of the Saline Range over a pre-existing topography consisting of Proterozoic and Paleozoic marine sediments, Mesozoic plutons, and Cenozoic volcanic and sedimentary rocks that was disrupted later by Basin and Range faulting to create complex outcrop patterns [119][120][121].Volcanic rocks that include the obsidians analyzed in this study are located on the northeastern part of Saline Valley along the eastern base of the Saline Range on the west side of Steele Pass.The age of the obsidian is not known, but the felsic rocks in the region generally underlie more basic extrusive Pliocene and early Quaternary caprocks [122,123].The Inyo Mountains are a N-S trending fault-block that forms the eastern boundary of the Owens Valley and the western boundary of Saline Valley.
Alluvial deposits containing obsidian nodules and debitage have been known in the eastern arm of Saline Valley and adjacent Inyo Mountains since the mid-1970s [124], with outcrops of obsidian-bearing Pliocene rhyolite flows and ash flow tuffs identified on the western side of the Saline Range and in remote areas of the volcanic tableland in the Steele Pass area in the mid-late 1990s [125].Saline Range obsidian has been considered a source of archaeological obsidian [126,127].Early archaeological investigations in the adjacent Owens Valley identified multiple types of obsidian toolstone material.Subsequent geochemical research postulated three subsources for Saline Range obsidian based on Sr-Zr and Sr-Ba relationships [65].Additional archaeological studies at the Eureka Dunes and in the Inyo Mountains identified another unrecognized obsidian source, Eureka Dunes Unknown.It is thought that geochemically distinct subsources originate from within the Saline Range [125]; although archaeological studies designed to spatially define the geological sources and their physical extent were never completed.
Archaeological obsidian has been recognized at multiple montane sites in east-central California.Three high-elevation sites (CA-170-08-00-S11, CA-170-08-00-S15, and CA-170-08-00-S21) located in the Inyo Mountains were documented during a high-elevation archaeological investigation [128].Located mid-slope on a north-facing aspect, just west of the Inyo Mountain crest, site CA-170-08-00-S11 is a substantial high-elevation prehistoric site containing 35 fragmentary flaked tools and 400 debitage specimens.The associated artifact deposit evidences high-elevation resource procurement that spanned more than 3500 years.Activities within the site focused on stone tool production and repair, likely associated with large mammal hunting.Site CA-170-08-00-S15 is a large, high-elevation prehistoric site composed of 39 formal flake tools and a large, variable-density, lithic debitage scatter covering nearly three acres.The artifact deposit evidences frequent and continuous use from the terminal Pleistocene through the late Holocene.Site CA-170-08-00-S21 is an extensive high-elevation prehistoric habitation site containing a diverse assemblage of artifacts and features.This resource dates to the last thousand years, with most of the assemblage associated with activities spanning the last 650 years.

Fish Springs
The Quaternary Big Pine volcanic field, located in the Owens Valley between the towns of Big Pine and Independence (Figures 1 and 2), consists of about 40 volcanic vents that cover an area of approximately 500 km 2 and were active between 1.2 Ma to 16 Ka [129].A single coulée composed of homogeneous high-silica rhyolite [130] dated at 0.99 Ma [131] is present in the vicinity of Fish Springs.The outer portions of the coulee are composed of felsitic rhyolite, with internal portions consisting of pumiceous perlite containing obsidian nodules [132].Known as ta'kapi by the Owens Valley Paiute [133], this high-silica rhyolite has been utilized for tools locally since prehistoric times [76].Nodular obsidian suitable for toolstone also occurs in small pumiceous perlite domes disturbed by historic mining [134] and has been observed as nodules in alluvial sediments along slopes of the adjacent volcanic hill [102].Stevens [135] notes that Fish Springs obsidian is common at high-elevation sites in the southern Sierra Nevada Mountains 20 km or more to the west, and a few Fish Springs obsidian artifacts have been reported in Death Valley [136].Bettinger et al. [19] note that much of Fish Springs obsidian has a green hue, varies from near-transparent to near-opaque in character with the transition marked by contorted flow bands, and contains abundant black and white phenocrysts.Instrumental neutron activation analysis (INAA) by Eerkens and Glascock [102] suggests that the Fish Springs obsidian can be distinguished from other west-central obsidian sources on the basis of trace element character.

Coso Range
The Coso Range lies at the western edge of the Basin and Range physiographic province some 280 km southeast of Mono Lake (Figures 1 and 2) in an area underlain by plutonic and metamorphic rocks of Mesozoic age that have a thin Plio-Pleistocene volcanic and sedimentary cover.Within the Coso Range, the Coso volcanic field (CVF) contains volcanic features ranging from pyroclastic deposits, explosion craters, debris flows, and rhyolite domes [137].Volcanic rocks having a wide range of compositions were erupted over the last 6 Ma [138].Volcanism was episodic, with eruptions in particular locations of the volcanic field separated by periods of inactivity.For example, basaltic to rhyolitic eruptions occurred in the eastern Coso Range between 4 to 3 Ma, which was followed by a quiescent period of inactivity from 3 to 2.1 Ma [137,138].Volcanism in the Coso Range continued until the late Pleistocene time [137,139].
The CVF contains at least 38 high-silica rhyolite volcanic extrusions of the Late Pleistocene age [42] that occur most commonly as steep-sided domes and less frequently as lava flows erupted onto the pre-Cenozoic basement rocks.Primary obsidian occurs in thick flow bands emanating from steep rhyolite domes.Younger volcanic rocks cover most of the southwesterly and northwesterly areas of the CVF obsidian domes, and some domes are surrounded by rings of pyroclastic deposits consisting of obsidian, pumiceous rhyolite, and small amounts of basement rock [140].Occasionally, lava domes amalgamate to form compound structures, of which the Sugarloaf Mountain complex is the largest.Because of their youthfulness, most CVF domes and flows are still covered with a blocky glassy pumiceous carapace so that their obsidian interior is only exposed in roadcuts or quarries, if exposed at all.
As noted in Table 1, CVF obsidians are crystal-poor, high-silica, metaluminous rhyolites that contain 77 ± 0.6% SiO 2 [42] that are broadly similar in major element composition to obsidian present at other rhyolite centers in eastern California and western Nevada [60], including those examined in this study.On the basis of K-Ar geochronology [137] and geochemical character [42], the CVF rhyolites can be divided into different groups (Figure 3) that were progressively erupted at c. 1040 ± 20 Ka, 587 ± 18 Ka, 235 ± 25 Ka, 170 ± 11 Ka, 160 ± 30 Ka, 89 ± 10 Ka, and 63 ± 9 Ka.Bacon et al. [42] noted that the dome and flow surface morphology, geological field relationships, and age dating results indicate that each rhyolite group consists of essentially coeval extrusions that occurred in time spans that were very short compared to the overall length of CVF magmatism.
The two oldest domes, with ages of 1040 Ka and 587 Ka, not shown in Figure 3, are small outcrops in the far northeast and central areas of the CVF that are much more extensively eroded than other CVF rhyolites.At least eight individual rhyolite bodies of intermediate size are scattered across the north-central portion of the CVF and comprise the 235 Ka suite.The three members of the 170 Ka group are small rhyolite bodies, one located in the far north and the other two present in the far south of the CVF.Bacon et al. [42] point out that one group of very geochemically similar rhyolites, consisting of four domes including South Sugarloaf Mountain and one extensive lava flow in the center of the CVF could not be dated by the K-Ar method with an acceptable precision.An age of 160 Ka was assigned to this rhyolite suite based on the obsidian hydration rind thickness and because geological field evidence suggests that these outcrops were closer in age to the 170 Ka suite than to the younger rhyolite group erupted at 89 Ka.Three large domes in the Joshua Ridge area in the southern portion of the CVF comprise the 89 Ka group.The 15 youngest domes and flows comprise an arcuate pattern from north to south in the middle zone of the CVF and were formed at 63 Ka.Included in this group are the large domes of Cactus Peak and Sugarloaf Mountain (termed East Sugarloaf Mountain in Figure 3) that were emplaced through the older South Sugarloaf Mountain dome.
The eruptive history of the CVF, with most domes and flows of all rhyolite groups except the oldest dome venting near the center of the field, has produced a complex spatial distribution of domes and flows in which domes and flows of different ages are intermingled in close proximity (Figure 3).In a comprehensive geological and geochemical examination of the CVF, based on the previous age control of Duffield et al. [137], the distribution of CVF rhyolites was explained by Bacon et al. [42] in terms of an emplacement model in which a long-lived central magma reservoir was present beneath the Coso Range that episodically vented throughout the course of its compositional evolution history directly above its center, but with dikes radiating from the central magma chamber that formed outlying domes.The subtle variations in trace element composition observed within the CVF obsidian suite by Bacon et al. [42] are considered to be characteristic of specific batches of magma, erupted at different times, which had undergone slightly different degrees of crystal fractionation and chemical differentiation within the main subcrustal magma reservoir.This situation, which has resulted in the juxtaposition of domes and flows of different geological ages, makes the archaeological attempt to identify the different CVF obsidian subsources on some basis other than direct age dating quite challenging and has generated an extensive discussion in the literature about the number and character of archaeologically significant CVF subsources [11,12,53,61,63,70,77,[141][142][143][144][145].
most of the southwesterly and northwesterly areas of the CVF obsidian domes, and some domes are surrounded by rings of pyroclastic deposits consisting of obsidian, pumiceous rhyolite, and small amounts of basement rock [140].Occasionally, lava domes amalgamate to form compound structures, of which the Sugarloaf Mountain complex is the largest.Because of their youthfulness, most CVF domes and flows are still covered with a blocky glassy pumiceous carapace so that their obsidian interior is only exposed in roadcuts or quarries, if exposed at all.
As noted in Table 1, CVF obsidians are crystal-poor, high-silica, metaluminous rhyolites that contain 77 ± 0.6% SiO2 [42] that are broadly similar in major element composition to obsidian present at other rhyolite centers in eastern California and western Nevada [60], including those examined in this study.On the basis of K-Ar geochronology [137] and geochemical character [42], the CVF rhyolites can be divided into different groups (Figure 3) that were progressively erupted at c. 1040 ± 20 Ka, 587 ± 18 Ka, 235 ± 25 Ka, 170 ± 11 Ka, 160 ± 30 Ka, 89 ± 10 Ka, and 63 ± 9 Ka.Bacon et al. [42] noted that the dome and flow surface morphology, geological field relationships, and age dating results indicate that each rhyolite group consists of essentially coeval extrusions that occurred in time spans that were very short compared to the overall length of CVF magmatism.[137] for the obsidian sites sampled by [42].
The two oldest domes, with ages of 1040 Ka and 587 Ka, not shown in Figure 3, are small outcrops in the far northeast and central areas of the CVF that are much more extensively eroded than other CVF rhyolites.At least eight individual rhyolite bodies of intermediate size are scattered across the north-central portion of the CVF and comprise the 235 Ka suite.The three members of the 170 Ka group are small rhyolite bodies, one located in the far north and the other two present in the far south of the CVF.Bacon et al. [42] point out that one group of very geochemically similar rhyolites, consisting of four domes including South Sugarloaf Mountain and one extensive lava flow in the center of the CVF could not be dated by the K-Ar method with an acceptable precision.An age of 160 Ka was assigned to this rhyolite suite based on the obsidian hydration rind thickness and  [137] for the obsidian sites sampled by [42].
The CVF as a source for archaeological obsidian, first noted by Farmer [146], Heizer and Treganza [147], and Harrington [148], has been extensively discussed over the past half-century, beginning with Jack and Carmichael [60].Many of the CVF rhyolite localities and pyroclastic deposits contain workable obsidian glass that has been quarried for tools by the indigenous population, as documented from excavation at 34 CVF sites [52,53].CFV obsidian was extracted from numerous small quarries and pebble lag deposits [143] both before and after the arrival of bow-and-arrow technology some 1500 years ago for both use and trade [149].Rochester Cave is a reduction site located < 2 km distant from Sugarloaf Mountain.Another set of artifact samples attributed to the CVF is present at Rose Spring in the Rose Valley, a N-S trending valley located just south of Owens Valley ~15 km northwest of the Coso Range [150].The Rose Spring site, which was a locus of cultural activity multiple times during the past 2000 years [53,151], is located near the northern end of the Rose Valley and is the type site for the Rose Springs series of arrow points [152].

Oak Flat
The Oak Flat archaeological site is in the foothills of the Cuyama Valley of the Sierra Madre Range in Santa Barbara County (Figure 1).Occupation sites in Cuyama Valley are reported to be from the Chumash Middle Period between 3000 to 2000 years ago [154].Flaked stone is abundant at Oak Flat, represented by various types of cryptocrystalline rock and obsidian of unknown origin [155].

LIBS Analysis
The samples used for this study came from two sources.Sets of geological obsidian from the Medicine Lake Highlands and the Northern Coast Ranges were provided by the US Geological Survey (Menlo Park, CA, USA).Some samples from the Coso Range came from the collection of California State University-Bakersfield.Samples from the Bodie Hills, Long Valley/Mono-Inyo, Saline Hills, and Fish Springs and a portion of those from the Coso Range were collected by the authors.Obsidian tools from Long Valley were provided by the US Geological Survey (Menlo Park), while flakes of obsidian debitage from Bodie Hills, Fish Springs, Inyo Mountains, Rose Springs, Rochester Cave, and Oak Flat came from collections curated by the US Bureau of Land Management (Bishop, CA, USA) and California State University-Bakersfield.
LIBS is an application of atomic emission spectroscopy in which a high-energy, shortduration pulsed laser beam is focused on a material to cause material ablation and the generation of a high-temperature microplasma containing its constituent elements on the sample surface.The subsequent dissociation and ionization of small amounts of material within the plasma leads to the generation of continuum and atomic/ionic emission across UV-visible-near IR wavelengths during cooling.Spectral analysis of this light is used to detect the elements present.LIBS is capable of qualitative, semi-quantitative, and quantitative analysis.
The work reported here utilized a SciAps, Inc. (Woburn, MA, USA) Z-300 series handheld LIBS analyzer.This instrument uses a proprietary pumped solid-state 1064 nm Nd-YAG pulsed nanosecond laser that generates a 6 mJ laser pulse with a nominal 100 µm beam size at a 10-Hz firing rate.It has a built-in camera for beam targeting and the capability to flow an inert gas (typically Ar) across the sample surface for plasma confinement and signal enhancement.The light signal from the plasma emission is collected, typically after a 650 ns delay over a 1 ms integration time, and passed by fiber optic cable into three spectrometers with time-gated, charge-coupled diode detectors having respective spectral ranges and resolutions of 190 to 365 nm with a full-width half maximum (FWHM) value of 0.18 nm, 365 to 620 nm with an FWHM value of 0.24 nm, and 620 to 950 nm with an FWHM value of 0.35 nm.This analytical procedure produces a composite broadband LIBS spectrum over the 23,432 channels of the spectrometer, such as that shown in Figure 4.
The 3D translational stage that permits rastering the laser beam across the surface of a sample is computer-controlled for automatic adjustment of the laser focus at each sample location.Automated stage movement permits analysis over a 2 × 2 cm area, with the raster pattern, spacing, and number of laser shots at each location determined by the user.For this study, four laser shots for data collection were taken at each point on five different 4 × 3 point grids covering an area of approximately 1.0 × 0.7 mm, following two laser shots for sample surface cleaning.The 48 accumulated individual analyses were averaged to produce a composite LIBS spectrum, such as that shown in Figure 4, with five such average spectra acquired from fresh obsidian flakes for each sample.These broadband LIBS emission spectra provide a spectral geochemical fingerprint that includes all elements present in the sample above their intrinsic limits of detection by LIBS.
Typically, some 20 elements are present at significant emission intensity in our suite of obsidian LIBS spectra-H, Li, C, O, Na, Mg, Si, Al, K, Ca, Ti, V, Mn, Fe, Rb, Sr, Ba, La, and Y.The presence of a substantial H emission peak in all our obsidian spectra raises the intriguing possibility that LIBS analysis could be used to quantitatively measure the water content of obsidian.Representative LIBS broadband spectra between 150-950 nm for the obsidian analyzed in this study.This sample from Sugarloaf Mountain in the Coso Volcanic Field has a composition similar to the specimen in Table 1, analyzed by Bacon et al. [42], namely 35% Si, 6.5% Al, 3%-4% Na and K, 0.5 to 0.7% Fe and Ca, and ~300 ppm Mg and Rb.The Ar peaks in the infrared portion of the spectrum between 700 and 920 nm (shown in grey text) are from the Ar purge gas used via the handheld analyzer for the LIBS analysis.
The 3D translational stage that permits rastering the laser beam across the surface of a sample is computer-controlled for automatic adjustment of the laser focus at each sample location.Automated stage movement permits analysis over a 2 × 2 cm area, with the raster pattern, spacing, and number of laser shots at each location determined by the user.For this study, four laser shots for data collection were taken at each point on five different 4 × 3 point grids covering an area of approximately 1.0 × 0.7 mm, following two laser shots for sample surface cleaning.The 48 accumulated individual analyses were averaged to produce a composite LIBS spectrum, such as that shown in Figure 4, with five such average spectra acquired from fresh obsidian flakes for each sample.These broadband LIBS emission spectra provide a spectral geochemical fingerprint that includes all elements present in the sample above their intrinsic limits of detection by LIBS.
Typically, some 20 elements are present at significant emission intensity in our suite of obsidian LIBS spectra-H, Li, C, O, Na, Mg, Si, Al, K, Ca, Ti, V, Mn, Fe, Rb, Sr, Ba, La, and Y.The presence of a substantial H emission peak in all our obsidian spectra raises the intriguing possibility that LIBS analysis could be used to quantitatively measure the water content of obsidian.

Data Processing and Chemometric Analysis
The foundation for this study is that the concept of LIBS geochemical fingerprinting, i.e., that the full LIBS broadband spectrum or a sufficiently large portion thereof, contains sufficient compositional information to provide a unique chemical description of any particular sample.Thus, if advanced statistical signal processing and classification techniques are applied to a sufficiently robust spectral data set, it should be possible to distinguish samples of the same kind originating from one place from those originating in another.Obsidian is a particularly challenging material for such geochemical finger printing because it is a high-Si, rhyolitic glass that tends to have similar bulk composition wherever Representative LIBS broadband spectra between 150-950 nm for the obsidian analyzed in this study.This sample from Sugarloaf Mountain in the Coso Volcanic Field has a composition similar to the specimen in Table 1, analyzed by Bacon et al. [42], namely 35% Si, 6.5% Al, 3%-4% Na and K, 0.5 to 0.7% Fe and Ca, and ~300 ppm Mg and Rb.The Ar peaks in the infrared portion of the spectrum between 700 and 920 nm (shown in grey text) are from the Ar purge gas used via the handheld analyzer for the LIBS analysis.

Data Processing and Chemometric Analysis
The foundation for this study is that the concept of LIBS geochemical fingerprinting, i.e., that the full LIBS broadband spectrum or a sufficiently large portion thereof, contains sufficient compositional information to provide a unique chemical description of any particular sample.Thus, if advanced statistical signal processing and classification techniques are applied to a sufficiently robust spectral data set, it should be possible to distinguish samples of the same kind originating from one place from those originating in another.Obsidian is a particularly challenging material for such geochemical finger printing because it is a high-Si, rhyolitic glass that tends to have similar bulk composition wherever found, and it is only on the basis of minor-and trace-element compositions that obsidian of different provenance can be distinguished.We have approached this challenge using chemometric analysis of broadband LIBS spectra.
Chemometrics comprise a diverse group of techniques for the statistical treatment of chemical data, particularly the very large data sets obtained using many different kinds of spectroscopy [156,157].The impact of this approach may be particularly profound with LIBS data because it considers the entire content of a broadband LIBS spectrum or suite of related LIBS spectra, thereby utilizing all of the information contained therein rather than considering just a small number of user-selected spectral features.Details of the procedures used in this study for statistical signal processing and chemometric analysis have been described in detail by Harmon et al. [158] and are not repeated here.Data for each of the spectral processing and classification tasks in this study took the form of LIBS broadband spectra of the type shown in Figure 4.
In our sample classification and discrimination analysis, samples refer to individual items in a set of similar objects (i.e., LIBS spectra) for which a label is desired (e.g., source of origin), and classes refer to the sample groups that are to be discriminated (i.e., obsidian sources or artifact locations).Each observation is a single normalized LIBS spectrum for a particular sample.The central chemometric task for this study is sample discrimination, which requires the development of a set of mathematical features that characterize each sample within a population, in this instance, the LIBS plasma emission intensities at the different spectral wavelengths of the broadband spectrum.The only requirement for a feature is that it must consistently have the same meaning for each input to the classifier.So, for example, if a specific feature for one sample is the Si emission line intensity at 288.16 nm in the LIBS spectrum, then that feature must be the intensity for the same spectral emission line for all samples.

Data Preprocessing
The first stage of data preprocessing of acquired LIBS spectra consisted of two parts: baseline correction and Ar line removal.Baseline correction consisted of first estimating the baseline (i.e., the portion of the spectrum that is not informative) and subtracting it from the original spectrum.The baseline was estimated by first using a Hampel filter [159] to remove any strong emission line magnitudes.The Hampel filter uses a sliding window in which the data surrounding an emission line is used to calculate the median absolute deviation (MAD).If the magnitude of the emission line is greater than a factor of the MAD (factor = 3 for this study), the magnitude is replaced with the MAD.The sliding window size used for the Hampel filter was 2% of the spectrum length.
Once the non-informative peaks were removed, the baseline estimate was smoothed using a sliding median window with a window size equal to 0.5% of the spectrum length.Smoothing was included to generate a baseline estimate that captured constant offsets or gradual drifts in magnitude (i.e., low-frequency effects) rather than any high-frequency variations in emission line magnitudes since subtracting the latter might eliminate informative but low-magnitude emission line responses.Baselines were estimated for each spectrum and subtracted.Once the baseline was removed, the spectrum was clipped to remove Ar spectral lines between 675 and 935 nm (Figure 4).
Rather than using a priori knowledge of the elements that are hypothesized to distinguish samples from different sources, classifiers develop statistical models based on the behavior of each spectral feature and then, based on these models, determine which features are most important for the discrimination between the classes of samples to be labeled.No a priori knowledge of composition is required for discrimination, so it is possible to examine the features that the classifier used to make its decisions once a classifier is trained and, thereby, acquire an estimate of the compositional differences by which samples were discriminated.
The performance of any classification approach is dependent on both the within-class and across-class variability of the measurements.The former defines the ability to model a particular class by asking the question: Does the training data set fully represent the test data set?On the other hand, the latter indicates how well the differences between classes upon which the classification decision rests are modeled.Classes will be completely discriminable if the clusters of their populations do not overlap within feature space.If the within-class variability is high (i.e., the population cluster is large), a class is more likely to overlap with other similar classes, thereby reducing classification performance.If the across-class variability is small (i.e., the population clusters are close together), then the classes are more likely to overlap, thus reducing classification performance.
Variability also influences the number of samples necessary for the adequate modeling of each class and estimating the decision boundaries between classes.High within-class variability requires a large number of samples to ensure that the training data adequately represent the actual population.Low across-class variability requires a higher number of samples in order to ensure that the most accurate decision boundary between the classes is determined.The latter can be conceptualized by thinking of two classes with clusters that are well-separated in feature space.Any number of lines can be drawn between the two classes to separate them.However, as the class clusters become closer and closer together, the number of lines that maximize accurate separation (i.e., classification accuracy) becomes fewer and fewer, and the number of samples necessary to ensure that the best separation line is estimated increases.

Spectral Similarity
A requirement for high-quality handheld LIBS analysis is that the surface presented to the instrument faceplate be flat so that the laser beam strikes the sample surface orthogonally.Spectral emission intensities will be diminished when this is not the case.In this study, each obsidian flake analyzed was in its natural form (i.e., a conchoidal surface with a degree of flatness that depended on the size of the flake).Thus, a high degree of signal-to-noise ratio (SNR), i.e., the magnitude of a LIBS emission line compared to the level of background noise, variation was observed.The inclusion of low SNR spectra could potentially confound the modeling of the classes by increasing within-class variability.Given the low cost of measuring spectra with the handheld LIBS system, a large number of spectra were collected, and a second stage of preprocessing was added to detect and discard low SNR spectra.
Low SNR spectra were detected using a similarity analysis where similarity was measured as the pairwise correlation distance, which is defined as the Pearson correlation coefficient between two spectra subtracted from one: where C m,n is the pairwise correlation distance between spectra m and n, s i is the emission magnitude at wavelength i, s is the average emission magnitude across the spectrum, and W is the number of emission lines.For each of the five obsidian pieces analyzed for every sample, the pairwise correlation distance between spectra was calculated.For each spectrum, the median correlation distance with the other spectra was compared to a threshold.If the median correlation distance was greater than the threshold, the spectrum was rejected as too dissimilar.If more than 50% of the spectra for a piece were rejected, all the spectra were rejected for that piece.This second restriction was added with the reasoning that if less than 50% of the spectra for a piece were similar, the true spectral representation for that piece is likely unknown.The threshold for the correlation distance was chosen as 0.02 based on the probability density functions (PDFs) of the median correlation distances for each of the obsidian sources (Figure 5).For each source, the median correlation distances for all of the spectra were used to estimate the PDF using kernel density estimation.Note that the majority of spectra for each source have median correlation distances below 0.02 (the maxima of the PDFs are generally less than 0.02), with some scattered peaks at greater correlation distances, likely indicating low SNR spectra.A post hoc analysis was conducted at the conclusion of the source discrimination study (see Section 5.1.3),which validated the selection of this threshold for these data.
Overall, 28.6% of the spectra that were collected for this study were rejected.With this spectra rejection technique, it is possible to remove entire samples from the data set if all the pieces' spectra from a sample of obsidian are rejected; however, using this threshold did not result in any sample removal.Table 2 lists the percentage of samples, pieces, and spectra that were rejected by class.Spectral rejection was spread somewhat unevenly across the classes: 40.6% for Fish Springs, 37.3% for the Medicine Lake Highlands, 34.1% for Bodie Hills, 33.9% for the North Coast Ranges, 32.3% for Long Valley/Mono-Inyo, 30.0% for the Saline Range, and 15.9% for Coso (Table 2).
PDFs are generally less than 0.02), with some scattered peaks at greater correlation distances, likely indicating low SNR spectra.A post hoc analysis was conducted at the conclusion of the source discrimination study (see Section 5.1.3),which validated the selection of this threshold for these data.Overall, 28.6% of the spectra that were collected for this study were rejected.With this spectra rejection technique, it is possible to remove entire samples from the data set if all the pieces' spectra from a sample of obsidian are rejected; however, using this threshold did not result in any sample removal.Table 2 lists the percentage of samples, pieces, and spectra that were rejected by class.Spectral rejection was spread somewhat unevenly across the classes: 40.6% for Fish Springs, 37.3% for the Medicine Lake Highlands, 34.1% for Bodie Hills, 33.9% for the North Coast Ranges, 32.3% for Long Valley/Mono-Inyo, 30.0% for the Saline Range, and 15.9% for Coso (Table 2).

Visualization
Principal component analysis (PCA) is a technique that transforms data via linear combinations of the features such that each successive component explains the greatest amount of variance remaining in the data (e.g., the first component explains the greatest amount of variance possible via linear transformation and the second component represents the greatest amount of the remaining variance).In this way, PCA is primarily used for dimension and noise reduction [160].A small number of components can capture a large amount of the information in a data set and, thereby, provide a means to reduce the number of features or dimensions of a spectral dataset.If the data dimensions are reduced to two or three, PCA can also be used to visualize high-dimensional data by plotting the transformed features (i.e., the PC scores).A 2D or 3D score plot can be used to graphically display the degree of clustering of LIBS spectra from the same class and the separation between samples of different classes.In operation, each principal component is an array of weights that is the same dimension as the spectrum length.A PC score for a spectrum is the sum of the multiplication of these weights and the emission magnitudes for that spectrum.The weights approach zero for wavelengths that have relatively constant magnitudes across spectra and increase in magnitude as the variance in emission line magnitudes increases.Thus, data from wavelengths that are uninformative (i.e., that do not change regardless of the sample under measurement) are discarded via the multiplication of zero weights, and data from informative wavelengths are emphasized via the multiplication of large weights.Although a similar process could be performed by picking individual wavelengths based on the variance of the magnitudes, PCA is a more efficient approach that selects multiple informative wavelengths simultaneously per component.Thus, while a few components may well represent the data, this does not necessarily equate to a few spectral peaks representing the data.
PCA score plots are analogous to typical bivariate geochemical diagrams since samples of similar composition will lie close together on the plots, and compositionally dissimilar samples will lie far apart.Given these characteristics, PCA score plots also provide a means of outlier rejection through observation.

Sample Classification/Discrimination
In classification problems, such as our obsidian discrimination application, the chemometric task is to create a mathematical model for a suite of LIBS spectral measurements that can assign a set of observations for each sample to its appropriate class.In this study, we used a supervised approach in which the correct class labels are known for each observation.While the labels are necessary for training, once a classifier is trained, no labels are necessary to use the classifier to predict the class of newly collected data.
The method used to select data for training and testing is particularly important because it affects the robustness and validity of the classification performance.In this study, retrospective data are used for training and assessing the accuracy of the classifier.However, it is important that using this approach avoids data leakage between training and testing such that the classifier does not receive information about the test data during the training phase.Thus, the same observations cannot be used simultaneously for both training and testing since the classifier would be provided with a priori information about the test data.Cross-validation provides a principled approach to address this issue.The cross-validation technique of k-folds is a partition approach in which data are randomly partitioned into different groups.Then, multiple train/test operations are performed, one for each partition of data.Initially, a partition of data is selected as the test data set, and all of the remaining partitions are used for training the classifier.This process is repeated until each partition has been used for testing.
Although it is possible to split the data at random into folds, when multiple observations occur per sample, as in this study, it is necessary to split the data into pseudo-random folds such that all the observations for a particular sample occur in the same fold.In this way, the classifier is not both trained and tested on observations of the same sample, and no data leakage occurs.A sample-based, stratified approach was used to generate the folds such that, where possible, an equal number of samples per class were placed into each fold, and all observations for any one sample occurred in the same fold.
The result after all the partitions have been processed is that each spectrum in the data set has received one set of classifier confidences.For example, with a 7-class classification schema, a spectrum would have seven confidences, with each value indicating the confidence that the spectrum belongs to one of the seven classes.Using this spectra-based approach to classification results in multiple responses per sample where a single class assignment per sample is desired.To reduce the multiple responses into a single response per sample, the class confidences are averaged across all the sample's observations, and the class with the highest average confidence is used for class assignment.
Although the number of spectra collected in this study was large, the number of samples for several of the classes was small, and a concern for small data sets is that the classification accuracy can be influenced by which samples are grouped together into each fold.For this reason, multiple iterations of k-folds were used to generate a distribution of classifier performance.For both the source discrimination and Coso sub-source discrimination, 50 iterations were used to estimate the classifier performance distribution.Five-fold cross-validation was used for the source discrimination, but the number of folds was reduced to three for the Coso sub-source discrimination due to the small number of samples for some of the sub-sources.
PLSDA is a machine learning tool for multivariate discrimination that can be used for data classification [161,162] and to statistically model the structure between sample observations and class labels in order to provide classification discrimination.As a supervised technique, PLSDA uses data with assigned group labels to train a classifier.In PLSDA, high-dimensional sample data are regressed to a categorical matrix that uses a linear model to generate predictor variables, termed latent variables (LV).The LV are linear combinations of the original variables that will ensure the maximum covariance between sample groups in the model and, therefore, allow graphical visualization and understanding of the different patterns within the data and their relationships through plots of LV scores and loadings.PLSDA searches for a subspace in which information from the independent variables, the samples, can be projected onto dependent variables, the group labels, such that the covariance between the independent and dependent variables is maximized.This creates a model for transforming additional samples into estimates of group labels.The model consists of loadings and regression weights that relate to different features of the samples [163].The loadings and regression weights can be considered measures of the importance of different features [164], with the assumption that larger magnitude loadings and regression weights indicate a stronger impact on the model.The number of LVs can be adjusted to maximize the classification accuracy via a cross-validation approach.The model then calculates the predicted probability that a sample belongs to each group in the model using Bayesian statistics.The performance of the classifier can then be tested by inputting additional data to assess the accuracy of the output group labels.
Partial Least Squares Discriminant Analysis (PLSDA) was used for classification.PLSDA is an extension of the Partial Least Squares (PLS) regression algorithm that provides discrete class responses rather than regression estimates.The implementation of PLSDA used in this study is based on the SIMPLS algorithm [165] and was developed using open-source software available at https://github.com/covartech/PRT(accessed on 26 September 2023).

Source Similarity
As noted in Table 1 and discussed in Section 2.2 above, high-silica rhyolite obsidian has a broadly similar chemical composition wherever it is erupted.Thus, the average broadband LIBS spectra for our seven California obsidian sources are likewise broadly similar.Though there are small differences between spectra, it is difficult to distinguish the four spectra without careful visual inspection (Figure 6).Chemometric analysis of our LIBS spectral data is, therefore, ideally suited to address the challenge of obsidian source discrimination and artifact provenance determination.

Source PCA
As noted above, the purpose of PCA is to define a small number of uncorrelated variables (i.e., principal components) that are the linear combinations of the measured variables that explain the maximum amount of variance in a dataset.PCA provides a useful tool for identifying whether samples within a dataset are the same or different.A 2D or 3D plot can be used to graphically display the degree of clustering of LIBS spectra from the same class and the separation between samples of different classes.PCA score plots are a product of the computation to identify the underlying structure of the variables in the spectral data and provide a set of uncorrelated principal components that best define the variation present in that data in a compact representation via the projection of principal components into 2D or 3D plots of the PC scores., Elemental loadings from PCA point to the variables that might be responsible for observed compositional differences.PCA loading plots are used to identify which variables have the largest effect on each principal component.Loadings can range from −1 to +1, with values close to −1 or +1, indicating that the variable strongly influences the component, and loadings near to 0, indicating that the variable has a weak influence on the component.Evaluating the loadings can also help characterize each component in terms of the variables.
Minerals 2023, 13, x FOR PEER REVIEW 20 of 39 the four spectra without careful visual inspection (Figure 6).Chemometric analysis of our LIBS spectral data is, therefore, ideally suited to address the challenge of obsidian source discrimination and artifact provenance determination.

Source PCA
As noted above, the purpose of PCA is to define a small number of uncorrelated variables (i.e., principal components) that are the linear combinations of the measured variables that explain the maximum amount of variance in a dataset.PCA provides a useful tool for identifying whether samples within a dataset are the same or different.A 2D or 3D plot can be used to graphically display the degree of clustering of LIBS spectra from the same class and the separation between samples of different classes.PCA score plots are a product of the computation to identify the underlying structure of the variables in the spectral data and provide a set of uncorrelated principal components that best define the variation present in that data in a compact representation via the projection of principal components into 2D or 3D plots of the PC scores., Elemental loadings from PCA point to the variables that might be responsible for observed compositional differences.PCA loading plots are used to identify which variables have the largest effect on each principal PCA score plots for the seven obsidian sources analyzed in this study, Medicine Lake Highlands, North Coast Ranges, Bodie Hills, Long Valley/Mono-Inyo, Fish Springs, Saline Range, and Coso Range, for which the number of samples per source ranged from 8 to 42, are shown in Figure 7.To reduce the number of symbols per plot and thereby increase visibility, the LIBS spectra for each sample were averaged prior to estimating the PCA model.In the initial observations of the PCA score plot, four outlier samples were noted: three samples from North Coast Ranges and one sample from Long Valley/Mono-Inyo.These four samples were discarded from all subsequent analyses.Highlands, North Coast Ranges, Bodie Hills, Long Valley/Mono-Inyo, Fish Springs, Saline Range, and Coso Range, for which the number of samples per source ranged from 8 to 42, are shown in Figure 7.To reduce the number of symbols per plot and thereby increase visibility, the LIBS spectra for each sample were averaged prior to estimating the PCA model.In the initial observations of the PCA score plot, four outlier samples were noted: three samples from North Coast Ranges and one sample from Long Valley/Mono-Inyo.These four samples were discarded from all subsequent analyses.The diagrams for the Medicine Lake Highlands and North Coast Ranges in panel (a), Bodie Hills and Long Valley/Mono-Inyo in panel (b), and Fish Springs, the Saline Range, and Coso in panel (c) display the relative positions of the data points for each source in principal component space and illustrate the compositional differences amongst the different sources.The mean values for each source dataset are shown in (d), which also shows the convex hull of the data points used to generate the means, i.e., the space that contains all the pairwise line segments between a set of points and that is also bounded by those points.The samples for each source appear to be clustered in Figure 7, with class overlap along the borders between these classes in feature space, suggesting that confusions are due to physical similarities between the sources rather than poor models of the sources.
The PCA loading plots for the first two principal components, which together explain just under 65% of the total variation in our obsidian dataset, are displayed in Figure 8.As The diagrams for the Medicine Lake Highlands and North Coast Ranges in panel (a), Bodie Hills and Long Valley/Mono-Inyo in panel (b), and Fish Springs, the Saline Range, and Coso in panel (c) display the relative positions of the data points for each source in principal component space and illustrate the compositional differences amongst the different sources.The mean values for each source dataset are shown in (d), which also shows the convex hull of the data points used to generate the means, i.e., the space that contains all the pairwise line segments between a set of points and that is also bounded by those points.The samples for each source appear to be clustered in Figure 7, with class overlap along the borders between these classes in feature space, suggesting that confusions are due to physical similarities between the sources rather than poor models of the sources.
The PCA loading plots for the first two principal components, which together explain just under 65% of the total variation in our obsidian dataset, are displayed in Figure 8.As expected for high-silica rhyolite glass, the same suite of 16 major and minor elements are reflected in both principal components: H, Li, O, Na, Mg, Al, Si, K, Ca, Ti, V, Cr, Mn, Fe, Sr, and Ba.Overall, the magnitude of the individual loadings is low, only extending to less than −1 or greater than +1 only for Ca and Na in the PC1 plot and Si, Al, Ca, and Na in the PC2 plot.Li, Mg, K, and Ba are important contributors to PC1 and Li to PC2.Loadings for the elements Si and Al are largely distinct from those for the other elements (i.e., negative for PC1 and positive for PC2), and loadings for the transition metals Cr and Mn only influence PC2.H is present in both loading plots, raising the intriguing possibility that it might be possible to make quantitative obsidian hydration measurements via LIBS.
Trace elements, such as Sc, Zn, Rb, Y, Nb, and the REE that are present in California high-silicas rhyolites at the ppm to 100s of ppm level depending on the specific element [32,34,37,42] have been shown useful in distinguishing archeological obsidian sources by a variety of studies (e.g., [60,70,77,87,92,97,102,145,166,167]) but not observed in our obsidian LIBS spectra above their intrinsic limits of detection, so these elements are not present in the loading plots.Importantly, this work demonstrates that it is possible to distinguish different California obsidian sources on the basis of their major and minor element character through the use of chemometric analysis of LIBS spectral data.
Minerals 2023, 13, x FOR PEER REVIEW 22 of 39 expected for high-silica rhyolite glass, the same suite of 16 major and minor elements are reflected in both principal components: H, Li, O, Na, Mg, Al, Si, K, Ca, Ti, V, Cr, Mn, Fe, Sr, and Ba.Overall, the magnitude of the individual loadings is low, only extending to less than −1 or greater than +1 only for Ca and Na in the PC1 plot and Si, Al, Ca, and Na in the PC2 plot.Li, Mg, K, and Ba are important contributors to PC1 and Li to PC2.Loadings for the elements Si and Al are largely distinct from those for the other elements (i.e., negative for PC1 and positive for PC2), and loadings for the transition metals Cr and Mn only influence PC2.H is present in both loading plots, raising the intriguing possibility that it might be possible to make quantitative obsidian hydration measurements via LIBS.Trace elements, such as Sc, Zn, Rb, Y, Nb, and the REE that are present in California high-silicas rhyolites at the ppm to 100s of ppm level depending on the specific element [32,34,37,42] have been shown useful in distinguishing archeological obsidian sources by a variety of studies (e.g., [60,70,77,87,92,97,102,145,166,167]) but not observed in our obsidian LIBS spectra above their intrinsic limits of detection, so these elements are not present in the loading plots.Importantly, this work demonstrates that it is possible to distinguish different California obsidian sources on the basis of their major and minor element character through the use of chemometric analysis of LIBS spectral data.

Source Discrimination
Our PLSDA results are presented here in the form of bivariate plots that portray the classification of the observations, where each entry in the matrix indicates the percentage of spectra that were identified as belonging to the column class when, in fact, they are

Source Discrimination
Our PLSDA results are presented here in the form of bivariate plots that portray the classification of the observations, where each entry in the matrix indicates the percentage of spectra that were identified as belonging to the column class when, in fact, they are actually members of the row class.The total number of samples per class is shown in square brackets to the right of each row, and the overall classification accuracy is displayed in the title.The bivariate plots were selected from the 50 random instantiations of k-folds based on the proximity of the overall accuracy to the average overall accuracy across the fifty randomizations.Thus, these are representative examples; however, each random instantiation would generate its own bivariate plot.
Our PLSDA classification of the seven obsidian sources analyzed is shown in Figure 9. Overall accuracy for the 7-source discrimination task was 92.3 ± 1.7%.The low standard deviation across the 50 randomizations suggests that there was an adequate number of samples for classification.Although there is a clear class overlap of the seven obsidian sources in the feature space of the PCA scores plot (Figure 7), which is to be expected given the similarity in obsidian major element compositional character (Table 1), the excellent overall success in their discrimination at just over 92% is a sufficient foundation to use for artifact attribution.9. Overall accuracy for the 7-source discrimination task was 92.3 ± 1.7%.The low standard deviation across the 50 randomizations suggests that there was an adequate number of samples for classification.Although there is a clear class overlap of the seven obsidian sources in the feature space of the PCA scores plot (Figure 7), which is to be expected given the similarity in obsidian major element compositional character (Table 1), the excellent overall success in their discrimination at just over 92% is a sufficient foundation to use for artifact attribution.If the potential sources for the artifacts can be further localized to the five obsidian sources on the eastern side of the Sierra Nevada, which, from an archaeological standpoint, are the most likely sources for the artifacts at the sites analyzed [50,68], then the accuracy of attribution might be expected to further improve, given that overall accuracy for discriminating just these five sources was 96.4 ± 1% (Figure 10).If the potential sources for the artifacts can be further localized to the five obsidian sources on the eastern side of the Sierra Nevada, which, from an archaeological standpoint, are the most likely sources for the artifacts at the sites analyzed [50,68], then the accuracy of attribution might be expected to further improve, given that overall accuracy for discriminating just these five sources was 96.4 ± 1% (Figure 10).Because obsidian from the CVF has been recognized to have had widespread co veyance across pre-historic California, particular attention has been given by the Calif

Coso Sub-Sources
Because obsidian from the CVF has been recognized to have had widespread conveyance across pre-historic California, particular attention has been given by the California archeological community to the question of whether discernable subsources are present at CVF.The studies of Duffield et al. [137] and Bacon et al. [42], described in Section 3.3.5 above, established the current understanding of the geological, geochemical, and geochronological character of the CVF.These authors note that obsidian dome and flow surface morphologies, geological field relationships, and dating results indicate that each of the rhyolite groups within the CVF consists of essentially coeval extrusions that occurred in a relatively short time span compared to the overall life of the CVF magmatic system.Cluster analysis of CVF geochemical data for 25 elements yielded the same seven distinct groups (i.e., sub-sources) recognized on the basis of K-Ar dating and geological characteristics.
Subsequent to this definitive geological description and examination of age relationships and the geochemical character of the CVF rhyolites, there have been several attempts within the archeological community to delineate/differentiate subsources within the CVF the basis of trace element geochemistry that have arrived at different and conflicting conclusions.Hughes [63] postulated four geochemical groups or subsources of artifact-quality Coso obsidian on the basis of contrasts in the abundances of the incompatible trace element Rb and Zr in obsidian measured using X-ray fluorescence spectrometry (XRF) from 16 sites across the CVF: (i) a Joshua Ridge group with high Zr and low Rb contents from the Joshua Ridge area that all have K-Ar ages of 89 Ka; (ii) a West Cactus Peak group of sites, two of which have K-Ar ages of 235 Ka, having intermediate-low Zr and high Rb contents; (iii) a West Sugarloaf group having intermediate Zr and intermediate Rb contents but mixed K-Ar ages of 235 Ka and 160 Ka; and (iv) the Sugarloaf Mountain group with low Zr and intermediate Rb contents and also mixed K-Ar ages of 160 Ka and 63 Ka.The four Coso obsidian subsources identified by Hughes [63] correspond to the different age and compositional groups previously identified by Bacon et al. [42], with the exceptions of one Sugarloaf Mountain site and one West Sugarloaf site.It should be noted, however, that Bouey [142] pointed out particular problems with the XRF analysis of obsidian and argued that the Sugarloaf Mountain and West Sugarloaf subsources might not be as easily distinguishable as suggested by Hughes [63].On the basis of a statistical classification analysis (RQ-mode principal components analysis) of INAA trace element data, Ericson and Glascock [144] observed that 14 elements (Rb, Cs, Mn, Hf, Sc, Zr, and the rare earth elements La, Nd, Ce, Eu, Tb, Dy, Yb, and Lu) contributed to a sample classification that recognized the four groupings of Hughes [63] but also included two additional CVF obsidian compositional groups, a conclusion more in line with the initial results of Duffield et al. [137] and Bacon et al. [42] based on K-Ar dating, field relationships, and whole-rock geochemical analysis.Draucker [145] applied stepwise multielement discriminant analysis to high-precision major and trace element analyses obtained by laser ablation inductively-coupled plasma mass spectrometry (LA-ICP-MS), observing that the simple Rb and Zr dual-element approach of Hughes [63] was not successful at any subsource separation.Instead, 16 elements in order of significance (Nb, Ce, Eu, Sr, Mn, Fe, Ti, Ba, Pr, Gd, Y, Nd, Rb, Zn, Ga, and Sm) were observed to be useful in classification.Four distinct obsidian types were recognized that were consistent with the geochronological framework established for the CVF by Duffield et al. [137] and Bacon et al. [42].The Joshua Ridge and the East Sugarloaf groups are the same, as observed by Hughes [63].The other two groups are a West Cactus group and a West Sugarloaf group.The West Sugarloaf type contains samples from the south and southeast Sugarloaf sites on the older South Sugarloaf Mountain, which is part of Hughes' [63] West Sugarloaf group but includes an additional west Sugarloaf location.The West Cactus group includes the West Cactus group of Hughes [63] but includes the newly identified pebble lag Stewart Quarry site that had not been sampled in any previous studies.Another study employing LA-ICP-MS analysis by Eerkens et al. [77] analyzed samples from across the Owens Valley of east-central California.Concentrations were measured for 13 minor and trace elements in obsidian from four CVF locations-West Cactus, Sugarloaf, West Sugarloaf, and Joshua Ridge.Good discrimination was observed on the basis of Ba/Sr and La/Sm ratios, and it was noted that the ratios of Zr/Y, Dy/Yb, Nb/Zn, Mn/V, and Zn/Ti were useful in this context.
A representative example of PLSDA classification of Coso sub-sources is shown in Figure 11.The average overall accuracy across the 50 random instantiations was 84.6 ± 4.1%.Our LIBS analysis (i) recognizes the five Stewart Quarry samples as a separate subsource;

Spectra Rejection Threshold Validation
As discussed in Section 4.2.2, a threshold for spectra rejection during p was selected based on the pdfs of median correlation distances for each sour the optimal threshold for the purposes of this study is the threshold that ma sification accuracy.With this in mind, a validation study was conducted to the 0.02 threshold was the optimal threshold for these data.
The impact of the spectra rejection threshold was tested by re-running th cross-validated source discrimination task with different spectra rejection th

Spectra Rejection Threshold Validation
As discussed in Section 4.2.2, a threshold for spectra rejection during preprocessing was selected based on the pdfs of median correlation distances for each source.However, the optimal threshold for the purposes of this study is the threshold that maximizes classification accuracy.With this in mind, a validation study was conducted to determine if the 0.02 threshold was the optimal threshold for these data.The impact of the spectra rejection threshold was tested by re-running the seven-class cross-validated source discrimination task with different spectra rejection thresholds and comparing the overall accuracy distributions.The results are plotted in Figure 12 with the number of samples that remained after spectra rejection for a given threshold noted in parentheses along the x-axis.For more strict thresholds, some samples were rejected completely.Note that discarding samples can be beneficial to classification since it can reduce the within-class variance; however, it can also be detrimental if discarding samples reduces the training set to too great an extent.

Spectra Rejection Threshold Validation
As discussed in Section 4.2.2, a threshold for spectra rejection during preprocessing was selected based on the pdfs of median correlation distances for each source.However, the optimal threshold for the purposes of this study is the threshold that maximizes classification accuracy.With this in mind, a validation study was conducted to determine if the 0.02 threshold was the optimal threshold for these data.
The impact of the spectra rejection threshold was tested by re-running the seven-class cross-validated source discrimination task with different spectra rejection thresholds and comparing the overall accuracy distributions.The results are plotted in Figure 12 with the number of samples that remained after spectra rejection for a given threshold noted in parentheses along the x-axis.For more strict thresholds, some samples were rejected completely.Note that discarding samples can be beneficial to classification since it can reduce the within-class variance; however, it can also be detrimental if discarding samples reduces the training set to too great an extent.Without any spectra rejection, the average classification accuracy was approximately 85% correct.Although some thresholds that resulted in rejected samples improved accuracy, the best performance was achieved by thresholds that retained all of the samples, and maximum performance was achieved with a threshold equal to 0.02.Thus, the 0.02 threshold used throughout this study appears to be a good choice for this data set.

Archaeological Obsidian Attribution
LIBS spectra were collected for 66 artifacts from seven locations (Figures 1 and 2; Table 3): Bodie Hills, Long Valley, Fish Springs, Inyo Mountains, Rose Springs, and Rochester Cave in the Mono and Inyo Counties of east-central California and the Oak Flat site in Santa Barbara County.The objective of this portion of this study was to determine whether these samples could be consistently attributed to one of five possible geological sources (Bodie Hills, Mono-Inyo/Long Valley, Fish Springs, Saline Range, and Coso).13) was used to visualize the labeled training data and unlabeled artifact data for the provenance assignment.Spectra were averaged for each sample to simplify visualization.In general, the distribution of artifacts in Figure 13 is similar to that of the source obsidians, with the exception of two Oak Flat outliers (see discussion below).This PCA model was trained using the labeled source data and applied to both the source and artifact data.13) was used to visualize the labeled training data and unlabeled artifact data for the provenance assignment.Spectra were averaged for each sample to simplify visualization.In general, the distribution of artifacts in Figure 13 is similar to that of the source obsidians, with the exception of two Oak Flat outliers (see discussion below).This PCA model was trained using the labeled source data and applied to both the source and artifact data.

Artifact PLSDA 5-Class Attribution
The first approach considered for artifact attribution was the five-source classifier discussed previously.If each artifact sample is truly from one of the five classes, and the five classes have been adequately modeled (i.e., the training set contains enough samples of each class), then using a single five-class classifier can be expected to provide the most accurate attribution performance by jointly optimizing the decision boundaries for all five classes.
The five-class PLSDA classifier was trained using the labeled source data previously used in the cross-validation analysis and tested using the artifact data.Each sample was attributed to one of the five sources based on which class resulted in the highest classifier confidence.In Figure 14, the proportion of samples attributed to each of the five sources is plotted as a function of artifact source.The number of samples per artifact source is listed in parentheses along the x-axis.
The Oak Flat, Rochester Cave, and Rose Springs are attributed only to the CVF.The Inyo Mountain artifacts are attributed to the CVF at >75%, with the Saline Range and Bodie Hills comprising alternative attributions.The Fish Springs artifacts are attributed to the Saline Range and Bodie Hills sources almost equally.Only one of the three Long Valley tools is attributed to this source.Somewhat problematically, only slightly more than 55% of the Bodie Hills artifacts are attributed to the local Bodie Hills source.From the PCA score plot (Figure 13), Bodie Hills, Fish Springs, and Saline Range all display significant overlap, which may explain, in part, the attribution of Bodie Hills and Fish Springs artifacts to Bodie Hills and Saline Range sources.

classes.
The five-class PLSDA classifier was trained using the labeled source data previously used in the cross-validation analysis and tested using the artifact data.Each sample was attributed to one of the five sources based on which class resulted in the highest classifier confidence.In Figure 14, the proportion of samples attributed to each of the five sources is plotted as a function of artifact source.The number of samples per artifact source is listed in parentheses along the x-axis.The Oak Flat, Rochester Cave, and Rose Springs are attributed only to the CVF.The Inyo Mountain artifacts are attributed to the CVF at >75%, with the Saline Range and Bodie Hills comprising alternative attributions.The Fish Springs artifacts are attributed to the Saline Range and Bodie Hills sources almost equally.Only one of the three Long Valley tools is attributed to this source.Somewhat problematically, only slightly more than 55% of the Bodie Hills artifacts are attributed to the local Bodie Hills source.From the PCA score plot (Figure 13), Bodie Hills, Fish Springs, and Saline Range all display significant overlap, which may explain, in part, the attribution of Bodie Hills and Fish Springs artifacts to Bodie Hills and Saline Range sources.

Artifact Single Source Binary Classification
As mentioned previously, the five-class classifier should provide the most accurate attribution results under some limitations, namely that adequate training data was provided and all of the artifacts are reliably attributable to one of the five sources.If the number of samples is not adequate for some classes, this can negatively impact multiple decision boundaries.Further, if some of the artifacts are not from one of the five sources, the classifier will still assign them to one of the five classes.Classifiers assign all test inputs to a class regardless of whether or not the input is similar to the data used for training.Standard classifiers do not have a none-of-the-above response.

Artifact Single Source Binary Classification
As mentioned previously, the five-class classifier should provide the most accurate attribution results under some limitations, namely that adequate training data was provided and all of the artifacts are reliably attributable to one of the five sources.If the number of samples is not adequate for some classes, this can negatively impact multiple decision boundaries.Further, if some of the artifacts are not from one of the five sources, the classifier will still assign them to one of the five classes.Classifiers assign all test inputs to a class regardless of whether or not the input is similar to the data used for training.Standard classifiers do not have a none-of-the-above response.
Given these limitations, an alternative approach to the five-class classifier was considered in which each source was modeled independently with a binary classifier.The hypothesis was that attribution might improve if issues due to low sample size were restricted to the classifier trained on the inadequately sampled class.Further, by using binary classifiers, it is possible to convert the classifier confidences to posterior probabilities of the samples belonging to a particular class and use those probabilities to generate an unsure or none-of-the-above response.
For each of the five sources, a binary PLSDA classifier was trained from the sevensource data set where the labels were binary source/not source.The additional two sources (Medicine Lake Highlands and North Coast Ranges) were used only as examples for the not source class.By increasing the not source training set, a better delineation of the source decision boundary can be achieved.It should be noted that this approach is similar to the "one-vs-all" method of converting inherently binary classifiers into multi-class classifiers [168,169].
Once a sample is tested with the set of binary classifiers, a common scale must be used to compare the five classifier outputs.Platt [170] proposed transforming SVM classifier confidences to posterior probabilities by estimating the sigmoidal relationship between these values.The method is as follows.A calibration curve (also known as a reliability curve) is generated by binning the classifier confidence values and calculating the number of positive (i.e., true source) samples with confidence within each bin.The calibration curve is the maximum likelihood estimate of posterior probability as a function of classifier confidence.For SVM and PLSDA classifiers, these curves are generally sigmoidal.A sigmoid function is fit to the calibration curve, and this function then provides the transformation of classifier confidences to posterior probabilities [171].
In order to convert classifier confidences to posterior probabilities, calibration curves were generated for each classifier.Classifier confidences were generated for each classifier using ten random instantiations of 5-fold cross-validation with the source data used for training/testing.All of the classifier confidences were pooled across the instantiations, and the calibration curve was estimated as described above.Finally, a sigmoid was fit to the calibration curve to generate the transformation function.
Artifact attribution consisted of passing the artifact spectra as inputs to the binary classifiers, converting the outputs to posterior probabilities using the classifier-specific calibration curves, averaging the posterior probabilities per class across each sample's observations, and selecting the class with the highest posterior probability.
The attribution results using the binary classifiers are plotted in Figure 15.Of the 20 debitage samples from Bodie Hills, 85% are attributed to this source.With the binary classifiers, Fish Springs continues to be confused with Bodie Hills and Saline Range, and one of the three Long Valley artifacts is attributed to Bodie Hills.The samples for the remaining three artifact sources are all attributed to CVF.Looking at the posterior probabilities provides some further insights into the classifier behaviors and suggests potential methods to investigate for further improvement to algorithm accuracy.The posterior probabilities per source are plotted for each of the Bodie Hills artifacts in Figure 16.A typical classification approach is to select the class with the maximum output; however, in two of the three cases for which a Bodie Hills sample was labeled as CVF (i.e., JC213 and JC770), the posterior probabilities are low across all the classes.These probabilities would suggest a lack of reliability in the response and could provide the means to develop an algorithm with an unsure class.Looking at the posterior probabilities provides some further insights into the classifier behaviors and suggests potential methods to investigate for further improvement to algorithm accuracy.The posterior probabilities per source are plotted for each of the Bodie Hills artifacts in Figure 16.A typical classification approach is to select the class with the maximum output; however, in two of the three cases for which a Bodie Hills sample was labeled as CVF (i.e., JC213 and JC770), the posterior probabilities are low across all the classes.These probabilities would suggest a lack of reliability in the response and could provide the means to develop an algorithm with an unsure class.
A similar analysis for Oak Flat reveals that the classification might likewise be considered unsure for a sample such as Oak Flat sample JC691 for a different reason (Figure 17).In this case, the posterior probabilities are high for two of the classes, suggesting that simply selecting the one with the higher probability might not yield the correct attribution.Note that the posterior probabilities are calculated independently per class, so it is possible to have probabilities across the classes that add up to more than one.
In contrast to the results seen in Figures 16 and 17, the posterior probabilities for the Rose Springs artifacts (Figure 18) are all 90% or above, with the other classes all having low posterior probabilities.These results would suggest a high likelihood that the attributions of CVF are reliable.
Looking at the posterior probabilities provides some further insights into the classifier behaviors and suggests potential methods to investigate for further improvement to algorithm accuracy.The posterior probabilities per source are plotted for each of the Bodie Hills artifacts in Figure 16.A typical classification approach is to select the class with the maximum output; however, in two of the three cases for which a Bodie Hills sample was labeled as CVF (i.e., JC213 and JC770), the posterior probabilities are low across all the classes.These probabilities would suggest a lack of reliability in the response and could provide the means to develop an algorithm with an unsure class.A similar analysis for Oak Flat reveals that the classification might likewise be considered unsure for a sample such as Oak Flat sample JC691 for a different reason (Figure 17).In this case, the posterior probabilities are high for two of the classes, suggesting that simply selecting the one with the higher probability might not yield the correct attribution.Note that the posterior probabilities are calculated independently per class, so it is possible to have probabilities across the classes that add up to more than one.In contrast to the results seen in Figures 16 and 17, the posterior probabilities for the Rose Springs artifacts (Figure 18) are all 90% or above, with the other classes all having low posterior probabilities.These results would suggest a high likelihood that the attributions of CVF are reliable.
Figure 18.Plot of the posterior probabilities obtained with binary PLSDA classifiers for the Rose Springs artifacts.In this case, the probability is high for only a single location for each sample so the CVF assignment is likely to be reliable.In contrast to the results seen in Figures 16 and 17, the posterior probabilities for the Rose Springs artifacts (Figure 18) are all 90% or above, with the other classes all having low posterior probabilities.These results would suggest a high likelihood that the attributions of CVF are reliable.
Figure 18.Plot of the posterior probabilities obtained with binary PLSDA classifiers for the Rose Springs artifacts.In this case, the probability is high for only a single location for each sample so the CVF assignment is likely to be reliable.

Summary and Conclusions
A high level of correct discrimination was achieved for the seven California obsidian suites analyzed in this study, and classification performance was improved using a similarity analysis approach for spectra rejection.Adding spectra rejection to the preprocessing resulted in a statistically significant improvement in overall accuracy (p < 0.001), Figure 18.Plot of the posterior probabilities obtained with binary PLSDA classifiers for the Rose Springs artifacts.In this case, the probability is high for only a single location for each sample so the CVF assignment is likely to be reliable.

Summary and Conclusions
A high level of correct discrimination was achieved for the seven California obsidian suites analyzed in this study, and classification performance was improved using a similarity analysis approach for spectra rejection.Adding spectra rejection to the preprocessing resulted in a statistically significant improvement in overall accuracy (p < 0.001), increasing overall accuracy from 85% correct to 92% correct.To achieve this improvement, however, approximately a third of the spectra were rejected.This result suggests that there may be some inherent challenges to collecting high SNR spectra from geological samples.Given the low cost of collecting spectra with handheld LIBS systems, an approach that maximizes spectra collection and uses preprocessing to reject low SNR spectra may prove beneficial in any applications for which samples do not necessarily conform to the LIBS data collection ideals.
Our examination of geologically defined subsources in the CVF demonstrates that models developed without an appropriate number of samples per class are prone to lower levels of classification performance.It should be noted that to discriminate closely similar sample classes, it not only may be data that are needed but also samples that include the maximum level of class-level compositional diversity.
Two chemometric methods were used to assign obsidian artifacts from six locations in east-central California (Bodie Hills, Long Valley, Fish Springs, Inyo Range, Rose Springs, and Rochester Cave) and a site in the south-central region of the state (Oak Flat) to their most likely source.The general agreement of the source assignments to those derived from the basis of archaeological examination provides a strong measure of confidence in our provenance attributions.The second approach also provided some encouragement for the development of an approach to chemometrics that includes an unsure category.Typically, a classifier is constrained to selecting one of a set of classes.By converting classifier confidences to posterior probabilities, it is possible to consider whether any of the classes, as modeled by the training data, are good candidates.It may also be possible to further refine an unsure response to the number of classes that are potential matches, e.g., no matches or two potential matches out of five.These kinds of adaptations may be especially beneficial for applications in which there is uncertainty with regard to the set of classes from which test samples might be drawn.
The ability to characterize the geochemical composition of obsidian artifacts rapidly and non-destructively is of great utility to archaeologists.Sourcing obsidian to its geologic origin is critical for selecting a source-specific obsidian hydration rate, thereby allowing for age calculations based on obsidian birefringent zone measurements [172].Data generated from such analyses are often compared to known geologic sources to identify obsidian conveyance patterns and to inform models of mobility and exchange [143].In addition to providing insight into landscape-level movements, obsidian sourcing has been used to identify intra-site variations in obsidian source profiles over time.These observations have led many to conclude that the observed changes in obsidian source exploitation reflect socio-political circumstances, including the privatization of some resources, resulting in differential access [70].
Within archaeology, the chemical character of obsidian artifacts is most frequently determined using XRF analysis (e.g., [56,173]).This method is non-destructive and relatively inexpensive, adding greatly to its utility.However, XRF instrumentation is burdened with special operating precautions because of its open-beam X-ray source, the inability of XRF to analyze the lightest elements of the periodic table (Z < 12), and its constraint on minimum sample size can be problematic for archaeological application.Since flake size often varies with both artifact type and production stage, eliminating smaller flakes from sample populations due to XRF size requirements can bias archaeological interpretation [174].Since LIBS analysis interrogates a mm scale rather than cm scale size domain, it permits the use of smaller specimen sizes, which has the potential to capture a broader suite of associated human behaviors.Although other analytical methods such as INAA or ICP-MS can also analyze small flakes [174], these methods are expensive and time-consuming laboratory techniques that lack the portability of LIBS and, therefore, cannot be used for real-time artifact analysis in the field during an excavation [175].
The use of portable LIBS instruments enables the rapid analysis of large numbers of samples, thus providing larger datasets while reducing the logistics effort and expenses required to collect, transport, and curate specimens for laboratory analysis.Currently, archaeology is facing a curation crisis where the demand for curatorial space exceeds the physical capacity of authorized facilities.The analyses of archaeological materials in the field could assist in alleviating this issue by reducing the need for the collection of large sample suites.A greater benefit of in-field analyses lies in the political realm surrounding Cultural Resource Management (CRM).For example, many tribal entities in the United States object to the collection and removal of artifacts from the landscape, equating this to a form of cultural erasure.The use of LIBS to undertake obsidian analysis in the field, essentially non-destructively, has the potential to alleviate the need for collection and, thereby, satisfy both researchers and indigenous peoples alike.

Minerals 2023 , 39 Figure 1 .
Figure 1.Physiographic map of California showing the obsidian sources and sites of obsidian artifacts analyzed in this study.

Figure 1 .
Figure 1.Physiographic map of California showing the obsidian sources and sites of obsidian artifacts analyzed in this study.

Figure 2 .
Figure 2. Map of the east-central California region that forms the main focus of this study showing geographic features mentioned in the text along with locations of the obsidian sources and artifact sources sampled in this study identified in Figure 1.

Figure 2 .
Figure 2. Map of the east-central California region that forms the main focus of this study showing geographic features mentioned in the text along with locations of the obsidian sources and artifact sources sampled in this study identified in Figure 1.

Figure 3 .
Figure 3. Map of the Coso Volcanic Field showing sample sites for this study (black circles) and sites dated by K-Ar with ages shown in thousands of years B.P. (Ka) from Duffield et al.[137] for the obsidian sites sampled by[42].

Figure 3 .
Figure 3. Map of the Coso Volcanic Field showing sample sites for this study (black circles) and sites dated by K-Ar with ages shown in thousands of years B.P. (Ka) from Duffield et al.[137] for the obsidian sites sampled by[42].
Widespread geographic conveyance of CVF obsidian has been recognized by the presence of Coso obsidian artifacts from pre-historic sites in both central and southern California and throughout southwestern United States [18,50,52,53,68,70,77,82,83,153].

Figure 4 .
Figure 4.Representative LIBS broadband spectra between 150-950 nm for the obsidian analyzed in this study.This sample from Sugarloaf Mountain in the Coso Volcanic Field has a composition similar to the specimen in Table1, analyzed by Bacon et al.[42], namely 35% Si, 6.5% Al, 3%-4% Na and K, 0.5 to 0.7% Fe and Ca, and ~300 ppm Mg and Rb.The Ar peaks in the infrared portion of the spectrum between 700 and 920 nm (shown in grey text) are from the Ar purge gas used via the handheld analyzer for the LIBS analysis.

Figure 4 .
Figure 4.Representative LIBS broadband spectra between 150-950 nm for the obsidian analyzed in this study.This sample from Sugarloaf Mountain in the Coso Volcanic Field has a composition similar to the specimen in Table1, analyzed by Bacon et al.[42], namely 35% Si, 6.5% Al, 3%-4% Na and K, 0.5 to 0.7% Fe and Ca, and ~300 ppm Mg and Rb.The Ar peaks in the infrared portion of the spectrum between 700 and 920 nm (shown in grey text) are from the Ar purge gas used via the handheld analyzer for the LIBS analysis.

Figure 5 .
Figure 5. Plot showing the probability density functions (PDFs) of the median correlation distances for each of the seven obsidian sources as a measure of the intra-sample variability of the LIBS spectra.

Figure 5 .
Figure 5. Plot showing the probability density functions (PDFs) of the median correlation distances for each of the seven obsidian sources as a measure of the intra-sample variability of the LIBS spectra.

Figure 6 .
Figure 6.Averaged LIBS broadband spectra between 200 to 800 nm for four of the obsidian sources sampled in this study.

Figure 6 .
Figure 6.Averaged LIBS broadband spectra between 200 to 800 nm for four of the obsidian sources sampled in this study.

Figure 7 .
Figure 7. PCA score plots for the California obsidian sources analyzed in this study: (a) Medicine Lake Highlands and North Coast Ranges, (b) Bodie Hills and Long Valley/Mono-Inyo, (c) Fish Springs, Saline Range, and Coso, and (d) all source averages with colors corresponding to those in panels (a-c).

Figure 7 .
Figure 7. PCA score plots for the California obsidian sources analyzed in this study: (a) Medicine Lake Highlands and North Coast Ranges, (b) Bodie Hills and Long Valley/Mono-Inyo, (c) Fish Springs, Saline Range, and Coso, and (d) all source averages with colors corresponding to those in panels (a-c).

Figure 9 .
Figure 9. PLSDA classification matrix for the seven California obsidian sources analyzed in this study.The number at the top of the figure is the percentage of spectra correctly classified, with the number of samples analyzed for each class noted by the number in brackets on the right side of the figure.

Figure 9 .
Figure 9. PLSDA classification matrix for the seven California obsidian sources analyzed in this study.The number at the top of the figure is the percentage of spectra correctly classified, with the number of samples analyzed for each class noted by the number in brackets on the right side of the figure.

Minerals 2023 ,Figure 10 .
Figure 10.PLSDA classification matrix for the five obsidian sources of the eastern Sierra Neva region analyzed in this study.The number at the top of the figure is the percentage of percentage samples correctly classified, with the number of samples analyzed for each class noted by the nu ber in brackets at the right side of the figure.

Figure 10 .
Figure 10.PLSDA classification matrix for the five obsidian sources of the eastern Sierra Nevada region analyzed in this study.The number at the top of the figure is the percentage of percentage of samples correctly classified, with the number of samples analyzed for each class noted by the number in brackets at the right side of the figure.

Figure 11 .
Figure 11.PLSDA classification matrix for the five obsidian sources of the eastern region analyzed in this study.The number at the top of the figure is the percentage o spectra correctly classified, with the number of samples analyzed for each class noted in brackets at the right side of the figure.

Figure 11 .
Figure 11.PLSDA classification matrix for the five obsidian sources of the eastern Sierra Nevada region analyzed in this study.The number at the top of the figure is the percentage of percentage of spectra correctly classified, with the number of samples analyzed for each class noted by the number in brackets at the right side of the figure.

Figure 12 .
Figure 12.Comparison of the effect of running the 7-class PLSDA classification source discrimination task matrix with different rejection thresholds.The overall accuracy is plotted on the y-axis.The x-axis shows the rejection threshold value and the number of samples that remained after spectra rejection.

Figure 12 .
Figure 12.Comparison of the effect of running the 7-class PLSDA classification source discrimination task matrix with different rejection thresholds.The overall accuracy is plotted on the y-axis.The x-axis shows the rejection threshold value and the number of samples that remained after spectra rejection.

Figure 13 .
Figure 13.PC scores plot for the California obsidian sources used for classifier training (red) and artifacts for classification (blue).

Figure 13 .
Figure 13.PC scores plot for the California obsidian sources used for classifier training (red) and artifacts for classification (blue).

Figure 14 .
Figure 14.Percentage of samples from each artifact source attributed to the 5 east-central obsidian sources included in this study using a single 5-class PLSDA classifier for obsidian artifact attribution.

Figure 14 .
Figure 14.Percentage of samples from each artifact source attributed to the 5 east-central obsidian sources included in this study using a single 5-class PLSDA classifier for obsidian artifact attribution.

Minerals 2023 , 39 Figure 15 .
Figure 15.Percentage of samples from each artifact source attributed to the five east-central obsidian sources included in this study using five binary PLSDA classifiers for obsidian artifact attribution.

Figure 16 .
Figure16.Plot of the posterior probabilities obtained with binary PLSDA classifiers for the Bodie Hills artifacts.For some samples, the probabilities are relatively low for all classes, suggesting some

Figure 15 .
Figure 15.Percentage of samples from each artifact source attributed to the five east-central obsidian sources included in this study using five binary PLSDA classifiers for obsidian artifact attribution.

Figure 16 .
Figure 16.Plot of the posterior probabilities obtained with binary PLSDA classifiers for the Bodie Hills artifacts.For some samples, the probabilities are relatively low for all classes, suggesting some unreliability in selecting a class from the classifier responses.

Figure 16 . 39 Figure 17 .
Figure 16.Plot of the posterior probabilities obtained with binary PLSDA classifiers for the Bodie Hills artifacts.For some samples, the probabilities are relatively low for all classes, suggesting some unreliability in selecting a class from the classifier responses.Minerals 2023, 13, x FOR PEER REVIEW 31 of 39

Figure 17 . 39 Figure 17 .
Figure 17.Plot of the posterior probabilities obtained with binary PLSDA classifiers for the Oak Flats artifacts.For JC691, two classes have high probabilities, and choosing the highest may not provide reliable results.

Table 1 .
Representative major and minor element chemistry for obsidian reported in the literature for the five California obsidian sources sampled in this study.Major element abundances are given as weight percent oxide values and trace element concentrations are given as part per million concentrations.

Table 2 .
Percent of data removed per source by spectra rejection preprocessing using a threshold = 0.02.

Table 2 .
Percent of data removed per source by spectra rejection preprocessing using a threshold = 0.02.

Table 3 .
Sources of archaeological obsidian sampled in this study.