An Unsupervised Machine Learning Method for Electron--Proton Discrimination of the DAMPE Experiment

Galactic cosmic rays are mostly made up of energetic nuclei, with less than $1\%$ of electrons (and positrons). Precise measurement of the electron and positron component requires a very efficient method to reject the nuclei background, mainly protons. In this work, we develop an unsupervised machine learning method to identify electrons and positrons from cosmic ray protons for the Dark Matter Particle Explorer (DAMPE) experiment. Compared with the supervised learning method used in the DAMPE experiment, this unsupervised method relies solely on real data except for the background estimation process. As a result, it could effectively reduce the uncertainties from simulations. For three energy ranges of electrons and positrons, 80--128 GeV, 350--700 GeV, and 2--5 TeV, the residual background fractions in the electron sample are found to be about (0.45 $\pm$ 0.02)$\%$, (0.52 $\pm$ 0.04)$\%$, and (10.55 $\pm$ 1.80)$\%$, and the background rejection power is about (6.21 $\pm$ 0.03) $\times$ $10^4$, (9.03 $\pm$ 0.05) $\times$ $10^4$, and (3.06 $\pm$ 0.32) $\times$ $10^4$, respectively. This method gives a higher background rejection power in all energy ranges than the traditional morphological parameterization method and reaches comparable background rejection performance compared with supervised machine learning~methods.


Introduction
Electrons 1 in cosmic rays (CR) are important probe of nearby CR accelerators due to their short propagation distances in the Milky Way [1,2]. They are also widely used to search for new physics, such as the particle dark matter [3,4]. The abundance of CR electrons above GeV is significantly lower, by a factor of 10 −3 ∼10 −2 , than that of CR protons. Therefore, it is challenging to precisely measure the spectrum of electrons. Currently, the best measurements of the electron and/or positron spectra come from space (or balloon) direct detection experiments, including the magnetic spectrometers and imaging calorimeters [5][6][7][8][9][10][11][12]. The ground-based atmospheric imaging Cherenkov telescope arrays also tried to measure the total electron plus positron spectra to higher energies, which, however, are subject to large systematic uncertainties [13][14][15][16]. The spectra of electrons were measured up to a few TeV, experiencing a softening around a few GeV, a hardening around 50 GeV, and a softening around 0.9 TeV [2]. Those spectral features help establish a three-component origin model of electrons and positrons, including primary electrons from CR acceleration sources, secondary electrons and positrons from inelastic interactions between CR nuclei and the interstellar medium, and additional electrons and positrons relevant to the high-energy excesses [2].
The DArk Matter Particle Explorer (DAMPE) is a space high-energy charged CR and gamma-ray detector optimized for precision detection of electrons with a very high energy resolution and background rejection [17,18]. DAMPE is a calorimetric-type instrument, which consists of four sub-detectors. The Plastic Scintillator Detector (PSD; [19]) on the top is used to measure the particle charge up to Z = 28, and serves as an anti-coincidence detector for γ rays. The charge resolution of PSD was found to be about 0.137 (full width at half maximum) for Z = 1 [20]. The Silicon Tungsten tracKer-converter (STK) is designed for the trajectory measurement [21]. It can also measure the particle charge for Z < 8. The Bi 4 Ge 3 O 12 (BGO; [22]) calorimeter plays a crucial role in the energy measurement and the electron-proton discrimination. The total thickness of the BGO calorimeter of DAMPE reaches ∼32 radiation lengths and thus enables the calorimeter to contain electromagnetic showers without remarkable leakage below ∼10 TeV, which ensures a very high energy resolution (better than 1.5% for E > 10 GeV) and a high electron-proton discrimination capability. The NeUtron Detector (NUD; [23]) at the bottom provides a further electron-proton separation via the detection of secondary neutrons produced by interactions in the calorimeter. All the detectors have operated stably in space since the launch of DAMPE in December, 2015 [24,25]. Important progress in measuring the electron and CR nuclear spectra has been achieved [11,[26][27][28][29].
One of the most important elements for precise measurements of the electron spectrum is to "suppress" the proton background. For a calorimeter detector, this can be achieved by means of the shower morphology differences between hadronic showers and electromagnetic showers. Typically, an electromagnetic shower spreads less with a more regular morphology than a hadronic shower with similar deposited energy. In Ref. [11], a two-parameter representation of the shower morphology was developed, i.e., the lateral spread and the longitudinal development. This method can effectively suppress the proton background 2 , resulting in the level of background for electron energies a few percent below TeV. However, for E > TeV, the background increases quickly for this traditional method. An optimization of the electron-proton discrimination is necessary (e.g., [30,31]).
The Principal Component Analysis (PCA) is one of the most commonly used machine learning methods for dimensionality reduction and feature extraction [32][33][34][35]. The working principle of PCA is to express the original data in a new data space. Compared to other machine learning methods, PCA is an unsupervised machine learning and thus does not rely on simulated data. Therefore, it may avoid potential biases from models of simulation. The disadvantage of the PCA method is that the limited data statistics at high energies may result in relatively large statistical uncertainties.
This work develops an algorithm to separate electrons from protons using the PCA method. In Section 2, we introduce the basic principle of the PCA method. In Section 3, we present the detailed algorithm to separate electrons from protons applicable to the DAMPE experiment. Section 4 gives the results of our method. Finally, we conclude this work in Section 5.

The PCA Method
Generally speaking, the PCA method corresponds to a transform in a high-dimensional (not necessarily orthogonal) parameter space through a rotation matrix to find a new coordinate system, in which the variances of the data along the major axes of the new coordinate are the largest. Larger variance means that the data are more discrete and discriminative. Finding the coordinate axes corresponding to the maximum variance is equivalent to determining the eigenvectors corresponding to the maximum eigenvalues of the covariance matrix of the original data. The commonly used method to solve the eigenvalues of the covariance matrix is the Singular Value Decomposition method [36][37][38].
In our analysis, we characterize the shower morphology as the energy deposition ratio and the hit dispersion (see Section 3 for more details) in each BGO layer. A vector space is formed by the linear combination of these variables, which are then transformed into a new space through a linear transformation. In the new vector space, the first several principal components retain most of the variance of the data sample. In this work, we keep only the first three components and ignore the others. In summary, our analysis consists of 5 steps: (1) Selecting the data with good reconstruction.
Constructing characteristic variables carrying shower morphology information.
(3) Finding the eigenvector and transformation matrix. (4) Transforming the original data into the new space and finding the first three principal components. (5) Rotating the previous three-dimensional space to obtain the final component to discriminate electrons from protons.

Data Selection
Six years of DAMPE flight data are used in this analysis. The instrument dead time after trigger, the on-orbit calibration time, and the time when the satellite passes through the South Atlantic Anomaly region are excluded. We first apply a pre-selection procedure to select events with an accurate track reconstruction and a good shower containment in the BGO calorimeter. This procedure consists of a few specific conditions as follows: • The events should meet the High Energy Trigger (HET) [11] condition to ensure a good shower development at the beginning of the BGO caloriment. • The radial spread of the shower development, defined as the Root Mean Square (RMS) of the distances between the hit BGO bars and the shower axis, j /E total , should be smaller than 40 mm. The E j is energy deposited in j-th BGO bar, and D j is the distance between the corresponding BGO bar and track of the particle. This cut could eliminate a large fraction of nuclei because the hadronic shower is typically wider than the electromagnetic one. • The max energy bar of the BGO should not be on the edge of the detector. • The max energy ratio of each layer, e.g., the ratio of the max energy of a single BGO bar over the total energy of that layer, should be less than 0.35. The cut can eliminate those particles coming from the side of the detector. • The reconstructed track should pass through the top and bottom surfaces of the BGO. • Events with PSD charge should be smaller than 2 to remove heavy nuclei.
We show the efficiency of these pre-selection conditions in Figure 1. The results are obtained from the Monte Carlo (MC) simulation for an isotropic source distribution with 1 m radius and E −1 spectrum. The spectrum is then re-weighted to E −2.7 for protons and E −3.1 for electrons. We see that the pre-selection procedure can be able to suppress protons by a factor of 10∼10 3 , mainly due to the HET requirement. Furthermore, since the CR proton spectrum is approximately proportional to E −2.7 , the different energy deposition fractions in the calorimeter of protons (30∼50%) and electrons (>90%) would contribute to the suppression factor by about 3∼7 for a given reconstructed energy window [39].

Construction of Characteristic Variables
The BGO calorimeter is composed of 14 layers, and each layer consists of 22 BGO crystals placed in a hodoscopic configuration [22]. With the hit information from those 308 BGO crystals, we characterize the shower morphology from longitudinal and lateral views, respectively. The longitudinal shower development is characterized by the energy ratio in each BGO layer, where E i is the deposited energy of the i-th layer and E total is the total deposited energy in the calorimeter. The lateral spread, on the other hand, is described by the RMS of the energy deposits in each layer, where E ij is the deposited energy of the j-th bar in the i-th layer, d ij − d cog i is the distance from the j-th bar in the i-th layer to the "center of gravity" of the i-th layer, defined as Based on these 28 basic variables, F i and RMS i , we further construct higher-order variables to achieve a better particle discrimination. The simplest way is to randomly weight RMS i and F i to form a new set of variables and to search for optimal weighting coefficients. We define the new variables as where θ is the angle between the reconstructed incident direction and the vertical direction of an event, and α i , β i , γ are random numbers between 0 and 1, which will be determined by the PCA.

Finding the Principal Components
The major task of the PCA analysis is to find the optimal weighting coefficients of the variables, i.e., α i , β i and γ. We first generate tens of millions of random sets of weighting parameters. For a set of random weights, there is a new vector {RMS i , F i } for an event. Then, a covariance matrix can be obtained for a data sample. The direction of the first principal component is the direction of the eigenvector corresponding to the largest eigenvalue of the covariance matrix. Mathematically, this is to solve the eigenvectors and eigenvalues of the covariance matrix. The eigenvectors, placed in descending order of eigenvalues, form the transformation matrix. Multiplied by this transformation matrix, the vector {RMS i , F i } is transformed to a new one {X, Y, Z, . . .}, which gives the principal components in descending order of their capabilities to distinguish particles. We find the transformation matrix using the python package sci-kit (https://scikit-learn.org/, accessed on 20 September 2022) and calculate the proton rejection power. The optimal condition is to ensure that the ratio between the peak of the distribution of electron candidates and the valley is as large as possible.
The output of the PCA is a vector group with an orthogonal rank reduction. The first principal component with the largest variance, however, may not be able to effectively distinguish electrons from protons by itself. We therefore keep the first three principal components.   For the convenience of use of the PCA results, we further rotate the vector space of the first three components to find a new principal direction, which distinguishes electrons from 7 of 12 protons most effectively. This is equivalent to seeking a rotation from (X, Y, Z) to a new set of basis (X , Y , Z ), such that the single X is enough to discriminate electrons from protons well. After a proper rotation, we obtain a clearer separation of electrons and protons using the new variable X , as shown in Figure 3.

Results
Using the PCA method, we reduce the 28-dimensional parameter space to three major principal components to form a new vector space. The three-dimensional vector space is then further rotated to form a new principal axis, which separates the electrons from protons most effectively. In order to estimate the performance of the electron-proton discrimination, we use the MC simulation samples of electrons and protons as templates to fit the flight data. Note that the transformation matrix is obtained directly from the flight data, which makes our method distinct from the supervised machine learning.
Specifically, we choose three typical reconstructed energy ranges, representing low, middle and high energies, to show the distribution and background estimation. Comparisons between the simulation and flight data are shown in the left panels of Figure 4 for the three energy bands. The right panels of Figure 4 show the relative efficiencies of protons ( f B ) and electrons ( f S ) for different cuts of the X . From the template fitting results, we can estimate the residual background fractions given signal efficiencies. If we set 90% electron efficiency, the proton contamination is found to be (0.45 ± 0.02)%, (0.52 ± 0.04)%, and (10.55 ± 1.80)% for reconstructed energies 80.0-127.5 GeV, 350.0-700.0 GeV, and 2.0-5.0 TeV, respectively.  The background fraction of protons as a function of reconstructed event energy is shown in Figure 5 (left axis). And for the highest energy range of a few TeVs, it is still well controlled in our method while keeping a relatively high electron efficiency. As a comparison, the electron efficiency decreases significantly above TeV in order to suppress the proton background to a level of (10∼20)% when using the traditional method [11].
Finally, we obtain the rejection power of protons of the PCA algorithm. The proton rejection power is defined as Q = f −1 p × φ p /φ e , where f p is the residual proton fraction in the electron sample, and φ p and φ e are the primary fluxes of protons and electrons. The rejection power is calculated with the reconstructed energy for selected samples and with the primary energy for primary fluxes, respectively. Note that the reconstructed energy corresponds to the primary energy for electrons with a tiny dispersion of ∼1%. For the proton and electron fluxes, we use the fitting results as φ p (E) = 7.58 × 10 −5 (E/TeV) −2.772 [1 + (E/0.48 TeV) 5 ] 0.173/5 GeV −1 m −2 s −1 sr −1 [26], and φ e (E) = 1.62 × 10 −4 (E/0.1 TeV) −3.09 [1 + (E/0.91 TeV) 8.3 ] −0.1 GeV −1 m −2 s −1 sr −1 [11]. The proton rejection power as a function of reconstructed event energy is shown in Figure 5 (right axis). For the selected three energy bands in Figure 4, the proton rejection power is (6.21 ± 0.03) × 10 4 , (9.03 ± 0.05) × 10 4 , and (3.06 ± 0.32) × 10 4 .

Conclusions
The machine learning methods are more and more widely used in astroparticle physics. Significant improvements have been achieved in the efficiency and accuracy of particular problems such as classifications, pattern recognitions, and nonlinear inverting problems. Supervised machine learning relies on training, which is based on the simulation data. The advantage is that it is not limited by the statistics of the real data, and a very good training of the model can be achieved. However, this method requires a good match between simulation data and real data. As a consequence, the training results are highly model-dependent. Unsupervised machine learning, on the other hand, avoids such a model dependence but is subjected to statistical uncertainties of the experimental data.
Using an unsupervised machine learning method, the PCA, we discriminate electrons from protons for the DAMPE experiment. We use the six-year flight data of DAMPE to search for effective parameters to distinguish those particles. We find that the PCA method performs well in the electron identification. The residual proton contamination fraction is estimated to be (0.45 ± 0.02)%, (0.52 ± 0.04)%, and (10.55 ± 1.80)% for electron energies of 80.0-127.5 GeV, 350.0-700.0 GeV, and 2.0-5.0 TeV. Compared with the traditional method used in Ref. [11], the PCA method improves the whole energy range. For the same electron efficiency, the proton background from the PCA method is lower by a factor of two to three. Compared with the supervised machine learning method, our approach has a comparable background suppression ability [30]. Acknowledgments: This work uses data recorded by the DAMPE mission, which was funded by the strategic priority science and technology projects in space science of the Chinese Academy of Sciences.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: (Throughout this paper, we use electrons to represent electrons and positrons without discriminating them unless specified otherwise.)