CNN-Based Classifier as an Offline Trigger for the CREDO Experiment

Gamification is known to enhance users’ participation in education and research projects that follow the citizen science paradigm. The Cosmic Ray Extremely Distributed Observatory (CREDO) experiment is designed for the large-scale study of various radiation forms that continuously reach the Earth from space, collectively known as cosmic rays. The CREDO Detector app relies on a network of involved users and is now working worldwide across phones and other CMOS sensor-equipped devices. To broaden the user base and activate current users, CREDO extensively uses the gamification solutions like the periodical Particle Hunters Competition. However, the adverse effect of gamification is that the number of artefacts, i.e., signals unrelated to cosmic ray detection or openly related to cheating, substantially increases. To tag the artefacts appearing in the CREDO database we propose the method based on machine learning. The approach involves training the Convolutional Neural Network (CNN) to recognise the morphological difference between signals and artefacts. As a result we obtain the CNN-based trigger which is able to mimic the signal vs. artefact assignments of human annotators as closely as possible. To enhance the method, the input image signal is adaptively thresholded and then transformed using Daubechies wavelets. In this exploratory study, we use wavelet transforms to amplify distinctive image features. As a result, we obtain a very good recognition ratio of almost 99% for both signal and artefacts. The proposed solution allows eliminating the manual supervision of the competition process.


CREDO Project
The Cosmic Ray Extremely Distributed Observatory (CREDO) is a global collaboration dedicated to observing and studying cosmic rays (CR) [1] according to the Citizen Science paradigm. This idea underpinned some other similar particle detection initiatives like CRAYFIS [2][3][4][5][6] and DECO [7][8][9][10]. The CREDO project collects data from various CR detectors scattered worldwide. Note that according to the project's open access philosophy, the collected data are available to all parties who want to analyse them. Given the large amount of potential hits registered in these experiments and the fact that only a fraction of them are attributable to particles of interest (mostly muons), effective on-line or off-line triggers are a must. The on-line muon trigger described in [4] was based on the CNN with a lazy application of convolutional operators. Such an approach was motivated by the limited computational resources available in mobile devices. Here, we propose an alternative approach to the CNN-based trigger design aimed principally at off-line use. However, the moderate size of convolutional layers in our design in principle allows for use also with the limited resources of smartphones.
CR are high-energy particles (mostly protons and atomic nuclei) which move through the space [11]. They are emitted by the Sun or astrophysical objects like supernovae, supermassive black holes, quasars, etc. [12]. CR collide with atoms in the Earth's atmosphere thus producing secondary particles that can undergo further collisions, finally resulting in particle air showers that near the Earth surface consist of various particles, mainly photons and electrons/positrons but also muons. Muons are of principle interest to us because their signatures are easily distinguishable from other particles. Moreover, it is guaranteed that they are of cosmic origin as there are no terrestrial sources for muons. Such air showers can be detected by various CR detectors. CR studies provide an alternative to studying high-energy particle collisions in accelerators [13] and in terms of energies available surpass them by several orders of magnitude. As the CR are ionising radiation, they can cause the DNA mutations [14], damage of hardware, data storage or transmission [15]. Monitoring the intensity of the radiation flux (cosmic weather) is also important for manned space missions [16] and humans on Earth [17]. Existing detector systems are operating in isolation, whereas CR detectors used by the CREDO collaboration operate as part of a global network. This may provide new information on extensive air showers, or Cosmic Ray Ensembles [1]. The CREDO project is gathering and integrating detection data provided by users. The project is open to everybody who wants to contribute with their own detector. Most of the collected data comes from smartphones running the CREDO Detector app, operating on the Android system [18]. The physical process behind registering cosmic rays with the app is identical to that used by silicon detectors in high-energy physics experiments. The ionising radiation interacts with the camera sensor and produces electron-hole pairs [19]. Then, the algorithm in the app analyses the image from the camera and searches for CR hits. Signals qualified as hits of cosmic rays are then sent to the CREDO server.
The overall scale of the CREDO observation infrastructure and the data collected so far can be summarised in the following statistics (approximate values as of February 2021): over 1.2 × 10 4 unique users, over 1.5 × 10 4 physical devices, over 1.0 × 10 7 candidate detections registered in a database, and the total operation time of 3.9 × 10 5 days, i.e., more than 1050 years.

Gamification as a Participation Driver
To arouse interest in cosmic ray detection with the CREDO Detector application among primary and secondary school pupils, university students and all interested astroparticle enthusiasts, an element of gamification was introduced. One of these elements is the "Particle Hunters" competition. In this competition, each participant teams up with other participants under the supervision of a team coordinator who is usually a teacher from their educational organisation. Each participating team's goal is to capture as many good cosmic ray particle candidates as possible using the above-mentioned CREDO Detector application. The competition is played in two categories: League and Marathon.

•
Competition in the League category: Consists of capturing particles during one selected night of the month. During the competition, each month, on the night of the 12th to the 13th of the month, competition participants launch the CREDO Detector application between 9 pm and 7 am (local time of each team). The winner of the competition is chosen based on the number of particles captured during that one night. In Figure 1, the day on which this event occurred (League) is indicated by a dashed vertical green line. • Competition in the Marathon category: Members of the team participating in this category launch the CREDO Detector application at any time during the competition. At the end of the contest, the total number of detections made by all team members for the entire duration of the event is calculated, including detections made for the League category.
The competition lasts 9 months. The current third edition runs from 21 October 2020 to 18 June 2021. The role of gamification in the CREDO project can be observed in a plot of the daily activity of users of the CREDO Detector Application shown in Figure 1. A significant increase in user activity during each edition of the "Particle Hunters" competition can be observed. In particular, it can be seen that the daily activity of the CREDO application users has changed since the competition inception. Statistically, about 97 users are active daily, but there is a decrease in the holiday season and an increased activity of users in periods of competitions. There is also a visible decrease during the pandemic-where the possibilities of advertising the application (e.g., at science festivals) are very limited. The horizontal axis shows the number of days from 1 January 2018. The application was launched (start) in June 2018.
The statistics of the last (completed) edition of the competition is shown in Table 1. The table compares results of all users with those participating in the competition. The above statistics show that during the competitions most of detections, i.e., 67%, come from the competitors. This proves the positive impact of the gamification upon the CREDO project performance. Unfortunately, the data collection process is exposed to the cheating users, trying to deliver as many hits as possible. Significant part of data are thus unusable for the research, because the percentage of good detection candidates decreases from 46% to 30%, see Table 1. Therefore, to be able to fully exploit the potential of gamification, an efficient, fair, and intelligent detection filtering mechanism is required, and this is where machine learning capabilities come into play. More information about the competition can be found on the official "Particle Hunters" website [20].

Data Management
The events being sent to the server can be corrupted. This is because the detection from CCD/CMOS sensors is strongly dependent on the correct use. The Android app provided by CREDO collaboration is working on the so-called dark frame, i.e., the image registered with the camera tightly covered. A user can, however, produce fake detections, by the corruption of the dark frame, e.g., with not fully covered sensor or use of an artificial light sources to simulate cosmic ray hits. More obvious cases related to fakes and hardware malfunction can be relatively easy recognised and filtered. The simplest off-line filter that is used in the competition is the anti-artefact filter, which consists of three parts of detection analysis [19]: • Coordinate analysis-more than two events at the same location on two consecutive frames are marked as hot pixels because they are statistically incompatible with the muon hit rate. • Time analysis-hits are rejected if more than 10 detections are registered on a given device per minute, which is also incompatible with the expected muon hit rate. • Brightness analysis-the frame cannot contain more than 70 pixels with luminance greater than 70 in the greyscale.
Requirements defined above reduce the number of detections from 10.5 million to about 4 million. More specifically, based on the time analysis, about 5.8 million hits were rejected, while 380 thousand hits were rejected based on the brightness analysis, with another 0.8 million rejected based on the coordinate analysis.

ANN-Based Method to Remove Artefacts
Analysis of data collected from various types of sensors is one of the most important driving forces in the development of computational intelligence methods. Significant challenges are especially issues related to the multiplicity of sources (types of sensors), the operation of sensors in distributed systems, the exponential increase in the volume of information received from them and very often the requirement to carry out the analysis in the real-time regime. Tasks related to classification and recognition are often approached by well-known statistical models [21,22], such as SVM [23], ANN [24] or RF [25]. Recently, deep learning models, based on various variants of neural networks, such as CNN [26], RNN [27] or GNN [28], are also experiencing a renaissance. Depending on the area and specificity of applications, very complex approaches are used, often utilising integration and combination of many techniques, which is particularly visible in case of interdisciplinary problems occurring, for example, in such selected fields as medicine [29,30], education [31,32], metrology [33][34][35], biometrics [36,37], learning of motor activities [38,39] or gesture recognition [40,41].
In image classification and recognition tasks, convolutional architectures (CNNs), regardless of depth, have a structural advantage over other types of statistical classifiers and usually outperform them. Therefore, in this paper, we chose to design an approach based on deep convolutional networks. Several recognised CNN-based classifier models were considered, including AlexNet [42], ResNet-50 [43], Xception [44], DenseNet201 [45], VGG16 [46], NASNetLarge [46] and MobileNetV2 [47]. The possibilities of using the concept of transfer learning were also analysed, where such networks are pretrained for large, standardised data sets, such as ImageNet [48]. The transfer learning approach to classifying the CREDO data was already discussed in [49]. Due to the peculiarity of the problem, quite unusual input data and a small spatial size of the signal in the images (only a few to a maximum of several dozen pixels), we decided to develop a dedicated architecture tailored to the specifics and requirements of the problem. To obtain the optimal classifier we explored different architectures (taking into account a constraint related to the relatively low resolution of input images) and available hyperparameter values, like learning rate, batch size, solver, regularisation parameters, pooling size, etc. Section 2.2 presents the best classifier setup we found.

Experiment Design
The experiment was performed, and its flow chart is presented in Figure 2. The experiment consists of the following steps, where some of them are described in next subsections in detail: • Data import. The source data are stored in the CREDO App database [18]. Data used in this experiment were imported and stored in a more flexible format for further computation. • Data filtering. Due to a huge amount of useless data, including very typical and obvious artefacts, the robust and deterministic filtering algorithm with high specificity was applied [50]. As a result, all non-artefact data are retained. • Manual tagging. Manual tagging using web-based software [51] by five independent researchers was performed. As a result, four classes of images were obtained: 535 spots, 393 tracks, 304 worms and 1122 artefacts. However, in this study we are focused on a binary classification. Therefore, spots, tracks and worms made up one class (called collectively signal) and the artefacts the other. Given a manually labelled dataset, we can roughly estimate the annotators' classification uncertainty in terms of the mean and standard deviation of the number of votes cast for an image that is a signal or artefact. Five people did a manual classification. We used samples whose classification was almost unanimous, i.e., Therefore, an average signal vote can be calculated according the Equation (1): Respective probabilities of a given vote number for signal are -5/5 votes probability: 66%, -4/5 votes probability: 34%, -standard deviation of votes: 0.48.
Thus, the overall vote number probability is 4.7 ± 0.5, which gives 10% of relative uncertainty. • Building a CNN model. The main part of the experiment is the Artificial Neural Network model based on Convolutional Neural Network described in Section 2.2. • Data preprocessing. We consider three approaches to preparing the input data: feeding raw data (Section 3.1), feeding wavelet transformed data (Section 3.2) and feeding the fusion of raw and wavelet transformed data (Section 3.3). • Cross-validation. The model was trained and tested in a non-stratified repeated 5-fold cross-validation standard procedure, thus resulting in 25 classification results. • Results evaluation. Finally, the results obtained were evaluated using accuracy calculated as the fraction of correct classifications to overall classifications and are presented in Section 3.

Data import
Import data from CREDO App database

Data filtering
Filter data using pattern recognition robust algorithm

Results evaluation
Evaluation of the results based on the accuracy and confusion matrix analysis

Cross validation
Perform 5-fold cross validation including: split, fit and evaluation Assigning data to classes: [spots, tracks, worms] collectively called signals and artefacts.
Only raw data Only wavelet transformed data Combined raw and wavelet transformed data The computations were optimised with respect to various single wavelet transformations performed during preprocessing.

CNN Model and Its Architecture
The Convolutional Neural Network (CNN) model shown in Figure 3 was build to perform the experiment. The model is moderately deep and its convolutional layers are moderately wide. The motivation for such an architecture was the potential to use it as the lightweight trigger in the online applications. Therefore, the network we used had a typical architecture including convolutional, pooling and fully-connected layers. In this architecture, the model hyperparameters to be configured include the size of filters and kernels, the activation function in convolutional layers, the size of the pool in pooling layers, the output space dimensionality, the activation function and kernel initialiser as well as their regularisers in fully-connected layers. The best hyperparameter combination was found manually by performing many trial and error cycles. Finally, we used the architecture that consisted of the layers and its parameters which are listed in Table 2. The optimisation algorithm RMSProp [52] using a batch size of 64 was used as the solver. Additionally, for the fully-connected (dense) layers, a combined L1 and L2 regularisation (so-called Elastic Net [53]) with coefficients of 0.01 was applied. Table 2. Layer-by-layer summary of the proposed CNN model. Each layer name is given followed by the number of feature maps (convolutional layers) or neurons (dense layers), the size of the convolutional filter or pooling region, the activation function used and, last, the number of parameters to learn.

Applying Wavelet Transforms as Feature Carriers
The CNN input layer can be fed with raw images that in case of CREDO data are 60 × 60 (RGB) three-layer colour images. Given the great diversity of artefact images in terms of types and shapes we came up with a design which focuses on general image properties like the shape of the border or the connectedness of the image pattern. These general properties can be amplified by applying wavelet transformation. As a result, one obtains the averaged image along with horizontal and vertical fluctuations which amplify horizontal and vertical border components, respectively. Accordingly, the raw data are subject to preprocessing as per the recipe below. The first preprocessing step is a greyscale conversion, which is implemented by summing up the channels. This step is aimed to remove a redundant information which does not carry any physical interpretation. The colour of the pixel is associated with the colour filter, overlaid on the CMOS array, that happened to be hit during detection. This is basically a random event and is not correlated to radiation species. The next step is the noise reduction. As the analysed images were of different overall brightness, we decided to apply a noise cut-off algorithm which depends on the average brightness. Moreover, images marked as "artefacts" usually differ from "non-artefacts" by a few standard deviations in brightness. The two above mentioned quantities were used to define the cut-off threshold, i.e., average and standard deviation of brightness (Equations (2) and (3)). The threshold was determined for each image separately. The standard deviation was calculated for the total brightness of each image: where b i denotes mean of brightness and σ i is a standard deviation of brightness of i th image. Finally, the threshold used for noise reduction has the form threshold = t i for t i < 100 100 for t i ≥ 100 All pixels below the threshold are cut off. A set of images prepared in this way is subject to wavelet transform. More specifically, before feeding the images to the CNN, the Daubechies wavelet transformation was performed on them. Formally the original image signal f was transformed into four subimages according to the formula where subimage a denotes the average signal while h, v and d denote the horizontal, vertical and diagonal fluctuations, respectively [54]. All subimages have half the resolution of the original image. The full preprocessing flow for exemplary images selected from the dataset is presented in Figures 4-7.

Baseline Triggers
As already mentioned, the main rationale behind proposing the CNN-based trigger for CR detection in the CMOS cameras is the potential to easily extend this solution to any number of classes without essential changes in the network architecture, thus providing the consistence in signal processing. Still, it is instructive to compare the CNN-based trigger with a baseline classifiers which capture just the main differences between images attributable to signals and artefacts. There are indeed two qualitative features which enable the separation of signal and artefact images. These are the integrated luminosity (artefacts are generally brighter) and the number active pixels (in artefact images usually more pixels are lit). For the purposes of baseline triggers, both quantities, denoted l and np, respectively, take into account only the pixels above the threshold defined by Equation (3). Then, we determine the minimum integrated luminosity l art min and minimum number of active pixels for images labelled as artefacts np art min and maximal integrated luminosity l sig max and maximal number of active pixels np sig max for images labelled as signals. Given these quantities, the parameters determining the decision boundary are defined as np b = (np art min + np sig max )/2 and l b = (l art min + l sig max )/2. The decision boundary itself is thus defined as the quarter ellipse All examples falling inside the quarter ellipse are classified as signals and those outside of it, as artefacts. The distribution of the signal and artefact labelled examples around the decision boundary is shown in Figure 8. It is visible that the vast majority of signals lies within the decision boundary. However, still, there is some artefact admixture in this region. One can think about defining the decision boundary in a more elaborate way than that defined by Equation (5). To this end we tested the refined baseline triggers in the form of the kNN and Random Forest classifiers working in the same feature space as the base trigger. The performances of base trigger and its refined versions are summarised in Table 3. One sees that the baseline triggers perform surprisingly well, with the average signal and artefact recognition accuracy at the level of 96-97%. One also observes that the accuracy of the artefact recognition is about 4% worse across all baseline triggers. This difference can be attributed to the fraction of artefacts lying within decision boundary. This fraction could not be isolated out even with refined kNN and RF refined baseline triggers.
Note, however, that the overall high performance of the baseline triggers is reached at the cost of complete lack of generalisability, i.e., inability to work with increased number of classes. This is because the signals consisting of, e.g., straight lines (called tracks) and those consisting of curvy lines (called worms) and having the same number of active pixels, are entirely indistinguishable in this feature space. Table 3. Performance of the base model applied to the input data. Three variants of classifiers using manually selected two features were analysed: a simple heuristic model based on decision rules, a kNN type classifier (k = 7, metric = L2) and a classifier based on boosted decision trees called random forests (number_estimators = 100, depth_trees = 2). Results have been estimated using repeated k-fold validation (5 rounds with 5 folds each).

Type
Overall Acc ± Std Dev Signal Acc ± Std Dev Artefact Acc ± Std Dev

Experimental Results
In this section, we discuss various preprocessing and training strategies which are aimed at the CNN-based trigger to follow the human annotators signal/artefact assignment as closely as possible. All computations have been performed on the Google Colaboratory platform using TensorFlow [55] and Keras libraries [56].

Training on Raw Data
In our base model, the raw un-preprocessed data were fed to the CNN. The objective of the base model was to evaluate and fine-tune the CNN architecture and to test the model's vulnerability to the noise present in the original data. As shown in Table 4, the base model performs remarkably well in identification of both signals and artefacts, exceeding the accuracy of 98% in both cases. The corresponding confusion matrix is shown in Figure 9 and indicates that the distribution of the misclassified images is rather uniform. Table 4. CNN model performance for raw data set. Results have been estimated using repeated k-fold validation (5 rounds with 5 folds each).

Tensor Depth
Overall Acc ± Std Dev Signal Acc ± Std Dev Artefact Acc ± Std Dev

Training on Wavelet Transformed Data
In this section, we present the results obtained using the method discussed in Section 2.
To select the optimal form of the input to the CNN, apart from the adaptive thresholding discussed in Section 2.3, we tested several types and combinations of Daubechies wavelet transforms available in Mahotas library [57]. In Table 5, we show the recognition accuracy rates for the input signals in the form of a single wavelet (1-dimensional input tensor). All results have been evaluated using the repeated 5-fold cross-validation. Apparently, the application of any type of the wavelet from the set (D2, D4, . . . , D20) results in a recognition rate, for both signals and artefacts, equal to 98% within two standard deviations.
In Table 6, we show the accuracy results obtained with another approach, where the CNN was fed with wavelet tensors of varying depths in the range from D2:D4 (2 wavelets) to D2:D20 (10 wavelets). Again, very stable accuracy at the level of 98% was found across various wavelet sequences.
We attribute this accuracy stability to the fact that, even though both signals and artefacts are very diverse within their respective classes, there are clear morphological distinctions, e.g., in terms of the number of active pixels, between signals and artefacts. This can be observed by comparison of Figures 4-7.  Finally, in Figure 11 we show confusion matrices, for both single wavelet and wavelet sequence versions of the experiment. Again, we see that (within one standard deviation) the misclassification rate is the same for signals and artefacts and is not worse than 2%. As shown in Figure 12, the CNN learning curves for the wavelet transformed input stabilise around the 20th epoch.

Combined Approach
Finally, we explored the possibility to feed the CNN with both the raw data as well as the thresholded and then wavelet transformed data. This way, the model was exposed to effective feature extraction (by the wavelet transform), while retaining the information of the substantial noise component. Again, as can be observed in Table 7, the recognition rate exceeds 98% but compared to the training on raw data or wavelet transformed data separately, we do not observe substantial gain in combining the two approaches. The corresponding confusion matrices are shown in Figure 13.

Discussion of Experimental Results
The CNN classifier variants introduced in Sections 3.1-3.3 retain the same architecture but differ in the type and size of the input data. Despite this, their performance is comparable to within one standard deviation and achieves an accuracy close to 99%. To ascertain robustness and stability, verification of the obtained accuracy was performed using the k-fold cross-validation technique. The summary results of the computational experiments in this regard are given in Tables 4-7. As the performance of the different variants of the classified do not differ significantly, in practical applications the model with lower time complexity should be favoured. The performance estimation of the different variants of the proposed classifier in this regard is presented in Table 8. These values indicate that it is worth using models requiring input data of the smallest possible size, i.e., raw images or single wavelets. This may be important in applications that require running the model directly on mobile devices (smartphones).
The learning curves shown in Figures 10, 12 and 14 exhibit some perturbations over the first few epochs. This is particularly strongly visible for models using tensor inputs containing wavelets. This phenomenon is probably due to rapid changes in model parameters during the initial learning phase. This is turn may be a consequence of the relatively small depth of the network and the small spatial size of the input images.

Demonstration of Models Performance
We want to stress one more time that at the present stage of investigation the only meaningful question one may ask in not how accurate the classification is but rather how accurately the trigger mimics the human annotators and how consistent it is in triggering. Figures 15 and 16 show the random specimens of 25 images classified as signals and artefacts. Both figures show, albeit qualitatively, that the trigger is rather consistent. Providing more quantitative support of trigger accuracy requires larger set of annotated images. An alternative approach would be to cross check the CNN based trigger with an alternative trigger. We are currently performing such a study.

Discussion of Alternative Architectures
The main motivation behind the trigger architecture discussed in the preceding sections was to create a solution which, on the one hand, will be able to encompass the great variety of signal and artefact morphologies and, on the other hand, will easily generalise (without a change in network structure) to several signal classes. The canonical classes of signals observed in CCDs have been defined almost 20 years ago as spots, tracks and worms [58]. However, later CMOS-based observations, also by CREDO collaboration, suggested the emergence of multi-track signals, so the classifying network must be big enough to be able to accommodate such extended classification. Furthermore, given the current CREDO dataset size of several millions of images and its designed increase by two orders of magnitude, we have adopted preprocessing operations that are as simple and time efficient as possible. Therefore, having performed the wavelet transform, we refrained from further image segmentation but rather utilised the CNN's capability to simultaneously process several sectors of each image, and then fed the four sub-images resulting from Equation (4) as a single image. Now, it is tempting to check how these two assumptions (flat input and big network) impacted the overall classifier's performance. To this end we performed two exploratory studies, discussed in the following two subsections, where we analysed an alternative input organisation, and secondly analysed the performance of the CNN with the number of input parameters reduced by an order of magnitude.

Alternative Input Organisation
The wavelet transform computed for a single image generates four components (a, v, h, d), with each component being half the size of the original input image. In the basic solution, these four components are spatially folded into a single image whose dimensions add up to the original image. This technique allows such wavelet subimages to be combined together with the original image into a single coherent tensor without scaling. In this section, we discuss another approach to constructing the input data tensor by treating individual wavelet components as separate layers. The formal definition of this way of constructing the wavelet representation as a multidimensional tensor is described by Equation (6).
where subimage a denotes the average signal while h, v and d denote the horizontal, vertical and diagonal fluctuations, respectively [54]. As a result, the wavelet representation generated for a single image takes the form of a tensor with dimensions half the size of the original image (30 × 30 px) and a depth of 4 layers. As the base model uses a set of transforms chosen in such a way that it can efficiently process input data with a resolution of 60 × 60 px, it is necessary to scale the twice smaller wavelet representation to this size. Without this rescaling, it would also not be possible to assemble the input data tensor containing the wavelets and raw image. The rescaling is done by interpolating the individual wavelet images to a higher resolution. This keeps the size of input tensors unchanged and allows a direct comparison of the obtained classification results, as the model architecture and the input data size are preserved.
In order to evaluate whether this arrangement of wavelet components noticeably affects the classification results, a corresponding experiment was conducted. Input data tensors constructed according to the newly proposed scheme, i.e., sequential arrangement of wavelet components, were loaded into the base classifier discussed in Section 2.2. These tensors were also supplemented with an additional layer in the form of a greyscale image to enrich information about the global luminance distribution. The results obtained are shown in Table 9. They are comparable in terms of standard deviation accuracy with the previously obtained results for the base model. Thus, there is no noticeable effect of the wavelet component setting on the model performance.

Application of Smaller Scale Model
Besides the input data format, the second extremely important aspect is the model architecture itself. The proposed base model by its scale far exceeds the size of the learning set. The model itself is not very deep (4 convolutional layers), but it is quite broad in the sense that it uses a significant number of filters in each layer. This results in a model size of about 1 million parameters requiring learning and tuning. Compared to the size of the training set, which contains about 2000 elements, there may be reasonable doubt as to whether such a difference in scale compromises the ability to effectively learn such a model from the available data. To verify this issue, a dedicated smaller scale model was developed for comparison purposes. During the development of the small scale model, it became apparent that it needed to be much deeper than the base model in order to learn effectively from the available data. The result of many design trials and experiments is the architecture of the convolutional network shown in Table 10 and in illustrative form in Figure 17. The final model has about 100,000 parameters requiring learning and tuning, so it is an order of magnitude smaller than the baseline model. Consequently, this model should be less susceptible to overfitting than the baseline model. This small-scale model was then trained on a standard dataset to compare the performance and statistical parameters of the classification process with that of the base model. To this end, both of the wavelet tensor ordering techniques discussed previously were used, i.e., combining subimages of a single layer and combining subimages into a sequence of layers. In addition, in both cases the tensor was extended with a layer containing the greyscale source image. Tables 11 and 12 present the obtained wavelet tensor classification results ordered according to the schemes described by Formulas (4) and (6), respectively. Table 10. Layer-by-layer summary of the proposed smaller-scale CNN model. Each layer name is given followed by the number of feature maps (convolutional layers) or neurons (dense layers), the size of the convolutional filter or pooling region, the activation function used and, last, the number of parameters to learn. Comparing the classification results obtained with the base model and the small-scale model, it cannot be concluded that they differ significantly from each other. With respect to the determined standard deviations, the analysed models show comparable performance. Thus, in our opinion, it can be concluded that the use of the base model is justified even in the case of a large scale difference with respect to the power of the available training set.

Layer
A baseline model using an architecture with more learning parameters certainly has much more potential in terms of discriminative ability and intrinsic feature representation capacity. In this sense, it may be promising to use it as a prototype solution for more demanding applications, such as multi-class classification of signals distinguishing their different morphologies.  components (a, v, h, d) set in a sequence which gives a tensor depth of 1 + 4. Results have been estimated using repeated k-fold validation (5 rounds with 5 folds each).

Summary of Alternative Architectures
To summarise our exploratory studies towards modified shape of the input tensor and the decreased number of CNN's units, we conclude neither replacing the single wavelet image with a tensor dimension equal to four nor the decrease of the number of neurons by one order of magnitude do not change significantly the classifier's performance. Thus, given the original requirements of fast image preprocessing and the network's ability to accommodate multi-class classification, we conclude that the original trigger setup is the right base for larger dataset trigger.

Summary and Outlook
We described an application of a Convolutional Neural Network to filter artefacts in the cosmic ray detection experiments performed on mobile phones. Generally, such experiments are aimed at broader scientifically oriented audience, in the framework of the so called Citizen Science philosophy. A gamification (e.g., Particle Hunters' Competitions) is an efficient method to sustain the participants' engagement, necessary for such projects to be scientifically productive. However, the gamification is accompanied by the surge of fake signals related either to the hardware malfunction or participants' cheating. Our method uses a subset of CREDO images labelled by judges as either "signal" or "artefact". We started from considering a baseline trigger whose training consisted on constructing the decision boundary in two-dimensional feature space defined by integrated luminosities and the number of active pixels. On average the baseline trigger and its refined versions based on kNN and RF classifiers performed just 2% worse than the CNN trigger. Their artefact recognition rate was, however, 4% worse than that of the CNN trigger. Then, we have studied three versions of the experiment setup and two architectures of CNN models. In the basic version, the raw CR images were fed to the CNN. In the refined version of our solution, the images were adaptively thresholded and then subject to wavelet transforms. The motivation of the wavelet transform was its ability to amplify distinctive signal features, like the shape of object borders or its fragmentation. Such input was then fed to the CNN. Finally, we have studied the impact of simultaneous feeding of raw and wavelet transformed data but found no significant improvement of the recognition rate. The overall accuracy of three discussed approaches reached the level of 98-99% for both signal and artefacts. With such accuracies the adverse effects of gamification can be effectively neutralised. Given the similar performance of all three preprocessing methods the practical application of the method is determined by time efficiency which favours the raw RGB based CNN classification.
In general, the classifiers investigated are limited to some extent by the accuracy of the annotators in recognising whether a hit is a signal or artefact. As shown in the paper, CNN triggers were found to be significantly more consistent than annotators (smaller standard deviation). The natural extension of the presented methods is to increase the number of signal classes so that various types of particle tracks can be identified. This research is currently under way.