1. Introduction
Pulsars are neutron stars that are highly magnetized and rotate rapidly. They emit beams of electromagnetic radiation that can be observed from Earth as periodic pulses [
1]. Jocelyn Bell Burnell and Antony Hewish found the first pulsar in 1967 [
2]. Since then, pulsars have become increasingly important for many astrophysical research endeavours [
3]. Some of them are tests of general relativity, examining the state of ultra-dense matter, investigating binary star evolution, and contributing to the development of gravitational wave astronomy through pulsar timing arrays [
4]. Finding new pulsars remains challenging, despite their importance. This is mainly because there is so much observational data, and accurate pulsar signals are pretty rare in a background full of noise and radio-frequency interference (RFI).
Discovering pulsars and determining their identities is crucial for contemporary astronomy and fundamental physics [
5,
6]. The 500 m Aperture Spherical radio Telescope (FAST) was finished in September 2016 in Guizhou Province, China [
7]. One of its primary research objectives was to discover pulsars. As of now, the FAST Early Science Data Center has found 240 pulsar candidates [
7,
8,
9], and 123 of these have been verified as new pulsars. Chinese radio telescopes initially discovered the pulsars J1859–0131 and J1931–01 [
8,
10], which was a significant step forward for China’s astronomy. J0318 + 0253 is the first millisecond pulsar (MSP) found by FAST and one of the weakest high-energy MSPs yet seen in radio flux [
8]. This accomplishment highlights the significance of FAST in global efforts to detect low-frequency gravitational waves. The sky survey operations conducted by FAST generate vast amounts of candidate data [
11,
12,
13]. For example, processing 2000 observational data points each day yields approximately 300,000 diagnostic images [
14]. This underscores the critical need for automated, high-efficiency candidate screening methods.
FAST generates approximately 300,000 candidate plots daily, making manual inspection infeasible. In order to deal with this flood of data, RFI excision, dedispersion, Fourier domain searching, and folding all necessitate specialist processing software like PRESTO (Ransom 2001) [
14]. This pipeline ends in the creation of diagnostic graphs, including the time-phase graph, the frequency-phase graph, the DM curve graph, and the pulse profile diagram. Such plots play a critical role in confirming pulsar prospects, which are typically generated by experienced astronomers or amateurs using citizen science projects like the Pulsar Search Collaboratory. Nevertheless, manual verification is not scalable and not reproducible due to the massive size of data.
Machine learning and deep learning algorithms have become of interest to address these limitations, as an automated method of selecting pulsar candidates. The existence of structured datasets, including the High Time Resolution Universe (HTRU) dataset and the FAST early science dataset, has allowed enormous steps forward in this area. The Parkes Radio Telescope was used to obtain the HTRU dataset that consists of 1196 confirmed pulsars and 89,996 noise candidates, demonstrating a severe class imbalance (the ratio of positive to negative cases is 1:75). Likewise, the FAST dataset is composed of 1160 positive and 14,319 negative candidates (imbalance ratio of 1:12), which provides more challenges to the classifier construction.
Early methods featured hand-engineered features, i.e., SNR, DM spacing, chi-square statistics, and peak scores, evaluated through common machine learning models, i.e., Support Vector Machines (SVMs), decision trees, and ensemble classifiers [
14]. These models were quite successful; however, because they used feature engineering, their flexibility and ability to generalize across different survey scenarios were limited. Even more recent studies have applied convolutional neural networks (CNNs) to directly analyze diagnostic plots, therefore requiring less manual feature engineering [
9]. CNNs have the capability to effectively learn hierarchical spatial patterns which qualifies them to be used in locating peculiar pulsar signals embedded in noisy or partially ruined plots.
One of the first to display the application of deep CNNs to pulsar classification was Lyon et al. [
6], who achieved higher accuracy and robustness. Zhu et al. [
9] extended it further to include picture pattern recognition and tried out hybrid models, which mix image and statistical inputs. Later, Zhang et al. [
15] introduced a custom deep learning model for pulsar classification which significantly reduced false positive rates at maintained high sensitivity. Nevertheless, such models mostly tend to rely either on image data or numerical properties, not taking advantage of the synergistic nature of multimodal data fusion.
In this respect, this paper proposes a new hybrid pulsar candidate detection algorithm based on a multi-scale DenseNet network. We combine convolutional processing of diagnostic plots with a feedforward neural network (FNN) which consumes numerical features including SNR, DM, pulse width and FFT-based scores. This two-input architecture permits this model to extract both geometrical and statistical properties of candidate signals. The representations of the features are combined and run through a classification layer to probabilistically estimate the probability that each candidate is a real pulsar.
The principal contributions of the work are three-fold. We first build a unified multimodal classification model, which combines the DenseNet based image processing and FNN based feature embedding, allowing more precise and generalizable predictions. Second, we propose a balanced and synthesized dataset, inheriting the properties of both FAST and HTRU observations to mitigate the effects of class imbalance problems, while maintaining the authenticity of the signal. Third, we perform thorough analysis by five-fold cross-validation, ablation experiments and error analysis to compare our model with the current baselines.
Our results demonstrate that the proposed model outperforms both CNN-only and FNN-only baselines across all key performance metrics, including accuracy, precision, recall, F1-score, and AUC-ROC. Moreover, the lightweight design (approximately 2.3 million parameters) and fast inference time (4.2 ms per candidate) make it suitable for real-time candidate screening in large-scale pulsar surveys. This hybrid framework not only aligns with the demands of modern radio astronomy but also sets a precedent for future developments in multimodal deep learning applications for astrophysical signal classification.
2. Materials and Methods
2.1. Dataset Construction and Composition
To test the proposed pulsar candidate recognition system, we designed a high-resolution artificial dataset comprising 20,000 instances. This dataset was designed to mimic the statistical and structural characteristics of two often-cited realistic pulsar candidate datasets: the Five-hundred-meter-Aperture Spherical radio Telescope (FAST; Pingtang County, Guizhou Province, China) and the High Time Resolution Universe (HTRU) survey conducted using the Parkes 64-m radio telescope (CSIRO, Parkes, NSW, Australia). In detail, the dataset comprises 4000 positive examples (pulsars) and 16,000 negative examples (radio frequency interference or noise), making it realistic for real-world survey conditions with an imbalance factor of 1:4. It should be noted that the 1:4 ratio reflects an early-stage candidate filtering scenario and does not represent the extreme class imbalance (1:10 to 1:106) observed in full-scale FAST and HTRU survey outputs.
The samples are identified uniquely, bear a binary label (1 if pulsar, 0 otherwise) and a source tag (FAST or HTRU subset) showing whether the candidate was sampled in the simulated FAST or HTRU subset. A total of 12 numerical features were generated on every sample. To ensure reproducibility of feature generation, each of the first 12 numeric characteristics was randomly sampled from a distribution adjusted to the typical FAST and HTRU candidate characteristics. More specifically, the value of the signal-to-noise ratio was randomly sampled from a lognormal distribution, the DM from a uniform distribution ranged from 5 to 600 pc cm−3, the pulse periods from a bimodal Gaussian distribution were suitable for regular pulsars and MSPs, the pulse widths were from a truncated Gaussian distribution, and the chi-squared statistics were from a gamma distribution. The remaining diagnostic factors, skewness, kurtosis, FFT peak score, folding RMS, maximum peak ratio, profile sharpness index, and frequency drift factor, were randomly sampled from bounded Gaussian/uniform distributions to model the observed variations in the real pulsar candidate samples.
Four diagnostic plots were also attached to each candidate which emulate typical visual outputs in the manual screening of pulsars: the integrated pulse profile, the time-versus-phase diagram, the frequency-versus-phase diagram, and the DM SNR curve. The integrated profiles were developed from 1 to multiple Gaussian components depending on the sampled pulse width. At the same time, the time-phase and frequency-phase plots were derived by repeating the pulse template across subintegrations and frequency channels, incorporating white noise, red noise, and weak sine modulations to model RFI-like behaviors. The DM–S/N plots were generated by incorporating dispersion smearing across trial DMs. The four plots were created as 128 × 128-pixel grayscale PNG images, following the FAST and HTRU display conventions. Although the dataset is synthetic, it was constructed with direct reference to published statistical distributions and imaging characteristics observed in FAST and HTRU datasets. It serves as a proxy for algorithm development, model ablation, and performance benchmarking in the absence of real-time access to proprietary observational data. Final testing on real HTRU2 or FAST datasets remains necessary for validating field deployment [
8]. A complete synthetic generation pipeline was applied, combining statistical sampling of numerical features with pulse-template modeling, dispersion smearing, and controlled noise/RFI injection to ensure realistic variability consistent with published FAST and HTRU diagnostic characteristics.
2.2. Preprocessing and Feature Normalization
Before training, all numerical features underwent z-score standardization to guarantee a compatible scale and convergence stability [
6]. Diagnostic images were resized to 128 × 128 pixels and mean zero, unit variance was normalized to make feature extraction in convolutional layers stable. Filtering or clipping of any missing or outlier values in the feature matrix was performed to empirically bounded values to ensure numerical stability.
2.3. Dataset Partitioning
The entire data was randomly stratified into training (70%), validation (15%), and test (15%) sets. The relative classes were maintained in all the subsets so that the exposure to pulsar and non-pulsar candidates was kept constant across training and assessment. Moreover, the model performance stability and generalizability were evaluated using five-fold cross-validation with a random seed varying.
2.4. Model Architecture
The recognition algorithm proposed employs a dual-branch design, where a multi-scale DenseNet backbone is utilized to process images, and a fully connected feedforward neural network (FNN) is used to process structured numerical inputs [
16]. The DenseNet architecture can be characterized as a ‘multi-scale’ architecture because it provides a connection pattern that combines the outputs of all previous convolutional blocks and passes them along to the next layers. This essentially enables the model to consider the outputs of the fine-scale representations in the early layers (with a smaller receptive field) as well as the coarse-scale representations in the deep layers (with a larger receptive field). The DenseNet branch consists of four 3 × 3 convolutional blocks, each accompanied by batch normalization and ReLU activation functions. These blocks extract visual features in a hierarchical way from the diagnostic plots. Simultaneously, the 1D feature vector also undergoes another independent feedforward neural network (FNN) consisting of two hidden layers of 128 and 64 neurons, respectively, with ReLU activations and dropout. The outputs of both branches are concatenated and forwarded to a final classification head, which consists of a sigmoid activation and a fully connected layer function, producing a probabilistic prediction of pulsar candidacy (0: non-pulsar, 1: pulsar).
In our implementation, the DenseNet branch uses a compact four-block configuration with a growth rate of 16 and a layer distribution of {4, 4, 4, 4} per block, followed by 2 × 2 average pooling between blocks and global average pooling at the end, producing a 256-dimensional feature vector. The FNN depth (128 and 64 units) was selected after exploratory grid-search experiments balancing performance and computational efficiency, and dropout (p = 0.5) was applied to reduce overfitting. The multimodal fusion step concatenates the 256-dimensional CNN embedding with the 64-dimensional FNN output to form a joint 320-dimensional representation that is passed to a fully connected classification head. This configuration provides sufficient representational capacity while remaining efficient enough for inference-time deployment.
2.5. Training Procedure
Model training was conducted using the PyTorch framework (version 2.0; Meta Platforms, Inc., Menlo Park, CA, USA). with an initial learning rate of 0.001, the Adam optimizer was used (Kingma, 2014). The binary cross-entropy loss function was modified with class weights to compensate for class imbalance. With early stopping based on stagnation in the validation loss, each model was trained for up to 50 epochs. A batch size of 128 was used, and dropout regularization was applied in both branches of the network to mitigate overfitting. The schematic workflow of the hybrid pulsar candidate classification model is presented in
Figure 1. The Adam optimizer was selected for its stability and adaptability during the training of multimodal neural networks, and for its widespread use in prior pulsar-candidate classification studies. The initial learning rate of 0.001 follows the default recommendation in the original Adam formulation and provides reliable convergence behavior in convolutional architectures without requiring extensive tuning. Empirically, this learning rate produced stable training curves in our experiments.
The system integrates diagnostic plots and numerical features as dual input modalities. Diagnostic images are preprocessed and passed through a CNN based on DenseNet architecture, while structured 1D features (e.g., SNR, DM, pulse width) are processed through a feedforward neural network (FNN). Extracted features from both paths are combined in a joint feature space and forwarded to a classification module, which outputs a prediction of either pulsar or non-pulsar. Model performance is evaluated using five-fold cross-validation with F1-score and AUC as key metrics. All experiments were performed on an NVIDIA RTX 3060 Laptop GPU (6 GB VRAM; NVIDIA Corporation, Santa Clara, CA, USA) paired with an Intel i7 processor (Intel Corporation, Santa Clara, CA, USA) and 16 GB system memory. The hybrid DenseNet–FNN model required approximately 4.8 GB of GPU memory during training. Using a batch size of 128 and 50 epochs, the full training procedure took approximately 58 min.
2.6. Evaluation Metrics
Using the standard classification metrics, accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC), the model performance evaluation was performed on the held-out test set. Additionally, confusion matrices were analyzed to quantify the rate of false negatives and false positives. To assess computational efficiency, we also reported the model’s total parameter count, training time per epoch, and inference latency per candidate.
5. Conclusions
A hybrid pulsar candidate recognition algorithm was proposed in this work within a multi-scale DenseNet framework, which aims at combining the diagnostic plot images and structured numerical features to achieve robust classification. The proposed model exploits both spatial and statistical information that is complementary in nature by integrating the convolutional and feedforward neural network branches. This model was trained and tested on a synthesized dataset of real-world observational patterns in FAST and HTRU surveys, which provides the possibility of strict benchmarking of class-imbalanced conditions. We have shown in our experiments that the hybrid model is much better than single-modality baselines in terms of various evaluation measures with an F1-score of 0.904, an AUC-ROC of 0.978, and an accuracy of more than 96 percent. Ablation studies confirmed the contribution of each diagnostic plot type, while cross-validation demonstrated the model’s stability and generalizability.
Notably, the system maintained low inference latency (4.2 ms/sample), suggesting potential for real-time integration into survey pipelines once validated on real observational data. Beyond performance, the model’s interpretability, as revealed through misclassification analysis, offers valuable insights for refining candidate verification strategies, and its modular design provides a flexible foundation for future enhancements. The research indicates that combining multimodal inputs within a unified neural architecture can enhance pulsar candidate detection, even in noisy and high-volume datasets. Future work will focus on further validating the model using live observational data from FAST and the upcoming SKA, expanding its applicability to transient signal detection, and exploring semi-supervised and transformer-based variants for deeper feature abstraction. The proposed approach makes a meaningful contribution to the automation of pulsar discovery and exemplifies the broader utility of multimodal deep learning in radio astronomy.