Next Article in Journal
Predictions of the Chinese Forest Frog (Rana chensinensis) Distribution Pattern Under Climate Change up to 2090s
Previous Article in Journal
Identification, Cloning, and Functional Characterization of Carotenoid Cleavage Dioxygenase (CCD) from Olea europaea and Ipomoea nil
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring the Habitat Distribution of Decapterus macarellus in the South China Sea Under Varying Spatial Resolutions: A Combined Approach Using Multiple Machine Learning and the MaxEnt Model

1
South China Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510300, China
2
College of Marine Living Resource Sciences and Management, Shanghai Ocean University, Shanghai 201306, China
3
Key Laboratory for Sustainable Utilization of Open-Sea Fishery, Ministry of Agriculture and Rural Affairs, Guangzhou 510300, China
4
Guangdong Provincial Key Laboratory of Fishery Ecology and Environment, Guangzhou 510300, China
*
Authors to whom correspondence should be addressed.
Biology 2025, 14(7), 753; https://doi.org/10.3390/biology14070753
Submission received: 14 May 2025 / Revised: 9 June 2025 / Accepted: 23 June 2025 / Published: 24 June 2025
(This article belongs to the Section Marine Biology)

Simple Summary

In this study, a combined approach integrating multiple machine learning algorithms with the MaxEnt model was applied to systematically evaluate the habitat suitability prediction of D. macarellus in the South China Sea under different spatial resolutions. The SHAP method was employed to interpret the contributions of environmental variables within the machine learning framework. The results demonstrated that higher predictive performance was achieved at the finer 0.083° resolution (ROC_AUC = 0.836, accuracy = 0.793, and NPV = 0.862). Furthermore, external validation confirmed that the XGB model exhibited the best overall predictive accuracy and stability, with AUC values approaching 0.9. SHAP analysis identified CHL and SST as the key drivers influencing the distribution of D. macarellus, emphasizing their ecological significance. MaxEnt modeling further delineated suitable habitat areas, primarily located in the northern and central-southern regions of the South China Sea. Through comparative analysis of different spatial resolutions and modeling approaches, this study highlights that the combination of 0.083° environmental data and the XGB model is more suitable for investigating the distribution patterns of D. macarellus in the South China Sea. These advancements will provide a stronger scientific basis for the sustainable development and management of offshore fishery resources.

Abstract

The selection of environmental variables with different spatial resolutions is a critical factor affecting the accuracy of machine learning-based fishery forecasting. In this study, spring-season survey data of Decapterus macarellus in the South China Sea from 2016 to 2024 were used to construct six machine learning models—decision tree (DT), extra trees (ETs), K-Nearest Neighbors (KNN), light gradient boosting machine (LGBM), random forest (RF), and extreme gradient boosting (XGB)—based on seven environmental variables (e.g., sea surface temperature (SST), chlorophyll-a concentration (CHL)) at four spatial resolutions (0.083°, 0.25°, 0.5°, and 1°), filtered using Pearson correlation analysis. Optimal models were selected under each resolution through performance comparison. SHapley Additive exPlanations (SHAP) values were employed to interpret the contribution of environmental predictors, and the maximum entropy (MaxEnt) model was used to perform habitat suitability mapping. Results showed that the XGB model at 0.083° resolution achieved the best performance, with the area under the receiver operating characteristic curve (ROC_AUC) = 0.836, accuracy = 0.793, and negative predictive value = 0.862, outperforming models at coarser resolutions. CHL was identified as the most influential variable, showing high importance in both the SHAP distribution and the cumulative area under the curve contribution. Predicted suitable habitats were mainly located in the northern and central-southern South China Sea, with the latter covering a broader area. This study is the first to systematically evaluate the impact of spatial resolution on environmental variable selection in machine learning models, integrating SHAP-based interpretability with MaxEnt modeling to achieve reliable habitat suitability prediction, offering valuable insights for fishery forecasting in the South China Sea.

1. Introduction

Marine ecosystems and fisheries are vital components of human development, and China’s marine fisheries are among the most sensitive to climate change globally [1,2]. Significant shifts in oceanographic conditions have profoundly affected the distribution of fish populations, making accurate fishery forecasting critically important for the sustainable development and effective management of marine fisheries [3,4]. In recent years, remotely sensed oceanographic data with multiple spatial resolutions have been widely used in marine habitat suitability modeling and fishery forecasting [5,6,7]. Given the vastness of the world’s oceans and the logistical and economic constraints of conducting in situ surveys, remote sensing has become an indispensable tool for marine scientists and resource managers. These datasets provide continuous, large-scale, and near-real-time information on key environmental parameters such as SST, CHL, salinity, and ocean currents, which are critical for understanding species–environment interactions. In 2023, based on 0.25° spatial resolution, the spatiotemporal variation in suitable habitats of Sardinops sagax was predicted [8]. In 2023, the habitat distribution of 12 highly migratory top predator species was predicted based on environmental variables at a 0.083° spatial resolution [9]. Exploring the relationship between environmental factors and species habitat distribution, and accurately inferring habitat suitability, play a critical role in promoting sustainable fisheries and conserving marine resources.
The South China Sea (SCS), located at the western edge of the Pacific Ocean, is one of the most important fishing grounds in the world. It is highly vulnerable to climate change, including monsoon-driven circulation, extreme weather events, and long-term oceanographic shifts [10]. Its dynamic ocean environment and rich nutrient conditions support high levels of primary productivity, providing abundant food resources for marine organisms [11,12]. This, in turn, offers favorable habitats for economically important species such as carangids and tunas. The South China Sea contributes a significant proportion of China’s total marine fishery output and serves as a crucial livelihood source for both coastal and distant-water fishers [13]. In recent years, the region has experienced growing ecological pressures from global warming, sea-level rise, intensified human activities, and increasingly frequent extreme climatic events [14,15]. Environmental phenomena such as rising sea surface temperatures and ocean acidification have had profound impacts on marine ecosystems. Since most marine fish migrate toward environments that meet their survival needs, shifts in oceanographic conditions directly affect their distribution patterns. One of the most prominent indicators of ecosystem degradation in the South China Sea is coral reef decline [16]. Coral reefs, as critical components of the marine ecosystem, offer essential habitat for a wide range of fish and invertebrates [17]. However, widespread coral bleaching and mortality driven by climate change have forced many species to seek new, more suitable habitats [18]. These environmental changes not only threaten the stability of fishery yields but also have far-reaching implications for marine biodiversity and habitat distribution. Therefore, accurately identifying the relationship between species habitat distribution and environmental suitability is essential for promoting species sustainability and effective resource management.
Mackerel scad (Decapterus macarellus), a newly targeted carangid species in the South China Sea, has emerged as an economically important pelagic fish in recent years [19]. It typically inhabits depths ranging from 40 to 200 m and is widely distributed across tropical and subtropical oceanic regions. However, the stability of its habitat has been increasingly challenged by the impacts of climate change and intensified human activities, which have disrupted marine ecosystems and altered species distributions [20]. At present, research on D. macarellus has primarily focused on its biological [21,22,23]. Habitat-related studies have largely relied on species distribution models for prediction [24], yet few have examined how different spatial resolutions of environmental data influence modeling accuracy. Moreover, there is a lack of integrated approaches that combine multiple machine learning algorithms to improve predictive performance and ecological insight. Developing accurate predictive models can significantly enhance the identification of potential fishing and non-fishing grounds, reduce the cost of fishery operations, and improve management efficiency [25,26,27]. Therefore, selecting appropriate spatial resolutions for environmental variables and applying robust modeling approaches are essential to effectively analyze the relationship between D. macarellus and its surrounding oceanographic environment. This, in turn, provides a valuable foundation for advancing sustainable fisheries management and resource conservation strategies for D. macarellus in the South China Sea.
Fish growth and development are closely related to surrounding oceanic conditions, and habitat prediction based on various environmental factors has become a common approach in contemporary habitat modeling. Early species distributions were primarily based on statistical techniques such as generalized linear models (GLMs) and generalized additive models (GAMs), which were used to analyze the relationships between species presence and environmental variables [28,29,30]. With the development of species distribution, predictive modeling of species habitats has increasingly adopted more advanced techniques, including the MaxEnt model and ecological niche factor analysis (ENFA), both of which have been widely applied to forecast species distributions [31]. Machine learning, which is capable of identifying complex patterns in data and making accurate predictions, has seen rapid advancement in recent years, particularly in the analysis of relationships between fishery resources and environmental drivers [32]. Compared with traditional methods, machine learning algorithms offer improved flexibility and predictive accuracy, especially as data availability and algorithmic sophistication continue to improve. Among these, the RF algorithm—originally proposed by Leo Breiman—is a widely used ensemble learning method [33]. It generates outputs by averaging the predictions of multiple decision trees and is particularly effective in handling uncertainties within complex ecological systems. In addition, several representative machine learning models—including DT, ET, KNN, XGB, and LGBM—have demonstrated significant advantages in capturing complex nonlinear relationships between environmental variables and species distributions. The DT model is a classical machine learning algorithm known for its strong capability in regression and classification tasks [34]. ET is an ensemble learning method based on decision trees and can be considered a variant of the random forest algorithm; it introduces greater randomness during tree construction, thereby improving the model’s generalization ability [35]. KNN is a non-parametric, instance-based supervised learning algorithm commonly used for both classification and regression problems [36]. XGB is an optimized implementation of gradient boosting decision trees (GBDTs), designed to construct a strong ensemble by integrating multiple weak learners, and is suitable for various machine learning tasks, including classification, regression, and ranking [37]. LGBM is also a decision tree–based gradient boosting framework that has been successfully applied across multiple domains due to its efficiency and high scalability [38].
In this study, multiple machine learning algorithms were applied to predict the distribution of D. macarellus in the South China Sea using environmental variables at different spatial resolutions. These included the commonly used resolutions of 0.083° and 0.25°, as well as the less frequently utilized 0.5° and 1°. Based on the performance metrics of each model, the optimal algorithm was selected for each spatial resolution. Habitat suitability visualization was subsequently conducted using the MaxEnt model. The contribution and influence of each environmental variable were further interpreted using SHAP values and cumulative area under the curve (AUC) plots. The objective of this study was to identify the most suitable spatial resolution and the best-performing model for predicting the habitat of D. macarellus in the South China Sea. The findings aim to provide a scientific basis for the development and management of D. macarellus fishery resources in the region.

2. Materials and Methods

2.1. Sampling Information of D. macarellus

This study utilized spring survey data of D. macarellus collected from the South China Sea during the 2016–2022 period for model training, while data from 2023 to 2024 were used as an external validation dataset. The surveys were conducted using a commercial fishing vessel equipped with a main engine rated at 441 kW. The vessel measured 36.8 m in length, 6.8 m in width, and had a draft of 3.8 m. The sampling area covered the northern and central-southern regions of the South China Sea (Figure 1). The trawl net used for these surveys measured 76 m × 53.79 m × 34 m, with a mesh size of 200 mm and a cod-end mesh size of 39 mm. In most cases, sampling was conducted three times per day, with the net towed at an average speed of 3.5 knots for one hour. However, in certain nearshore areas and fishing grounds, the towing duration was reduced to 30–50 min to avoid collisions with other vessels or fixed fishing gear. Within the study area, sampling sites with a high occurrence frequency of D. macarellus were selected. These sites exhibited relatively high biomass, supporting their significance in habitat identification. Ultimately, 117 distinct and validated sampling locations were chosen for analysis.

2.2. Screening of Environmental Data

For the selection of environmental variables, seven key parameters (SSS, SSH, MLD, SST, DIS, BATH, and CHL) were obtained from the Copernicus Marine Service (https://marine.copernicus.eu/ accessed on 10 April 2025) and the Global Fishing Watch System (https://globalfishingwatch.org/ accessed on 10 April 2025) (Table 1). The selection of these parameters was based on their recognized importance in influencing the distribution of marine species. To prevent model overfitting and ensure accurate model evaluation, Pearson correlation analysis was conducted on the seven selected environmental variables to avoid excessive multicollinearity and intercorrelation among predictors. To reduce interdependence among environmental factors, only variables with Pearson correlation coefficients below 0.8 (|R| < 0.8) were retained, where values close to −1 or 1 indicate strong negative or positive correlations, respectively. Environmental data were downloaded at spatial resolutions of 0.083° and 0.25°, while datasets originally at 0.5° and 1° resolution were resampled and extracted using ArcGIS 10.7.

2.3. Model Construction and Performance Evaluation

Machine learning offers strong nonlinear fitting capabilities and can automatically extract features from complex, high-dimensional data, thereby improving predictive accuracy. It also demonstrates high flexibility and efficiency in handling large datasets and variable selection. In this study, six machine learning algorithms—decision tree (DT), extra trees (ETs), K-nearest neighbors (KNN), light gradient boosting machine (LGBM), random forest (RF), and extreme gradient boosting (XGB)—were employed to train predictive models. Model performance was evaluated through comparative analysis, and the optimal algorithm was selected for further application. Following the selection of the optimal model, SHAP values and cumulative AUC curves were employed to assess both the relative importance and the directional influence of environmental factors on species distribution. Based on this, species habitat suitability was predicted using the MaxEnt model. By comparing different machine learning models, this study aimed to enhance the accuracy and robustness of species distribution predictions under varying environmental conditions. All modeling procedures were implemented in R. In species distribution modeling, it is essential to compare environmental conditions with known species occurrences, which requires a sufficient number of presence records. However, due to geographical and economic constraints, actual sampling data for most species are often limited. As a result, the use of pseudo-absence data as a substitute has become a widely accepted approach in model construction. In this study, pseudo-absence points were randomly generated at three times the number of actual presence records within the study area to support model construction [39]. Overlapping points were removed to ensure spatial independence among all locations, thereby enhancing the predictive accuracy of the model. During the modeling process, a cross-validation approach was employed. The dataset was randomly split into the following two subsets: 70% for model training and 30% for validation. To ensure robustness, the modeling procedure was repeated 5 times. Model performance was evaluated using the following five metrics: negative predictive value (NPV), positive predictive value (PPV), specificity, area under the receiver operating characteristic curve (ROC_AUC), and accuracy. Generally, an NPV greater than 0.8 indicates good model performance, while a PPV above 0.5 suggests that the model is acceptable. A specificity value exceeding 0.8 also reflects good performance. Similarly, a ROC_AUC value above 0.8 is considered indicative of strong discriminatory ability, and an accuracy higher than 0.8 denotes satisfactory overall model performance [40]. The formulas for calculating each evaluation metric are as follows [41,42]:
A c c u r a c y = T P + T N T P + T N + F P + F N
S p e c i f i c i t y = T N T N + F P
P P V = T P T P + F P
N P V = T N T N + F N
In these formulas, TP (true positive) refers to correctly predicted presence points, FN (false negative) refers to actual presence points incorrectly predicted as absence, TN (true negative) refers to correctly predicted pseudo-absence points, and FP (false positive) refers to pseudo-absence points incorrectly predicted as presence. AUC is calculated based on the area under the ROC curve, which plots the true positive rate (sensitivity) against the false positive rate (1 − specificity).

2.4. Model Parameter Settings

In this study, multiple machine learning algorithms, including DT, ET, KNN, RF, LGBM, and XGB, were employed to predict the habitat suitability of D. macarellus in the South China Sea. The key hyperparameters for each model were set as follows: for XGB, the learning rate (eta) was set to 0.1, the maximum depth (max_depth) to 6, and the number of boosting rounds (nrounds) to 100. For RF, the number of trees (ntree) was set to 500, and the number of variables randomly selected at each split (mtry) was optimized through a grid search with tuneLength = 3. The ET model was configured with ntree = 500, mtry = 3, and nodesize = 1. For LGBM, the maximum depth was set to 6, the learning rate to 0.1, the number of trees to 100, and the Bernoulli distribution was used for binary classification tasks. The KNN model was set with k = 5, and feature standardization (centering and scaling) was applied prior to modeling. The DT model was trained using default parameter settings. All models were trained and validated using five-fold cross-validation to optimize hyperparameters and prevent overfitting. The detailed parameter settings for each machine learning model are summarized in Table 2.

3. Results

3.1. Screening Factors

Pearson correlation analyses were conducted for environmental datasets at four different spatial resolutions (Figure 2). At a resolution of 0.083° (Figure 2a), the highest correlation was observed between DIS and BATH (r = −0.66), followed by SST and CHL (r = −0.64). At 0.25° resolution (Figure 2b), the strongest correlation was also between DIS and BATH (r = −0.71), followed by SSS and CHL (r = 0.71). At 0.5° resolution (Figure 2c), the highest correlation remained between DIS and BATH (r = −0.66). At 1° resolution (Figure 2d), SST and CHL showed the strongest correlation (r = −0.80). Overall, across all resolutions, the Pearson correlation coefficients among the seven environmental variables were below the threshold of 0.8 (|r| < 0.8), indicating that all variables were suitable for inclusion in the predictive models.

3.2. Model Performance

The selected environmental variables were incorporated into the models at different spatial resolutions, and the corresponding values of NPV, PPV, specificity, ROC_AUC, and accuracy were calculated for each model (Figure 3). Based on a comprehensive comparison of model performance, the XGB model achieved the best results at the 0.083° spatial resolution, with NPV, specificity, and ROC_AUC values all exceeding 0.8, indicating strong predictive capability (Figure 3a). Additionally, the accuracy reached 0.793, which is close to 0.8 and further supports the model’s reliability (Figure 3a). At the 0.25° resolution, the ET model outperformed other models, with a specificity value of 0.908 and both NPV and ROC_AUC above 0.8, suggesting good predictive performance (Figure 3b). In contrast, at the 0.5° resolution, the PPV values ranged between 0.3 and 0.45, indicating a high error rate (over 50%); thus, this resolution was excluded from further analysis (Figure 3c). A 1° resolution was also tested; however, the configuration of pseudo-absence points was not suitable for modeling at this scale, leading to considerable inaccuracies. As a result, data at this resolution were deemed unreliable and excluded from the final model predictions.
By comparing the results, it was found that although the 0.25° spatial resolution yielded a higher specificity value (0.908) compared to that of the 0.083° resolution (0.862), it performed worse in terms of ROC_AUC, accuracy, and NPV. Therefore, the prediction at the 0.083° resolution is considered to have higher overall precision and better predictive performance.

3.3. Analysis of Variable Importance

The SHAP approach was used to quantify the relative importance of environmental variables within the optimal model (Figure 4). Additionally, cumulative AUC plots were employed to visually assess how the sequential inclusion of each environmental factor influenced model accuracy (Figure 5). At the 0.083° spatial resolution, the SHAP value distribution for CHL exhibited a wide range, with high SHAP values predominantly negative, suggesting that elevated chlorophyll-a concentrations may be unfavorable for the habitat suitability of D. macarellus in the South China Sea (Figure 4a). SST showed the most pronounced positive influence in the model, with a concentration of high SHAP values, indicating that warmer waters enhance the species’ habitat suitability and that D. macarellus tends to prefer warmer environments (Figure 4a). Both DIS and BATH showed positive contributions at intermediate values, suggesting a preference for mid-range environmental gradients (Figure 4a). In contrast, MLD and SSH exhibited no clear directional trend, with highly variable SHAP values, possibly reflecting interactions with other environmental factors (Figure 4a). The overall contribution of SSS was relatively low, with no consistent pattern observed (Figure 4a). At the 0.25° spatial resolution, SST, BATH, and DIS emerged as the most important positive predictors in the model, representing the favorable effects of warm waters, deeper environments, and offshore distance on the distribution of D. macarellus (Figure 4b). In contrast, higher concentrations of chlorophyll a (CHL) significantly suppressed the probability of species occurrence, which may be attributed to increased water turbidity or intensified ecological competition resulting from high phytoplankton density (Figure 4b). Other variables, such as MLD, SSH, and SSS, showed relatively minor influence (Figure 4b). Comparative analysis of the results across the two spatial resolutions revealed a consistent pattern: elevated SST positively influenced the habitat suitability of D. macarellus in the South China Sea, while high CHL concentrations had a negative impact.
The cumulative AUC plots were used to evaluate the contribution of the seven environmental variables in the predictive models (Figure 5). At the 0.083° spatial resolution, CHL contributed the most to model performance, followed by a noticeable increase after the inclusion of SSH and BATH (Figure 5a). A slight decrease was observed after the addition of SSS, while the inclusion of DIS and SST led to minor increases in AUC (Figure 5a). The curve eventually stabilized after the addition of MLD (Figure 5a). At the 0.25° spatial resolution, SSH exhibited the highest contribution, and a gradual upward trend in AUC was observed as CHL, SSS, BATH, MLD, and SST were sequentially added. The curve plateaued following the inclusion of DIS (Figure 5b). Overall, CHL and SSH were consistently identified as the top two contributing environmental variables at both spatial resolutions. CHL also exhibited high activity in the SHAP value distribution, further highlighting its influence. These findings suggest that CHL is a key environmental factor shaping the habitat suitability and distribution of D. macarellus in the South China Sea.

3.4. Habitat Suitability Prediction for D. macarellus in the South China Sea

Machine learning models were trained and used for prediction based on true presence data from 2016 to 2022 and artificially generated pseudo-absence points. After selecting the optimal models through performance comparison, habitat suitability predictions were conducted using the MaxEnt model. The XGB model was identified as the best-performing algorithm at the 0.083° spatial resolution, while the ET model showed the highest performance at the 0.25° resolution. The prediction results from the two spatial resolutions showed a high degree of similarity, both indicating that the suitable habitat of D. macarellus in the South China Sea is primarily located in the northern and central-southern regions, with the latter covering a larger area. At the 0.083° resolution, the higher spatial precision allowed for clearer visualization of suitable habitats, which were mainly concentrated in the central-southern South China Sea, particularly between 9–15° N and 110–117° E (Figure 6). Although the map at 0.25° resolution appeared less clear due to a coarser grid size, the core suitable habitat areas closely matched those observed at the 0.083° resolution (Figure 6b). Based on the model evaluation metrics, the 0.083° spatial resolution yielded higher predictive accuracy compared to the 0.25° resolution. In conjunction with the habitat suitability maps, it was evident that higher spatial resolution resulted in more precise identification of suitable areas (Figure 6a).

3.5. External Validation

Using 2023–2024 survey data as an external validation dataset, multiple machine learning models were compared at two spatial resolutions (0.083° and 0.25°). The results showed that the XGB model outperformed the other models at both resolutions, with AUC values approaching 0.9 (Figure 7). In addition, a comparison of the XGBoost model’s performance across different spatial resolutions revealed that its predictive stability was stronger at the 0.083° resolution (Figure 7a) than at 0.25° (Figure 7b). Habitat suitability predictions generated using the optimal model revealed that the suitable habitat of D. macarellus was primarily located in the northern and central-southern regions of the South China Sea, consistent with the findings from the training dataset (Figure 8). This external validation supports the robustness and high predictive accuracy of the model, confirming the reliability of the predicted distribution patterns. Therefore, we conclude that the 0.083° resolution is more appropriate for predicting the habitat suitability of D. macarellus in the South China Sea using a combined approach of machine learning and the MaxEnt model.

4. Discussion

4.1. Justification for the Selection of Environmental Variables

Marine environmental factors play a crucial role in shaping the distribution of D. macarellus in the South China Sea. Among them, CHL influences fishery grounds through its role in the marine food chain, as it reflects the abundance of phytoplankton—the primary producers supporting higher trophic levels [43]. SST is one of the most fundamental environmental parameters affecting fish behavior, metabolism, and distribution [44]. For pelagic species like D. macarellus, DIS and BATH are essential spatial variables that influence habitat accessibility and depth preference [45]. SSS is particularly important for migratory species, as salinity gradients often guide movement patterns [46]. SSH, which integrates oceanographic features such as currents, thermal structures, salinity fronts, and eddies, is commonly used in fishery analysis to identify dynamic oceanographic zones favorable for fish aggregation. In this study, a total of seven environmental variables—including CHL, SST, and SSH—were selected to improve the model’s capacity to capture the ecological and spatial preferences of D. macarellus. Our results identified CHL as a key factor influencing the species’ distribution. The SHAP analysis indicated that higher CHL concentrations were associated with a negative contribution to habitat suitability for D. macarellus. Biologically, this result may be explained by the species’ preference for offshore pelagic environments with relatively moderate primary productivity, rather than eutrophic nearshore waters. As an indicator of phytoplankton biomass, CHL is closely linked to feeding availability, which is essential for the survival and aggregation of D. macarellus [47]. High CHL concentrations typically occur in coastal areas with strong nutrient input and higher phytoplankton density, which may coincide with lower oxygen levels, increased turbidity, and anthropogenic stressors—conditions less favorable for this species. As a fast-swimming, migratory, mid-trophic level fish, D. macarellus tends to aggregate in mesotrophic areas where prey (e.g., zooplankton, small pelagic forage) is abundant but not over-concentrated. Spring is the main spawning season for D. macarellus in the South China Sea, and pre-spawning feeding behavior may be the primary reason why CHL emerged as a key influencing factor [48]. These findings are consistent with previous studies using SDM models to assess the species’ distribution in the South China Sea [24]. The results of this study indicate that D. macarellus prefers oceanic, warm-water ecosystems, where elevated sea surface temperatures and abundant zooplankton resources create favorable conditions for foraging and group migration. Overall, D. macarellus tends to inhabit offshore areas characterized by higher temperatures, moderate distances from the coast, and intermediate depths, reflecting the typical ecological traits of pelagic oceanic species. In this study, the application of the SHAP method provided significant advantages in interpreting the contributions of environmental factors. Unlike traditional variable importance measures, SHAP values quantify both the magnitude and the direction (positive or negative) of each factor’s contribution to the predictions, enabling a more transparent understanding of how environmental drivers influence species distribution [49]. The results revealed that key drivers such as CHL and SST played important roles in modulating the habitat suitability of D. macarellus in the South China Sea. A major advantage of SHAP lies in its ability to link machine learning model outputs to biological mechanisms, offering clearer explanations of the positive or negative effects of environmental variables on species predictions [50]. This approach enhances both the interpretability of the models and their ecological relevance.

4.2. Model Performance Comparison

In this study, the XGB model achieved the best predictive performance at the 0.083° spatial resolution, with ROC_AUC = 0.836, accuracy = 0.793, and NPV = 0.862, outperforming the ET model, which was optimal at the 0.25° resolution. Furthermore, in external validation using the 2023–2024 dataset, XGB remained the best-performing model across both spatial resolutions (0.083° and 0.25°), with AUC values approaching 0.9, demonstrating outstanding predictive robustness. XGB constructs an ensemble of decision trees sequentially, with each new tree specifically trained to correct the residual errors of the previous ensemble [51]. This approach enables the model to more effectively capture the complex nonlinear relationships between environmental factors and species distribution. A core advantage of XGB lies in its use of both first-order (gradient) and second-order (Hessian) derivatives during model training [52]. By accurately estimating the curvature of the loss function, XGBoost achieves more precise and stable updates at each iteration, which is critical for ecological datasets where environmental variables are often highly nonlinear and where small improvements can significantly enhance model performance. Additionally, XGB’s flexibility in modeling complex patterns, its strong regularization mechanisms, and its ability to mitigate overfitting collectively explain why it consistently outperformed other machine learning models across different spatial resolutions in this study.

4.3. Comparison of Spatial Resolution Effects

The comparison across different spatial resolutions clearly demonstrated that the resolution of environmental data has a significant impact on habitat suitability prediction outcomes [53]. Models constructed using 0.083° resolution data achieved higher predictive accuracy than those based on 0.25°, and were markedly superior to models using 0.5° or 1° resolution data. Specifically, at 0.083° resolution, the XGB model exhibited the highest ROC_AUC, accuracy, and NPV compared to other resolutions, although the 0.25° models also demonstrated a certain level of reliability. External validation further confirmed that at both 0.083° and 0.25° resolutions, XGBoost remained the optimal model; however, model stability was notably higher at 0.083°. The superior performance and stability at finer resolutions are likely attributable to the enhanced ability to capture small-scale environmental heterogeneity, which is critical for the habitat selection of D. macarellus in the dynamic South China Sea. In contrast, coarser resolutions tend to excessively smooth local variability, masking important microhabitat features necessary for accurate modeling. Additionally, environmental variables are undergoing subtle changes under the influence of climate change, and such fine-scale variations are more easily detected using high-resolution data. Compared to highly migratory species such as tunas [54,55], where 0.5° or 1° resolutions are commonly used for fishery forecasting [56,57], the relatively localized movement patterns of D. macarellus make the 0.083° resolution more suitable for spatial distribution analysis. A study on the distribution of Caspian kutum (Rutilus frisii) indicated that higher spatial resolutions yielded stronger model accuracy, whereas coarser spatial resolutions could lead to reduced predictive reliability and model performance [58]. Studies exploring the effects of spatial scale on fish habitat modeling have shown that models based on higher spatial resolutions generally achieve better predictive performance [59]. Research investigating different spatial scales produced results consistent with those of the present study, thereby supporting the validity and robustness of our findings. High-resolution environmental data allow for more precise detection of habitat shifts and ecological responses, thereby providing a better basis for effective management and conservation strategies [60,61,62].

4.4. Ecological Rationality of Predicted Potential Habitats

The MaxEnt model is widely used for habitat suitability prediction due to its advantage of requiring only species presence data, making it particularly suitable for marine ecosystems where data collection is often challenging or incomplete [63]. While machine learning models (XGB, RF, ET. e.g.) are highly effective in capturing complex nonlinear relationships between multiple environmental variables and species distributions, they often lack the intuitive spatial visualization capabilities necessary for management applications. MaxEnt, grounded in ecological niche theory, complements this limitation by generating clear, spatially explicit habitat suitability maps [64,65]. Results from both internal and external validations in this study showed that the suitable habitats for D. macarellus were mainly located in the northern and central-southern regions of the South China Sea, characterized by favorable sea surface temperatures, moderate depths, and intermediate distances from the coast. These patterns are consistent with the ecological traits of D. macarellus as an epipelagic, oceanic species. Regionally, the distribution suggests the potential for broad-scale migratory behavior; however, further research is needed to confirm the extent of its migratory patterns. Given the increasing pressures from climate change and human activities, accurately identifying suitable habitats and delineating potential fishing grounds will be critical for reducing search costs in fisheries operations and supporting sustainable fishery resource management [66]. The integration of MaxEnt habitat modeling with machine learning-based prediction offers a powerful framework to meet these challenges in dynamic marine ecosystems like the South China Sea.

5. Conclusions

In this study, a combined approach integrating multiple machine learning algorithms with the MaxEnt model was applied to systematically evaluate the habitat suitability prediction of D. macarellus in the South China Sea under different spatial resolutions. The SHAP method was employed to interpret the contributions of environmental variables within the machine learning framework. The results demonstrated that higher predictive performance was achieved at the finer 0.083° resolution (ROC_AUC = 0.836, accuracy = 0.793, and NPV = 0.862). Furthermore, external validation confirmed that the XGB model exhibited the best overall predictive accuracy and stability, with AUC values approaching 0.9. SHAP analysis identified CHL and SST as the key drivers influencing the distribution of D. macarellus, emphasizing their ecological significance. MaxEnt modeling further delineated suitable habitat areas, primarily located in the northern and central-southern regions of the South China Sea. Through comparative analysis of different spatial resolutions and modeling approaches, this study highlights that the combination of 0.083° environmental data and the XGB model is more suitable for investigating the distribution patterns of D. macarellus in the South China Sea. Future research should explore the effects of additional environmental factors, model parameter optimization, and the integration of oceanographic model outputs to further enhance prediction accuracy and practical applicability. These advancements will provide a stronger scientific basis for the sustainable development and management of offshore fishery resources.

Author Contributions

Q.S.: writing—review and editing, writing—original draft, investigation, conceptualization. P.Z.: software, methodology, formal analysis. X.F.: visualization, investigation, formal analysis. J.F.: writing—review and editing, supervision, methodology, conceptualization. Z.C.: writing—review and editing, resources, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Key R&D Program of China (No. 2024YFD2400502), the National Natural Science Foundation of China (32303008), the Central Public-Interest Scientific Institution Basal Research Fund, CAFS (No. 2023TD05), and the Financial Fund of the Ministry of Agriculture and Rural Affairs, China (NHZX2025).

Institutional Review Board Statement

The animal study was reviewed and approved by the South China Sea Fisheries Research Institute Animal Welfare Committee (Ethical approval number: SCSFRI document NO.40/2016, 18 March 2016).

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available upon request.

Acknowledgments

We would like to express our sincere gratitude to our advisor, our classmates, and all the participants involved in the marine data survey for their invaluable contributions. Their support and efforts have greatly enriched this research and made this work possible.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Kang, B.; Pecl, G.T.; Lin, L.; Sun, P.; Zhang, P.; Li, Y.; Zhao, L.; Peng, X.; Yan, Y.; Shen, C. Climate change impacts on China’s marine ecosystems. Rev. Fish Biol. Fish. 2021, 31, 599–629. [Google Scholar] [CrossRef]
  2. Li, Y.; Sun, M.; Yang, X.; Yang, M.; Kleisner, K.M.; Mills, K.E.; Tang, Y.; Du, F.; Qiu, Y.; Ren, Y. Social–ecological vulnerability and risk of China’s marine capture fisheries to climate change. Proc. Natl. Acad. Sci. USA 2024, 121, e2313773120. [Google Scholar] [CrossRef] [PubMed]
  3. Moustahfid, H.; Hendrickson, L.C.; Arkhipkin, A.; Pierce, G.J.; Gangopadhyay, A.; Kidokoro, H.; Markaida, U.; Nigmatullin, C.; Sauer, W.H.; Jereb, P. Ecological-fishery forecasting of squid stock dynamics under climate variability and change: Review, challenges, and recommendations. Rev. Fish. Sci. Aquac. 2021, 29, 682–705. [Google Scholar] [CrossRef]
  4. Mondal, S.; Punt, A.E.; Mendes, D.; Osuka, K.E.; Lee, M.A. Teleconnection Impacts of Climatic Variability on Tuna and Billfish Fisheries of the South Atlantic and Indian Ocean: A Study Towards Sustainable Fisheries Management. Fish Fish. 2025, 26, 240–256. [Google Scholar] [CrossRef]
  5. Yati, E.; Sadiyah, L.; Satria, F.; Alabia, I.D.; Sulma, S.; Prayogo, T.; Marpaung, S.; Harsa, H.; Kushardono, D.; Lumban-Gaol, J. Spatial distribution models for the four commercial tuna in the sea of maritime continent using multi-sensor remote sensing and maximum entropy. Mar. Environ. Res. 2024, 198, 106540. [Google Scholar] [CrossRef]
  6. Belkin, I.M. Remote sensing of ocean fronts in marine ecology and fisheries. Remote Sens. 2021, 13, 883. [Google Scholar] [CrossRef]
  7. Xue, M.; Tong, J.; Ma, W.; Zhu, Z.; Wang, W.; Lyu, S.; Chen, X. Reimagining habitat suitability modeling for Pacific saury (Cololabis saira) in the Northwest Pacific Ocean through acoustic data analysis from fishing vessels. Ecol. Inform. 2025, 85, 102971. [Google Scholar] [CrossRef]
  8. Shi, Y.; Kang, B.; Fan, W.; Xu, L.; Zhang, S.; Cui, X.; Dai, Y. Spatio-temporal variations in the potential habitat distribution of pacific sardine (Sardinops sagax) in the Northwest Pacific Ocean. Fishes 2023, 8, 86. [Google Scholar] [CrossRef]
  9. Braun, C.D.; Lezama-Ochoa, N.; Farchadi, N.; Arostegui, M.C.; Alexander, M.; Allyn, A.; Bograd, S.J.; Brodie, S.; Crear, D.P.; Curtis, T.H. Widespread habitat loss and redistribution of marine top predators in a changing ocean. Sci. Adv. 2023, 9, eadi2718. [Google Scholar] [CrossRef]
  10. Yang, S.; Chen, D.; Deng, K. Global effects of climate change in the South China Sea and its surrounding areas. Ocean-Land-Atmos. Res. 2024, 3, 0038. [Google Scholar] [CrossRef]
  11. Deng, L.; Zhao, J.; Sun, S.; Ai, B.; Zhou, W.; Cao, W. Two-decade satellite observations reveal variability in size-fractionated phytoplankton primary production in the South China Sea. Deep Sea Res. Part I Oceanogr. Res. Pap. 2024, 206, 104258. [Google Scholar] [CrossRef]
  12. Peng, R.; Chen, X.; Wu, Q.; Yan, Z.; Fu, Y.; Qin, B.; Hao, R.; Yu, K. Wildfire particulates enhance phytoplankton growth and alter communities in the South China Sea under wind-driven upwelling. J. Geophys. Res. Biogeosciences 2024, 129, e2024JG008066. [Google Scholar] [CrossRef]
  13. Wang, D.K.-H. Fisheries management in the South China Sea. In Routledge Handbook of the South China Sea; Routledge: London, UK, 2021; pp. 243–261. [Google Scholar]
  14. Song, J.; Yao, L.; Guo, J.; Fu, Y.; Cai, Y.; Wang, M. The Seasonal Correlation Between El Niño and Southern Oscillation Events and Sea Surface Temperature Anomalies in the South China Sea from 1958 to 2024. J. Mar. Sci. Eng. 2025, 13, 153. [Google Scholar] [CrossRef]
  15. Wang, Y.; Su, C.; Zhang, J.; He, J.; Liu, W.; Wang, T. Causation, impacts and mitigation strategies of typical marine ecological disasters in coasts of the South China Sea. Reg. Stud. Mar. Sci. 2025, 85, 104167. [Google Scholar] [CrossRef]
  16. Xiao, J.; Wang, W.; Wang, X.; Tian, P.; Niu, W. Recent deterioration of coral reefs in the South China Sea due to multiple disturbances. PeerJ 2022, 10, e13634. [Google Scholar] [CrossRef]
  17. Shi, J.; Li, C.; Wang, T.; Zhao, J.; Liu, Y.; Xiao, Y. Distribution pattern of coral reef fishes in China. Sustainability 2022, 14, 15107. [Google Scholar] [CrossRef]
  18. Zuo, X.; Su, F.; Yu, K.; Wang, Y.; Wang, Q.; Wu, H. Spatially modeling the synergistic impacts of global warming and sea-level rise on coral reefs in the South China Sea. Remote Sens. 2021, 13, 2626. [Google Scholar] [CrossRef]
  19. Xu, Y.; Zhang, P.; Panhwar, S.K.; Li, J.; Yan, L.; Chen, Z.; Zhang, K. The initial assessment of an important pelagic fish, Mackerel Scad, in the South China Sea using data-poor length-based methods. Mar. Coast. Fish. 2023, 15, e210258. [Google Scholar] [CrossRef]
  20. Hodapp, D.; Roca, I.T.; Fiorentino, D.; Garilao, C.; Kaschner, K.; Kesner-Reyes, K.; Schneider, B.; Segschneider, J.; Kocsis, Á.T.; Kiessling, W. Climate change disrupts core habitats of marine species. Glob. Change Biol. 2023, 29, 3304–3317. [Google Scholar] [CrossRef]
  21. Retnoningtyas, H.; Agustina, S.; Natsir, M.; Ningtias, P.; Hakim, A.; Dhani, A.K.; Hartati, I.D.; Pingkan, J.; Simanjuntak, C.P.; Wiryawan, B. Reproductive biology of the mackerel scad, Decapterus macarellus (Cuvier, 1833), in the Sulawesi Sea, Indonesia. Reg. Stud. Mar. Sci. 2024, 69, 103300. [Google Scholar] [CrossRef]
  22. TENRIWARE, T.; Nur, M.; Nasyrah, A.F.A. Biological reproduction of mackerel scad, Decapterus macarellus (Cuvier, 1833) caught by purse-seine net in Majene Waters, West Sulawesi, Indonesia. Biodiversitas J. Biol. Divers. 2023, 24, 3012–3018. [Google Scholar] [CrossRef]
  23. Silooy, F.; Tupamahu, A.; Ongkers, O.; Matrutty, D.; Pattikawa, J. Sex ratio, age group and length at first maturity of Mackerel Scad (Decapterus macarellus Cuvier, 1833) in the southern waters of Ambon, eastern Indonesia. IOP Conf. Ser. Earth Environ. Sci. 2021, 777, 012008. [Google Scholar] [CrossRef]
  24. Shen, Q.; Zhang, P.; Yu, W.; Xiong, P.; Cai, Y.; Li, J.; Chen, Z.; Fan, J. Impact of Climate Change on the Habitat Distribution of Decapterus macarellus in the South China Sea. J. Mar. Sci. Eng. 2025, 13, 156. [Google Scholar] [CrossRef]
  25. Rufino, M.M.; Mendo, T.; Samarão, J.; Gaspar, M.B. Estimating fishing effort in small-scale fisheries using high-resolution spatio-temporal tracking data (an implementation framework illustrated with case studies from Portugal). Ecol. Indic. 2023, 154, 110628. [Google Scholar] [CrossRef]
  26. Meeanan, C.; Noranarttragoon, P.; Sinanun, P.; Takahashi, Y.; Kaewnern, M.; Matsuishi, T.F. Estimation of the spatiotemporal distribution of fish and fishing grounds from surveillance information using machine learning: The case of short mackerel (Rastrelliger brachysoma) in the Andaman Sea, Thailand. Reg. Stud. Mar. Sci. 2023, 62, 102914. [Google Scholar] [CrossRef]
  27. Han, H.; Yang, C.; Jiang, B.; Shang, C.; Sun, Y.; Zhao, X.; Xiang, D.; Zhang, H.; Shi, Y. Construction of chub mackerel (Scomber japonicus) fishing ground prediction model in the northwestern Pacific Ocean based on deep learning and marine environmental variables. Mar. Pollut. Bull. 2023, 193, 115158. [Google Scholar] [CrossRef]
  28. Austin, M.P. Spatial prediction of species distribution: An interface between ecological theory and statistical modelling. Ecol. Model. 2002, 157, 101–118. [Google Scholar] [CrossRef]
  29. Segurado, P.; Araujo, M.B. An evaluation of methods for modelling species distributions. J. Biogeogr. 2004, 31, 1555–1568. [Google Scholar] [CrossRef]
  30. Araújo, M.B.; Luoto, M. The importance of biotic interactions for modelling species distributions under climate change. Glob. Ecol. Biogeogr. 2007, 16, 743–753. [Google Scholar] [CrossRef]
  31. Silva, L.D.; Costa, H.; de Azevedo, E.B.; Medeiros, V.; Alves, M.; Elias, R.B.; Silva, L. Modelling native and invasive woody species: A comparison of ENFA and MaxEnt applied to the Azorean forest. In Modeling, Dynamics, Optimization and Bioeconomics II: DGS III, Porto, Portugal, February 2014, and Bioeconomy VII, Berkeley, USA, March 2014-Selected Contributions 3; Springer: Berlin/Heidelberg, Germany, 2017; pp. 415–444. [Google Scholar]
  32. Gladju, J.; Kamalam, B.S.; Kanagaraj, A. Applications of data mining and machine learning framework in aquaculture and fisheries: A review. Smart Agric. Technol. 2022, 2, 100061. [Google Scholar] [CrossRef]
  33. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  34. De Ville, B. Decision trees. Wiley Interdiscip. Rev. Comput. Stat. 2013, 5, 448–455. [Google Scholar] [CrossRef]
  35. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
  36. Peterson, L.E. K-nearest neighbor. Scholarpedia 2009, 4, 1883. [Google Scholar] [CrossRef]
  37. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, San Francisco, CA, USA, 13 August 2016; pp. 785–794. [Google Scholar]
  38. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
  39. Barbet-Massin, M.; Jiguet, F.; Albert, C.H.; Thuiller, W. Selecting pseudo-absences for species distribution models: How, where and how many? Methods Ecol. Evol. 2012, 3, 327–338. [Google Scholar] [CrossRef]
  40. Carrington, A.M.; Manuel, D.G.; Fieguth, P.W.; Ramsay, T.; Osmani, V.; Wernly, B.; Bennett, C.; Hawken, S.; Magwood, O.; Sheikh, Y. Deep ROC analysis and AUC as balanced average accuracy, for improved classifier selection, audit and explanation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 329–341. [Google Scholar] [CrossRef]
  41. Šimundić, A.-M. Measures of diagnostic accuracy: Basic definitions. Ejifcc 2009, 19, 203. [Google Scholar]
  42. Kessler, N.; Cyteval, C.; Gallix, B.t.; Lesnik, A.; Blayac, P.-M.; Pujol, J.; Bruel, J.-M.; Taourel, P. Appendicitis: Evaluation of sensitivity, specificity, and predictive values of US, Doppler US, and laboratory findings. Radiology 2004, 230, 472–478. [Google Scholar] [CrossRef]
  43. Sommer, U.; Stibor, H.; Katechakis, A.; Sommer, F.; Hansen, T. Pelagic food web configurations at different levels of nutrient richness and their implications for the ratio fish production: Primary production. In Sustainable Increase of Marine Harvesting: Fundamental Mechanisms and New Concepts: Proceedings of the 1st Maricult Conference Held in Trondheim, Norway, 25–28 June 2000; Springer: Berlin, Germany, 2002; pp. 11–20. [Google Scholar]
  44. Claireaux, G.; Lagardère, J.-P. Influence of temperature, oxygen and salinity on the metabolism of the European sea bass. J. Sea Res. 1999, 42, 157–168. [Google Scholar] [CrossRef]
  45. Britton-Simmons, K.H.; Rhoades, A.L.; Pacunski, R.E.; Galloway, A.W.; Lowe, A.T.; Sosik, E.A.; Dethier, M.N.; Duggins, D.O. Habitat and bathymetry influence the landscape-scale distribution and abundance of drift macrophytes and associated invertebrates. Limnol. Oceanogr. 2012, 57, 176–184. [Google Scholar] [CrossRef]
  46. Telesh, I.; Schubert, H.; Skarlato, S. Life in the salinity gradient: Discovering mechanisms behind a new biodiversity pattern. Estuar. Coast. Shelf Sci. 2013, 135, 317–327. [Google Scholar] [CrossRef]
  47. Garrido, S.; Ben-Hamadou, R.; Oliveira, P.B.; Cunha, M.E.; Chícharo, M.A.; van der Lingen, C.D. Diet and feeding intensity of sardine Sardina pilchardus: Correlation with satellite-derived chlorophyll data. Mar. Ecol. Prog. Ser. 2008, 354, 245–256. [Google Scholar] [CrossRef]
  48. Hou, G.; Wang, J.; Chen, Z.; Zhou, J.; Huang, W.; Zhang, H. Molecular and morphological identification and seasonal distribution of eggs of four Decapterus fish species in the northern South China Sea: A key to conservation of spawning ground. Front. Mar. Sci. 2020, 7, 590564. [Google Scholar] [CrossRef]
  49. Ekanayake, I.; Meddage, D.; Rathnayake, U. A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP). Case Stud. Constr. Mater. 2022, 16, e01059. [Google Scholar] [CrossRef]
  50. Brugere, L.; Kwon, Y.; Frazier, A.E.; Kedron, P. Improved prediction of tree species richness and interpretability of environmental drivers using a machine learning approach. For. Ecol. Manag. 2023, 539, 120972. [Google Scholar] [CrossRef]
  51. Shehadeh, A.; Alshboul, O.; Al Mamlook, R.E.; Hamedat, O. Machine learning models for predicting the residual value of heavy construction equipment: An evaluation of modified decision tree, LightGBM, and XGBoost regression. Autom. Constr. 2021, 129, 103827. [Google Scholar] [CrossRef]
  52. Zhang, P.; Jia, Y.; Shang, Y. Research and application of XGBoost in imbalanced data. Int. J. Distrib. Sens. Netw. 2022, 18, 15501329221106935. [Google Scholar] [CrossRef]
  53. Bellamy, C.; Scott, C.; Altringham, J. Multiscale, presence-only habitat suitability models: Fine-resolution maps for eight bat species. J. Appl. Ecol. 2013, 50, 892–901. [Google Scholar] [CrossRef]
  54. Ely, B.; Viñas, J.; Alvarado Bremer, J.R.; Black, D.; Lucas, L.; Covello, K.; Labrie, A.V.; Thelen, E. Consequences of the historical demography on the global population structure of two highly migratory cosmopolitan marine fishes: The yellowfin tuna (Thunnus albacares) and the skipjack tuna (Katsuwonus pelamis). BMC Evol. Biol. 2005, 5, 19. [Google Scholar] [CrossRef]
  55. Erauskin-Extramiana, M.; Arrizabalaga, H.; Hobday, A.J.; Cabré, A.; Ibaibarriaga, L.; Arregui, I.; Murua, H.; Chust, G. Large-scale distribution of tuna species in a warming ocean. Glob. Change Biol. 2019, 25, 2043–2060. [Google Scholar] [CrossRef] [PubMed]
  56. Xu, H.; Song, L.; Zhang, T.; Li, Y.; Shen, J.; Zhang, M.; Li, K. Effects of different spatial resolutions on prediction accuracy of Thunnus alalunga fishing ground in waters near the Cook Islands Based on Long Short-Term Memory (LSTM) Neural Network Model. J. Ocean Univ. China 2023, 22, 1427–1438. [Google Scholar] [CrossRef]
  57. Buenafe, K.C.V.; Everett, J.D.; Dunn, D.C.; Mercer, J.; Suthers, I.M.; Schilling, H.T.; Hinchliffe, C.; Dabalà, A.; Richardson, A.J. A global, historical database of tuna, billfish, and saury larval distributions. Sci. Data 2022, 9, 423. [Google Scholar] [CrossRef] [PubMed]
  58. Moëzzi, F.; Eagderi, S. Quantifying how spatial resolution affects fish distribution model performance and prediction: A case study of Caspian Kutum, Rutilus frisii. Int. J. Aquat. Biol. 2024, 12, 533–545. [Google Scholar]
  59. Núñez-Riboni, I.; Akimova, A.; Sell, A.F. Effect of data spatial scale on the performance of fish habitat models. Fish Fish. 2021, 22, 955–973. [Google Scholar] [CrossRef]
  60. Nagendra, H.; Lucas, R.; Honrado, J.P.; Jongman, R.H.; Tarantino, C.; Adamo, M.; Mairota, P. Remote sensing for conservation monitoring: Assessing protected areas, habitat extent, habitat condition, species diversity, and threats. Ecol. Indic. 2013, 33, 45–59. [Google Scholar] [CrossRef]
  61. Pettorelli, N.; Laurance, W.F.; O’Brien, T.G.; Wegmann, M.; Nagendra, H.; Turner, W. Satellite remote sensing for applied ecologists: Opportunities and challenges. J. Appl. Ecol. 2014, 51, 839–848. [Google Scholar] [CrossRef]
  62. Marvin, D.C.; Koh, L.P.; Lynam, A.J.; Wich, S.; Davies, A.B.; Krishnamurthy, R.; Stokes, E.; Starkey, R.; Asner, G.P. Integrating technologies for scalable ecology and conservation. Glob. Ecol. Conserv. 2016, 7, 262–275. [Google Scholar] [CrossRef]
  63. Zhao, Y.; Deng, X.; Xiang, W.; Chen, L.; Ouyang, S. Predicting potential suitable habitats of Chinese fir under current and future climatic scenarios based on Maxent model. Ecol. Inform. 2021, 64, 101393. [Google Scholar] [CrossRef]
  64. Phillips, S.J.; Dudík, M. Modeling of species distributions with Maxent: New extensions and a comprehensive evaluation. Ecography 2008, 31, 161–175. [Google Scholar] [CrossRef]
  65. Elith, J.; Phillips, S.J.; Hastie, T.; Dudík, M.; Chee, Y.E.; Yates, C.J. A statistical explanation of MaxEnt for ecologists. Divers. Distrib. 2011, 17, 43–57. [Google Scholar] [CrossRef]
  66. Wang, H.; Zhou, S.; Li, X.; Liu, H.; Chi, D.; Xu, K. The influence of climate change and human activities on ecosystem service value. Ecol. Eng. 2016, 87, 224–239. [Google Scholar] [CrossRef]
Figure 1. Maps showing the area where the study fish were sampled.
Figure 1. Maps showing the area where the study fish were sampled.
Biology 14 00753 g001
Figure 2. Pearson correlation heatmaps of environmental variables at four spatial resolutions. (BATH: bathymetry, CHL: mass concentration of chlorophyll-a in seawater, DIS: distance from shore, MLD: ocean mixed layer thickness, SSS: seawater salinity, SST: sea surface temperature, SSH: sea surface height above geoid) (a) Pearson correlation heatmap of environmental variables at 0.083° spatial resolution; (b) Pearson correlation heatmap of environmental variables at 0.25° spatial resolution; (c) Pearson correlation heatmap of environmental variables at 0.5° spatial resolution; (d) Pearson correlation heatmap of environmental variables at 1° spatial resolution.
Figure 2. Pearson correlation heatmaps of environmental variables at four spatial resolutions. (BATH: bathymetry, CHL: mass concentration of chlorophyll-a in seawater, DIS: distance from shore, MLD: ocean mixed layer thickness, SSS: seawater salinity, SST: sea surface temperature, SSH: sea surface height above geoid) (a) Pearson correlation heatmap of environmental variables at 0.083° spatial resolution; (b) Pearson correlation heatmap of environmental variables at 0.25° spatial resolution; (c) Pearson correlation heatmap of environmental variables at 0.5° spatial resolution; (d) Pearson correlation heatmap of environmental variables at 1° spatial resolution.
Biology 14 00753 g002
Figure 3. Model performance comparison at different spatial resolutions. (a) Model performance comparison at 0.083° spatial resolution; (b) model performance comparison at 0.25° spatial resolution; (c) model performance comparison at 0.5° spatial resolution.
Figure 3. Model performance comparison at different spatial resolutions. (a) Model performance comparison at 0.083° spatial resolution; (b) model performance comparison at 0.25° spatial resolution; (c) model performance comparison at 0.5° spatial resolution.
Biology 14 00753 g003
Figure 4. SHAP bee swarm plots of environmental variables at different spatial resolutions. (a) SHAP bee swarm plot of environmental variables at 0.083° spatial resolution; (b) SHAP bee swarm plot of environmental variables at 0.25° spatial resolution.
Figure 4. SHAP bee swarm plots of environmental variables at different spatial resolutions. (a) SHAP bee swarm plot of environmental variables at 0.083° spatial resolution; (b) SHAP bee swarm plot of environmental variables at 0.25° spatial resolution.
Biology 14 00753 g004
Figure 5. Cumulative AUC plots of environmental variables at different spatial resolutions. (a) Cumulative AUC plot of environmental variables at 0.083° spatial resolution; (b) cumulative AUC plot of environmental variables at 0.25° spatial resolution.
Figure 5. Cumulative AUC plots of environmental variables at different spatial resolutions. (a) Cumulative AUC plot of environmental variables at 0.083° spatial resolution; (b) cumulative AUC plot of environmental variables at 0.25° spatial resolution.
Biology 14 00753 g005
Figure 6. Comparison of D. macarellus habitat suitability maps at multiple resolutions in the South China Sea. (a) Habitat suitability map of D. macarellus at 0.083° spatial resolution; (b) habitat suitability map of D. macarellus at 0.25° spatial resolution.
Figure 6. Comparison of D. macarellus habitat suitability maps at multiple resolutions in the South China Sea. (a) Habitat suitability map of D. macarellus at 0.083° spatial resolution; (b) habitat suitability map of D. macarellus at 0.25° spatial resolution.
Biology 14 00753 g006
Figure 7. AUC comparison across models using the external dataset. (a) Cross-model AUC values at 0.083° spatial resolution using an external dataset; (b) cross-model AUC values at 0.25° spatial resolution using an external dataset.
Figure 7. AUC comparison across models using the external dataset. (a) Cross-model AUC values at 0.083° spatial resolution using an external dataset; (b) cross-model AUC values at 0.25° spatial resolution using an external dataset.
Biology 14 00753 g007
Figure 8. Habitat suitability map of D. macarellus based on the best model using the external dataset. (a) Habitat suitability map of D. macarellus at 0.083° spatial resolution using the best model based on the external dataset; (b) habitat suitability map of D. macarellus at 0.25° spatial resolution using the best model based on the external dataset.
Figure 8. Habitat suitability map of D. macarellus based on the best model using the external dataset. (a) Habitat suitability map of D. macarellus at 0.083° spatial resolution using the best model based on the external dataset; (b) habitat suitability map of D. macarellus at 0.25° spatial resolution using the best model based on the external dataset.
Biology 14 00753 g008
Table 1. Candidate factors and data sources.
Table 1. Candidate factors and data sources.
VariableDescriptionSourceUnitSpatial Resolution
SSSSeawater salinityhttps://marine.copernicus.eu/0.083° and 0.25°
SSHSea surface height above geoidhttps://marine.copernicus.eu/m0.083° and 0.25°
MLDOcean mixed layer thicknesshttps://marine.copernicus.eu/m0.083° and 0.25°
SSTSea surface temperaturehttps://marine.copernicus.eu/°C0.083° and 0.25°
DISDistance from shorehttps://globalfishingwatch.org/km0.083°
BATHBathymetryhttps://globalfishingwatch.org/m0.083°
CHLMass concentration of chlorophyll-a in seawaterhttps://marine.copernicus.eu/mg⋅m−30.25°
Table 2. Parameter settings of the model.
Table 2. Parameter settings of the model.
ModelsParameter Settings
RFtuneLength = 3; ntree = 500
DTrpart
XGBEta = 0.1; max_depth = 0.8; nrounds = 100
KNNK = 5
ETntree = 500; mtry = 3; nodesize = 1
LGBMdistribution = “bernoulli”; n.trees = 100; interaction.depth = 6; shrinkage = 0.1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shen, Q.; Zhang, P.; Feng, X.; Chen, Z.; Fan, J. Exploring the Habitat Distribution of Decapterus macarellus in the South China Sea Under Varying Spatial Resolutions: A Combined Approach Using Multiple Machine Learning and the MaxEnt Model. Biology 2025, 14, 753. https://doi.org/10.3390/biology14070753

AMA Style

Shen Q, Zhang P, Feng X, Chen Z, Fan J. Exploring the Habitat Distribution of Decapterus macarellus in the South China Sea Under Varying Spatial Resolutions: A Combined Approach Using Multiple Machine Learning and the MaxEnt Model. Biology. 2025; 14(7):753. https://doi.org/10.3390/biology14070753

Chicago/Turabian Style

Shen, Qikun, Peng Zhang, Xue Feng, Zuozhi Chen, and Jiangtao Fan. 2025. "Exploring the Habitat Distribution of Decapterus macarellus in the South China Sea Under Varying Spatial Resolutions: A Combined Approach Using Multiple Machine Learning and the MaxEnt Model" Biology 14, no. 7: 753. https://doi.org/10.3390/biology14070753

APA Style

Shen, Q., Zhang, P., Feng, X., Chen, Z., & Fan, J. (2025). Exploring the Habitat Distribution of Decapterus macarellus in the South China Sea Under Varying Spatial Resolutions: A Combined Approach Using Multiple Machine Learning and the MaxEnt Model. Biology, 14(7), 753. https://doi.org/10.3390/biology14070753

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop