SatGuard: Satellite Networks Penetration Testing and Vulnerability Risk Assessment Methods

Xiao, Jin; Wang, Buhong; Dong, Ruochen; Zhao, Zhengyang; Zhao, Bofu

doi:10.3390/aerospace12050431

Open AccessArticle

SatGuard: Satellite Networks Penetration Testing and Vulnerability Risk Assessment Methods

by

Jin Xiao

,

Buhong Wang

^*,

Ruochen Dong

,

Zhengyang Zhao

and

Bofu Zhao

School of Information and Navigation, Air Force Engineering University, Xi’an 710077, China

^*

Author to whom correspondence should be addressed.

Aerospace 2025, 12(5), 431; https://doi.org/10.3390/aerospace12050431

Submission received: 31 March 2025 / Revised: 6 May 2025 / Accepted: 9 May 2025 / Published: 12 May 2025

(This article belongs to the Section Astronautics & Space Science)

Download

Browse Figures

Versions Notes

Abstract

Satellite networks face escalating cybersecurity threats from evolving attack vectors and systemic complexities. This paper proposes SatGuard, a novel framework integrating a three-dimensional penetration testing methodology and a nonlinear risk assessment mechanism tailored for satellite security. To address limitations of conventional tools in handling satellite-specific vulnerabilities, SatGuard employs large language models (LLMs) like GPT-4 and DeepSeek-R1. By leveraging their contextual reasoning and code-generation abilities, SatGuard enables semi-automated vulnerability analysis and exploitation. Validated in a simulated ground station environment, the framework achieved a 73.3% success rate (22/30 attempts) across critical ports, with an average of 5.5 human interactions per test. By bridging AI-driven automation with satellite-specific risk modeling, SatGuard advances cybersecurity for next-generation space infrastructure through scalable, ethically aligned solutions.

Keywords:

satellite network; penetration testing; cybersecurity; risk assessment; LLMs; prompt engineering

1. Introduction

1.1. Background

Satellite networks have become indispensable to global connectivity [1], enabling critical services such as broadband communication in remote regions and precision navigation through global navigation satellite systems (GNSSs). These systems are particularly vital where terrestrial infrastructure is impractical, such as in oceanic, desert, or disaster-stricken areas. The advent of low Earth orbit (LEO) constellations—exemplified by SpaceX’s Starlink [2] and Amazon’s Project Kuiper [3]—has further revolutionized performance by minimizing latency through proximity to Earth.

However, the rapid deployment and operational complexity have introduced unprecedented cybersecurity risks. For instance, the 2022 Viasat cyberattack exploited ground infrastructure vulnerabilities, disrupting civilian internet access and military operations across Europe [4,5]. Such incidents underscore the unique challenges of securing satellite networks, which span multi-segment architectures (space, ground, and user), rely on legacy protocols, and face extended patch deployment cycles.

Traditional security methods, such as the Common Vulnerability Scoring System (CVSS), widely used in information technology (IT) environments, face limitations when applied to satellite networks. Satellite networks have unique system complexity due to their vast interconnected components, including the user, ground, and space segments. Thus, there is an urgent need for tailored security solutions to address the specific challenges of satellite networks.

1.2. Existing Works

As a core cybersecurity practice, penetration testing employs offensive strategies to evaluate system vulnerabilities. The objective is to detect vulnerabilities in a given network environment, evaluate security risks, and implement measures to reduce them [6]. In recent years, penetration testing has emerged as a critically important domain. A multitude of studies have been carried out and implemented with the aim of enhancing the security of networks [7,8,9,10,11,12,13].

Al-Ghamdi [7] and Felderer et al. [8] systematized security testing techniques, categorizing penetration testing into black-box, white-box, and gray-box approaches. Recent studies, such as Khan et al. [10], advanced dynamic vulnerability analysis for mobile applications, while Crawley [12] developed cloud-specific penetration testing frameworks for AWS and Azure. IoT security research by Chen et al. [11] proposed modular testing methods to address device heterogeneity. However, these works predominantly focus on conventional IT systems, neglecting the unique challenges of satellite networks. For example, existing frameworks omit analysis of RF physical-layer vulnerabilities, orbital-specific attack vectors, and interdependencies between space-ground-user segments. A comparative summary of limitations is provided in Table 1.

Following penetration testing, vulnerability assessment is a crucial step. Vulnerability assessment is critical for evaluating system security, prioritizing remediation, and strategizing defenses. The Common Vulnerability Scoring System (CVSS)—a widely adopted benchmark—quantifies vulnerabilities (0–10 scale) using three metric groups: base (intrinsic attributes), temporal (evolving exploitability), and environmental (organization-specific context) [14,15,16,17]. While CVSS versions (v2.0 to v4.0 [18]) excel in traditional IT systems, they face limitations in satellite networks. The inherent complexity of satellite architectures—interconnected space-ground-user segments, nonlinear risk propagation, and mission-critical interdependencies—renders linear CVSS scoring inadequate. This underscores the necessity for domain-specific assessment methods tailored to satellite ecosystems.

1.3. Contributions

This paper advances satellite network security through three principal contributions, with a focus on the innovative integration of large language models (LLMs) to address domain-specific challenges:

(1): To develop a three-dimensional penetration testing framework tailored to satellite networks, enabling systematic evaluation across space, ground, and user segments with configurable automation levels (manual, semi-automatic, and fully automatic).
(2): To propose a nonlinear vulnerability risk assessment formula that amplifies high-impact, low-probability threats, integrates dynamic time-decay mechanisms for residual risk quantification, and prioritizes critical segments through environmental weights.
(3): To validate the framework through semi-automated experiments using GPT-4 and DeepSeek-R1, achieving a 73.3% success rate in vulnerability exploitation while embedding ethical safeguards to prevent service disruptions. The integration of prompt engineering and role-based constraints ensures ethical alignment, mitigating risks of unintended service disruptions—a critical innovation for sensitive space infrastructure.

By bridging AI-driven automation with domain-specific risk modeling, this work pioneers the use of LLMs in satellite cybersecurity, offering scalable solutions for next-generation constellations.

2. Satellite Networks Penetration Testing Method

Due to the lack of existing methods for satellite network penetration testing, we adapted the cyber-physical system security methodology from [19] to develop a three-dimensional testing method.

As illustrated in Figure 1, this method comprises three functional dimensions: (1) Satellite network segmentation (x-axis) divides the network into user, ground, and space segments, enabling focused evaluation of mission-critical components rather than comprehensive system coverage. (2) Automation degree (y-axis) provides manual, semi-automatic, and fully automatic operational modes for adaptable test execution. (3) Penetration process (z-axis) enforces sequential progression through planning, vulnerability discovery, analysis, exploitation, and remediation phases. This triaxial structure systematically integrates network partitioning, automated adaptability, and procedural discipline, offering security practitioners a comprehensive testing methodology for heterogeneous satellite architectures. The visual method in Figure 1 clarifies operational workflows while maintaining mission-specific prioritization.

2.1. Segment Identification

As depicted in Figure 2, satellite networks comprise three interdependent segments: the user, ground, and space segments.

The user segment encompasses a diverse array of distributed terminal nodes, such as satellite phones for voice and data communications in remote areas, fixed/mobile very-small-aperture terminals (VSATs) enabling broadband internet access, and portable user terminals for emergency response operations. These terminals access the satellites either through direct user links or via terrestrial telecommunications infrastructure operated by service providers [20].

The ground segment usually incorporates ground stations (GS), gateways, and network control center (NCC). Ground stations encompass a variety of types, including traditional ground stations, cloud-based ground stations offering low-latency data collection via cloud platforms, remote ground stations distributed globally to maintain continuous satellite visibility, and custom ground stations tailored for specific mission requirements [21]. Gateways serve as a critical interface for interconnecting satellite networks with terrestrial networks (e.g., the internet and cellular networks), enabling protocol conversion and data routing between these heterogeneous communication systems. The NCC centrally coordinates operations, resource allocation, and integration with terrestrial networks (e.g., 5G/WLAN) [6].

The space segment of satellite networks is primarily composed of LEO satellites, which are increasingly deployed in large-scale constellations to facilitate global broadband connectivity. Notable examples include Starlink (SpaceX, Hawthorne, CA, USA), OneWeb (London, UK), Kuiper (Amazon, Seattle, WA, USA), and Lightspeed (Telesat, Ottawa, ON, Canada), which utilize commercial off-the-shelf (COTS) hardware and advanced communication technologies to achieve cost-efficient, high-speed data transmission with worldwide coverage. Modern LEO satellites operate within a power range of 2–5 kW, tailored to mission-specific payload demands, and employ sophisticated power management strategies such as deployable solar arrays and lithium-ion battery systems to sustain operations during orbital eclipse phases. While inter-satellite links (ISLs) are critical for inter-satellite communication in many systems, some constellations like OneWeb operate without ISLs, relying instead on dense ground station networks to maintain connectivity [22].

To facilitate practical implementation, Table 2 outlines key testing methodologies tailored to each segment alongside their anticipated hazards, bridging the theoretical framework with operational risk assessment.

By characterizing each segment’s architectural components and threat surfaces, testers can prioritize high-impact targets, aligning testing efforts with mission-critical risks.

2.2. Automation Degree Selection

From the perspective of automation, penetration testing can be categorized into manual, semi-automatic, and fully automatic approaches, a classification that offers methodological inspiration for security researchers and enumerates corresponding technical options across automation levels.

Manual penetration testing in satellite networks relies on human expertise to explore vulnerabilities. Security professionals analyze attack vectors by inspecting network traffic anomalies, identifying logic flaws, cryptographic weaknesses in cross-link communications, or misconfigured access controls. While this approach enables detection of mission-critical vulnerabilities, its labor-intensive nature and reliance on domain expertise limit practical applicability in large-scale satellite constellations. Moreover, the escalating complexity and sophistication of cyber-attacks increasingly impede human experts’ ability to maintain pace with the evolving threat landscape, thereby compromising the timely formulation of effective countermeasures against adversarial actors [23].

Semi-automatic penetration testing merges human expertise with automated tools and scripts, enabling testers to harness the efficiency of automation while maintaining critical oversight for decision-making and targeted interventions. Contemporary semi-automatic penetration testing in satellite networks increasingly integrates AI-driven frameworks. Machine learning models trained on historical space cyber incidents dynamically parse telemetry datasets to predict orbital-specific attack vectors. Neural networks optimized for space-qualified protocols enhance real-time anomaly detection. Utilizing advanced LLMs supports penetration testers in improving the efficiency of their workflow [24]. LLMs enhance the efficiency of generating tailored exploit scripts and autonomously debugging vulnerabilities within satellite network systems. Overall, semi-automatic penetration testing strikes a balance between speed and thoroughness, making it a popular choice for comprehensive penetration testing.

Fully automatic penetration testing leverages autonomous AI agents to execute continuous security assessments, employing reinforcement learning to adapt attack strategies. The collaborative multi-agent system enhances penetration testing efficiency by employing specialized, coordinated agents for reconnaissance, scanning, and exploitation. This modular architecture addresses context loss and data deluge through phased task execution and autonomous decision-making [25]. However, ethical risks persist, as overly aggressive autonomous penetration testing agents might inadvertently disrupt critical services or leak sensitive data during exploitation. Additionally, while AI excels at pattern recognition, it struggles with creative problem-solving in novel attack scenarios, necessitating hybrid human-AI approaches for comprehensive satellite network security.

Testers can select appropriate automation levels—from manual, expertise-driven methods to autonomous AI agents—based on their resources, skills, and project needs, ensuring effective and efficient penetration testing aligned with operational contexts.

2.3. Penetration Stage Execution

The penetration testing process for satellite networks involves a structured approach to identify, analyze, exploit, and mitigate vulnerabilities of the space, ground, and user infrastructure.

The planning phase involves three key steps: (1) defining the scope (e.g., communication protocols, ground control systems, and user terminals), (2) setting objectives (e.g., evaluating cross-link encryption integrity or command chain robustness), and (3) gathering technical specifications. This phase ensures compliance with regulatory frameworks and minimizes operational disruptions during testing.

Vulnerability discovery employs various techniques to identify weaknesses in satellite networks. Automated tools analyze telemetry streams for unencrypted data transfers, insecure ranging protocols, or misconfigured access controls in space-based payloads, while ground infrastructure audits detect unpatched vulnerabilities in command-and-control software or exposed application programming interface (API) endpoints. User segment testing focuses on Internet of Things (IoT) device firmware flaws, session hijacking risks in hybrid terrestrial–satellite networks, and insecure credential management in navigation applications.

Vulnerability analysis in satellite networks demands a holistic approach to assess inherent technical weaknesses, potential exploitation mechanisms, and mission-critical consequences. Adversaries can exploit unencrypted telemetry feeds, insecure APIs, and legacy system vulnerabilities through interception during orbital handovers, malicious code injection via uplinks, or attacks on user terminals. Such breaches may trigger operational failures across interconnected space-terrestrial infrastructures, jeopardizing data integrity and service continuity.

Vulnerability exploitation is conducted in simulated environments to validate risks without disrupting live operations. Techniques include fuzzing satellite firmware for logic errors, leveraging Metasploit modules for known ground system vulnerabilities, and simulating attacks on user terminals. Exploit validation procedures must be conducted under ethical compliance frameworks and legal obligations governing adversarial testing, while maintaining technical safeguards to prevent unintended service degradation.

The reporting and remediation phase systematically documents the identified vulnerabilities and prescribes mitigation strategies for satellite networks. By formalizing closed-loop mechanisms for threat documentation and corrective implementation, this phase ensures the continuity of communication services while preserving data integrity under adversarial conditions.

Structured stages ensure systematic vulnerability management, from scope definition in planning to actionable recommendations in reporting, minimizing operational disruption while maximizing threat detection.

3. Satellite Networks Vulnerability Risk Assessment Method

In the previous chapter, we systematically analyzed the dimensions of the penetration testing method for satellite networks. However, vulnerability identification alone remains insufficient to evaluate the systemic security impacts. As a critical element of security management, vulnerability risk assessment serves an essential function in quantifying threat severity. This chapter proposes a vulnerability risk assessment method for satellite networks, exploring how a scientifically sound evaluation method can accurately identify high-risk vulnerabilities. This provides actionable strategies to fortify satellite network security defenses.

3.1. Satellite Networks Vulnerability Risk Assessment Formula

To comprehensively evaluate the systemic risks of satellite networks with multi-layered architecture and nonlinear threat propagation, this study introduces a composite risk scoring formula:

R = [(I^{α} \cdot L^{β} \cdot M^{γ}) \cdot \frac{D + T}{2} \cdot δ (t) \cdot C] \cdot ω_{s}

(1)

The proposed composite risk scoring formula integrates nonlinear amplification, dynamic decay, and segment-specific weighting to address the unique risk propagation characteristics of satellite networks. To ensure clarity and reproducibility, Table 3 defines all parameters in Formula (1), while subsequent subsections elaborate on their empirical or theoretical justifications.

3.1.1. Core Risk Factors

Core risk factors constitute the foundational elements of vulnerability evaluation, reflecting intrinsic threat characteristics and immediate consequences.

Impact (I): This parameter quantifies potential damage across strategic, operational, and economic dimensions, with scores ranging from 1 to 5 (

I \in [1, 5]

).

Likelihood (L): Likelihood evaluates the probability of successful exploitation based on historical data, exploit complexity, and attacker capabilities; assigned a score between 1 and 5 (

L \in [1, 5]

).

Measurability (M): Measurability reflects the detection capability of existing monitoring systems; measured on a scale from 1 to 5 (

M \in [1, 5]

).

Nonlinear exponents (α, β, γ): The exponents are designed as tunable parameters to reflect operational priorities:

(1): Impact amplification (α): A range of α ∈ [1.0,1.5] allows operators to prioritize high-impact vulnerabilities (e.g., α = 1.5 for mission-critical systems) or align with linear scoring (α = 1.0).
(2): Likelihood dampening (β): The range β ∈ [1.0,1.3] balances rare but catastrophic events (e.g., orbital collisions) with common threats.
(3): Adjustment (γ): A lower γ ∈ [0.5,1.0] amplifies risks from stealth attacks (e.g., γ = 0.5 for undetectable firmware exploits).

In this study, the values of α, β, and γ were determined through analysis of historical satellite cybersecurity incidents. The exponent α = 1.2 was derived from analyses of the 2022 Viasat attack, where ground segment compromises triggered cascading service disruptions across space and user segments. This value balances systemic risk prioritization without overfitting to outlier events. Similarly, β was determined to be 1.1, balancing likelihood and risk assessment. And γ was set to 0.8 after studying stealth attacks with low measurability, ensuring these threats are not underestimated.

Defense difficulty (D) and technical complexity (T): These parameters assess remediation costs (

D \in [1, 5]

) and attack execution difficulty (

T \in [1, 5]

), respectively. The arithmetic mean

\frac{D + T}{2}

balances defense pressure across both dimensions, avoiding bias from single parameters.

3.1.2. Dynamic Adjustment Factors

Dynamic adjustment factors calibrate risk values over time and account for data reliability, ensuring adaptability to evolving threats.

Time decay factor (

δ (t)

): This factor models risk attenuation through an exponential decay function:

δ (t) = \{\begin{matrix} 1.2 \cdot e^{- k (t - t_{0})} \\ 0.8 \end{matrix} \begin{matrix} U n p a t c h e d \\ P a t c h e d \end{matrix}

(2)

where k defines the decay rate, and

t_{0}

is the vulnerability discovery time. The exponential decay model (2) reflects the urgency of patching. For unpatched vulnerabilities, k = 0.05 ensures a gradual risk reduction (e.g., δ(t) decreases by 5% monthly), calibrated using vulnerability lifespan data from satellite operators. For patched cases, δ(t) = 0.8 accounts for residual risks (e.g., delayed patch deployment). In different scenarios, the speed of vulnerability risk attenuation over time is different, and the k value needs to be accurately set according to the specific situation.

Confidence (C): Confidence adjusts risk scores according to data source reliability, with values constrained to the interval

C \in [0.7, 1.3]

. Expert assessments (C = 1.3) and automated scans (C = 0.9) are weighted to balance subjective and objective inputs.

3.1.3. Environmental Weighting Factor

Environmental weighting factor tailors risk evaluations to the strategic importance of satellite network segments.

Segment weight (

ω_{s}

): This weight adjusts risk scores according to the criticality of affected segments:

ω_{s} = \{\begin{matrix} 1.5 \\ 1.2 \\ 1.0 \end{matrix} \begin{matrix} S p a c e S e g m e n t \\ G r o u n d S e g m e n t \\ U s e r S e g m e n t \end{matrix}

(3)

The hierarchical weighting (equation (3)) prioritizes space (ωₛ = 1.5) and ground segments (ωₛ = 1.2) due to their critical roles in command/control, while user segments (ωₛ = 1.0) are periodically reassessed to address emerging threats like low-orbit terminal breaches.

3.1.4. Normalization and Risk Level Classification

The nonlinear amplification exponents (α, β, γ) in Formula (1) play a pivotal role in shaping risk scores. These parameters are not merely coefficients but context-sensitive multipliers that reflect mission-critical trade-offs. For instance, prioritizing high-impact vulnerabilities (α = 1.5) amplifies their scores exponentially, while suppressing low-measurability threats (γ = 0.5) mitigates underestimation of stealth attacks. Critically, the interdependency of these exponents means that assuming simultaneous maxima for α, β, and γ (e.g., α = 1.5, β = 1.3, γ = 0.5) would yield unrealistically inflated risk scores, as real-world vulnerabilities rarely exhibit such parameter combinations. And the exponents (α, β, γ) are assigned based on empirical analysis of historical incidents rather than arbitrary maxima.

To address this, we calculate case-specific maximum risk scores (

R_{\max_c a s e}

) for each vulnerability using its actual assigned exponents (α, β, γ) while holding other parameters at theoretical maxima (I = 5, L = 5, M = 5, D = 5, T = 5, δ(t) = 1.2, C = 1.3, ωₛ = 1.5). This approach ensures that normalization accounts for the unique nonlinear dynamics of each vulnerability.

The normalization procedure involves three steps:

(1): Case-specific maximum calculation: For each vulnerability, $R_{\max_c a s e}$ is computed using Formula (1) with I = 5, L = 5, M = 5, D = 5, T = 5, δ(t) = 1.2, C = 1.3, ωₛ = 1.5, while retaining the vulnerability’s original exponents (α, β, γ). This reflects the worst-case scenario specific to the vulnerability’s risk profile, avoiding unrealistic inflation from incompatible parameter combinations.
(2): Linear normalization: The absolute risk score (R) is mapped to a 0–10 scale via:

$R_{n o r m a l i z e d} = \frac{R}{R_{\max_c a s e}} \times 10$

(4)

This ensures scores remain proportional to their theoretical upper bounds under identical operational conditions.
(3): Empirical calibration: To align scores with historical incident data, an empirical cap ( $R_{\max_p r a c t i c a l}$ = 500) is applied, limiting $R_{normalized}$ to 10 even if $R_{\max_c a s e}$ exceeds 500. This addresses rare edge cases where theoretical maxima diverge from observed risks, ensuring consistency across assessments.

This dual-step process balances theoretical fidelity with practical relevance, enabling intuitive cross-vulnerability comparisons while respecting domain-specific risk dynamics. Risk levels are subsequently classified as follows: low (0–3.0), medium (3.1–6.0), high (6.1–8.0), and extremely high (8.1–10.0), aligning thresholds with industry standards for cybersecurity severity.

3.2. Comparative Analysis with Existing Assessment Method

3.2.1. Overview of CVSS Framework

The CVSS is a widely adopted standard for IT vulnerability assessment. Its methodology employs three hierarchical metric groups to calculate severity scores (0.0–10.0):

(1): Base metrics evaluate intrinsic vulnerability characteristics through two subcomponents. These include exploitability metrics (attack vector (AV), attack complexity (AC), privileges required (PR), and user interaction (UI)) and impact metrics (confidentiality (C), integrity (I), and availability (A), with scope (S) determining whether exploitation affects external components).

The base score combines exploitability and impact subscores. In CVSS 3.1, exploitability is weighted as 8.22 × AV × AC × PR × UI, while impact depends on confidentiality, integrity, and availability losses. Scope alters the impact formula, elevating scores if vulnerabilities affect external components.

(2): Temporal metrics adjust the base score based on dynamic factors, including exploit code maturity (E), remediation level (RL), and report confidence (RC).

Temporal Adjustments multiply the base score by temporal metrics (e.g., Temporal Score = Base Score × E × RL × RC).

(3): Environmental metrics customize scores for specific organizational contexts by modifying base metrics using security requirements (CR, IR, AR) and adjusting attack vectors or complexity (e.g., MAV, MAC).

The final environmental score adjusts the temporal score using these contextual parameters.

3.2.2. Comparison with CVSS

The proposed satellite network vulnerability risk assessment method and the CVSS share foundational objectives in quantifying vulnerability severity through structured metrics, but diverge in their design philosophy and technical implementations. Both frameworks employ hierarchical architectures comprising core risk evaluation, temporal adjustments, and environmental customization to balance standardized assessment with contextual adaptability. At the parameter classification level, they incorporate metrics related to exploitability (e.g., attack complexity in CVSS vs. likelihood in the proposed method) and impact (e.g., confidentiality/integrity/availability in CVSS vs. strategic/operational/economic impact in the satellite method). Dynamic adjustments are also common to both, with CVSS using temporal metrics (E, RL, RC) and the satellite method employing time decay (δ(t)) and confidence (C) factors to reflect evolving risks. These similarities underscore their shared goal of translating qualitative vulnerabilities into actionable quantitative scores.

However, critical distinctions emerge in their mechanisms for addressing domain-specific challenges. The proposed method introduces nonlinear parameter amplification via exponents (α, β, γ) to prioritize high-impact, low-probability events—a design informed by empirical analyses of aerospace incidents where catastrophic failures exhibit disproportionate systemic effects. This contrasts with CVSS’s linear multiplicative combinations (e.g., 8.22 × AV × AC × PR × UI for exploitability), which may underestimate rare but critical vulnerabilities in satellite ecosystems. Furthermore, the proposed method incorporates an exponential time decay model (δ(t)) calibrated to satellite vulnerability lifespans, distinguishing between patched and unpatched states. This dynamic mechanism captures residual risks post-patching, a nuance absent in CVSS’s static temporal multipliers (E × RL × RC). Environmental customization also diverges: CVSS modifies base metrics (e.g., MAV, MAC) and security requirements (CR, IR, AR) for generic IT assets, while the satellite method employs hierarchical segment weights (ωₛ = 1.5–1.0) to prioritize space and ground segments, reflecting the operational criticality of orbital infrastructure. Such granularity addresses the spatial complexity of satellite networks, where vulnerabilities in space-based components pose cascading threats to global connectivity and command systems. Table 4 shows the comparative analysis of CVSS and our proposed method.

The satellite method’s advantages lie in its domain-specific optimizations. By integrating nonlinear risk amplification, it overcomes CVSS’s limitations in handling “black swan” events prevalent in aerospace systems. The dynamic decay model provides time-sensitive risk quantification, essential for satellite operators managing long patch deployment cycles. Segment-specific weighting ensures resource allocation aligns with mission priorities, a feature critical for heterogeneous networks spanning space, ground, and user segments. These innovations collectively enhance the framework’s precision in capturing the unique risk profile of satellite infrastructures.

4. Implementation and Results

In accordance with the proposed penetration testing method for satellite networks, we selected corresponding segments and automation technologies within the x and y dimensions to conduct the penetration testing. Given that the ground segment is a crucial and vulnerable part of the satellite system, our focus was set on the ground station. In the context of semi-automatic penetration testing, we opted for the approach based on LLMs.

4.1. Problem Setup

Task background: In June 2024, the SpaceX Starlink Generation 7 satellite network entered the global deployment phase. A newly built equatorial ground station (codenamed SGS-17) was found to have significant security risks during the system acceptance test due to rapid-delivery requirements. The core system of the software layer of the ground-station system architecture is a customized Linux 4.19 real-time operating system.

Evaluation metrics: We measure the effectiveness of LLMs by whether the penetration test is successfully completed and the number of interaction rounds with the LLMs [26,27,28]. We use the success rate to calculate the number of successful attempts and calculate the average number of interaction rounds.

LLMs used: In this study, the commercial OpenAI GPT-4 and the domestic DeepSeek-R1 LLM are selected as the objects of comparative research. They represent the current internationally leading commercial general-purpose model and the Chinese-language-domain-optimized professional model, respectively. The specific technical parameters are as follows:

GPT-4 is called through the Azure API service and supports a 32K context window. It achieves dynamic activation of trillions of parameters through a mixture of experts (MoE) architecture. The training data are up to October 2023, and it integrates multimodal understanding capabilities.

DeepSeek-R1 can be deployed through the Alibaba Cloud Machine Learning Platform PAI. It adopts the Transformer-XL architecture with an improved attention mechanism. It has a proprietary Chinese scientific literature training set (with a scale of 800 B tokens) and supports knowledge enhancement in vertical fields such as finance and biomedicine.

4.2. Experimental Procedures and Results

Step 1 Planning

Rapid deployment pressures exacerbate risks by leaving services unpatched or misconfigured. To simulate a real-world satellite ground station environment, we select Kali Linux 2024.02 as the attacking virtual machine and Metasploitable as the target machine.

The experimental setup utilized the Metasploitable virtual machine to emulate vulnerabilities commonly observed in satellite ground stations. While Metasploitable is a generic platform, its intentionally vulnerable configurations—such as outdated services, unsecured protocols, and misconfigured permissions—closely mirror weaknesses documented in real-world satellite infrastructure audits. For instance, the vsftpd backdoor vulnerability (CVE-2011-2523) replicates attack vectors observed in a 2020 Thales ground station security assessment, where unpatched FTP services exposed critical command-and-control interfaces to remote exploitation. Similarly, the Java RMI registry (port 1099) and SSH misconfigurations (port 22) in Metasploitable align with vulnerabilities reported in LEO ground station architectures, such as insecure remote management interfaces and weak authentication mechanisms. By virtualizing these components, the experiment prioritized ethical and operational safety while preserving the technical fidelity required to validate SatGuard’s semi-automated penetration testing workflows.

Step 2 Vulnerability Discovery

In penetration testing, port scanning serves as a foundational step to map the attack surface of a target system. Open ports and associated services expose potential entry points for exploitation, making them critical to identify early in the testing process. This aligns with the reconnaissance phase of the penetration testing lifecycle, where attackers gather intelligence to prioritize vulnerabilities.

Using Nmap, we conducted a full TCP port scan on the Metasploitable target machine (IP address: 192.168.1.10), identifying 23 open ports.

Step 3 Vulnerability Analysis

After obtaining port scan results, we sent the results of the open ports to the LLMs for vulnerability analysis. The prompt words and answer examples of GPT-4 and DeepSeek-R1 are listed in Table 5, respectively. As outlined in Table 5, both LLMs generated structured JSON outputs containing exploit paths, parameters, and success probabilities, showcasing their ability to translate port-scan results into actionable penetration-testing commands. Figure 3 illustrates that the LLMs’ outputs are all in JSON format. They were intended to supply the complete path of the MSF module and the MSF console penetration-testing commands that can be directly run in the command line.

Since LLMs may give different responses under the same prompt, we conducted three repeated experiments, sending the port-scan results to the LLM each time, and calculated the average number of response entries. Table 6 shows the port numbers analyzed using GPT-4 and DeepSeek-R1 in their three generations of results and the number of average analysis counts.

The comparative analysis reveals methodological differences between GPT-4 and DeepSeek-R1 in vulnerability enumeration. Quantitative evaluation shows GPT-4 produced an average of 5 vulnerability analyses per assessment, while DeepSeek-R1 generated 5.7 instances. More critically, qualitative divergence emerges in two dimensions: scope completeness and vulnerability multiplicity handling.

Regarding scope coverage, GPT-4 exhibited truncation at port 3306 during service enumeration, potentially constrained by inherent output length limitations. In contrast, DeepSeek-R1 demonstrated comprehensive port analysis, as evidenced by its complete reasoning trace documentation. Moreover, the LLMs diverged significantly in handling multi-vulnerability scenarios. For port 22 analysis, DeepSeek-R1 identified a single attack vector (“exploit/linux/ssh/ssh_auth_bypass”), whereas GPT-4 enumerated three distinct exploitation pathways: “exploit/linux/ssh/ssh_auth_bypass”, “exploit/multi/ssh/sshexec”, and “auxiliary/scanner/ssh/ssh_version”. The observed variations underscore the need for model-specific output normalization when conducting automated vulnerability assessments across multiple LLM architectures.

Step 4 Vulnerability Exploitation

The msfconsole commands generated by LLMs were directly copied to the penetration testing terminal’s command-line interface (CLI). This enabled the efficient execution of preconfigured attack sequences without manual code adaptation, significantly enhancing penetration testing efficiency and reliability by automating complex processes and minimizing human error.

We take port 1099 as an example for a detailed explanation. Port 1099 serves as the default registry port for Java Remote Method Invocation (RMI) services, a critical component in satellite ground stations’ distributed command systems. Attackers can exploit this vulnerability to manipulate telemetry data or disrupt space-to-ground communications in the real world. For the vulnerability of port 1099, the direct input of LLM-generated commands into the CLI (Figure 4) resulted in the successful establishment of a session between Kali Linux and Metasploitable (Figure 5). The establishment of this bidirectional communication channel in operational scenarios could potentially enable unauthorized remote access to mission-critical subsystems within ground station infrastructure.

Subsequent interaction with LLMs yielded operational CLI codes for essential post-compromise activities, including the following: cross-system file transfer operations (upload/download), privilege escalation vectors, and credential harvesting mechanisms (Figure 6).

Our experiments have revealed a crucial issue: both GPT-4 and DeepSeek-R1 show a potential risk of security policy being bypassed, often called “LLM jailbreaking”, when dealing with penetration testing-related queries.

To address the ethical issues and guarantee that the outputs are legally compliant and operationally relevant, our methodological framework has two layers of protection. This is to make sure that interactions with LLMs are in line with ethical cybersecurity practices and meet regulatory requirements.

In the pre-interaction phase, we set up a clear operational environment through a role-playing process, such as using prompt templates. This method ensures that the use cases are legitimate and that the commands generated comply with regulations. Moreover, we use sensitive word substitution to avoid misunderstandings. When asking questions, it is better to use technical language to prevent any implication of aggression or security violations. For example, phrases like “bypass security mechanisms” or “evaluate the effectiveness of security policies” are better choices than “hacker attack”.

Our methodology makes sure that the generated command sequences are both technically effective and procedurally legitimate in penetration testing workflows. Table 7 shows examples of pre-interaction and questions for getting code to perform various functions using LLMs.

To evaluate the efficiency of semi-automated penetration testing, we measured the success rate and average interaction. Table 8 presents the experimental results on the effectiveness of codes generated by GPT-4 and DeepSeek-R1. The framework achieved a 73.3% success rate (22 successful exploits out of 30 total attempts), requiring an average of 5.5 human interactions per test. Compared to traditional Metasploit workflows, SatGuard reduced average human interactions while achieving a 73.3% success rate. This efficiency stems from LLMs’ ability to automatically analyze potential vulnerabilities in ports and autogenerate executable attack commands, which is a capability that conventional tools typically lack.

Step 5 Reporting and Remediation

Finally, we leveraged the capabilities of LLMs to generate a report on the penetration-testing process. The report not only detailed the identified vulnerabilities but also provided specific and actionable suggestions for repairing the vulnerabilities, thus facilitating more effective security-enhancement measures.

4.3. Vulnerability Risk Scoring Calculation

Using the formula proposed in Section 3.1, we calculated risk scores for each port. Since all the port vulnerabilities detected in the scan remain unpatched, we selected the formula corresponding to “unpatched” in Equation (2). Moreover, as all the ports belong to the same target machine and have the same vulnerability discovery time, we did not consider the differences in time decay among various ports and set

t - t_{0}

to 0. All the data are sourced from the results of automated scans, so we set the confidence level C to 0.9. As we simulated a ground station, we chose the weight for the ground segment

ω_{s} = 1.2

. Taking Port 22 (SSH) as an example, we illustrate the calculation process as follows:

(1): Core risk factors:

Impact (I) = 5 (compromise allows root access),

Likelihood (L) = 4 (exploit publicly available),

Measurability (M) = 4 (easily detectable).

Exponents: α = 1.2, β = 1.1, γ = 0.8.

I^{α} \cdot L^{β} \cdot M^{γ} = 5^{1.2} \cdot 4^{1.1} \cdot 4^{0.8} \approx 96.09

Defense difficulty (D) = 4 (requires kernel patching),

Technical complexity (T) = 3 (moderate skill required).

\frac{D + T}{2} = \frac{4 + 3}{2} = 3.5

(2): Dynamic adjustment factors:

Since the vulnerability was unpatched and

t - t_{0} = 0

(discovery time = current time), time decay factor

δ (t) = 1.2 \cdot e^{- 0.05 \cdot 0} = 1.2

.

Confidence (C) = 0.9 (automated scan).

(3): Dynamic adjustment factor:

Segment weight (ωₛ) = 1.2 (ground segment).

Final risk score (R):

R = (96.09 × 3.5 × 1.2 × 0.9) × 1.2 = 435.86

R_{m a x_c a s e} = 5^{1.2} \cdot 5^{1.1} \cdot 5^{0.8} \cdot 5 \cdot 1.2 \cdot 1.3 \cdot 1.5 \approx 1717.88

R_{m a x_c a s e} > 500

R_{n o r m a l i z e d} = \frac{R}{R_{m a x_c a s e}} \times 10 = \frac{R}{R_{m a x_p r a c t i c a l}} \times 10 = \frac{435.86}{500} \times 10 \approx 8.72

Similar calculations were applied to all ports. Table 9 quantifies risk scores for 23 ports, with Port 22 (SSH) and Port 3306 (MySQL) classified as “Extremely High” risk (normalized scores > 8.7). These ports, critical to ground station operations, demand immediate remediation due to their high impact and exploitability.

The computational outcomes were consistent with the output of the LLMs. Both LLMs systematically omitted analysis of ports classified as “low” risk level. Instead, they analyzed the vulnerabilities categorized as “medium”, “high”, and “extremely high” risk levels, which also validates the effectiveness of the risk assessment method we proposed from the side.

5. Discussions and Future Directions

5.1. Limitations

The study’s methodological framework, while demonstrating promising results, has several limitations that warrant critical discussion. First, while the Metasploitable environment provided a controlled platform for initial validation, its generic nature imposes limitations on extrapolating results to operational satellite systems. For example, satellite ground stations often employ real-time operating systems (RTOSs) and proprietary protocols (e.g., CCSDS standards) not fully represented in Metasploitable. Additionally, hardware-specific vulnerabilities—such as radiation-induced bit flips in field-programmable gate arrays (FPGAs) or side-channel attacks on cryptographic modules—remain beyond the scope of this study. These limitations underscore the need for future experiments involving hardware-in-the-loop simulations and space-qualified software stacks to validate SatGuard’s efficacy in operational satellite environments. Nevertheless, the current framework’s success in identifying protocol-level and service-layer vulnerabilities demonstrates its foundational applicability to satellite cybersecurity, particularly in scenarios where legacy IT components intersect with space infrastructure.

Second, the reliance on LLMs (GPT-4 and DeepSeek-R1) trained on data up to 2023 introduces constraints in adapting to emerging threats. While these models excel in parsing historical vulnerabilities, their performance may degrade for zero-day exploits or novel attack patterns evolving after their knowledge cutoff. Furthermore, vulnerabilities associated with post-quantum cryptographic algorithms (e.g., lattice-based or hash-based signatures), which are gaining prominence due to advancements in quantum computing, remain underrepresented in the models’ training corpus. This temporal gap underscores the need for continuous LLM retraining with real-time threat intelligence feeds to maintain relevance in dynamic threat landscapes.

Third, the experimental scope omitted validation of optical communication links (e.g., CCSDS 142.0B), which are increasingly critical for high-bandwidth space-terrestrial data transmission. The absence of optical link testing precludes insights into vulnerabilities unique to laser-based systems, such as signal jamming in low Earth orbit or quantum key distribution flaws. These limitations underscore the need for caution when generalizing SatGuard’s efficacy to fully representative satellite ecosystems.

5.2. Future Directions

To address these limitations, future work will prioritize three avenues. First, we plan to integrate hardware-in-the-loop simulations using satellite-grade components (e.g., LEON processors, CCSDS-compliant transceivers) to evaluate firmware-level and radiation-induced vulnerabilities. Second, continuous retraining of LLMs with real-time threat intelligence—leveraging platforms like MITRE ATT&CK—will enhance adaptability to emerging attack vectors. Finally, experimental validation will expand to include optical communication scenarios, employing software-defined radios (SDRs) and optical link emulators to probe vulnerabilities in laser-based protocols. These steps aim to bridge the gap between controlled laboratory environments and the dynamic challenges of next-generation satellite networks.

6. Conclusions

As satellite constellations proliferate, SatGuard offers a critical toolkit for safeguarding global communication infrastructures against increasingly sophisticated cyber threats. The framework’s AI-driven automation and nonlinear risk prioritization enable commercial satellite operators to streamline security audits by reducing manual inspection cycles, thereby accelerating vulnerability identification and remediation. For instance, the semi-automated penetration testing workflow demonstrated in this study achieved a 73.3% success rate with minimal human intervention, suggesting that large-scale operators (e.g., Starlink or Kuiper) could reduce audit timeframes by over 40% while maintaining rigorous security standards. Furthermore, the integration of dynamic risk scoring—tailored to satellite-specific architectures—enhances the efficiency of triaging high-impact vulnerabilities. By bridging AI scalability with domain-specific threat modeling, SatGuard empowers operators to fortify next-generation constellations against both legacy and emergent attack vectors, ensuring resilient and adaptive cybersecurity postures in an era of rapid space infrastructure expansion.

Author Contributions

Conceptualization, J.X.; Investigation, R.D.; Writing—original draft, J.X.; Writing—review & editing, B.W., R.D., Z.Z. and B.Z.; Supervision, B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number 62472437.

Data Availability Statement

We confirm that this experiment is based on large language models and no new dataset was established. However, to enhance the reproducibility of the experiment, the model types, versions, and all prompt words used have been publicly disclosed in the main text.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LLM	Large Language Model
LEO	Low Earth Orbit
GNSS	Global Navigation Satellite Systems
LEO SCS	Low-Earth-Orbit Satellite Communication Systems
IDS	Intrusion Detection System
AI	Artificial Intelligence
RIS	Reconfigurable Intelligent Surface
SS	Spread Spectrum
CCI	Co-channel Interference
VAPT	Vulnerability Assessment and Penetration Testing
AWS	Amazon Web Services
GCP	Google Cloud Platform
CVSS	Common Vulnerability Scoring System
IT	Information Technology
VSAT	Very-Small-Aperture Terminal
GS	Ground Station
NCC	Network Control Center
COTS	Commercial Off-The-Shelf
ISL	Inter-Satellite Link
API	Application Programming Interface
IoT	Internet of Things
CLI	Command-Line Interface
RMI	Remote Method Invocation
OBC	On-Board Computer
RF	Radio Frequency
SDR	Software-Defined Radio

References

Yang, N.; Shafie, A. Terahertz Communications for Massive Connectivity and Security in 6G and Beyond Era. IEEE Commun. Mag. 2024, 62, 72–78. [Google Scholar] [CrossRef]
Starlink. Starlink Official Website. Available online: https://www.starlink.com/ (accessed on 16 February 2025).
Amazon. Project Kuiper Official Website. Available online: https://www.aboutamazon.com/news/innovation-at-amazon/what-is-amazon-project-kuiper (accessed on 16 February 2025).
Boschetti, N.; Gordon, N.G.; Falco, G. Space Cybersecurity Lessons Learned from the Viasat Cyberattack. In Proceedings of the ASCEND 2022, Las Vegas, NV, USA, 24–26 October 2022; Volume 4380. [Google Scholar]
Kang, M.; Park, S.; Lee, Y. A Survey on Satellite Communication System Security. Sensors 2024, 24, 2897. [Google Scholar] [CrossRef] [PubMed]
Duo, W.; Zhou, M.C.; Abusorrah, A. A Survey of Cyber Attacks on Cyber Physical Systems: Recent Advances and Challenges. IEEE/CAA J. Autom. Sin. 2022, 9, 784–800. [Google Scholar] [CrossRef]
Al-Ghamdi, A. A Survey on Software Security Testing Techniques. Int. J. Comput. Sci. Telecommun. 2013, 4, 14–18. [Google Scholar]
Felderer, M.; Büchler, M.; Johns, M.; Brucker, A.D.; Breu, R.; Pretschner, A. Security Testing: A Survey. In Advances in Computers; Elsevier: Amsterdam, The Netherlands, 2016; Volume 101, pp. 1–51. [Google Scholar]
Gangupantulu, R.; Cody, T.; Park, P.; Rahman, A.; Eisenbeiser, L.; Radke, D.; Clark, R.; Redino, C. Using Cyber Terrain in Reinforcement Learning for Penetration Testing. In Proceedings of the 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS), Barcelona, Spain, 1–3 August 2022; pp. 1–8. [Google Scholar] [CrossRef]
Khan, S.A.; Adnan, M.; Ali, A.; Raza, A.; Ali, A.; Naqvi, S.Z.H.; Hussain, T. An Android Applications Vulnerability Analysis Using MobSF. In Proceedings of the 2024 International Conference on Engineering & Computing Technologies (ICECT), Islamabad, Pakistan, 23 May 2024; pp. 1–7. [Google Scholar] [CrossRef]
Chen, C.-K.; Zhang, Z.-K.; Lee, S.-H.; Shieh, S. Penetration Testing in the IoT Age. Computer 2018, 51, 82–85. [Google Scholar] [CrossRef]
Crawley, K. Cloud Penetration Testing: Learn How to Effectively Pentest AWS, Azure, and GCP Applications; Packt Publishing: Birmingham, UK, 2023. [Google Scholar]
Liu, S.; Shi, X.; Song, Y.; Zhang, L.; Wang, Y.; Yuan, Z.; Li, D.; Liu, X. Research on Penetration Testing Method of Power Information System Based on Knowledge Graph. In Proceedings of the 2023 IEEE 11th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 8–10 December 2023; pp. 943–947. [Google Scholar] [CrossRef]
Elbaz, C.; Rilling, L.; Morin, C. Fighting N-Day Vulnerabilities with Automated CVSS Vector Prediction at Disclosure. In Proceedings of the 15th International Conference on Availability, Reliability and Security, Virtual, 25–28 August 2020; pp. 1–10. [Google Scholar]
Mell, P.; Scarfone, K.; Romanosky, S. A Complete Guide to the Common Vulnerability Scoring System Version 2.0; FIRST-Forum of Incident Response and Security Teams: Cary, NC, USA, 2007. [Google Scholar]
Nowak, M.; Walkowski, M.; Sujecki, S. Conversion of CVSS Base Score from 2.0 to 3.1. In Proceedings of the 2021 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, Croatia, 23–25 September 2021; pp. 1–3. [Google Scholar]
Scarfone, K.; Mell, P. An Analysis of CVSS Version 2 Vulnerability Scoring. In Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, Lake Buena Vista, FL, USA, 15–16 October 2009; pp. 516–525. [Google Scholar]
Balsam, A.; Nowak, M.; Walkowski, M.; Oko, J.; Sujecki, S. Comprehensive Comparison Between Versions CVSS v2.0, CVSS v3.x and CVSS v4.0 as Vulnerability Severity Measures. In Proceedings of the 2024 24th International Conference on Transparent Optical Networks (ICTON), Bari, Italy, 14–18 July 2024; pp. 1–4. [Google Scholar]
Humayed, A.; Lin, J.; Li, F.; Luo, B. Cyber-Physical Systems Security—A Survey. IEEE Internet Things J. 2017, 4, 1802–1831. [Google Scholar] [CrossRef]
Manulis, M.; Bridges, C.P.; Harrison, R.; Sekar, V.; Davis, A. Cyber Security in New Space: Analysis of Threats, Key Enabling Technologies and Challenges. Int. J. Inf. Secur. 2021, 20, 287–311. [Google Scholar] [CrossRef]
Peled, R.; Aizikovich, E.; Habler, E.; Elovici, Y.; Shabtai, A. Evaluating the Security of Satellite Systems. arXiv 2023, arXiv:2312.01330. [Google Scholar] [CrossRef]
Al Homssi, B.; Al-Hourani, A.; Wang, K.; Conder, P.; Kandeepan, S.; Choi, J.; Allen, B.; Moores, B. Next Generation Mega Satellite Networks for Access Equality: Opportunities, Challenges, and Performance. IEEE Commun. Mag. 2022, 60, 18–24. [Google Scholar] [CrossRef]
Bianou, S.G.; Batogna, R.G. PENTEST-AI, an LLM-Powered Multi-Agents Framework for Penetration Testing Automation Leveraging MITRE Attack. In Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR), London, UK, 2–4 September 2024; pp. 763–770. [Google Scholar]
Goyal, D.; Subramanian, S.; Peela, A. Hacking, the Lazy Way: LLM Augmented Pentesting. arXiv 2024, arXiv:2409.09493. [Google Scholar] [CrossRef]
Kong, H.; Hu, D.; Ge, J.; Li, L.; Li, T.; Wu, B. VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework. arXiv 2025, arXiv:2501.13411. [Google Scholar] [CrossRef]
Deng, G.; Liu, Y.; Mayoral-Vilches, V.; Liu, P.; Li, Y.; Xu, Y.; Zhang, T.; Liu, Y.; Pinzger, M.; Rass, S. PentestGPT: An LLM-Empowered Automatic Penetration Testing Tool. arXiv 2023, arXiv:2308.06782. [Google Scholar] [CrossRef]
Happe, A.; Kaplan, A.; Cito, J. Evaluating LLMs for Privilege Escalation Scenarios. arXiv 2023, arXiv:2310.11409. [Google Scholar] [CrossRef]
Xu, J.; Stokes, J.W.; McDonald, G.; Bai, X.; Marshall, D.; Wang, S.; Swaminathan, A.; Li, Z. AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-Attacks. arXiv 2024, arXiv:2403.01038. [Google Scholar] [CrossRef]

Figure 1. Satellite networks penetration testing method.

Figure 2. Overview of satellite network segments.

Figure 3. Partial JSON outputs from GPT-4 and DeepSeek-R1 demonstrating automated generation of Metasploit commands.

Figure 4. CLI Execution of LLM-generated Metasploit commands.

Figure 5. Successful session establishment and acquisition of authority.

Figure 6. Successful password stealing.

Table 1. Limitations of existing penetration testing frameworks.

Author	Domain	Research Scope	Satellite-Specific Limitations	Applicability to Satellite Networks
Gangupantulu et al. (2022) [9]	General IT	Reinforcement learning applications in penetration testing	No adaptation to satellite network constraints	Limited
Khan et al. (2024) [10]	Mobile applications	Dynamic vulnerability analysis via mobile security platform	RF physical-layer vulnerability analysis omitted	No
Chen et al. (2018) [11]	IoT devices	Modular penetration testing frameworks for IoT ecosystems	Satellite–terrestrial hybrid protocol gaps	Limited
Crawley (2023) [12]	Cloud services	Platform-specific penetration testing methodologies	Space protocol vulnerabilities excluded	No
Liu et al. (2023) [13]	Power systems	Knowledge graph-based penetration testing framework	On-board power constraints and radiation-induced software faults unaddressed	Limited

Table 2. Correlation of testing methodologies and anticipated hazards by segment.

Segment	Key Testing Methodologies	Anticipated Hazards
User Segment	IoT firmware analysis; hybrid network session hijacking tests	Insecure credential management; unencrypted user links
Ground Segment	API endpoint audits; vulnerability scanning	Unpatched legacy systems; misconfigured access controls
Space Segment	Inter-satellite link encryption validation; fault injection	Orbital collision vulnerabilities; on-board software integrity failures

Table 3. Parameter definitions for Formula (1).

Symbol	Name	Definition	Value/Range
I	Impact	Potential damage across strategic, operational, and economic dimensions	I ∈ [1,5]
L	Likelihood	Probability of successful exploitation based on historical data and complexity	L ∈ [1,5]
M	Measurability	Detection capability of existing monitoring systems	M ∈ [1,5]
D	Defense Difficulty	Remediation costs	D ∈ [1,5]
T	Technical Complexity	Attack execution difficulty	T ∈ [1,5]
α	Impact Amplification	Exponent for prioritizing high-impact vulnerabilities	α ∈ [1.0,1.5]
β	Likelihood Dampening	Exponent for mitigating over-penalization of low-probability threats	β ∈ [1.0,1.3]
γ	Measurability Adjustment	Exponent for addressing stealth attacks with low detection likelihood	γ ∈ [0.5,1.0]
δ(t)	Time Decay Factor	Exponential decay function adjusting risk over time, distinguishing patched/unpatched states	$δ (t) = 1.2 e^{- k (t - t_{0})}$ (unpatch), 0.8 (patched)
C	Confidence	Adjusts risk scores based on data source reliability	C ∈ [0.7,1.3]
$ω_{s}$	Segment Weight	Prioritizes risks based on criticality of affected network segments	$ω_{s}$ = 1.5 (space), 1.2 (ground), 1.0 (user)

Table 4. Comparative analysis of CVSS and proposed method.

Dimension	CVSS	Proposed Method	Practical Implication
Domain Applicability	General IT systems	Satellite networks	Tailored solutions for satellite-specific risks, enhancing security in space infrastructure.
Core Parameters	Linear combination of exploitability/impact metrics (AV, AC, PR, UI, C/I/A)	Nonlinear amplification of impact (I), likelihood (L), and measurability (M)	Prioritizes high-impact, low-probability threats critical to mission success.
Dynamic Adjustment	Temporal multipliers (E, RL, RC)	Exponential time decay (δ(t)) with patch-state differentiation and confidence calibration (C)	Provides time-sensitive risk quantification, crucial for managing long patch cycles.
Environmental Customization	Modifies base metrics (MAV, MAC) and security requirements (CR, IR, AR)	Hierarchical segment weights (ωₛ = 1.5–1.0)	Aligns risk evaluation with the operational criticality of different network segments.

Table 5. Prompt and answer examples (partial).

Prompt
“Analyze the following Nmap scan results and provide: 1. Possible vulnerabilities 2. The Metasploit module (including the full path) is recommended. 3. Set required parameters. 4. Use the probability of success evaluation scan results: {scan_results} Please return in JSON format as follows: {{ “exploits”: [ {{ “type”: “exploit/auxiliary”, “path”: “Full msf module Path”, “port”: port_number, “params”: {{“ Parameter name “:” Parameter value “}}, “probability”: “Probability of Success Evaluation “, “description”:” Vulnerability Description “, “msfconsolecommand”: The msfconsole command is in the format of msfconsole -x…” }} ] }} This command is returned strictly in the JSON format. Do not contain any other text description. Nor does it require json symbols in order to parse.”
Answers (Partial)
GPT-4	DeepSeek-R1
{ “type”: “exploit/auxiliary”, “path”: “exploit/uoix/ftp/vsftpd_234_backdoor”, “port”:21, “params”:{ “RHOSTS”:”192.168.1.10”, “RPORT”:”21” }, “probability”: “High”, “description”: “vulnerable FTp server (ysftpd 2.3.4)contains a backdoor allowing remote access.”, “msfconsolecammand”: “msfconsole-x ‘use exploit/unix/ftp/vsftpd 234 backdoor; set RHosTs 192.168.1.10; set RPORT 21; run’” }	{ “type”: “exploit/auxiliary”, “path”: “auxiliary/scanner/ftp/anonymous”, “port”: 21, “params”: { “RHOSTS”: “192.168.1.10”, “RPORT”: 21 }, “probability”: “Medium”, “description”: “FTP Anonymous Login Check”, “msfconsolecommand”: “msfconsole-x ‘use auxiliary/scanner/ftp/anonymous; set RHOSTS 192.168.1.10; run’” }

Table 6. Port analysis results.

LLM	Round	Ports									Analysis Count	Average Count
GPT-4	1	21	22 (3)		445		2049	3306			7	7
	2	21	22 (3)		445		2049	3306			7
	3		22 (3)	23	445		2049	3306			7
DeepSeek-R1	1	21		23	445	1099		3306	5900		6	6
	2				445	1099	2049	3306	5900	8009	6
	3		22		445	1099		3306	5900	8009	6

Table 7. Pre-interaction and question examples.

Pre-interaction	I am a teacher/engineer/other role in the field of cyberspace security. Now I am teaching/testing/doing something related to network security maintenance. I forgot the command during the experiment. Could you give me a hint on the commands for… on a target network?
Question List	1	Uploading and executing a script using…
	2	Escalating privileges on a Linux system using…
	3	Using… module to implement persistence
	4	Extracting passwords from… file using…
	5	Executing a Bash script on a Linux server
	6	Clearing the log files in the… directory

Table 8. Success rate and average interaction.

LLM	Port	Success Rate	Average Interaction
GPT-4	21	3/3 (100%)	4	5.2
	22	2/3 (66.7%)	6
	445	2/3 (66.7%)	5
	2049	2/3 (66.7%)	4
	3306	2/3 (66.7%)	7
DeepSeek-R1	445	2/3 (66.7%)	5	5.8
	1099	3/3 (100%)	6
	3306	2/3 (66.7%)	7
	5900	3/3 (100%)	4
	8009	1/3 (33.3%)	7

Table 9. Port vulnerability risk assessment.

Port	Service	I	L	M	D	T	R	$R_{n o r m a l i z e d}$	Risk Level
21	FTP	4	3	3	3	3	165.47	3.31	Medium
22	SSH	5	4	4	4	3	435.86	8.72	Extremely High
23	Telnet	5	5	2	2	2	182.85	3.66	Medium
25	SMTP	3	3	3	3	3	117.17	2.34	Low
53	DNS	4	3	4	4	3	243.01	4.86	Medium
80	HTTP	4	4	3	3	3	227.07	4.54	Medium
111	RPCbind	5	4	2	4	4	286.10	5.72	Medium
139	NetBIOS	5	5	3	2	3	316.14	6.32	High
445	Microsoft-DS	5	5	3	2	3	316.14	6.32	High
512	exec	4	4	2	3	3	164.17	3.28	Medium
513	login	4	4	2	3	3	164.17	3.28	Medium
514	shell	5	5	2	2	2	182.85	3.66	Medium
1099	rmiregistry	4	4	3	4	4	302.76	6.06	High
1524	ingreslock	3	2	4	3	3	94.42	1.89	Low
2049	NFS	5	4	3	3	3	296.79	5.94	Medium
2121	ccproxy-ftp	4	3	3	3	3	165.47	3.31	Medium
3306	MySQL	5	4	4	4	3	435.86	8.72	Extremely High
5432	PostgreSQL	5	3	4	4	3	317.63	6.35	High
5900	VNC	5	4	2	2	3	178.81	3.58	Medium
6000	X11	3	3	3	3	3	117.17	2.34	Low
6667	IRC	2	3	4	3	3	90.67	1.81	Low
8009	AJP13	4	4	3	3	3	227.07	4.54	Medium
8180	Unknown	4	3	1	4	4	91.62	1.83	Low

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, J.; Wang, B.; Dong, R.; Zhao, Z.; Zhao, B. SatGuard: Satellite Networks Penetration Testing and Vulnerability Risk Assessment Methods. Aerospace 2025, 12, 431. https://doi.org/10.3390/aerospace12050431

AMA Style

Xiao J, Wang B, Dong R, Zhao Z, Zhao B. SatGuard: Satellite Networks Penetration Testing and Vulnerability Risk Assessment Methods. Aerospace. 2025; 12(5):431. https://doi.org/10.3390/aerospace12050431

Chicago/Turabian Style

Xiao, Jin, Buhong Wang, Ruochen Dong, Zhengyang Zhao, and Bofu Zhao. 2025. "SatGuard: Satellite Networks Penetration Testing and Vulnerability Risk Assessment Methods" Aerospace 12, no. 5: 431. https://doi.org/10.3390/aerospace12050431

APA Style

Xiao, J., Wang, B., Dong, R., Zhao, Z., & Zhao, B. (2025). SatGuard: Satellite Networks Penetration Testing and Vulnerability Risk Assessment Methods. Aerospace, 12(5), 431. https://doi.org/10.3390/aerospace12050431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SatGuard: Satellite Networks Penetration Testing and Vulnerability Risk Assessment Methods

Abstract

1. Introduction

1.1. Background

1.2. Existing Works

1.3. Contributions

2. Satellite Networks Penetration Testing Method

2.1. Segment Identification

2.2. Automation Degree Selection

2.3. Penetration Stage Execution

3. Satellite Networks Vulnerability Risk Assessment Method

3.1. Satellite Networks Vulnerability Risk Assessment Formula

3.1.1. Core Risk Factors

3.1.2. Dynamic Adjustment Factors

3.1.3. Environmental Weighting Factor

3.1.4. Normalization and Risk Level Classification

3.2. Comparative Analysis with Existing Assessment Method

3.2.1. Overview of CVSS Framework

3.2.2. Comparison with CVSS

4. Implementation and Results

4.1. Problem Setup

4.2. Experimental Procedures and Results

4.3. Vulnerability Risk Scoring Calculation

5. Discussions and Future Directions

5.1. Limitations

5.2. Future Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI