A Review of Practical AI for Remote Sensing in Earth Sciences

.


Introduction
Remote sensing is a technology that enables data collection without direct contact with the subject, utilizing sensors to measure or detect various types of energy, such as electromagnetic radiation and acoustic signals, emitted, reflected, or scattered by the object under investigation [1].Multiple sensors and platforms have been developed for remote sensing.As sensors continue to advance, the amount of remote sensing data generated has reached staggering proportions.For example, according to NASA's Earth Science Data Systems (ESDS), the Earthdata Cloud held more than 59 petabytes (PB) of data as of September 2021.ESDS estimates that this amount is expected to increase to more than 148 PB in 2023, 205 PB in 2024, and 250 PB in 2025 [2].To effectively manage this massive volume of remote sensing data, preprocessing techniques, including noise reduction and sensor calibration using a variety of algorithms and data compression algorithms, are utilized to minimize the data size, while computer systems with ample memory and parallel processing capabilities facilitate the handling of these large datasets [3].
With the increasing data quality and volume from remote sensing platforms, there is a need for computational platforms and effective tools to handle and extract valuable information from remote sensing datasets.AI tools can assist in managing large volumes of observations, modeling, analysis, and environmental forecasting, and have proven effective for key tasks such as noise reduction [4], data fusion [5], object detection [6,7], and many other important applications.As AI technologies develop, acquiring and storing remote sensing data becomes increasingly important.The process of obtaining this large volume of data entails using various sensors on different platforms, such as Unmanned Aerial Vehicles (UAVs) [8], unmanned ground vehicles (UGVs), aircraft, and satellites.These sensors, including Global Positioning System (GPS), Inertial Measurement Unit (IMU), LiDAR, and cameras, play an important role in capturing diverse types of energy, such as electromagnetic radiation and acoustic signals, emitted, reflected, or scattered by the objects of interest.In remote sensing, fusing data from multiple sensors, such as Li-DAR, multispectral or hyperspectral imaging, and radar, facilitates comprehensive and detailed analysis of the Earth's surface, atmosphere, and environment [9].In advanced applications, AI-powered onboard and ground processing systems take center stage, autonomously handling critical tasks like calibration, filtering, filling, and scaling [10,11].These algorithms identify intricate patterns and detect anomalies, minimizing subjectivity and bias in the analysis process and empowering researchers to efficiently assimilate, analyze, and interpret vast amounts of remote sensing data with unprecedented speed and accuracy.
A number of challenges related to AI approaches may limit their practical applications.For example, training AI algorithms, especially deep learning models, requires significant computational resources, making them challenging to develop on resource-constrained shared devices.Many neural network-based models are often considered blackbox models, and understanding the reasons behind AI predictions is difficult but critical for gaining trust and ensuring effective decision making [12].Creating labeled datasets for training AI models in remote sensing can be labor-intensive and time consuming, especially for fine-grained or multi-class tasks [13], and transferring AI models trained on one dataset to perform well on different datasets can also require additional resources.Incorporating domain-specific knowledge and expertise into AI models is essential to ensure the representation of relevant features and relationships [14,15].
To successfully deploy an operational AI model, there are a few critical steps to consider.First, real-world applications usually need AI models to scale efficiently to process large-scale remote sensing data in real time, with minimal turnaround.Practical AI systems require collaborative platforms for AI developers, domain experts, and remote sensing practitioners working together to share knowledge, data, and best practices, with public-facing applications displaying user-friendly tools and interfaces that enable non-experts to leverage AI capabilities for remote sensing applications effectively.Uncertainty estimates are also needed for decision-making processes, especially in accuracy-critical applications like precision agriculture and environmental monitoring [16].When integrated with social media and sensitive data, AI systems need to address privacy concerns, ethical considerations, and compliance with local and international regulations.
This review paper aims to comprehensively evaluate and synthesize the existing literature on the need to develop practical AI in remote sensing.We aim to provide valuable insights that inform future research and applications.Key contributions of this paper include the following: 1. Overview of successful examples of practical AI in research and real-world applications; 2. Discussion of research challenges and reality gaps in the practical integration of AI with remote sensing; 3. Emerging trends and advancements in practical AI techniques for remote sensing; 4. Common challenges practical AI face in remote sensing, such as data quality, availability of training data, interpretability, and the requirement for domain expertise; 5. Potential practical AI solutions and ongoing or future real-world applications.We adopted a structured approach to organize this paper.First, we commence with a background, which is a significant section that provides crucial context on AI and remote sensing, emphasizing key techniques.Subsequently, we explore various applications of AI in remote sensing, presenting an overview of the methods employed and relevant use cases.Additionally, we discuss the challenges related to AI integration in remote sensing.Finally, we summarize futuristic AI applications that can potentially transform various fields beyond what we currently imagine.

Basics of AI and Remote Sensing
This section comprehensively explores the fundamental concepts of remote sensing and discusses key AI techniques in this field.A systematic literature review was conducted to achieve a comprehensive understanding, encompassing reputable sources such as peer-reviewed publications, conference papers, and technical reports.The selected literature was critically analyzed, and key insights and findings were synthesized to provide comprehensive coverage of a broad spectrum of AI techniques in remote sensing.

Brief Recap of Remote Sensing Technologies
Understanding the fundamental principles of remote sensing is important for comprehending its diverse techniques and applications and integrating them with AI techniques.Remote sensing systems are built to take advantage of the various parts of the electromagnetic spectrum (Figure 1) and atmospheric windows to observe different targets.Passive sensors detect natural energy emitted or reflected by the Earth, such as optical sensors that capture sunlight reflection (Figure 2a), whereas active sensors emit energy and measure the reflected or backscattered signals (Figure 2b).This wide range of sensors enables remote sensing data to be acquired via satellites for global coverage, aircraft for higher spatial resolution, and drones for small-scale data collection [17].Once remote sensing data is acquired, interpreting images and digital data becomes crucial in extracting meaningful information.Digital image processing techniques include filtering, image fusion, feature extraction, and classification algorithms, enabling the extraction of valuable insights [18].The following paragraphs describe the main remote sensing techniques, while AI methods that assist with data processing are presented in Section 2.2.

Optical Remote Sensing
This technique focuses on gathering and interpreting optical data, primarily within the visible and near-infrared sections of the electromagnetic spectrum (Figure 1) [19].As sunlight interacts with the Earth's surface, materials on the surface absorb and reflect specific wavelengths of light.This interaction creates unique spectral signatures that are characteristic of different surface features [20].The sensors, available in handheld, airborne, and spaceborne modes, contain detectors that record light intensity across different wavelengths (Figure 3).The recorded data are transmitted to ground stations or processing centers, where they are processed and transformed into images or spectral data.
In the context of optical remote sensing image (RSI) object detection, the primary objective is to ascertain whether a given aerial or satellite image contains pertinent objects and precisely determine their locations [21].To ensure image quality, several processing steps are undertaken.Preprocessing involves noise removal and contrast enhancement to improve clarity and interpretability, followed by feature extraction, where relevant characteristics are identified and extracted from the images for further analysis.The ultimate objective is to classify objects within the images and assess the accuracy of the results.This classification process allows for effective interpretation and understanding of the image information.An accuracy assessment is also performed to verify the reliability and precision of the results.
In optical remote sensing, three primary modes are commonly used as follows: handheld, airborne, and spaceborne.Handheld sensors capture spectral signatures of ground objects, facilitating ground-truthing and small-scale data collection.Airborne sensors mounted on airplanes or drones offer higher spatial resolution and efficient coverage of larger areas, making them useful for tasks such as land cover/land-use mapping [22], crop health assessment, and identification of ecological hotspots.Spaceborne sensors on satellites provide extensive coverage and repeated observations over time, enabling the mapping of large areas, monitoring changes in land use, tracking migratory patterns, and observing atmospheric conditions.The wealth of data collected by spaceborne sensors contribute significantly to various applications, including environmental monitoring, urban planning, disaster management [23], and climate studies.Vegetation indices, like the Normalized Difference Vegetation Index (NDVI) [24], are derived from optical remote sensing data by analyzing reflectance and absorption [25].They serve as early detectors of nutrient deficiencies by studying light reflection changes [26].For instance, higher near-infrared reflection often means nitrogen shortage, whereas less red light reflection could indicate phosphorus deficiency [27,28].Monitoring these indices over time offers predictive insights into vegetation growth dynamics, which extends to crop trends.AI analysis of historical data uncovers vegetation responses to changing conditions and can inform fertilizer and pesticide use by farmers, resulting in resource savings, higher yields, and reduced chemical reliance.AI-based methods also proved valuable for deriving snow-covered areas from sensors with radiometric information limited to visible and near-infrared bands [29,30], allowing for applications in environmental monitoring at m-scale spatial resolution.

Radar Remote Sensing
This technique operates in the microwave region of the electromagnetic spectrum (Figure 1), involving the transmission and reception of microwave waves [31].A radar antenna emits pulses of microwave radiation toward the Earth or space, capturing the echoes reflected by the targets and containing data regarding the targets' characteristics, including distance, direction, shape, size, roughness, and dielectric properties (Figure 4) [32].By analyzing the time and intensity of the echo signals, radar remote sensing can generate images or maps of the targets with varying resolutions and perspectives.It is widely used in mapping land surfaces, monitoring weather patterns, studying ocean currents, and detecting objects such as buildings and vehicles [33].Synthetic Aperture Radar (SAR) produces high-resolution surface images and is particularly valuable for large-scale forest cover mapping because it can penetrate clouds and foliage, enabling accurate mapping even in challenging weather or limited visibility conditions [34].The dual-polarization technology employed by SAR allows for differentiation between different forest canopy types and the underlying vegetation.When the radar signal encounters the forest canopy, it scatters, with a portion of the signal returning to the radar instrument.This returned signal carries crucial information about forest structure and biomass.By incorporating dual-polarization radar, the accuracy and comprehensiveness of forest mapping are enhanced, providing detailed insights into both the forest structure and underlying vegetation.The ability of SAR to effectively distinguish between various forest canopy types and the vegetation beneath them is a significant advantage.This capability enables SAR to generate high-resolution data that can detect changes in forest cover with exceptional precision [35].

LiDAR
LiDAR operates by emitting pulsed lasers that reach a target, and the time it takes for the reflected light to return to the sensor is precisely measured to calculate the distance between the sensor and the object (Figure 5) [36].For airborne surveys, the distance traveled is then converted to elevation, and multiple returns allow for mapping forests and tree heights [37,38] Figure 5. LiDAR systems incorporate GPS systems, which identify the locations of the emitted light energy, and an inertial measurement unit, IMU, which provides the aircraft's orientation in the sky.LiDAR systems record the reflected rays of light in the form of a waveform or distribution in two different ways.In a Discrete Return LiDAR System [39], the waveform curve is analyzed to identify individual peaks, with individual points on the ground recorded at each peak location, whereas a full waveform LiDAR System records the complete distribution of the returned light energy, and although data processing is more complex, it has the potential to capture a larger amount of information compared to discrete return LiDAR systems.Whether collected as discrete points or entire waveforms, LiDAR data are often available as a LiDAR point cloud, which represents a three-dimensional collection of points in space.

Thermal Remote Sensing
This technique is a passive remote sensing method that measures the radiant flux emitted by ground objects within specific wavelength ranges, typically 3-5 μm and 8-14 μm [40,41].Thermal cameras, radiometers, and other sensors are utilized to capture energy within the thermal infrared range.The thermal detector can be either cryogenic or uncooled and converts the data into electrical signals, which are then processed to generate thermal images or temperature data of the target object or surface.By analyzing these thermal images and data, valuable information about the object's emissivity, reflectivity, and temperature can be obtained.Factors that can impact the accuracy of TIR remote sensing data include atmospheric conditions, changes in solar illumination, and variations in target emissivity and radiance.To address these uncertainties, TIR data often undergo calibration or correction processes to ensure precise temperature measurement and analysis.Thermal remote sensing can be employed in environmental monitoring and wildfire detection [42,43].As an example, Figure 6 shows a temperature map derived from ECOS-TRESS data collected during the historic Pacific Northwest heatwave in 2021.

Multispectral and Hyperspectral Imaging
Multi-spectral cameras have the ability to detect a broader range of wavelengths beyond the visible spectrum, including infrared and ultraviolet (Figure 1).It relies on spectral signature rather than spatial shape to detect and discriminate among different materials in a scene [44].The camera captures a sequence of images using different filters that target specific wavelengths or bands of light in parallel, forming a comprehensive dataset containing information from various spectral channels.The images then undergo a series of processing steps, including normalization, calibration, alignment, registration, noise reduction, and enhancement.Hyper-spectral imaging (HSI) [45] is a more advanced technique that collects information across the electromagnetic spectrum with very high spectral resolution from ground objects using hundreds of narrow bands [46].HSI data contain numerous narrow spectral bands, creating a dataset known as a hyper-spectral image cube containing spatial dimensions (x, y coordinates) and spectral bands (wavelengths) and enabling detailed analysis of reflected or emitted light at specific spectral intervals.However, the high-dimensional and noisy nature of the data poses analysis challenges, requiring the application of algorithms that facilitate denoising, classification, detection, and other tasks.It should be noted that there is no absolute threshold on the number of bands that distinguish between multispectral and hyperspectral remote sensing [47].
Above all, data from multiple sensors are combined to gain a deeper understanding of the system investigated [48,49].Table 1 provides an overview of the pros and cons of each remote sensing technique, with its advantages, limitations, and applications.

Conventional Machine Learning in Remote Sensing
The remote sensing community has extensively utilized conventional machine learning methods for various tasks such as classification, object detection, and geophysical parameter estimation.These methods have proven effective in handling multi-temporal and multi-sensor remote sensing data, providing valuable information for environmental monitoring [14,[50][51][52][53].
Ensemble decision-tree-derived classifiers are well-known algorithms for classifying tasks with remote sensing data [54][55][56].These algorithms include bagging [57], boosting [58,59], and random forest (RF) techniques [60].The RF approach was used in a variety of applications ranging from land cover classification [61][62][63][64][65][66] to data fusion [7,67] classification tasks using hyperspectral data [68,69].Random forest involves bagging, creating an ensemble of decision trees by randomly selecting samples and features from the training data.By combining multiple decision trees, RF classifiers can provide robust predictions while offering variable importance (VI) measurements and are often used in remote sensing applications [70].This feature selection method allows RF to effectively rank and eliminate irrelevant features, reducing dimensionality and identifying the most significant remote sensing and geographic data that offer new insights into the Earth system [49,71].The selective feature choice in RF is particularly beneficial as it prevents overfitting, enhances generalization, and reduces computational load and redundancy.Despite these advantages, accurately selecting discriminatory variables from high-dimensional remote sensing data remains challenging [72], and the selection of training data may influence the results [73].
Similar to RF, boosting approaches such as the Extreme Gradient Boosting (XGBoost) method also utilize decision trees as base learners but take the process further by combining the strengths of individual trees in a boosting technique [74].This iterative process sequentially creates decision trees, with each subsequent tree focused on correcting the errors of its predecessors.This approach helps XGBoost achieve low bias and variance, ultimately improving classification.An advantage of XGBoost in remote sensing data classification is its ability to handle cases where different classes (e.g., algal bloom species) exhibit similar spectral signatures but may have varying concentrations or distributions [75].To ensure optimal accuracy and prevent overfitting, XGBoost employs hyper-parameter tuning techniques.
Another conventional technique is Support Vector Machines (SVMs) that categorize data by discovering high-dimensional hyperplanes that effectively separate distinct classes, leading to improved data generalization and better image classification [76].These machines handle challenges like non-linearity and dimensionality by utilizing the kernel trick, which involves mapping input data into higher-dimensional spaces and relies on a subset of training data, referred to as support vectors, to establish decision boundaries.By leveraging kernel functions, SVMs transform input data, enabling the identification of hyperplanes in expanded dimensions and effectively accommodating scenarios in which original feature separability is limited [77].Notably, SVMs incorporate a flexible soft margin approach, allowing for a degree of misclassification tolerance [78].

Deep Learning in Remote Sensing
Deep learning, a subfield of machine learning, has emerged as a valuable tool in remote sensing, offering solutions to unprecedented challenges and creating new opportunities in remote sensing applications [53,[79][80][81].Deep learning utilizes hierarchical artificial neural networks to identify patterns within data and extract valuable features from large and complex datasets [82].During training, the network adjusts weights and biases through a process known as backpropagation, enhancing its ability to recognize patterns and relationships as it processes more data.Deep learning networks gradually transform the data into representations suitable for specific tasks such as image preprocessing, object recognition, and pixel-based classification [83].This section lists and briefly introduces some common deep learning algorithms.

Deep Convolutional Neural Networks (DCNNs)
Deep Convolutional Neural Networks, DCNNs, utilize a multi-layer architecture effective for image recognition and classification tasks [84,85].The architecture of DCNNs consists of multiple layers, in which the initial layers, known as convolutional layers, play a fundamental role in detecting low-level features within the input image (Figure 7).They achieve this by applying convolutional filters, also called kernels, to the image.These filters effectively act as feature detectors, focusing on edges, corners, and other basic patterns that characterize the image, helping identify simple shapes and textures in the scene.A non-linear activation function, Rectified Linear Unit, ReLU [86], is applied after each convolutional operation to introduce non-linearity and enable the learning of more intricate patterns.Following the convolutional layers, pooling layers are utilized to reduce the spatial dimensions of the data while retaining the essential information.Pooling achieves this downsampling by aggregating information from neighboring pixels and introducing the ability to detect certain features regardless of their spatial position within the image.The convolution and pooling process is typically repeated multiple times to allow the network to learn higher-level features and representations progressively.As the network goes deeper into its layers, it can capture increasingly abstract and sophisticated features essential for recognizing complex objects or patterns.The last fully connected layer of the DCNN generates probabilities associated with the different classes of objects, with the softmax activation function ensuring that the class probabilities sum up to one.This final classification step enables the network to recognize and categorize objects present in the remote sensing image accurately [87].An activation function can be defined as setting all negative values to zero and leaving positive values unchanged.
It is a simple activation function that is computationally efficient to compute and helps alleviate the vanishing gradient problem, which can occur during backpropagation in DCNN.It is worth noting that ReLU is not without its limitations.One issue is the "dying ReLU" problem, where neurons can become "stuck" during training and become inactive, resulting in zero activations that prevent learning.To address this, variants like Leaky ReLU [88] and Parametric ReLU [89] have been introduced.While Figure 7 illustrates a basic DCNN architecture as an example, recent years have seen the evolution of more specialized architectures for specific applications.Notably, U-Net [90] and SegNet [91] are tailored for semantic segmentation tasks in images.U-Net features a contracting path with repeated 3 × 3 convolutions, ReLU activations, and 2 × 2 max pooling for feature extraction, followed by an expansive path for upsampling and generating detailed segmentation masks.On a similar note, SegNet focuses on pixel-wise image labeling.It comprises an encoder network akin to VGG16's convolutional layers, a decoder network for low-to-full resolution feature mapping, and a pixel-wise classification layer.Further, along the timeline, AlexNet [92] ushered in a new era for DCNNs with its multi-layered architecture, employing convolution, max pooling, and Local Response Normalization (LRN) to process image features.VGG introduces depth with its 3 × 3 convolutional kernel, leading to VGG16 and VGG19 models known for their accuracy.The Inception network, designed by Google, utilizes diverse kernel sizes for capturing features at varying scales, whereas DeepLab [93] harnesses DCNNs, atrous convolution, and CRFs for precise semantic segmentation, achieving high accuracy and efficiency.

Deep Residual Networks (ResNets)
In remote sensing, the need for deep neural networks arises due to the complexity of high-dimensional and noisy data caused by similar spectral characteristics of objects.However, neural networks are trained using a back-propagation process that relies on gradient descent, which decreases the loss function and finds the weights that minimize it.If there are too many layers, repeated multiplications will eventually reduce the gradient until it "disappears," and performance will plateau or deteriorate with each additional layer [94].To handle this issue, ResNets were introduced as a solution to this "degradation problem" in deep learning models [95,96].
ResNets introduce residual blocks or "skip connections" or "shortcut connections."These skip connections allow for the stacking of multiple identity mappings, which are essentially convolutional layers that initially do nothing.By bypassing and reusing the activations of the previous layer, the skip connections introduce a shortcut for the gradients to flow more directly during backpropagation.This helps to speed up the initial training phase by compressing the network into fewer layers.
The core difference of residual learning is the residual block and skip connections which are defined as where F is the residual mapping (sequence of convolutional layers), x is the input to the block, and y is the output.Residual blocks allow us to train much deeper neural networks bypassing one or more layers in between.'Shortcut projection', which is a 1 × 1 convolutional layer, denoted as P(x), is incorporated within the skip connection, allowing for dimension adjustment and alignment of the feature maps.Shortcut projection is represented as where P represents the 1 × 1 convolutional layer used for dimension adjustment.By ensuring that the information passed between layers is well-aligned and optimized, shortcut projection contributes to faster training convergence and more effective model learning.The initial training enables the model to establish a baseline data representation.Once this initial training is complete, all layers are expanded, and the remaining parts of the network, known as the residual parts, are allowed to explore more of the feature space of the input image.Through these techniques, ResNets address the vanishing gradient problem and facilitate the training of much deeper models, which can effectively capture and represent the complex and subtle patterns present in remote sensing imagery.

You Only Look Once (YOLO)
Algorithms for real-time object detection and segmentation in remote sensing images represent significant advancement with applications in the identification and classification of multiple objects within large datasets of images or video frames.The algorithm named YOLO (You Only Look Once) has gained popularity for its ability to process the entire image simultaneously using a Single Shot Detector and a CNN [97], initially leveraging the Darknet framework [98].Within YOLO, bounding boxes indicating the location, class, and confidence score of each detected object within the image are generated [99] (Figure 8).The confidence score produced by YOLO reflects both the likelihood of an object being present in the bounding box and the accuracy of the box itself and is used in the final detection process.Overlapping bounding boxes can still occur.To refine the results and ensure only the most accurate detections are retained, YOLO incorporated Non-Maximum Suppression (NMS), a technique that eliminates redundant bounding boxes by keeping only the one with the highest confidence score.YOLOv2 [100] improves the speed and the type of object detected, and YOLOv3 enables the prediction of objects of different sizes [101][102][103].

Faster Region-Based CNN (R-CNN)
Faster R-CNN is a two-step approach for object detection in remote sensing [106] based on two key modules: the Region Proposal Network (RPN) and the Fast R-CNN detector.The Fast R-CNN module is an upgrade of the previous R-CNN approach allowing simultaneous processing of the entire image and region proposals in a single forward propagation pass and also replacing the slower SVM-based classification with a softmax layer, increasing the processing speed while also improving detection accuracy [107].The RPN uses predefined bounding boxes of various scales and aspect ratios to determine areas of interest for the detector.The RPN operates by sliding a small network over the convolutional feature map, producing object proposals with corresponding objectness scores that undergo further processing through fully connected layers for box regression and box classification.This allows the model to refine the positions of the proposed bounding boxes and classify them accurately.

Self-Attention Methods
In remote sensing, approaches such as Recurrent Neural Networks (RNNs) face challenges related to capturing complex contextual dependencies when analyzing longer sequences of images.RNNs are well-suited for sequential data analysis, yet they encounter difficulties in effectively capturing the nuanced relationships between distant elements within extended sequences.This limitation can lead to a loss of important contextual information and hinder their performance on tasks involving long-range dependencies.To overcome this limitation, attention mechanisms have been designed to allow access to all elements in a sequence at each time step, facilitating a comprehensive understanding of dependencies and improving the handling of longer sequences.
The transformer architecture [108], originally developed for natural language processing, has played a key role in advancing attention mechanisms by introducing selfattention as a standalone mechanism.The model involves transforming feature maps into sequences of embeddings, which capture essential information from the input data.This capability is particularly valuable in modeling spatial and spectral dependencies in remote sensing imagery.By incorporating attention mechanisms, transformers can effectively learn and leverage the contextual and spatial relationships present in remote sensing data, making them highly suited for complex and high-dimensional data analysis [109].
The general formula for attention is where X is the input; Q is the query matrix obtained by linearly transforming the input embeddings: Q = XWQ; K is the key matrix obtained by linearly transforming the input embeddings: K= XWK; V is the value matrix obtained by linearly transforming the input embeddings: V = XWV; dK is the dimension of the key and query vectors; WQ, WV, and WK are learnable weight matrices for linear transformations.BERT (Bidirectional Encoder Representations from Transformers) is an example of a transformer-based model that has shown remarkable success in language representation learning tasks that captures bidirectional contextual information by considering both the left and right context in all layers [110].When applying BERT to remote sensing data, a specific approach can be followed as described by [111] regarding the hyperspectral imagery.The hyperspectral images (HSIs) are flattened and directly inputted into the BERT model for feature extraction, allowing the model to learn global dependencies among spectral bands.The addition of a multi-head self-attention (MHSA) mechanism accommodates diverse pixel relationships regardless of spatial distance, enabling the model to effectively capture long-range dependencies and complex relationships within the hyperspectral data.

Long Short-Term Memory, LSTM
LSTM, short for Long Short-Term Memory [112], is a type of recurrent neural network (RNN) that is commonly used for sequence modeling and time series analysis [113].The LSTM design aims to address the vanishing gradient problem in traditional RNNs, which can make it challenging to capture long-term dependencies in sequences [114].LSTMs receive an input sequence, which could be a sequence of sensor readings, or any other sequential data, with each element in the sequence representing a feature vector.At each time step, the LSTM network activates a series of gates: input gate, forget gate, and output gate, controlling the level of information allowed to enter, exit, or be retained, with the use of memory cell states and hidden states.The input gate takes the current input and the previous hidden state as inputs, and a sigmoid activation function for these inputs produces a value between 0 and 1 for each element in the feature vector.A selection process is then applied, with 1 being retention and 0 being elimination in the cell.A similar process occurs in the forget gate that decides which elements of the memory cell should be erased or forgotten.The memory cell is then updated based on the input from the input and the forget gates, allowing the LSTM to retain important information and discard irrelevant or redundant information.The output gate takes the current input and the updated hidden state from the previous time step and, similarly, determines which elements of the cell should be outputted.The hidden state is updated based on the output from the output gate and the updated memory cell, and the LSTM network can output a prediction based on the updated hidden state.This prediction can be used for various tasks such as sequence classification, sequence generation, or time series forecasting.

Other AI Methods in Remote Sensing
There is a growing interest in utilizing generative adversarial networks, GANs [115], in remote sensing applications [116,117].GANs are neural networks excelling in handling complex, high-dimensional data, even with limited or no annotated training data [118].
GANs consist of two networks, a generator and a discriminator, trained in competition.The generator produces fake images (forgeries) using random noise, which the discriminator evaluates alongside real images (Figure 9).Both networks train simultaneously and compete against each other.The generator learns from the discriminator's feedback, incorporating synthetic and real images through backpropagation, leveraging the discriminator's error signal.This iterative cycle enhances the generator's ability to produce higher-quality, more realistic images.The generator becomes proficient at deceiving the discriminator by refining the forgeries through successive iterations and feeding them back to the discriminator, completing the GAN training process [119].where G is the generator network, D is the discriminator network, z is random noise, x is a real sample, Pdata is true data distribution, and Pz is the prior distribution of the random noise vector.The generator and discriminator compete to outperform each other in a minmax game.
GANs have various applications in remote sensing, including image-to-image translation tasks like dehazing and removal of thin clouds.For this purpose, the CycleGAN [79] and its variants can be used to accomplish cloud-removal tasks [120].CycleGAN can be trained on datasets with image pairs with clouds and no clouds, with the goal of learning the mapping between the two sets of images.With the trained CycleGAN, clouds can be removed in new sets of images.CycleGAN consists of two generators and two discriminators, with each generator handling the forward and back translation between the image domains, while each discriminator distinguishes between real and synthetic images.During training, the generators aim to maximize the probability of the discriminators making mistakes while the discriminators strive to accomplish their tasks.Challenges related to applying this method for cloud removal tasks include a high percentage of cloud cover in the image or complex cloud shapes not seen in training datasets.
To enhance the resolution of low-resolution satellite images, the SRGAN (Super-Resolution Generative Adversarial Network) model can be utilized [121].Built on a ResNet, the generator learns to map low-resolution images to high-resolution counterparts.The discriminator's task is to differentiate between generated and real high-resolution images.
During training, the generator seeks to deceive the discriminator, while the discriminator aims to classify the images correctly [7].
For image-to-image translation tasks and other tasks such as image sharpening, classification, and others, the Pix2Pix GAN model can also be used [122].A series of other GAN-based algorithms, such as HRPGAN [123] and similar algorithms, can also be used for super-resolution, whereas MARTA GANs [124] can be used for data augmentation, PSGAN for pan-sharpening [125], and ES-CCGAN [126] and CLOUD-GAN [127] based on CycleGAN for dehazing and cloud removal [118].
Deep Reinforcement Learning (DRL) offers advantages in remote sensing, such as learning from unlabeled data and improving decision-making processes [128].DRL combines reinforcement learning (RL) techniques with deep neural networks to create a powerful framework for solving complex problems.RL involves an agent interacting with an environment to maximize cumulative rewards, while deep neural networks approximate optimal policies.The agent observes the environment's state, takes an action, and receives a reward based on the action.The agent updates its policy using the reward signal and transitions to a new state, aiming to maximize cumulative reward over time.Deep neural networks serve as function approximators, capturing complex relationships between states and actions and generalizing to new situations [129].
An example of DRL in remote sensing is unsupervised band selection in hyperspectral image classification [130], specifically using a deep Q-network, DQN [131].The currently selected bands represent the state by formulating the problem as a Markov decision process, MDP [132], and adding the next band is considered the action.The DQN learns a band-selection policy by maximizing the reward signal based on classification accuracy from the selected bands.Training involves normalized spectral signatures and reward signals, updating DQN weights with batches of these data.The learned policy is evaluated on unseen datasets to assess generalization and accuracy, demonstrating its superiority over other methods.Adjustments to DQN parameters, such as layer count, neuron count, and learning rate, can further enhance accuracy and consistency.This model is suitable for remote sensing image processing applications that analyze large amounts of data, overcoming challenges related to limited labeled samples and redundant spectral information.
Each technique offers unique benefits and is suited for specific tasks in remote sensing, enabling researchers and practitioners to choose the most appropriate approach based on their data and objectives.Table 2 provides an overview of the key AI techniques in remote sensing, highlighting their advantages, limitations, and applications.

Land Cover Mapping
AI techniques have been widely used in mapping tasks for assigning labels to individual image pixels and allowing for the categorization based on different spectral and spatial features, providing valuable information about the distribution and characteristics of land cover types in a specific area [133][134][135] (Figure 10).As a practical example, the Environmental Systems Research Institute (Esri) has recently released a high-resolution (10 m) annual global land cover map (2017-2022), which was created using a full CNN with a U-Net architecture developed using Impact Observatory [136].To train this model, a massive training dataset of over five billion labeled image pixels was utilized, generously provided by the National Geographic Society.The map-making process involved utilizing the comprehensive coverage and high spatial resolution of the European Space Agency's (ESA) Sentinel-2 satellite imagery.Creating the map entailed running the AI model on an extensive collection of approximately 400,000 Earth observations of Land Use/Land Cover, LULC [137], of around 500 terabytes of cached imagery.The model incorporated six Sentinel-2 surface reflectance bands and generated ten land cover classes, including water, trees, grass, crops, and built areas.To achieve a comprehensive depiction of land cover, the final map was created by compositing the outputs of the model applied to multiple dates of imagery throughout the year, offering a comprehensive depiction of land cover.The computation process required approximately 1.2 million core hours to handle the immense computational load, with Microsoft Azure Batch expediting the processing time, with up to 6400 cores running simultaneously.

Earth Surface Object Detection
SpaceKnow's GEMSTONE (Global Economy Monitoring System Delivering Transparency and Online Expertise) project aims to develop advanced ML algorithms that utilize satellite data for monitoring global economic activity [138].These algorithms combine spectral unmixing and deep neural networks (DNNs) to detect [139] raw materials and manufactured structures, enabling comprehensive monitoring.Spectral unmixing involves analyzing the spectral properties of satellite imagery to identify and differentiate specific materials of interest, whereas DNNs classify and distinguish these detected materials, ensuring accurate and high-quality results.These algorithms are deployed in carefully selected locations, and the analysis outputs are aggregated into specific economic indices.Users can access this information via a user-friendly dashboard or an API (Application Programming Interface), allowing seamless integration into their organizations' workflows.The effectiveness of these algorithms has been demonstrated via case studies such as the Nagoya Port Analysis, in which various elements, such as oil tanks, were detected and tracked [140] over time, providing valuable insights into the port's activity.A road algorithm successfully monitored the expansion of the road network in Zayed City, Abu Dhabi, showcasing its potential for large-scale monitoring of urbanization and road development.

Multisource Data Fusion and Integration
Integrating information from various remote sensing techniques can provide a comprehensive understanding of objects or phenomena.This process involves collecting data from diverse sources, ensuring accurate data registration and co-registration, integrating correlated measurements, and estimating desired object attributes or identities [141].For instance, the European Space Agency (ESA) utilizes AI and satellite data to tackle surveying water pipe networks, detecting leaks, and identifying new water sources.Access to clean drinking water and reducing water pipe leaks are significant concerns in regions dealing with water scarcity, both in developing and developed nations.To handle this, ESA has developed a service catering to the needs of governments, water utilities, charities, non-profits, and NGOs operating in these areas.The service merges neural networks with multi-spectral and synthetic aperture radar satellite data, particularly ESA Sentinel 1 and 2 data.Neural networks recognize water's spectral and backscatter signatures, indicating moisture.This enables comprehensive surveys to locate underground water sources and identify pipe network leaks.As a result, a detailed map of Earth's sub-surface water has been created, boasting a spatial resolution of 10 square meters [142].This map encompasses over 1.5 trillion [142] satellite tiles and stores vast amounts of data.ESA has also launched a free underground water mapping service called SpaceWater.AI, with the support of Esri, Nvidia, and Amazon Web Services.Pilot users, such as the United Nations High Commissioner for Refugees (UNHCR) and WaterAid, are already benefiting from this service.The accuracy of identifying underground water sources reaches a maximum peak of up to 98% [142], although it may vary based on geographic and environmental conditions.
Additionally, ESA has also developed the Total Ecosystem Management of the Inter-Tidal Habitat (TEMITH) project [143], led by the University of Southampton, to monitor Solent's intertidal habitat on England's south coast using Earth Observation (EO) data.This project focuses on two pressures: algal mats and sediment disturbance.Gathering and preparing data involve multiple steps.Satellite data from various sources, including in situ datasets, are used to select collection dates and locations.For feature detection, two sensors are used as follows: Copernicus Sentinel-2 (10 m resolution) and the high-resolution MAXAR (0.31 to 0.61 m).Imagery is captured within a 4 week timeframe, extending to 8 weeks if needed, preferably during low tide and cloud-free conditions.Sediment disturbance detection uses mapped polygon datasets for model training, supplemented by drone imagery, aerial photography, and high-resolution satellite imagery for additional labeling.The labeling process considers scarring morphology and context, selecting highconfidence polygons for model training.Similarly, mapped polygon datasets for algal mats, seagrass, and salt marsh detection come from diverse sources, including the Environment Agency, Hampshire and Isle Wight Wildlife Trust, Natural England, and the Channel Coastal Observatory.Dataset selection is based on suitability and compatibility with available satellite imagery, aiming for a match within two weeks of data collection.Prioritizing Sentinel-2 imagery known for cloud-free, low-tide images, enhances feature visibility.The project trains three ResU-Net models and six U-Net CNN models.These models identify indicators like nutrient enrichment, seagrass presence, and salt marsh presence, targeting sediment disturbance and algal mats.

Three-dimensional and Invisible Object Extraction
Remote sensing data are the primary source for extracting valuable information about the 3D structures and spectral characteristics of objects [144].Two key types of data used in remote sensing are LiDAR data and hyperspectral data.LiDAR data provide detailed information on object heights and shapes within a surveyed area, whereas hyperspectral data capture the electromagnetic spectrum reflected or emitted by objects, allowing for the identification and analysis of different materials based on their unique spectral characteristics.However, both data types face challenges, such as spectral redundancy, low spatial resolution for hyperspectral data, and the presence of high-and low-frequency information in LiDAR data.
Startups like Enview have introduced a Web-based AI service specifically designed for LiDAR data analysis.By utilizing CNNs, Enview enables the automated identification of physical objects within 3D point clouds, including power lines, pipelines, buildings, trees, and vehicles.This technology is particularly beneficial for companies in the electricity and natural gas distribution sector, streamlining object identification through the segmentation and classification of LiDAR data.Enview's AI technology has already delivered significant cost savings by automating power line inspection [145].
In the realm of HSI, Metaspectral, a Vancouver-based company, has developed an AI platform that combines HSI and edge computing to revolutionize various industries.The platform incorporates data compression techniques and deep neural networks and supports various neural architectures.By reducing data streams without compromising information, the platform enables real time, pixel-by-pixel analysis of hyperspectral data.Metaspectral's AI platform finds applications in space exploration, recycling, and agriculture.The Canadian Space Agency utilizes this technology to measure greenhouse gas levels on Earth.In recycling, the system accurately classifies plastics by analyzing their chemical structures, enhancing the recycling process.In agriculture, the early detection of diseases is made possible by identifying specific spectral signatures associated with plant diseases, allowing for timely interventions.Additionally, the platform aids in climate change mitigation efforts by detecting and analyzing wildfire risks through hyperspectral analysis, facilitating proactive measures like controlled burns [146,147].

Existing Challenges
This section will discuss the challenges and limitations of AI in remote sensing [14], with potential solutions and advancements for overcoming these challenges.

Data Availability
AI training data are often sourced from satellites, aerial sensors, or ground-based instruments.However, these valuable data are not always readily accessible to researchers, scientists, or organizations.Some datasets may be restricted due to proprietary rights or controlled by government agencies, limiting their availability for broader use.Additionally, certain remote sensing datasets have limited temporal coverage, making it challenging to assess interannual and decadal variability [148,149].Consequently, the limited access to remote sensing data can impede the development and application of AI in this field.
To effectively train AI models, a significant amount of labeled data is required to teach algorithms to recognize and interpret specific features and patterns in remote sensing data.However, creating labeled datasets can be a time-consuming task that demands expertise [150].The availability of accurately labeled data is essential to achieve reliable results when training AI models.Real-time or frequent updates of remote sensing data are crucial for monitoring and analyzing dynamic environmental conditions and changes [151].However, the availability of such timely data can be limited, especially in certain regions or for specific types of data.This limitation can undermine the effectiveness of AI applications in remote sensing, as models trained on outdated or infrequent data may need to represent current conditions accurately.Overcoming the challenge of data availability in remote sensing requires collaborative efforts to improve data sharing and access [152].The initiatives that promote open data policies, data-sharing platforms, and partnerships between organizations can facilitate greater availability of remote sensing data.Collaborating with space agencies, government organizations, and private entities can also expand access to the necessary data for training and implementing AI models in remote sensing applications [153].

Training Optimization
Achieving optimal performance of AI models in remote sensing demands careful consideration and a solid grasp of mathematics.Selecting suitable loss functions is important in guiding models toward improved accuracy.For instance, cross-entropy loss is commonly employed for land cover classification, whereas mean squared error (MSE) loss is preferable for regression tasks [154].Imbalanced datasets can pose a significant challenge during model optimization when certain classes are rare or underrepresented.In these conditions, the model may exhibit bias towards the majority class, resulting in poor performance for the minority classes [155].Optimizing complex models in remote sensing comes with its own set of challenges.Deep learning models like CNNs or RNNs possess numerous parameters and demand substantial computational resources for training [156].Algorithms such as stochastic gradient descent (SGD) and its variants, such as Adam or RMSprop, are commonly employed for parameter updates [157].Fine-tuning the learning rate, selecting appropriate batch sizes, and determining convergence criteria are critical steps in optimizing complex models.Additionally, hardware limitations can introduce training time and computational efficiency challenges.

Data Quality
The accuracy, reliability, and completeness of training data directly influence the model's performance and generalization capability [158].Obtaining accurate and reliable ground truth labels can be difficult due to limited ground-based observations, subjective interpretations, or human errors [159].For instance, mislabeling land cover classes or confusion between similar classes can greatly affect the training and performance of models in land cover classification.Different sources, sensors, or acquisition times result in variations in spatial resolution, spectral characteristics, or temporal patterns [160].These inconsistencies can introduce biases and complicate the training process.In time series analysis, inconsistent temporal sampling intervals or missing observations can hinder the model's ability to capture temporal patterns accurately [161].

Uncertainty
Uncertainty arises in remote sensing data from various sources, including atmospheric conditions, sensor limitations, data acquisition techniques, and natural variability, caused by factors like clouds, haze, or aerosols, resulting in incomplete or distorted remote sensing data [162].Sensor characteristics and calibration also contribute to uncertainty [163].AI models trained on static datasets may need adjustments to adapt to these dynamic variations and may not generalize well to different locations or periods.Temporal and spatial variability of natural phenomena also will further contribute to uncertainty in remote sensing-based AI models [164].

Model Interpretability
Interpretability ensures the trustworthiness and validation of AI model outputs [165] and becomes especially important in sensitive applications like environmental monitoring [166] or disaster response, where transparency and accountability are crucial.However, AI models, particularly complex deep learning models, often function as black boxes, making it difficult to understand or explain their internal mechanisms and decision-making processes [167].Efforts are being made to address the interpretability of AI models in remote sensing [168].Techniques such as model explainability, feature importance analysis, or visualization methods can help shed light on the reasoning behind the model's predictions [169].

Diversity
Evaluating and validating AI models on diverse and independent datasets are critical steps to assess their generalization ability.To ensure consistent and reliable performance in real-world applications, it is essential to test the models across different geographic regions, seasons, sensor types, and environmental conditions.However, one of the main challenges lies in the availability of diverse and representative training data [6].Currently, various techniques are employed to address the data availability challenges.Data augmentation generates additional training examples by applying transformations, such as rotation, scaling, or noise addition, to the existing data [170].This technique exposes the model to broader variations, enhancing its ability to generalize to unseen data.Another common approach is transfer learning, where pre-trained models trained on large-scale datasets like ImageNet serve as a starting point [171].By fine-tuning these pre-trained models on a smaller remote sensing dataset, the models can leverage their acquired knowledge and adapt it to the specific task.Ensemble methods also contribute to diversity and generalization [172] by combining multiple individual models, each trained with different algorithms or variations of the training data.
While progress has been made in these areas, there are still unresolved aspects that researchers are actively working on.Ensuring that the training dataset is representative of the target population or the real-world distribution of data presents a significant challenge, and collecting a representative dataset that covers all possible variations, particularly in remote sensing, where data can be scarce or costly to obtain, is demanding [173].Developing effective techniques to adapt pre-trained models to remote sensing-specific features and variations remains an ongoing research area.
Remote sensing applications often involve detecting and analyzing rare or complex events [174], such as natural disasters or occurrences of rare species.AI models trained on standard datasets may have yet to encounter such events during training, posing challenges in generalizing these scenarios.Research efforts are focused on developing techniques to handle these rare events and improve the generalization capabilities of AI models.For example, IBM and NASA have collaboratively introduced the largest geospatial AI foundation model, named watsonx.ai,in partnership with Hugging Face.This model utilizes NASA's satellite data, specifically Harmonized Landsat Sentinel-2 (HLS) data, to revolutionize Earth observation and advance climate science.This joint initiative aims to democratize AI access, particularly in addressing evolving environmental conditions.The geospatial model is accessible on Hugging Face's open-source platform, showcasing its commitment to open AI and science.It stands out as the first open-source AI foundation model developed in collaboration with NASA.This partnership emphasizes the potential of open-source technologies in deepening our understanding of Earth's climate and environment.The watsonx.ai model excels in tasks such as flood and burns scar mapping, demonstrating a 15 percent enhancement over existing techniques.IBM's expertise in AI and NASA's Earth-satellite data contribute to the model's accuracy and effectiveness.The collaboration resonates with NASA's Open Source Science Initiative and IBM's broader efforts in AI advancement.Moreover, this geospatial model holds potential beyond its current applications.It could be adapted for tasks such as deforestation tracking, crop yield prediction, and greenhouse gas monitoring.IBM's Environmental Intelligence Suite will soon feature a commercial version of the model [175].Another common issue is the perpetuation of biases and inequities when AI models are trained on biased or unrepresentative data [56,176].

Integrity and Security
Biases or inaccuracies in the training data can result in biased or unreliable AI predictions, which can have consequences in real-world applications [177].To maintain integrity, it is essential to prioritize transparency, fairness, and accountability throughout the AI model development and training processes [178].By adhering to these principles, the integrity of the AI system can be upheld, instilling trust in its outcomes and promoting ethical practices.As discussed above, maintaining integrity in remote sensing data involves multiple aspects, including data quality, data integrity, and the prevention of tampering or manipulation [179].Protecting data integrity entails safeguarding the data from unauthorized modifications, tampering, or cyberattacks.Remote sensing data can be vulnerable to malicious actions, such as data breaches or unauthorized [180].One concern is the potential compromise of personal privacy through detailed imagery capturing identifiable features or activities.To address this, robust encryption protocols [181] and secure communication channels should be implemented while transmitting remote sensing data [182].Additionally, secure storage systems, including servers or cloud platforms equipped with access controls and encryption mechanisms, are essential for protecting the data from unauthorized access.Privacy regulations, such as GDPR, impose strict data handling, storage, and sharing requirements [183].

Ongoing and Future Practical AI Applications in Remote Sensing
This section explores ongoing and potential ideas that can advance practical AI applications.The workaround for these ideas may already be in progress, and some may inspire future applications with transformative impacts on environmental management.

Wildfire Detection and Management
The application of AI in wildfire management is increasing steadily [184], using advanced algorithms and remote sensing technologies to enable early detection and rapid response.AI systems analyze data from satellites, drones [185], and sensors to track wildfires in real time and predict fire behavior accurately by considering historical fire data, weather patterns, and topographical information.This data-driven approach enhances firefighting efficiency and reduces the impact of wildfires on communities and ecosystems.
AI's benefit lies in its capacity to handle large-scale data analysis [186] and pattern recognition, identifying hidden correlations in historical fire data, weather, and other relevant factors.AI-powered drones equipped with thermal imaging cameras can swiftly detect fires, leading to quicker response times and reduced costs.The Prometheus system developed by ESA uses AI and satellite data to predict wildfire behavior.Successful AI integration in wildfire management relies on a network of sensors collecting real-time data on fire occurrence, weather, and environment, fed into AI algorithms for analysis.Advanced ML techniques, like deep learning and neural networks, train AI models on vast datasets to enhance accuracy.To harness AI's potential, investments in infrastructure, communication networks, and technology are necessary.Though initial costs may be significant, benefits include reduced damages, improved response times, and enhanced firefighter safety.As AI systems become more sophisticated, their seamless integration into wildfire management practices will drive automation and efficiency.

Illegal Logging and Deforestation Monitoring
By analyzing satellite and drone imagery, AI can detect changes in forest cover, logging patterns, and illegal encroachments.This information can be used to track deforestation and identify areas that need protection.To revolutionize deforestation monitoring, AI with satellite imagery helps detect changes in forest cover and detect illegal logging in real time.The implementation involves effectively utilizing technologies like the Google Earth Engine (GEE) [187] and employing advanced AI algorithms.Satellite imagery data are collected from the different sources of remote sensing technology on changes in forest cover, which are then subjected to data cleaning and organization during the pre-processing stage of an AI model.The algorithms are then applied to analyze the data and identify patterns in illegal logging activities in a particular geographical area, which helps in decision making, ultimately leading to concrete actions against deforestation and holding illegal loggers accountable.As AI technology advances, we anticipate developing even more innovative and efficient applications for protecting our forests [188,189].A notable example of this approach is Global Forest Watch (GFW), which utilizes satellite imagery and advanced algorithms to monitor deforestation globally, alerting governments, NGOs, and stakeholders.

Coastal and Marine Ecosystem Monitoring
To protect coastal and marine ecosystems, AI can detect changes in coral reefs [190], identify marine pollution, track marine species, and support the sustainable management of coastal resources (Figure 11).One noteworthy trend in marine research involves using image recognition algorithms to analyze photographs or videos of marine environments.These algorithms can identify organisms or objects of interest, making them valuable tools for monitoring changes in animal populations and pinpointing areas where human activities are causing ecological damage.ML algorithms can also analyze underwater sounds [191].Understanding underwater soundscapes can be complex, but specific sounds can be recognized and distinguished from background noise with ML.This capability allows researchers and managers to monitor changes in ecosystem dynamics and gain valuable insights into the evolution of marine ecosystems [192].In marine research [193], computer vision techniques can be used to analyze high-definition (HD) digital camera photo sequences captured by fixed underwater stations, Autonomous Underwater Vehicles (AUVs), and Remotely Operated Vehicles (ROVs) across various oceanic regions.This technology facilitates the identification of areas with potential fish activity in their natural habitat, providing details such as the number of fish, species composition, and abundance in different locations.

Biodiversity Conservation and Habitat Monitoring
Advanced image analysis techniques, such as object detection and classification, can offer valuable insights to identify and monitor habitats, track species populations, and assess ecological connectivity, thereby enhancing the accuracy and efficiency of biodiversity monitoring [194].AI helps improve the conservation and sustainable use of biological and ecosystem values [195].GEE, which integrates AI for geospatial data analysis, can be used to process large amounts of satellite imagery and other remote sensing data [187].Imagine deploying AI-powered cameras that can automatically recognize and count species in remote areas and generate real-time data on population trends and distribution.This information becomes invaluable in guiding conservation efforts and assessing the progress of restoration projects.Another trend is AI applications that analyze extensive scientific literature, news articles, and social media posts [196] related to biodiversity and environmental issues.By extracting relevant information, identifying patterns, and detecting trends, NLP algorithms enable researchers and policymakers to stay updated on the latest developments in the field.

Airborne Disease Monitoring and Forecasting
The future of using AI and remote sensing envisions a proactive and data-driven approach to public health, and we might detect outbreaks early, respond rapidly, and implement targeted interventions [197].By monitoring various indicators, such as air quality [198], weather patterns, and population density, AI can identify potential hotspots and areas at risk.Remote sensing technologies equipped with AI-enabled sensors can provide real-time surveillance of disease-prone areas [199].Drones, for example, can collect data on air quality [200], temperature, and humidity, whereas satellites can capture high-resolution imagery.AI models trained on historical data, combined with remote sensing inputs, can generate accurate disease forecasting models.By analyzing factors such as environmental conditions, population movement, and social interactions, these models can predict the future spread of airborne diseases, informing public health agencies to prepare resources, implement preventive measures, and allocate healthcare facilities in advance, minimizing the impact of outbreaks.AI can also be used to detect and diagnose diseases early [201].
AI and remote sensing can aid in risk assessment by analyzing various factors that contribute to disease transmission, including air pollution levels, urbanization patterns, and human mobility.By understanding the risk factors associated with specific areas or populations, public health authorities can develop targeted strategies for prevention, allocate resources efficiently, and prioritize interventions where they are most needed.AIpowered systems can also play a role in raising public awareness and educating communities about airborne diseases.Through real-time data visualization, interactive maps, and user-friendly interfaces, individuals can access information about disease prevalence, preventive measures, and local resources.

Precision Forestry
The combination of AI, LiDAR [202], and hyperspectral imagery provides detailed information on forest structure, biomass, and species composition, promoting sustainable and efficient forestry management [203].Advanced thermal imaging techniques detect subtle temperature changes in trees as early indicators of pest infestation or disease outbreaks, and temperature variations can enable the detection of changes even before visible symptoms appear.Additionally, non-invasive acoustic sensors provide continuous monitoring and real-time insights into tree health and growth dynamics.By detecting anomalies such as wind-induced stress or structural weaknesses, these sensors assist forest managers in promptly dealing with potential issues [204].
Additionally, short-range remote sensing technology captures data that aid in visualizing various artifacts on tree trunks, providing valuable insights into their current and future health status [205].For detecting tassels in RGB imagery acquired by unmanned aerial vehicles (UAVs), an algorithm, YOLOv5-tassel, is used, and it has significant potential in precision agriculture [206].Incorporating AI algorithms significantly increases the probability of identifying these artifacts.This technological integration enables accurate measurement of tree characteristics and quality, whether the trees are standing or lying, facilitating an understanding of tree health and informed decision making in forestry management practices.

Urban Heat Island Mitigation
For identifying heat patterns, vegetation cover [207], and surface materials, AI can help urban planners optimize green infrastructure, develop heat mitigation strategies, and improve urban liveability.By integrating AI with satellite remote sensing and urban sensor network data, an integrated framework can provide accurate predictions of the urban heat island phenomenon [208], offering spatiotemporal granularity.This predictive capability is valuable for forecasting UHI (Figure 12) at specific times, facilitating the development of mitigation strategies, and formulating relevant policies to counteract its effects [209].AI algorithms can analyze various contributing factors, including land use type, urban morphology, and anthropogenic heat emissions, which contribute to the formation of heat islands.Leveraging this knowledge, geospatial and AI-based models can predict the impacts of different urban design and mitigation strategies on local temperatures, informing urban planners and decision makers to make informed choices and implement tailored strategies to combat urban heat islands based on the unique characteristics [210].

Precision Water Management
Integrating weather patterns and soil conditions with AI systems can yield accurate irrigation recommendations, predict crop water stress, and facilitate resource allocation, enhancing water use efficiency and conservation.In water management applications, particularly in extracting water bodies from remote sensing images, neural network architectures can be employed for semantic segmentation [211][212][213].Furthermore, AI algorithms offer promising opportunities to develop digital image classification methods, specifically for assessing water usage in irrigation.These methods utilize multi-temporal image data from remote sensing systems such as Landsat and Sentinel-2 to generate comprehensive crop maps encompassing various growing seasons.These emerging technologies enable cost-effective and accurate mapping of irrigated crops, facilitating effective water resource management [214].Moreover, Adaptive Intelligent Dynamic Water Resource Planning (AIDWRP) could be employed to sustain the urban areas' water environment [215].The utilization of Big Data and ML technologies also holds the potential to impact many facets of environment and water management [216].

Disaster Resilience Planning
By assessing the exposure and susceptibility of critical infrastructure and communities, AI-powered remote sensing can support the development of effective disaster response plans, early warning systems, and resilient urban designs [217].It guides individuals during disasters, offering real-time evacuation information, shelter locations, and critical details of the affected areas [218].AI-enhanced remote sensing services and products can enhance disaster preparedness awareness, assisting emergency agencies in evacuations and resource deliveries.Urban Resilience.AI Lab researchers use big data for AI models, crucial for mitigation, preparedness, and recovery (Urban Resilience.AI Lab).Predictive analytics anticipate evacuations using seismic and weather data while combining satellite images, seismometers, and social media verifies disasters for faster responses (AI for Disaster Response, AIRD).AI evaluates the damage, allocates resources, and prioritizes recovery efforts using satellite imagery [219].Additionally, AI assesses pre-disaster vulnerability, utilizing remote sensing data to identify high-risk areas [220].These advancements enhance disaster readiness, minimizing impacts on communities [221].

Conclusions
The integration of AI techniques in remote sensing has emerged as a powerful paradigm with tremendous potential for practical applications.This convergence creates exciting opportunities to advance our understanding of Earth's dynamics, support decisionmaking processes, and foster sustainable development.This review paper provides a comprehensive overview of the current state of AI in remote sensing, emphasizing its significance and impact.This paper covers the fundamentals of remote sensing technologies, including optical remote sensing, radar remote sensing, LiDAR, thermal remote sensing, and multispectral/HSI.It delves into key AI techniques used in remote sensing, such as conventional ML and deep learning, including DCNNs, ResNets, YOLO, Faster R-CNN, and self-attention methods.Various practical applications of AI in remote sensing are discussed in this paper, including image classification and land cover mapping, object detection and change detection, data fusion and integration, and hyperspectral/LiDAR data analysis.These applications showcase the effectiveness of AI in enhancing data analysis, improving accuracy, and automating processes.The paper also identifies several challenges: data availability, training optimization, data quality, security of sensitive remote sensing data, uncertainty in real-world scenarios, integrity, and diversity.Addressing these challenges requires further research and innovative solutions to ensure practical implementation.This paper outlines ongoing and potential applications, such as wildfire detection and management, illegal logging and deforestation monitoring, coastal and marine ecosystem monitoring, biodiversity conservation and habitat monitoring, airborne disease monitoring and forecasting, precision forestry, urban heat island mitigation, precision water management, and disaster resilience planning.Beyond these applications, there are even more possibilities, including precision agriculture optimization, renewable energy site selection, disaster management, early warning systems, and urban planning and infrastructure development.These envisioned applications highlight the transformative benefits of AI in addressing critical challenges and improving decision making in diverse fields, showcasing its potential to solve environmental and societal issues.

Figure 2 .
Figure 2. (a) Passive remote sensing: the sensor receives information.(b) Active remote sensing: the sensor emits and receives information.

Figure 3 .
Figure 3.The basic mechanism of optical remote sensing: sensors record information received as a function of wavelength and atmospheric conditions.

Figure 5 .
Figure 5. LiDAR sensor: detects objects at a distance D based on the speed of light, c, and the time between the light being emitted and being detected.Multiple returns assist in mapping objects with complex shapes.The yellow wave indicates multiple reflected returned rays, while the red-to-black gradient ray and the adjacent black wave represent the laser pulse.

Figure 7 .
Figure 7. Illustration of a basic DCCN architecture.The convolution is calculated using the following equation: [ , ] [ , ] [ , ] mn y i j x i m j n w m n b = + +  +  where [,] is the output feature map at position (, ),  is the input image, w is the filter, b is the bias term, and m and n are the indices of the filter.An activation function can be defined as

Figure 8 .
Figure 8. YOLO workflow: the output shows identified objects from the original image.Darknet has been replaced in later versions of YOLO by other frameworks.YOLO has further evolved through multiple versions, currently eight, with different updates, including changes in backbone architectures, the addition and then removal of anchors, and the use of PyTorch and PaddlePaddle frameworks, with the overall goal of balancing speed and accuracy for real-time object detection[104,105]

Figure 11 .
Figure 11.AI with remote sensors for coastal and marine ecosystem monitoring.

Table 1 .
Summary of various types of remote sensing techniques.

Table 2 .
AI models comparison table.