Swarm Exploration and Communications: A First Step towards Mutually-Aware Integration by Probabilistic Learning

: Swarm exploration by multi-agent systems relies on stable inter-agent communication. However, so far both exploration and communication have been mainly considered separately despite their strong inter-dependency in such systems. In this paper, we present the ﬁrst steps towards a framework that uniﬁes both of these realms by a “tight” integration. We propose to make exploration “communication-aware” and communication “exploration-aware” by using tools of probabilistic learning and semantic communication, thus enabling the coordination of knowledge and action in multi-agent systems. We anticipate that by a “tight” integration of the communication chain, the exploration strategy will balance the inference objective of the swarm with exploration-tailored, i


Introduction
In hazardous or inhospitable environments, exploration, and monitoring tasks impose high risks on human operators. Typical examples include emergency scenarios caused by nuclear or toxic accidents, as well as exploration scenarios in extraterrestrial environments [1,2]. Here, the use of mobile robotic systems is required. Cooperation in a multi-agent system, such as a swarm, is able to accelerate such reconnaissance missions or mapping tasks significantly [3]. An example of swarm exploration on an extraterrestrial surface, e.g., on Mars, is shown in Figure 1: Agents distribute and process sensed data along the arrows with the aim to reconstruct an unknown physical or chemical process u(δ, t) of interest at position δ and time t or relevant parameters of such processes in the domain Ω. For instance, a process of interest can be the spatio-temporal distribution of gas concentration. There, a relevant process parameter is the location of gas sources.
To achieve this goal, swarm exploration incorporates methods for distributed sensing, optimized (intelligent) information gathering [4], and agent movement/action coordination (exploitation). In particular, it requires the communication of locally and instantaneously available exploration measurements between agents. The underlying communication network acts as a data exchange backbone and is the tool that eventually enables the "diffusion" of local information to all agents and, hence, assists global decision-making. Communication is therefore always an integral part of a swarm exploration. Swarm exploration often considers reliable and error-free communications, i.e., ideal links. However, communication systems do add uncertainty to the exchanged information. This means that studies so far paint an optimistic picture of the exploration performance metric, e.g., the Mean Square Error, which indeed degrades with erroneous links, as illustrated in Figure 2. For instance, communication uncertainty needs to be considered when predicting new sampling positions for agents, since locations causing severe communication degradation will be useless for distributed information processing/exploration purposes. Likewise, communication systems are designed to aim for error-free transmission of measurements or processing results, but they are neither aware of their relevance for learning the entire explored process nor of the confidence in the data to be transmitted. Our key objective is hence to integrate the latter semantic understanding of the communicated messages into the communication and swarm system with respect to the overall exploration objective such that an exploration task can be completed with higher accuracy and/or higher speed. An integrated design could improve exploration performance towards that with ideal links, as sketched in Figure 2.
In conclusion, by using the framework of semantic communication [5][6][7][8][9] for the exchanged data among the agents, we avoid classical error-free digital transmission of data but open up the possibility for a swarm to "learn" a semantic understanding of the communicated messages with respect to the overall exploration objective. By integrating the physical layer directly into the exploration task and thus possibly excluding higher 1.
Modeling of the physical process by means of Factor Graphs (FGs) and design of ML-based "communication-aware" swarm exploration algorithms that follow active inference principles [10].

2.
Investigation and design of "exploration-aware" wireless communication methods and algorithms in the framework of ML. Here, we focus on the meaning, i.e., semantics, of the messages to be transmitted between robots instead of the raw data. To link exploration to communication, a promising approach is the framework of semantic communication [5][6][7][8][9] and in particular [11], as it may enable a tight integration.

State-of-the-Art
In the following section, we give a brief overview of state-of-the-art techniques in the areas of ML-based exploration and communications to highlight our contribution.

Distributed Multi-Agent Exploration
Distributed exploration requires cooperative computational techniques, which are also referred to as "in-network processing" [12]. The estimation is done such that each node conducts "local" computations and shares intermediate results with its neighboring nodes. The key to these computations is a decomposition of a network-global objective function into a sum of "local" sub-objectives, typically with additional constraints that ensure a network-wide convergence to a specific solution. A special class of such algorithms is called consensus-based algorithms, see, e.g., [13][14][15][16][17][18]. This class of algorithms enforces consensus over the whole network, i.e., each node converges to the same solution. Here, the Alternating Directions Method of Multipliers (ADMM) [19] has gained popularity for in-network processing due to its ability to handle different types of constraints on model parameters.
As an alternative, diffusion-based approaches (see e.g., [20][21][22] and references therein) have been proposed that estimate a quantity in a distributed fashion within a network without enforcing consensus. Such approaches are also based on solving an optimization problem that permits a decomposability of the network objective function. One of the applications of interest for swarm exploration is seismic imaging of subsurface structures. In particular, distributed subsurface imaging techniques based on the full waveform inversion and the traveltime tomography have been proposed recently that can be directly applied to decentralized multi-agent networks, s. [22,23]. Full waveform inversion is a highresolution geophysical imaging method based on the wave equation [24]. For a distributed implementation of this method, a global cost function is decomposed over the receivers and local gradients and subsurface images are computed. Following the diffusion-based information exchange, these gradients, and images are exchanged among the receivers in order to obtain a global estimate of the subsurface image.
For the exploration of complex physical processes that are described in terms of Partial Differential Equations (PDEs), classical approaches typically do not provide a direct assessment of statistical information about the quality of estimated parameters. In contrast, Bayesian inference methods postulate randomness of the parameters of interest and are from the domain of machine learning [25]. As such, instead of a point estimate, parameter distributions are computed. FGs can be used to describe probabilistic relationships between all model parameters [26] and parameter estimation is then realized using message passing schemes [27]. Bayesian tools have been used in the past for inverse PDE problems (see, e.g., [3,28]). In [3], the authors use FGs for inverse PDE modeling in a distributed setting and to localize gas sources based on concentration measurement samples. In essence, random variables are used to represent the gas concentration distribution in each mesh cell of the discretized PDE. An FG is then applied to capture temporal and spatial dependencies between concentration variables.
Having inferred the model parameters, one can then design a movement planning strategy that exploits the statistics of the estimated model parameters to optimally guide agents to new, more informative sampling locations to accelerate the exploration process. We categorize such strategies in the realm of exploitation. For example, methods of optimal experiment design [29] can be used for optimal selection of sampling locations; in [30] these were used for planning of optimal and safe trajectories for the movement of multiple agents. The work of [31] proposes information-driven approaches that guide agents based on mutual information or entropy. Furthermore, some swarm exploration approaches make use of (deep) Reinforcement Learning (RL) for the movement strategy of the agents [32][33][34]. However, the success of these methods relies heavily on the availability of suitable training data to learn an adequate movement strategy. Especially in applications with scarce training data, such approaches are likely to fail or perform unreliably in real environments: The use of synthetic training data introduces a model mismatch that is learned by the system. Furthermore, the learned behavior cannot be easily corrected a-posteriori due to the structure of the Deep Neural Network (DNN) that cannot be interpreted. Therefore, in our framework, we will focus on model-based approaches for exploration in scarce data regimes, as there are typically fewer parameters to infer as compared to purely datadriven methods.
All aforementioned methods for distributed exploration and path planning heavily rely on agent-to-agent communication of the exchanged data or messages. Hence, the quality of the inter-agent communication links has a direct impact on the exploration result. However, the majority of state-of-the-art methods for distributed exploration do not sufficiently take into account the erroneous nature of the communication links. Most studies consider erroneous inter-agent links by integrating noise and link failures into the link model, see, e.g., [35,36]. The algorithmic solutions are then adapted to these erroneous communication links.
So far, no framework has been proposed that unifies both communication and distributed exploration and optimizes both realms with respect to the overall exploration target. It is here, that we propose a framework for a joint design of the inter-agent communication and the exploration target in order to develop more robust and flexible algorithmic solutions for a distributed exploration.

Machine Learning for Communications
The probabilistic view often used in exploration is vital for the field of communications. With description by probabilistic models, we are able to connect exploration and communications closely. Since Claude Elwood Shannon laid the theoretical foundation of communications and information theory [37], probabilistic models have found their way not only into exploration but also into one prominent example of recent research interest: Artificial Intelligence (AI), in particular its subdomain Machine Learning (ML).
In the last decade, ML saw the emergence of powerful (probabilistic) models known as Deep Neural Networks (DNNs). Thanks to its ability to approximate arbitrarily well and to learn abstract features, it has led to several breakthroughs in research areas where there is no explicit domain knowledge but data to be collected, e.g., pattern recognition, generative modeling, and RL [38]. Previously considered intractable to optimize, automatic differentiation on dedicated Graphics Processing Units (GPUs) and innovative architectures now enable data-driven training of DNNs.
The impressive results showing equal or superhuman performance have not gone unnoticed by the communications community. Thus, much of the recent literature focuses on the data-driven design of the physical layer with DNNs, e.g., for wireless, molecular, and fiber-optical channels [38]. One prominent early example of such an approach is the Auto Encoder (AE) where a complete communication system is interpreted as one DNN and trained end-to-end [39].
In wireless communications, a number of channel models have been proposed and are widely used, so that key gains from using ML are expected in approximating optimal algorithmic structures that are otherwise numerically too complex (algorithm deficit) to be realized. For example, the computational complexity of Maximum A-Posteriori (MAP) decoding of large block-length codes or MAP detection, e.g., in massive Multiple Input Multiple Output (MIMO) systems, grows exponentially with code/system dimensions. In fact, e.g., using plain DNNs for decoding enables lowering of decoding complexity while approximately maintaining MAP error rate [40]. To improve generalization and reduce training complexity, more recent works focus on the idea of deep unfolding [41][42][43]. In deep unfolding, the parameters of a model-based iterative algorithm with a fixed number of iterations are untied and enriched with additional weights as well as nonlinearities. The resulting DNN can be optimized for performance improvements in MIMO detection [42,44] and belief propagation decoding [42]. An example of an algorithm deficit on a higher level beyond the physical layer is resource allocation, where it is difficult to analytically express the true objective function or to find the global optimum. Thus, Deep RL has proven to be a proper means [45].

Semantic Communication
In contrast to wireless channels, a model deficit holds for molecular and fiber-optical channels. Note that it applies in particular to the example of this article: integration of semantic context, here exploration, into communication system design. The idea of semantic communication emerged in the early 1950s [46][47][48] but has seen a lot of research interest only recently with the rise of ML application to the physical layer [5][6][7][8][9].
Its notion traces back to Weaver [46] who reviewed Shannon's information theory [37] in 1949 and amended considerations w.r.t. semantic content of messages. Oftentimes quoted is his statement that "there seem to be [communication] problems at three levels" [46]: A. How accurately can the symbols of communication be transmitted? (The technical problem). B. How precisely do the transmitted symbols convey the desired meaning? (The semantic problem).
C. How effectively does the received meaning affect conduct in the desired way? (The effectiveness problem).
Weaver saw the broad applicability of Shannon's theory back in 1949 and argued for the generality of the theory at Level A for all levels [49].
The generic model of Weaver was revisited by Bao, Basu et al. in [48,50] where the authors define semantic information sources and semantic channels. In [48], the authors consider joint semantic compression and channel coding at Level B with the classic transmission system, i.e., Level A, as the (semantic) channel. By this means, the authors can derive semantic counterparts of the source and channel coding theorems.
Recently, drawing inspiration from Weaver, Bao, Basu et al. [46,48,50] and enabled by the rise of ML in communications research, DNN-based natural language processing techniques, i.e., transformer networks, were introduced in AEs for the task of text and speech transmission [11,[51][52][53]. The aim of these techniques is to learn compressed hidden representations of the semantic content of sentences to improve communication efficiency, but exact recovery of the source (text) is the main objective. This leads to performance improvements in semantic metrics, especially at low Signal-to-Noise Ratio (SNR) compared to classical digital transmissions.
In summary, we note that both model and algorithm deficits are true for the open research topic of joint modeling or integrated design of communications and exploration. Therefore, we think that the capability of ML-based design to handle such model deficits is crucial to develop a first prototype of tight integration, as outlined in our main contribution. Further, the inherent flexibility of DNNs should allow for quick design.

Distributed Exploration Problem
In the following, we give a brief description of the exploration problem, which requires data exchange for a distributed solution and therefore motivates a unified framework with communications. Consider a multi-agent system (a swarm) of L autonomous mobile agents. Their objective is to learn model parameters of an unknown process u(δ, t) by taking samples of u(δ, t) at different locations and times (see Figure 1). Here, δ is a spatial coordinate vector and t is time. Exploration is understood as an inference of all (or some) relevant process parameters, such as positions of physical sources and material or medium parameters that cannot be directly observed from measurements. In this work, we assume that the process of interest u(δ, t) is represented by a PDE. Hence, the physical quantity at position δ and time t can be modeled by a function F(κ, δ, t) where κ is a parameter vector of the PDE and the function F computes the forward solution of the PDE for u(δ, t). For instance, in the wave equation the parameter κ can describe the spatial distribution of the wave velocity, in the diffusion equation it can be the location of diffusive sources. Then, the exploration problem can be formulated in generic terms as an inverse parameter estimation or optimization problem over all agents: where the variable κ l is the parameter estimate of κ at agent l. The function J l is the local cost at agent l, and usually evaluates a residual error between measured samples of u(δ, t) and estimated samples that are generated using the forward solution of the PDE F(κ l , δ, t).
For a consensus solution, one usually adds the constraint κ l = κ k , ∀l, k = 1, . . . , L. Doing so enforces convergence to the same parameter estimate for all agents in the network and results in iterative updates that require data from neighboring agents. Hence, solutions to the corresponding optimization problem naturally require agents to cooperate, which includes baseband/physical layer communication of processing results between the agents. As an example, for the distributed full waveform inversion proposed in [22] the local cost J l (κ l ) is the squared residual between the measured seismic response and synthetically generated seismic data based on the local model κ l . To evaluate this residual, synthetic seismic data needs to be generated, i.e., the wave equation needs to be solved in a forward manner by computing F(κ l , δ, t). Then, to enable a distributed estimation of the global model κ, gradients ∂J l (κ l )/∂κ l and models κ l are exchanged among connected agents and fused locally.
As mentioned earlier, the main objective of this study is the design of a joint framework that targets the exploration problem while considering the underlying imperfect communication conditions between the agents. To this end, we integrate both communications and exploration into a unifying framework by using FGs in a probabilistic setting.

Proposed Framework
As the main contribution of this article, we now describe in more detail a new design approach that considers communications and exploration jointly. To make the exploration "communication-aware" and communications "exploration-aware", we use tools of probabilistic learning and FGs [26,27].

Design Approach: Factor Graphs
We first illustrate how FGs can be used to solve inverse PDE problems in a distributed fashion. In [55], the PDE given by the diffusion equation has been modeled by a FG, and the parameter of interest, namely the position of a diffusive source, has been inferred by a message-passing algorithm over the FG. To enable a distributed implementation in a multi-agent network, the FG has been split over its connections such that each agent infers within a specific geometric region. An illustration of this approach is shown in Figure 3: In Figure 3a, the complete FG over a geometric region Ω of interest is shown that models the gas concentration at a specific position. The multiple layers of nodes describe the probabilistic relationship of the gas diffusion for the respective grid cell, while skin-colored nodes at the top indicate that measurements of gas concentration have been taken at the respective positions. Details regarding the probabilistic modeling are described in Section 5.1.1. To enable a distributed inference over this FG, the FG is split over several sub-graphs, as can be seen in Figure 3b. Each sub-graph is then assigned to one agent, which then infers the parameters of interest using approximative message passing schemes, see, e.g., [56]. The red arrows between the sub-graphs indicate where information between agents needs to be exchanged. Now, to connect both realms of exploration and communication with each other, we propose the use of FGs for each exploration and each communication block, respectively. This is illustrated in Figure 4: Each agent owns a local graph that is part of a global factor graph distributed over all agents, which is analogous to Figure 3b. The global FG models spatial and temporal interrelations of a PDE over the geometric area Ω. Hence, each agent itself is responsible for the inference of the PDE over a sub-region of Ω. In the local FG of each agent, a variable node s contains the quantity to be exchanged with neighboring agents. On each edge between two agents, data needs to be exchanged, and hence, a block representing the communication FG is placed between two connected agents. Since the communication block is also described by means of a FG (see [57] for an example), it can be readily placed between the FGs of two agents. It should be noted, that the specific structure of the FG depends on whether an agent is a transmitting or receiving agent. This is due to the fact that a variable node is always connected to a factor node and vice versa. With this approach, it remains to show how to specifically implement and integrate the communications FG.

Integration of Communication as Factor Node
To give a possible integration of the communications FG into the exploration FG, we consider the following: In the distributed setting, each agent implements a message passing algorithm over its local FG. In general, the inference on FGs requires the exchange of two types of messages: message m s→ f from a variable node s to a factor node f , and messages m f →s from a factor node f to a variable node s (see Figure 5). Now, we include the physical transmission of a variable-to-factor message through the incorporation of a factor node g into the FG. This factor represents the full communication chain between two agents. To properly account for the new factor, we modify the graph as shown in Figure 5a: A message m s→g is now sent from variable node s to the communication factor node g between Agent 1 and Agent 2. At the receiver side, the variableŝ is used to represent the received message. This variable is set to the belief of the received and decoded message m g→ŝ . Communication of the message m f →s is constructed similarly, yet this time the transmitter is augmented with a "copy" of a variable s that effectively belongs to the receiver FG graph of the Agent 2 (see Figure 5b). By designing the modified FG as shown in Figure 5, the communication is fully integrated into the exploration FG. The whole communications chain including the transmitter, channel, and receiver is hence modeled by a factor node g reflecting the communication uncertainties in a probability density function p(s,ŝ). Beside point estimatesŝ, this information facilitates exploration by bringing more informativeness into the estimation process, e.g., by exchanging approximations of p(ŝ) or bounds on learned parameters. All of them enable the exploration to be "communication-aware".
For transmission of variable s, we propose to use semantic communication that is aware of the importance or relevance of s. For instance, if data is transmitted over a highly unreliable communication channel between two agents, the receiving agent should be aware of the poor data quality and consider this information for its local inference. The agent can either assign a very low priority to the received data for the inference procedure or discard it completely. Bad data quality can also indicate that the agent should change its position to improve the condition of the communication channel. If received data from another agent is of high relevance, the receiving agent should adjust its position such that the communication condition is kept stable or improved. As such, intelligent exploration strategies can be designed to select new sampling locations subject to the optimization of relevance/confidence of the variables of interest.

Exploration
In the design approach from Section 4, we proposed the use of FGs to model the underlying PDE of the physical process and to incorporate inter-agent communication as a factor and variable nodes into the exploration procedure. From the proposed design approach, we derive the following two key challenges w.r.t. a "communication-aware" exploration:

1.
Probabilistic description of PDE by factor graphs: How to describe the PDE model in a Bayesian framework and conduct inference using FGs? 2.
Process prediction for exploitation: How to design a "communication-aware" exploration objective to determine new measurement positions for the agents?
These challenges give rise to the following possible approaches for the exploration framework.

Probabilistic Description of PDE by Factor Graphs
For the exploration of the physical process u(δ, t), we first need to solve the corresponding PDE numerically that describes this process. To this end, a broad variety of methods are available in the literature from the domain of Finite-Difference Methods (FDM) and Finite Element Methods (FEM). FEM allows for higher flexibility in spatial resolution since it is a mesh-based discretization. Based on the discretized PDE, one can employ a FG in the Bayesian framework to model and eventually solve the PDE. Here, variable nodes represent mesh parameter estimates at the corresponding mesh vertices (when using Lagrange elements in FEM). Factor nodes describe the spatial interrelation between two mesh vertices as modeled by the PDE. The complexity of the mesh directly influences the size of the FG since the number of mesh vertices corresponds to the number of variable nodes. Hence, it is crucial to optimize the FEM mesh by reducing, e.g., redundant vertices. Here, adaptive discretization methods can be investigated that are trained to change the mesh depending on the required spatial resolution of the physical process u(δ, t) in certain areas.
As an example of how an inverse PDE problem can be described and solved using FGs, we consider the exploration of a gas field. Details can be found in [55]. The underlying PDE describing the gas field is given by the diffusion equation: where u(δ, t) is the gas concentration at position δ and time t, χ is a gas diffusion coefficient, d(δ, t) is the gas source distribution and ∆ is the spatial Laplace operator. The problem in gas exploration is to determine both u(δ, t) and d(δ, t) from gas concentration measurements at distinct sampling positions. To this end, the diffusion Equation (2) needs to be solved numerically first. As mentioned before, FDM or FEM can be used for this matter. Authors in [55] used the FDM method, yet independent of the used discretization, we obtain a system of linear equations that needs to be solved. In the case of FDM, the unknown parameters in this system represent concentration and source strengths in each cell of the discretized spatial domain.  − 1], d[n]). Additionally, we define two prior pdfs, one for the initial gas concentration u[0] and one for the source p(d[n]). For initial gas concentration u[0], we use a Gaussian pdf with zero mean and high variance, since in the beginning no information about the gas concentration is available. For the source prior p(d[n]), we include a sparse assumption on the spatial distribution of gas sources and make use of sparse Bayesian learning techniques from the domain of compressed sensing [58]. To this end, hyperparameters γ are introduced that describe the precision of the prior of the source distribution in each Cell c. They are also random variables that need to be estimated. In other words, for p(d c [n]|γ c [n]) in Cell c, a Gaussian pdf with variance γ −1 c is assumed. For p(γ c [n]), a Gamma pdf is selected. A detailed derivation of this approach is given in [55]. In short: the hierarchical prior favors source distributions that are sparse. Based on these pdfs, one can then formulate the desired posterior pdf p(u, d, γ|y) using Bayes theorem: for a total of N time steps and C mesh cells in the discretized spatial domain. The posterior pdf can be graphically modeled via an FG. In Figure 6 To finally enable inference on the FG to obtain the posterior pdf, one can use the message passing algorithm. Here, messages that represent pdfs are exchanged between factor and variable nodes. Based on an iterative exchange of such messages, outgoing messages of variable nodes eventually converge to the marginal distributions of the respective variables. For the FG model from Figure 6, the messages that are connected to factors R c and Y c can be calculated using the sum-product algorithm. These messages are Gaussian pdfs which are parameterized by mean and variance. In particular, the mean and precision of the message can be computed in closed form and are the only quantities that need to be communicated over the edges. We summarize the messages or its characterizing parameters in a vector m. In contrast, messages m for factor nodes G c and H c are not computable in closed form. However, variational message passing [56] can be used to obtain an analytical approximation of these specific messages, cf. [55]. To enable a distributed inference, the complete FG that covers all mesh cells can be separated into several parts. Each part corresponds to a different 2D region of the spatial domain, as shown previously in Figure 3.

uc [n]
R c

Process Prediction for Exploitation
The second key challenge considers the design of a communication-aware exploration criterion to decide on the optimal sampling positions of the agents. In particular, the exploration criterion needs to consider the reliability of inter-agent communication to guide the agents to measurement positions that are both informative about the physical process and reliable in terms of communication. Hence, both exploration objectives and communication constraints need to be respected by this criterion. To this end, we intend to utilize the FG to make predictions about process properties or message/variable certainties at arbitrary measurement locations. These predictions are essential for exploitation and should include uncertainty measures from communications. In fact, these predictions form a formal basis for determining new measurement locations of the agents. Such uncertainty measures are contained in the pdf p(s,ŝ) that can be extracted from the joint pdf p(m,m) of transmitted and received messages m andm. This joint pdf p(m,m) is provided by the semantic communication system to the respective receiving agent, e.g., by marginalization or point estimate. Thus, by properly exploiting pdf p(s) one can optimally decide on new sampling positions of the agents. Specifically, given (i) a trained FG FG(·) with estimated marginal pdfs and (ii) measurement data y, we aim at finding new measurement positionsδ 1 , . . . ,δ L for L agents as a solution for the following optimization problem δ 1 , . . . ,δ L = arg min δ 1 ,...,δ L EC(δ 1 , . . . , δ L |FG, y) where EC(δ 1 , . . . , δ L |FG, y) is some chosen exploration criterion. Different choices for EC(δ 1 , . . . , δ L |FG, y) can be investigated such as information-theoretic measures like mutual information, entropy, or entropy rate. The motivation for the choice of informationtheoretic measures is mainly owed to the fact that this leads to an indirect optimization (maximization) of the information gathered by the swarm.

Semantic Communication
Following the design approach of Section 4.2, communication is modeled as a factor node that consists of the main communication blocks shown in Figure 7. As input, we have a semantic Random Variable (RV) m ∈ M N m ×1 m from domain M m of dimension N m set to the message m s→g .
For the remainder of the article, note that the domain of all RVs M may be either discrete or continuous. Further, we note that the definition of entropy for discrete and continuous RVs differs. For example, the differential entropy of continuous RVs may be negative whereas the entropy of discrete RVs is always positive [59]. Without loss of generality, we will thus assume all RVs either to be discrete or to be continuous. In this work, we avoid notational clutter by using the expected value operator: Replacing the integral by summation over discrete RVs, the equations are also valid for discrete RVs and vice versa.
The message m could, e.g., consist of samples of the pdf of s or of its parameters, i.e., mean µ and variance σ 2 in case of a Gaussian distribution. These parameters themselves are subject to stochastic perturbations when computed or measured. After encoding with the transmitter p θ (x|m), we transmit signals x ∈ M N Tx ×1 x over the wireless channel p(y|x) to the receiver side. There, we infer an estimatem set to the message m f →s based on the received signals y ∈ M N Rx ×1 y with the decoder q ϕ (m|y). Communication is modeled with its distribution p(m,m). Since the semantic context is included in communication system design with the RV m, we enter into the field of semantic communication, which has seen a lot of research interest recently [6,7,9,11,[46][47][48]53]. We will adapt and modify here the promising approach of [11] where the authors originally designed a semantic communication system for the transmission of written language/text similar to [51] using transformer networks. As an alternative idea, we could also follow the approach of [49]: There, the authors model semantics by means of hidden random variables and define the semantic communication task as the data-reduced and reliable transmission of a communications source over a communication channel such that semantics is best preserved. The authors cast this task as an end-to-end Information Bottleneck problem, allowing for compression while preserving relevant information. As a solution approach, the authors propose the ML-based semantic communication system SINFONY and analyze its performance in a distributed multipoint scenario where the meaning behind image sources is to be transmitted, revealing a tremendous rate-normalized SNR shift up to 20 dB compared to classically designed communication systems. Adapted to our scenario, this means we would aim to reconstruct the exploration RV s directly instead of m. How this translates into a different integration strategy is an open question, and we leave further elaboration for future work.
To make communications semantic/application-aware, we have to master two key challenges:

1.
Exploration integration: How can the meaning of exploration variables be exploited in communications when using tools of probabilistic learning? 2.
Exploration interface design: Which information should be passed to the exploration to reflect the uncertainty or reliability of communication, and how can we design this output by using tools from probabilistic learning?
We now propose how these challenges can be approached.

Model Selection
In order to tackle a semantic design of both transmitter and receiver in the considered exploration scenario, we first need to define a well-suited communication or machine learning model. From a probabilistic ML viewpoint, this design is equivalent to an unsupervised learning problem. Since we want to learn a hidden representation y of our input data m, our aim is to learn a probabilistic encoder or discriminative model p θ (y|m) parametrized by a parameter vector θ. It includes both transmitter p θ (x|m) and channel model p(y|x). Note that p θ (x|m) is probabilistic here but usually assumed to be deterministic since we aim for uncertainty reduction at the receiver and that p(y|x) is independent of θ.
As transmitters, application-adapted DNNs p θ (x|m) are preferably analyzed in the literature [11,53]. Also for exploration, we propose to use DNNs mainly for a generic and flexible design of ML-based prototype transceivers, since DNNs are able to approximate any function well (universal approximation theorem) and can be easily optimized using automatic differentiation frameworks. But we point out that alternative forms of learning are not excluded from future implementations.
The channel model p(y|x) in mobile communications is typically assumed to be a frequency-selective Rayleigh fading channel. Oftentimes, this assumption is further abstracted to Additive White Gaussian Noise (AWGN) or Multiple Input Multiple Output (MIMO) channels for basic research. In contrast, time selectivity can be neglected on Earth for slowly moving agents with long coherence time. Hence, we expect mainly frequency selective channels that should even be valid for exploration of, e.g., an extraterrestrial environment like Moon or Mars, assuming rural area models [60,61].
With this likelihood model p θ (y|m), and the semantic prior distribution p(m) in mind, the whole generative model is given by the joint probability distribution function p θ (m, y) = p θ (y|m) · p(m) usually assumed in communication systems. Furthermore, we will assume p θ (m, y) to belong to the exponential family. This leads to efficient learning and inference algorithms.
The last remaining communications component is the receiver. From a Bayesian perspective, the receiver infers m given the received signal y based on the posterior distribution p θ (m|y), which can be inferred from p θ (m, y) using Bayes theorem. Now, we are able to give an Answer to Challenge 1: Exploration Integration. It is the prior p(m) that allows modelling the message m s→g (see, e.g., Figure 5a). In data-driven transceiver design, samples of the prior p(m) are used as training data. In other words, through this prior, the design of the communication system becomes "exploration-aware".

Learning of the Semantic Communication System
To define an unsupervised learning optimization criterion for our discriminative model or encoder p θ (x|m), it is useful to follow the infomax principle from the informationtheoretic perspective [59]. This means our aim is to find a representation y ∼ p θ (y|m) that retains a significant amount of information about the input, i.e., maximization of the mutual information I(m; y) w.r.t. the encoder p θ (x|m) [49,62]: arg max p θ (x|m) I(m; y) (6) = arg max θ E m,y∼p θ (m,y) ln p θ (m, y) p(m)p θ (y) (7) = arg max θ H(m) − H(p θ (m, y), p θ (m|y)) (8) where H(m) = E m∼p(m) [− ln p(m)] is the entropy and E x∼p(x) [ f (x)] denotes the expected value of f (x) w.r.t. both discrete or continuous RVs x. Here we also used the fact that H(m) is independent of θ and that p θ (m|y) ∝ p θ (y|m) · p(m). Further, note that the form of p θ (y|m) has to be constrained to avoid learning a trivial identity mapping. For communications, the channel p(y|x), e.g., AWGN, indeed constraints p θ (y|m).
If the calculation of the posterior p θ (m|y) in (9) is intractable, we are able to replace it by a variational distribution q ϕ (m|y) with parameters ϕ. Similar to the transmitter, DNNs are usually used in semantic communication literature [11,53] for the design of the approximate posterior q ϕ (m|y) at the receiver. To enhance the performance complexity trade-off, the application of deep unfolding can be considered-a model-driven learning approach that introduces model knowledge of p θ (y, m) to construct q ϕ (m|y) [43,44].
With q ϕ (m|y), we are able to define a Mutual Information Lower BOund (MILBO) [62] similar to the well-known Evidence Lower BOund (ELBO) [38]: Optimization of θ and ϕ can now be done w.r.t. this lower bound, i.e., arg max θ,ϕ E m,y∼p θ (m,y) ln q ϕ (m|y) (11) arg max θ,ϕ − E y∼p(y) H p θ (m|y), q ϕ (m|y) (12) arg min There, H(p(x), q(x)) = E x∼p(x) [− ln q(x)] is the cross entropy between two pdfs p(x) and q(x). We note that the MILBO in (10) is equivalent to the negative amortized cross-entropy L CE θ,ϕ in (12). This means that approximate maximization of the mutual information justifies the minimization of the cross entropy in the Auto Encoder (AE) approach [39] oftentimes seen in recent semantic communication literature [11,53]. Thus, the idea is to learn parametrizations of the transmitter discriminative model and of the variational receiver posterior, e.g., by AEs or RL.
This means that optimization of the MILBO balances minimization of the Kullback-Leibler (KL) divergence D KL p θ (m|y) q ϕ (m|y) and maximization of the mutual information I θ (m; y). The former criterion can be seen as a regularization term that favors encoders with high mutual information, for which decoders can be learned that are close to the true posterior.
Indeed, our communications design and optimization based on probabilistic modeling can be applied in general also to semantics-agnostic settings. But in this article, we introduce semantics, i.e., messages, m into the exploration-tailored design and aim for accurate analog transmission of m. In fact, we adopt and modify the idea of [11] where the authors originally designed a semantic communication system for the transmission of written language/text similar to [51] using transformer networks:

1.
We replace the text with the messages m.

2.
The objective in [11] is to reconstruct m (sentences) as accurately as possible while preserving as much information of x in y. Optimization is done w.r.t. to a loss function consisting of two parts: Cross entropy between language input m and outputm, as well as an additional scaled mutual information term between transmit signal x, and received signal y. We omit the latter in our approach (12).
This view resembles JSCC and is backed by the extensive survey in [9]. After optimization, the authors measure semantic performance with BiLingual Evaluation Understudy (BLEU) and semantic similarity [11] We note that computation of the MILBO (10) (or cross-entropy (12)) leads to similar problems as for the ELBO [59]: If calculating the expected value cannot be solved analytically or is computationally intractable, we can approximate it using Monte Carlo sampling techniques. For stochastic gradient descent-based optimization, i.e., the AE approach, the gradient w.r.t. ϕ can then be calculated by application of the backpropagation algorithm in automatic differentiation frameworks like TensorFlow. Computation of the so-called reinforce gradient w.r.t. θ leads to a high variance of the gradient estimate since we sample w.r.t. the pdf p θ (y|m) dependent on θ. Typically, the reparametrization trick is used to overcome this problem [59].
As a final remark, we arrive at a special case of the infomax principle if we fix the encoder with p θ (y|m) = p(y|m) and hence the transmitter. Then, only the receiver approximate posterior q ϕ (m|y) needs to be optimized in (14). Thus, in this case, maximization of the MILBO is equivalent to a supervised learning problem and minimization of KL divergence between true and approximate posterior [44]. This setup has several benefits: In practice, we avoid the reinforce gradient, and especially we do not need any ideal connection between transmitter and receiver. Further, even today in 5G, we can apply a semantic receiver design to standardized systems having fixed transmitter capabilities to possibly achieve semantic performance gains. As a first step, we thus suggest research should focus on the design of ML-based receivers, given a non-learning state-of-the-art transmitter. We give a first example in Section 5.3. Finally, we note that the latter was designed for the digital transmission of bits, requiring near-deterministic links. This may not really be needed from a semantic perspective and is a waste of resources. Hence, it is also worth considering the adaption of the transmitter to achieve more efficient use of bandwidth and increase data rate.

Interface to the Application
Based on the learned posterior q ϕ (m|y), we have full information about our model at the receiver side but still need to design an interface to the application, e.g., by inferring transmitted messages/beliefs m. For the efficient integration as a factor node, the posterior q ϕ (m|y) should hence belong to the exponential family. Further, we can define an interface by finding the optimal estimatorm * = h(y) w.r.t. a loss function L(m,m) which measures the quality of the inference predictionm. Non-informative loss functions are, e.g., quadratic error loss or the MAP estimator [38]. If the variance needs to be computed for the exploration part but is intractable, we can lower bound it by the Fisher information [65].
In summary, we are able to give an Answer to Challenge 2: Exploration Interface Design. Possible approaches for output design include equipping the receiver with the capability to learn an approximation of its corresponding posterior p(m|y), e.g., a Gaussian with mean and variance, or to bound the estimator's variance with, e.g., Fisher information.

First Numerical Example: Semantic Receiver
In order to show the benefits of an exploration-oriented communication design, we present a first numerical toy simulation using the application example of the distributed full waveform inversion from [22] mentioned in Section 3. There, we make a first change to the error-free digital communication system and introduce a semantics-aware receiver to improve communication efficiency, as proposed in Section 5.2.2.
Since this is out of the scope of the first investigation of this paper, note that we will not investigate the exact effect of semantic communication on overall exploration performance, as well as the tight integration of communications into exploration with, e.g., an interface, as outlined in Section 5.2.3. But still, we can gain insight into why such a design may improve communication efficiency w.r.t. the exploration task.

Exploration Scenario and Data
As simulation data, we assume local models κ l and gradients ∂J l (κ l )/∂κ l of L = 20 agents after the second iteration of the distributed full waveform inversion from Section 3 as communicated messages, which are used to compute the global model κ i.e., to execute the exploration task [22]. Note that the global model κ is equivalent to the exploration RV s. Both local models and gradients are summarized in the semantic RV m ∈ R, being continuous-valued but processed as floating point numbers with N b bits b ∈ {0, 1} N b ×1 on digital hardware. The size of the dataset provided from [22] is N train = 1,147,000. In every iteration of the distributed seismic exploration algorithm, these discrete floating point numbers b are exchanged between the agents and need to be communicated.
Relying on modern digital error-free protocols, each bit would be considered equally important and equally likely. However, with floating point representation and data distribution p(m) p(b), respectively, this is in fact not the case: The bits are mapped via a weighted sum within function m = h(b) into the real-valued domain, i.e., the semantic space of the exploration task. To explain what we mean by semantic space, let us consider the example of language. Words m like "neat" and "fine" have similar meanings and are thus close in the semantic space. If we confuse both words, the change in meaning is minor. In our example, this means that there is room for non-perfect transmission of bit sequences as long as their meaning remains close, e.g., m = 1.53 andm = 1.54. With semantic space in the real-valued domain and without any further detailed knowledge about the exploration task, i.e., utilizing low-level semantics [49], it is reasonable to assume that our receiver estimatesm should be close to the true transmit value m in the Mean Square Error (MSE) sense. By doing so, we expect that the exploration task should be still completed with high accuracy while increasing communications efficiency. As an important remark, we note that the model of this scenario resembles that proposed in [49], as it distinguishes between communications source b and semantic source m and assumes a deterministic bijective semantic channel p(b|m). Further, we both optimize and measure semantic performance with the MSE metric, in contrast to [11].

Transmission Model
Since we want to focus on the key aspect of introducing semantics into the communications design at the receiver side, we use a simple abstraction of the digital transmission system in this first investigation neglecting details, i.e., modern communication protocols with, e.g., strong LDPC or Polar codes: We assume an uncoded Binary Phase Shift Keying (BPSK) transmission of the bits b of each floating point number over an AWGN channel p(y|x) with noise variance σ 2 n to a receiver.

Methodology
With this given transmitter, we focus now on the design of the receiver, as explained in Section 5.2. Based on the statistics/prior p(m) p(b) of the simulation data, we are able to compute the ideal posterior p(m|y) = p(y|m) · p(m)/p(y) by marginalization of p(y).
From our simulations, we note that for computational tractability, the resolution needs to be lower than N b = 16 bits.
To reduce computational complexity, we can introduce an approximate posterior q ϕ (m|y). Assuming a Gaussian approximate posterior q ϕ (m|y) from the exponential family, minimization of the cross entropy (12), i.e., maximization of the mutual information (6), reduces to minimization of the MSE loss w.r.t. receiver parameters ϕ [59]. Thus, from the latter general perspective, the choice of the MSE loss for both semantic receiver optimization and semantic performance metric is well motivated.
We examine here the following approaches for the final decision/estimation ofm (see Figure 7) based on the computed posterior p(m|y) or q ϕ (m|y): , and detected separately. We assume that the prior probability p(b i ) of every single-bit b i is known. Subsequently, we estimatem = h(b).

•
Analog transmission: Analog transmission of m over the AWGN channel is used as a reference curve. We assume N b power-normalized channel uses with subsequent averaging for a fair comparison. • DNN estimator: For approximate estimation, we set the mean of a Gaussian approximate posterior q ϕ (m|y) to a small DNN with input y ∈ R N b ×1 , 2 dense intermediate ReLU layers of width 2 · N b and a linear output layer for estimation of m. We take the mean, i.e., the output of the DNN, as the estimatem.
We trained the DNN with MSE loss for N e = 44 epochs with the stochastic gradient descent variant Adam and a batch size of N b = 500. To optimize the receiver over a wider SNR range, we choose the SNR to be uniformly distributed within SNR train ∈ [6, 16] dB where SNR = 1/σ 2 n with noise variance σ 2 n .

Results
In Figure 8, we show the Normalized MSE (NMSE) performance of the considered (sub) optimal receiver approaches as a function of SNR for N b = 16 floating point resolution. Mean estimator MAP detector Single-bit detector DNN estimator SNR train ∈ [6, 16] dB Analog transmission Figure 8. Normalized MSE (NMSE) as a function of Signal-to-Noise Ratio (SNR) for different non-and semantic exploration data receiver approaches. We assume uncoded digital BPSK transmission of gradients and models m of the distributed full seismic waveform inversion [22] over an AWGN channel.
The classic single-bit approach is clearly inferior in the considered SNR range. Even the approximative DNN estimator outperforms the latter clearly. Notably, we observe only a 2 dB SNR shift compared to the mean estimate, i.e., the optimal approach, but with much lower computational complexity.

Discussion
By just adapting the receiver to account for semantics in this first investigation of a simple digital transmission scheme, we achieved a notable semantic performance gain. Further, we are able to achieve near-optimal semantic performance with a DNN of low complexity and hence with small training and inference time, possibly allowing for a real-time implementation. Thus, we conclude that a semantic communication design is profitable and can be realized with manageable effort. We note that it is still an open question which NMSE is required to achieve satisfactory performance on the task of distributed full waveform inversion. But provided that the given NMSE values of, e.g., 10 −2 , are accurate enough, we could avoid high latency, complexity energy consumption and increase data rate of modern communication protocols even with this simplified transmission scheme and by just adapting the receiver.

Conclusions
In this work, we presented an approach to make both exploration and communications mutually aware. In particular, we proposed to use probabilistic machine learning models to enable a unified description of both exploration and communications in one framework and made a first attempt towards integrating both areas using factor graphs. By using a factor graph description, we can integrate communications as a factor node between two communicating agents and improve communication efficiency in terms of latency, bandwidth, data rate, energy, and complexity. A first numerical example of integrating exploration data into semantic communications showed promising semantic performance gains.
We note that exploration is just an example of the application of the proposed framework. It can naturally be applied to other domains with communication links as well, e.g., in control engineering problems, etc. This philosophy of designing communications explicitly for a particular application lies at the heart of recent research interest in semantic communication. The introduction of the semantic aspect holds the promise of data rate increase in 6G networks. Further research is required to develop first prototype algorithms that lay the foundation for a "tight" integration of exploration and communications. However, we anticipate that this work serves as a reason to stimulate the required research to close the gap between both realms.

Data Availability Statement:
The data is not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: