Next Article in Journal
HySecure: FPGA-Based Hybrid Post-Quantum and Classical Cryptography Platform for End-to-End IoT Security
Previous Article in Journal
VimGeo: An Efficient Visual Model for Cross-View Geo-Localization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature-Driven Joint Source–Channel Coding for Robust 3D Image Transmission

1
School of Customs and Public Economics, Shanghai Customs College, Shanghai 201204, China
2
College of Electronics and Information Engineering, Tongji University, Shanghai 200092, China
3
School of Mathematics & Statistics, University of Glasgow, University Gardens, Glasgow G12 8QQ, UK
4
School of Computer Science and Technology, Tongji University, Shanghai 200092, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(19), 3907; https://doi.org/10.3390/electronics14193907
Submission received: 19 August 2025 / Revised: 21 September 2025 / Accepted: 28 September 2025 / Published: 30 September 2025
(This article belongs to the Special Issue AI-Empowered Communications: Towards a Wireless Metaverse)

Abstract

Emerging applications like augmented reality (AR) demand efficient wireless transmission of high-resolution three-dimensional (3D) images, yet conventional systems struggle with the high data volume and vulnerability to noise. This paper proposes a novel feature-driven framework that integrates semantic source coding with deep learning-based Joint Source–Channel Coding (JSCC) for robust and efficient transmission. Instead of processing dense meshes, the method first extracts a compact set of geometric features—specifically, the ridge and valley curves that define the object’s fundamental structure. This feature representation which is extracted by the anatomical curves is then processed by an end-to-end trained JSCC encoder, mapping the semantic information directly to channel symbols. This synergistic approach drastically reduces bandwidth requirements while leveraging the inherent resilience of JSCC for graceful degradation in noisy channels. The framework demonstrates superior reconstruction fidelity and robustness compared to traditional schemes, especially in low signal-to-noise ratio (SNR) regimes, enabling practical and efficient 3D semantic communications.

1. Introduction

The rapidly increasing demand for wireless transmission of high-resolution three-dimensional (3D) image and video signals poses a significant challenge to current communication systems [1]. Emerging applications such as the metaverse, augmented/virtual reality (AR/VR), remote surgery, and vehicular-to-everything (V2X) require robust transmission and realistic reconstruction of 3D data in fast-varying wireless environments with limited bandwidth resources [2,3,4,5]. State-of-the-art (SOTA) digital communication systems are designed based on Shannon’s source–channel separation theorem, which is optimal only under asymptotic and idealized assumptions [6]. In practical scenarios with finite block-lengths and non-ergodic channel statistics, these separation-based systems are suboptimal [7,8] and suffer from the “cliff effect”—a sudden drop in quality when the channel signal-to-noise ratio (SNR) falls below a specific threshold [9].
Joint source–channel coding (JSCC) has long been studied as a powerful alternative to improve end-to-end performance in practical systems [10,11]. Rather than separating source and channel coding, JSCC schemes create a direct mapping from the source data to the channel input symbols. More recently, in the context of semantic communications [12], deep learning-based JSCC (DeepJSCC) methods have shown remarkable results [13,14,15]. By learning the end-to-end mapping from data, these methods exhibit strong resilience to channel variations and provide graceful performance degradation, making them ideal for unreliable wireless channels [16,17,18,19,20].
While JSCC provides a powerful framework for how to transmit data robustly, a critical and complementary question is what data to transmit to maximize efficiency, especially for high-dimensional sources like 3D images. Transmitting raw 3D point clouds or dense meshes is often prohibitively expensive in terms of bandwidth. A more semantic approach is to extract and transmit only the most salient geometric features that define the object’s shape. In nature, shapes are often characterized by common features like curves that define bending and direction [21]. Features which are extracted by well-chosen 3D curves can provide a much richer and more compact geometrical description than a sparse set of 3D landmarks [22].
Significant research has been dedicated to estimating such characteristic curves from 3D data. These methods can be grouped into two categories: estimation from point clouds and from triangulated meshes. For point clouds, techniques like Robust Moving Least Squares (RMLS) have been used to identify points of high curvature, which are then connected to form feature curves [21,23]. Ridges and valleys detected on meshes are powerful features for describing shape [24,25,26]. Others have used principal curvatures to select and smooth feature lines [27]. However, a common challenge with these methods is ensuring the connectivity of the estimated curves, which is crucial for reconstructing a coherent shape [25,27,28].
More recently, machine learning and deep learning have provided effective methods for feature extraction. Deep Neural Networks (DNNs) have been used to extract object contours, and 3D leaf reconstruction methods have integrated 2D image segmentation with 3D curve techniques [29]. These methods, while powerful, often require massive training datasets to achieve high accuracy and can be computationally intensive [30]. The field of mesh segmentation also offers relevant techniques for partitioning a mesh into meaningful parts, although often without focusing on the smoothness of the partition boundaries [31,32,33,34,35]. Specialized tools like FiberMesh have also been developed to manipulate mesh structures along user-defined paths. The work in [36] proposed a DNN-based indoor localization framework that leverages generative adversarial networks (GANs) and semi-supervised learning to mitigate the issue of limited datasets, demonstrating that deep neural architectures can effectively extract features and improve robustness even in data-scarce environments. This line of research highlights that data-driven techniques can complement optimization-based frameworks in wireless communications.
In this paper, we propose a novel framework for efficient and robust 3D image transmission that synergistically combines semantic 3D feature extraction with deep JSCC. Our approach deviates from transmitting the entire, dense 3D object. Instead, we first employ a lightweight algorithm to identify and extract salient geometric features—specifically, the ridge and valley curves that constitute the structural skeleton of the object. A formal mathematical analysis of the algorithm’s performance is challenging due to the complex and stochastic nature of the target images, which often include random noise and discontinuities along the curves. Therefore, this study focuses on empirically validating the method’s robustness and flexibility through simulations on these challenging cases, for which theoretical models are often insufficient.
This compact feature representation, which captures the essential geometric information, is then encoded and transmitted over the wireless channel using a purpose-built DeepJSCC scheme. By transmitting only the critical semantic features, our method dramatically reduces the required bandwidth while leveraging the robustness of JSCC to ensure high-quality reconstruction even under poor channel conditions.
To be clear, our proposed framework comprises two distinct and sequential stages: feature extraction and transmission. The scope of the zero-shot estimation is confined to the first stage, where a lightweight computational geometry algorithm acts as a semantic source encoder. It processes the dense 3D surface to extract a compact set of feature curves, operating without any prior training on specific object classes. The scope of the DeepJSCC scheme covers the second stage, where it takes these extracted features as input and learns a robust mapping to channel symbols for transmission. In summary, the zero-shot method determines what critical information to send, while the trained DeepJSCC module determines how to send it resiliently.
The contributions of this paper are outlined as follows.
  • Introduction of a novel zero-shot 3D estimation method which directly works on 3D surfaces with limited numbers of landmarks.
  • Illustration of the flexibility which works on complex surfaces with noises nearby.
  • Higher quality for the same cost. With identical bandwidth and power, the hybrid method produces a better-quality reconstructed 3D image.
  • Graceful adaptation to channel quality. As the channel connection improves (higher SNR), the hybrid method’s performance smoothly increases, delivering an even better image. In contrast, the standard digital method’s quality is "locked in” by its fixed compression rate and does not benefit from a better connection.
The rest of this paper is organized as follows. Section 2 provides background fundamentals used in this research. A novel approach for estimating curves on folded shapes where very few landmarks are available is presented in Section 3. Section 4 presents the hybrid joint source–channel coding scheme. Finally, Section 5 is the conclusion and the future research plan.

2. Data and Basic Concepts

This research utilizes 3D images of real-world objects which are captured using the Artec® camera system (Artec 3D, Luxembourg City, Luxembourg), such as the one shown in Figure 1a. A 3D image is constructed by: a point cloud consisting of ordered 3D points captured by stereo cameras, as shown in Figure 1b; a Delaunay triangulation whose triangles are consistently ordered, using the 3D points as their vertices, as shown in Figure 1c. Thus, A 3D shape (model) produced by the camera system is characterized as a triangulated surface in this paper [37].
Traditional local feature descriptors which define correspondences across individuals include landmarks and ridge or valley curves. The function of landmarks is to signify clearly defined corresponding locations on multiple objects [39]. Ridge or valley curves will be referred to as “curves” throughout this paper. Three-dimensional curves are effectively described by an ordered collection of consecutive 3D coordinates situated on the object’s surface. The arc length between any two points on a 3D curve is estimated by the sum of Euclidean distances of the intermediate adjacent points. Whether they are directly measured triangle vertices or calculated intermediate locations on the facets, both landmarks and points constituting a curve are treated as residing on the surface itself.
Ridge and valley curves are characterized by their principal curvatures [26]. (Note that our definition differs from the reference, as our surface normal vector points outwards, whereas theirs points inwards.) The maximal and minimal principal curvatures, denoted γ m a x and γ m i n , respectively (with γ m a x γ m i n ), measure the degree of bending along and across these curves. The principal curvatures of a point are estimated as follows. A local coordinate system can be constructed by the normal vector, an edge nearby and the cross product of them. Any vertex in the local area can be projected to the system. The projected coordinates can be fitted by a cubic polynomial of the three dimensions. The Weingarten matrix of the model provides its eigenvalues which we use as the principal curvatures and eigenvectors which are the principal vectors.
The first derivatives of the principal curvatures along these directions are defined as γ m a x = γ m a x δ m a x and γ m i n = γ m i n δ m i n , where δ m a x and δ m i n are the corresponding principal directions.
A curve is identified as a ridge if it satisfies:
γ m i n = 0 , γ m i n δ m i n > 0 , γ m i n < | γ m a x |
Conversely, a curve is identified as a valley if it meets the conditions:
γ m a x = 0 , γ m a x δ m a x < 0 , γ m a x > | γ m i n |
Physically, the negative curvature γ m i n describes the bending across a ridge, and the positive curvature γ m a x describes the bending across a valley. A larger absolute value for a principal curvature indicates a more pronounced convex or concave geometry. For convenience in this paper, we will also use the notation γ 1 = γ m a x and γ 2 = γ m i n for the first and second principal curvatures, where γ 1 γ 2 .
The Shape Index (SI) is a dimensionless measure that characterizes the local shape of a surface at any given point p [40]. Unlike principal curvatures, which measure the magnitude of bending, the SI provides a qualitative description of the surface’s topology. The index is defined by the formula:
S I = 2 π a r c t a n γ m a x + γ m i n γ m a x γ m i n .
This formula maps the relationship between the maximal ( γ m a x ) and minimal ( γ m i n ) principal curvatures onto a fixed interval of [−1, 1]. This mapping facilitates the identification of local geometries; for example, the curve formed by closed lips is a convex feature, which corresponds to an SI value near 1. Other fundamental shapes like cups (concave) and saddles are represented by SI values of −1 and 0, respectively.

3. Curve Estimation Methodology

As established in Section 2, a 3D curve can be represented by a sequence of discrete points. Since a 3D camera system provides an estimated manifold of the true object, any curve identified on this image is likewise an approximation of the true curve on the object’s surface. A curve can be parameterized by its arc length s, such that a point on the curve is given by x ( s ) = ( x 1 ( s ) , x 2 ( s ) , x 3 ( s ) ) [41]. A key statistical principle, highlighted by Vittert, is that the estimated curve converges to the true curve as the error in the image manifold decreases.
Identifying these curves often relies on landmarks to guide their composition. However, in practice, only a sparse set of landmarks is typically available, as manually adding more is a labor-intensive and time-consuming process. This section, therefore, introduces a novel computational geometry method designed to effectively estimate 3D curves even from limited guiding data.

3.1. Methodology

At any point on a 3D surface, two orthogonal principal directions exist which correspond to the maximum and minimum surface bending (curvature). As detailed in Section 2, for a ridge, the first principal curvature, γ 1 = γ m a x , aligns with the tangent along the ridge, while the second, γ 2 = γ m i n , is orthogonal to it. Surface creases are formally defined as “The loci of points where the largest in absolute value principal curvature takes a positive maximum (negative minimum) along its corresponding curvature line" [26].
Drawing inspiration from this definition, our novel method identifies the loci of points forming a curve. We use the principal curvature whose direction is transverse to the curve to locate a point where this curvature reaches a local maximum in absolute value. The other principal curvature is then utilized to step to an adjacent location, positioning the search near the next point on the curve.

3.1.1. Algorithm in Detail

To illustrate the algorithm, we consider the identification of a ring-like ridge on the shape shown in Figure 2 (left). The surface is colored according to the second principal curvature, γ 2 , where yellow indicates a stronger bend across the ridge. The guiding principle for identifying a ridge is to trace a path that follows the ridge’s direction and is composed of points with the maximum bending size (For a ridge, the second principal curvature ( γ 2 ) is negative, so we define the bending size as its absolute value, γ =   | γ 2 | .) across the ridge. These identified points form an initial approximation of the curve, which is subsequently smoothed using local neighbors.
As shown by the black point in the left panel of Figure 2, the process is initiated from a seed. The most efficient way to place a seed is through manual placement using the IDAV Landmark Editor software v3.6 (This software is no longer available for download. An alternative is the Stratovan Checkpoint.). This system has an interface that visualizes a 3D image and provides the coordinates of a point when it is clicked. Alternatively, especially when dealing with a large number of images, the seed can also be found automatically based on its geometric characteristics, such as maximum or minimum curvature. The iterative procedure is as follows:
  • Beginning at a starting point (initially the seed), take a single step along the first principal direction of the surface.
  • From the endpoint of this step, locate the nearest vertex on the surface. At this vertex, generate a two-sided search path along its second principal direction.
  • Along this search path, identify the point with the maximum curvature value γ =   | γ 2 | . This point becomes a new point on the curve and serves as the starting point for the next iteration.
  • Repeat steps 1–3 until a termination condition is met. For a ring-like curve, this could be returning sufficiently close to the original landmark; alternatively, it could be reaching a predefined total arc length.
Figure 2 provides a visualization of the algorithm [37]. Let the initial landmark or seed be denoted by L 1 . The first step, termed a movement, involves selecting a candidate point C 1 . This is achieved by advancing a set distance from L 1 along its first principal direction, as illustrated by the blue line. The path is sampled by a “plane cut” which generates the set of blue points [41]. The length of the step can be two or three times of the triangles’ edges. Since “plane cut” methods can only generate paths which cross at least one triangle further, the length of the step cannot be smaller than any edge of a triangle on the image.
Next, to locate the point of maximum local bending near C 1 , a second procedure called a comparison is performed. A two-sided “plane cut” (the red dots) is generated along the second principal direction of C 1 , indicated by the red line. The point with the highest bending size, γ , along this red path is identified. The nearest vertex on the mesh to this point is then selected as the next point on the curve, L 2 . This new point, L 2 , subsequently serves as the starting point for finding L 3 .
An important adjustment is made to minimize estimation error. As noted in Section 2, only the vertices of the mesh are direct observations from the camera system. Since the “plane cut” procedure can generate points that are not vertices, we enforce a rule to maintain fidelity to the original data. Instead of directly using the point with the maximum γ from the red “plane cut”, we always select its closest vertex on the triangulation. As shown in the central panel of Figure 2, this ensures that all identified points on the curve ( L i , i = 1 , 2 , 3 ) are vertices of the original mesh.

3.1.2. Direction Control

To prevent the identified curve from deviating erratically, a direction–control mechanism is implemented in each iteration. This involves creating a pair of planar “boundaries” by rotating the vector formed by the last two identified points around the normal vector of the current point. These boundaries constrain the search for the next point. For instance, in the second iteration shown in Figure 2, the vector L 1 L 2 is rotated around the normal at L 2 to form two new vectors (the black lines). For this anticlockwise ring identification, rotations of π 2 anticlockwise and π 4 clockwise proved most effective. The two planes spanned by these rotated vectors and the normal vector at L 2 serve as the “boundaries”. Any points on the red “plane cut” that are outside these boundaries are discarded, ensuring the search stays aligned with the direction of L 1 L 2 .
The result of this process, an “initial” curve composed of discrete points in anticlockwise order, is shown in Figure 3a. For this example, the “movement” distance was set to 10 units per iteration, and the “comparison” search extended five units on each side of the candidate point ( C i ). This initial curve serves as input for a subsequent smoothing procedure.
It is important to note that this method is not limited to one shape. Figure 3b demonstrates its application in identifying the edge of a socket-like feature. Although the final curve is a closed circuit, it was generated by identifying each half independently, starting and ending at two fixed anatomical landmarks that were required to be part of the final curve.

3.1.3. Smoothing Procedure

The smoothing procedure is designed to generate a new curve with reduced variability from the initial curve. This is accomplished by calculating a new, adjusted position for each point on the initial curve.
The core principle, illustrated in Figure 3c [37], is to reposition each point L i based on its neighbors, L i 1 and L i + 1 . First, a “plane cut” representing the shortest path between L i 1 and L i + 1 is created (shown as red dots). The midpoint of this path, M i , can be considered the average position of the two neighbors. To incorporate the original position of L i , a second, higher-density “plane cut” (the purple dots) is then generated along the shortest path from this midpoint M i to the original point L i . The final smoothed position is the midpoint of this second (purple) path, which then replaces the original L i .
Figure 4a compares the curve before and after smoothing. It can be noticed that the smoothed curve (red dots in Figure 4c) is much smoother than the initial curve (black dots in Figure 4b). It seems that the final red curve in Figure 4c has captured the shape of the ring-like ridge effectively. However, this procedure is an approximation whose accuracy is contingent upon the point density of the curve prior to smoothing. If this density is insufficient, a further examination of curvatures across ridges and valleys may be required.

3.2. Simulation Study

A simulation study was conducted to evaluate the method presented in Section 3.1. The evaluation utilizes two primary shapes, a ring-like curve and a cosine curve, depicted in Figure 5a and Figure 5b, respectively. For each shape, the analytically defined true curve is compared against the curve estimated by our algorithm. Furthermore, to assess the method’s robustness, varying levels of random noise were added to the surfaces to analyze the impact on performance.

3.2.1. Create the Surface

The surfaces for the simulation were constructed on a 100 × 100 grid in the x-y plane, spanning a width of two arbitrary units. To form a ring-like ridge, the height (z-coordinate) at each grid point was determined by the following Gaussian function:
z = 2 π σ 2 1 2 exp ( d c 1 ) 2 2 σ 2
where d c represents the Euclidean distance from any point ( x , y ) to the grid’s center. This formulation ensures that the true ridge is a circle of radius 1 centered on the grid, with a constant height of z = 2 π σ 2 1 2 . A visualization of this shape, along with the true curve marked by black dots, is provided in Figure 5a.
A Gaussian random field is a mathematical tool used to generate surfaces with natural-looking, random textures, like bumpy terrain or noisy data. A weighted Gaussian random field adds a layer of control, allowing you to vary the amount of randomness across the surface to make some areas rougher while keeping others smooth. To simulate shapes similar to ones in Figure 3, noise generated from a weighted Gaussian random field was added to the surface. For the ring-like ridge, the weight function is defined as w = 1 exp { D } , where D = ( d c 1 ) 2 . This distance-based weighting scheme is created based on Gaussian Radial Basis Function (RBF), a standard method for creating a smooth, localized field of influence [42]. It is designed to preserve the main ridge structure, regardless of the noise level in the surrounding area. Specifically, points located on the ridge have d c = 1 , which results in D = 0 and a maximum weight of w = 1 . Points close to the ridge have small values of D and therefore also have high weights. As a result, even when significant noise is applied to the entire surface, the ridge itself remains well-defined, and the points near it are not greatly displaced. An example of the noisy ring-like shape, with the true curve shown as black dots, is depicted in Figure 5c.
A similar process was used to create a cosine-shaped ridge. Its height (z-coordinate) is defined by the equation:
z = 2 π σ 2 1 2 exp y cos 3 2 x 2 2 σ 2
This formula yields a true ridge following a cosine path at a constant height of z = 2 π σ 2 1 2 , as shown in Figure 5b. To add noise, the same weighting approach was applied, with the weight function w = 1 exp { D } now using D = y cos 3 2 x 2 . This ensures the cosine ridge remains intact amidst the noise, as illustrated in Figure 5d.

3.2.2. Estimate the Ridge Curves

In this simulation study, the analytical coordinates of the true ring-like ridge curve and the cosine ridge curve can be determined. When the ring-like curve is projected onto the x-y plane, it forms a circle that satisfies the equation x 2 + y 2 = 1 . The third dimension, z, is held constant at the value obtained when d c = 1 in Equation (4). Thus, the coordinates of the ring-like curve are given by:
cos arctan y x , sin arctan y x , 2 π σ 2 1 2
Similarly, the projection of the true cosine ridge curve onto the x-y plane is the line y = cos 3 2 x . The z-coordinate is also held constant at the value obtained when d c = 1 in Equation (4). Thus, the coordinates of the cosine ridge curve are given by:
x , cos 3 2 x , 2 π σ 2 1 2
The algorithm from Section 3.1 estimates these curves. As the geometry information on the surface with noise is rather complex, the initial seed was placed manually using the IDAV Landmark Editor software. As noted in Step 3 of the method, the estimation relies heavily on curvature calculations, which in turn depend on the size of the local neighborhood used. We therefore investigated the influence of this neighborhood radius, denoted d r . The top row of Figure 5 shows the identified curves (red dots) versus the true curves (black dots, from Formulas (6) and (7)) on the noise-free simulated shapes.
The relationship between the average estimation error and the neighborhood radius d r is plotted in Figure 6a. For these two shapes, the choice of d r in the range of 0.1 < d r < 0.2 has a minimal impact on the results (less than 0.3% variation). Consequently, we selected a radius of d r = 0.15 which has the smallest average deviance on the cosine curve for the remainder of the simulation. The effect of noise on estimation accuracy is shown in Figure 6b (right). The second row of Figure 5 illustrates the impact of a large, 40% noise level. While noise affects the circular curve more significantly, the average deviance per point remains low, at just 3.5% even with a 50% noise level.
Figure 7 details the decomposition of the estimation error for the curves shown in Figure 5. For the circular curve, the deviance is broken down into radius and height components (Figure 7a,b). For the cosine curve, since the x-coordinates are matched, the deviance is decomposed into the second and third dimensions (Figure 7c,d). Even with substantial noise, the error in the third dimension is at most 0.5%, while errors in other dimensions are around 2.5%, which is quite small.
In summary, the method performs well in identifying curves from sparse landmarks, with errors remaining within an acceptable range. As explained in Section 3.1, the algorithm can achieve an accurate estimation as long as significant noise is not present in the immediate vicinity of the curve.

4. Transmission of 3D Point Cloud

In this section, we use the hybrid transmission scheme to investigate the efficiency and robustness to transmit 3D extracted features. A zero-loss dissection algorithm was introduced by reconstructing the triangulation along the estimated curves [38]. The fundamental strategy of the dissection method involves eliminating triangles intersected by the specified 3D curve and subsequently reconstructing the mesh along the cut. As shown in Figure 8a,b, it produces two complete pieces without throwing any information away. Figure 8c shows an example of the nasal area characterised by the estimated curves and Figure 8d shows the 3D nasal area dissected by the algorithm.

4.1. Model Description

The proposed adaptive model is developed based on Deep Joint Source–Channel Coding (DeepJSCC) [43]. As illustrated in Figure 9, the overall architecture consists of three main components: the encoder, the channel module, and the decoder. The encoder employs a sequence of convolutional layers, activation functions, and a self-attention mechanism, which enables adaptive feature extraction according to the channel quality. In addition, we further employ the 2-D Fast Fourier Transform (FFT) on the obtained matrices (i.e., transformed from the original 3D data) to generate different frequency outputs before the attention mechanism. The intuition behind the integration of the self-attention mechanism is it allows the model to selectively emphasize informative point-to-point and region-to-region relations in the feature space, thereby improving robustness against channel distortions and improving reconstruction of fine-grained geometric structures. We also provide a detailed architecture about our encoder and decoder in Figure 10.
In this work, the input tensor is generated from preprocessed numerical point cloud data. The encoder employs a sequence of convolutional layers, activation functions, and a self-attention mechanism, which enable adaptive feature extraction according to the channel quality (SNR). The output feature tensor has a shape of ( B , C out , H , W ) with C out = 24 , where H and W are reduced spatial dimensions. This feature tensor can be regarded as the baseband representation of the transmitted signal.
The encoder output is reshaped into complex-valued signals for channel simulation. We adopt a three-hop amplify-and-forward (AF) relay network, where each hop introduces additive white Gaussian noise (AWGN) determined by the given signal-to-noise ratio (SNR). Specifically, the received signal is modeled as
y = s sig + n noise ,
where s sig denotes the encoded complex signal and n noise is the complex Gaussian noise.
The noisy received signal is converted back into a real-valued feature tensor and fed into the decoder, which mirrors the encoder architecture with transposed convolution layers, activation functions, and a self-attention mechanism. This enables the decoder to adapt its reconstruction process based on channel quality. The final output is a reconstructed image tensor.

4.2. Training Strategy and Metrics

The primary model used is a regular DeepJSCC under the conditions of bandwidth ratio = 1/6. In addition, we further employ the 2-D FFT on the obtained matrices (i.e., transformed from the original 3D data) to generate different frequency outputs before attention mechanism. We adopt a fine-tuning strategy by loading pretrained weights and the parameters is updated using the Adam optimizer with a learning rate of 1 × 10 4 . The maximum number of epochs is set to 100.
To comprehensively assess model performance, we adopt the Peak Signal-to-Noise Ratio (PSNR) metric to evaluate the objective reconstruction quality. We first transform the 3D point cloud data into a 2D form, i.e., if there exists point cloud data on a 3D coordinate, record its coordinate position as valid data in the 2D matrix. By this means, we make it suitable for model input and evaluation. The model is evaluated under AWGN channels with varying SNR values.

4.3. Results and Findings

The simulation of transmitting 3D images was based on our own database which contains 82 facial images. The lip feature of each image was extracted and each lip image contains roughly 500 points. The performance of the proposed communication framework was evaluated under different channel SNRs in Figure 11. The results show that, when the SNR increases from 0 dB to 10 dB, the reconstruction quality improves with an average PSNR gain of approximately 1.5 dB. The reason lies in the fact that the joint codec can better match the coding rate between the source and channel, and thus the reconstructed results are more robust compared to separate design. In our experiment, it was shown that the average performance improvement across SNR = 0–10 dB was about 1.5 dB, and the advantages of the proposed scheme were more pronounced in poor channel conditions. This is also in line with the expected performance of the joint source channel coding scheme in other literature [43,44].
Figure 12 shows the simulation results of structural similarity index measure (SSIM) across SNR = 0–10 dB. To calculate the SSIM value, we transformed the original 3D point cloud data into a 2D matrices, which had the same format as the image. The SSIM values ranged from 0.56 to 0.62.

5. Conclusions and Future Work

We proposed a novel 3D image transmission scheme that used both state-of-the-art digital and emerging semantic communication with extracted features. The results show that the hybrid scheme offered the dual benefits of conserving bandwidth and exhibited graceful performance improvement as the channel SNR increased.
Future work on improving the efficiency of transmitting 3D images could include extensions to transmit a set of features and reconstruct the whole mesh model. Other advanced feature extraction modules tailored for 3D image structures could be integrated into our framework, potentially leveraging transformer-based or graph neural network architectures to better capture spatial correlations and semantic dependencies. Additionally, adaptive coding strategies that catered to real-time channel conditions and application requirements could be further considered to enhance robustness.

Author Contributions

Conceptualization, A.B. for 3D shapes and H.X., W.C. for Semantic Communications; A.B. for methodology, software, validation, formal analysis, investigation, resources and data curation; W.C. for semantic validation; Y.L. and A.B.; writing—original draft preparation, Y.L.; writing—review and editing, A.B.; visualization, Y.L. and W.C.; supervision, A.B.; project administration and funding acquisition, Y.L. and H.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Shanghai Customs College under project no. 2315027A2021. This research was also supported by Shanghai Institute of Intelligent Science and Technology, Tongji University; the Fundamental Research Funds for the Central Universities; Shanghai Rising-Star Program; Shanghai Pujiang Program.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study. Requests to access the datasets should be directed to the corresponding author.

Acknowledgments

The authors would like to express their sincere gratitude to Kevin Parsons and Kirsty McWhinnie from the University of Glasgow for their invaluable support in collecting the data on fish mandibles. We appreciate their willingness to grant permission for the use of this data in our paper. Their contributions have been instrumental in advancing our research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gündüz, D.; Qin, Z.; Aguerri, I.E.; Dhillon, H.S.; Yang, Z.; Yener, A.; Wong, K.K.; Chae, C.B. Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications. IEEE J. Sel. Areas Commun. 2023, 41, 5–41. [Google Scholar] [CrossRef]
  2. Al-fuqaha, A.A.; Guizani, M.; Mohammadi, M.; Aledhari, M.; Ayyash, M. Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications. IEEE Commun. Surv. Tutor. 2015, 17, 2347–2376. [Google Scholar] [CrossRef]
  3. El-Gazzar, N.A.H.R. A Survey of Vehicle-to-Everything (V2X) Communications. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3647–3663. [Google Scholar]
  4. Schick, D. The Metaverse: A New Frontier for Communication. J. Commun. 2022, 72, 1–10. [Google Scholar]
  5. Lee, H.K.; Lee, J.; Kim, H.Y. Remote Robotic Surgery: A Comprehensive Review. Int. J. Med. Robot. Comput. Assist. Surg. 2021, 17, e2198. [Google Scholar]
  6. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  7. Vembu, S.; Verdu, S.; Steinberg, Y. The Source-Channel Separation Theorem Revisited. IEEE Trans. Inf. Theory 1995, 41, 44–54. [Google Scholar] [CrossRef]
  8. Kostina, V.; Verdú, S. Lossy Joint Source-Channel Coding in the Finite Blocklength Regime. IEEE Trans. Inf. Theory 2013, 59, 2545–2575. [Google Scholar] [CrossRef]
  9. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
  10. Goldsmith, A. Joint Source/Channel Coding for Wireless Channels. In Proceedings of the IEEE Vehicular Technology Conference, Chicago, IL, USA, 25–28 July 1995; pp. 614–618. [Google Scholar]
  11. Zhai, F.; Eisenberg, Y.; Katsaggelos, A.K. Joint Source-Channel Coding for Video Communications; Elsevier Inc.: Amsterdam, The Netherlands, 2005; Available online: https://scispace.com/pdf/joint-source-channel-coding-for-video-communications-51nk9fjkhr.pdf (accessed on 19 August 2025).
  12. Qin, Z.; Ye, H.; Li, G.Y.; Juang, B.H.F. Deep Learning in Physical Layer Communications. IEEE Wirel. Commun. 2019, 26, 93–99. [Google Scholar] [CrossRef]
  13. Kurka, D.B.; Gündüz, D. Bandwidth-Agile Image Transmission with Deep Joint Source-Channel Coding. IEEE Trans. Wirel. Commun. 2021, 20, 8081–8095. [Google Scholar] [CrossRef]
  14. Tung, T.Y.; Gündüz, D. DeepWiVe: Deep-Learning-Aided Wireless Video Transmission. IEEE J. Sel. Areas Commun. 2022, 40, 2570–2583. [Google Scholar] [CrossRef]
  15. Yang, M.; Bian, C.; Kim, H.S. OFDM-Guided Deep Joint Source Channel Coding for Wireless Multipath Fading Channels. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 1328–1340. [Google Scholar] [CrossRef]
  16. Shao, Y.; Gunduz, D. Semantic Communications with Discrete-Time Analog Transmission: A PAPR Perspective. IEEE Wirel. Commun. Lett. 2023, 12, 510–514. [Google Scholar] [CrossRef]
  17. Wu, H.; Shao, Y.; Mikolajczyk, K.; Gündüz, D. Channel-Adaptive Wireless Image Transmission with OFDM. IEEE Wirel. Commun. Lett. 2022, 11, 2400–2404. [Google Scholar] [CrossRef]
  18. Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep Unsupervised Learning Using Nonequilibrium Thermodynamics. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; Available online: https://proceedings.mlr.press/v37/sohl-dickstein15.html (accessed on 19 August 2025).
  19. Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–12 December 2020; Available online: https://dl.acm.org/doi/abs/10.5555/3495724.3496298 (accessed on 19 August 2025).
  20. Wang, X.; Wang, S.; Zhang, S. A Survey on Deep Learning Based Joint Source-Channel Coding. arXiv 2021, arXiv:2101.06406. [Google Scholar]
  21. Lee, S. Robust Moving Least Squares for Feature Curve Extraction from Point-Sampled Surfaces. Comput.-Aided Des. 2005, 37, 1055–1065. [Google Scholar]
  22. Dryden, I.L.; Mardia, K.V. Statistical Shape Analysis; John Wiley & Sons: Hoboken, NJ, USA, 1998; Available online: https://download.e-bookshelf.de/download/0007/9139/12/L-G-0007913912-0014803393.pdf (accessed on 19 August 2025).
  23. Alexa, M.; Behr, J.; Cohen-Or, D.; Fleishman, S.; Levin, D.; Silva, C.T. Computing and Rendering Point Set Surfaces. IEEE Trans. Vis. Comput. Graph. 2003, 9, 3–15. [Google Scholar] [CrossRef]
  24. Yoshizawa, H.; Belyaev, A.; Seidel, H.P. A Feature-Based Approach to Subdivision Surface Reconstruction. Comput. Graph. Forum 2003, 22, 417–426. [Google Scholar]
  25. Belyaev, A.G.; Ohtake, Y. A Survey of Methods for Analyzing and Processing Feature Lines on Surfaces. Comput. Graph. Forum 2010, 29, 1827–1845. [Google Scholar]
  26. Ohtake, Y.; Belyaev, A.; Seidel, H.P. Ridge-valley Lines on Meshes via Implicit Surface Fitting. ACM Trans. Graph. (TOG) 2004, 23, 609–612. [Google Scholar] [CrossRef]
  27. Li, Z.K.G.S. An Algorithm for Extracting Feature Lines from Triangular Mesh Models. J. Comput.-Aided Des. Comput. Graph. 2010, 22, 637–642. [Google Scholar]
  28. Sederberg, T.W.; Zheng, J.P.R. Implicit Representation of Parametric Curves and Surfaces. Comput. Aided Geom. Des. 1985, 2, 450–461. [Google Scholar]
  29. Yu, X.F.; Meng, S. 3D Leaf Edge Reconstruction from a Single Image. In Proceedings of the International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017. [Google Scholar] [CrossRef]
  30. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012; Available online: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf (accessed on 18 August 2025).
  31. Shamir, A. A Survey on Mesh Segmentation Techniques. Comput. Graph. Forum 2008, 27, 1539–1556. [Google Scholar] [CrossRef]
  32. Tierny, J.; Pascucci, V.; Favelier, G.P.; Carr, H. A Survey of Mesh Parameterization. In Topological Methods in Data Analysis and Visualization; Pascucci, V., Tricoche, X., Hagen, H., Tierny, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 263–288. [Google Scholar] [CrossRef]
  33. Golovinskiy, A.; Funkhouser, T. Randomized Cuts for 3D Mesh Analysis. ACM Trans. Graph. 2008, 27, 1–12. [Google Scholar] [CrossRef]
  34. Lai, K.; Xiao, J.; Fei-Fei, L.; Savarese, S. A Large-Scale Hierarchical Multi-View RGB-D Object Dataset. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011. [Google Scholar] [CrossRef]
  35. Ben-Chen, M.; O’Brien, J.F.; Aben, I.S. Variational Harmonic Maps for Space Deformation. ACM Trans. Graph. 2008, 28, 1–11. [Google Scholar] [CrossRef]
  36. Njima, W.; Bazzi, A.; Chafii, M. DNN-Based Indoor Localization Under Limited Dataset Using GANs and Semi-Supervised Learning. IEEE Access 2022, 10, 69896–69909. [Google Scholar] [CrossRef]
  37. Liu, Y. Statistical Modelling of Local Features of Three-Dimensional Shapes. Ph.D. Thesis, University of Glasgow, Glasgow, UK, 2021. Available online: https://theses.gla.ac.uk/81948/ (accessed on 18 August 2025).
  38. Liu, Y.; Bowman, A.; Xu, H.; Duan, J. An Algorithm of Three-Dimensional Shape Dissection with Mesh Reconstruction. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2023, Brisbane, Australia, 10–14 July 2023; pp. 399–404. [Google Scholar] [CrossRef]
  39. Katina, S.; Mcneil, K.; Ayoub, A.; Guilfoyle, B.; Khambay, B.; Siebert, P.; Sukno, F.; Rojas, M.; Vittert, L.; Waddington, J.; et al. The definitions of three-dimensional landmarks on the human face: An interdisciplinary view. J. Anat. 2016, 228, 355–365. [Google Scholar] [CrossRef]
  40. Koenderink, J.J.; van Doorn, A.J. Surface shape and curvature scales. Image Vis. Comput. 1992, 10, 557–564. [Google Scholar] [CrossRef]
  41. Vittert, L.; Bowman, A.W.; Katina, S. A hierarchical curve-based approach to the analysis of manifold data. Ann. Appl. Stat. 2019, 13, 2539–2563. [Google Scholar] [CrossRef]
  42. Carr, J.C.; Beatson, R.K.; Cherrie, J.B.; Mitchell, T.J.; Fright, W.R.; McCallum, B.C.; Evans, T.R. Reconstruction and representation of 3D objects with radial basis functions. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’01, New York, NY, USA, 12–17 August 2001; pp. 67–76. Available online: https://www.cs.jhu.edu/~misha/Fall05/Papers/carr01.pdf (accessed on 18 August 2025).
  43. Bourtsoulatze, E.; Burth Kurka, D.; Gunduz, D. Deep Joint Source-Channel Coding for Wireless Image Transmission. IEEE Trans. Cogn. Commun. Netw. 2019, 5, 567–579. [Google Scholar] [CrossRef]
  44. Kurka, D.B.; Gündüz, D. DeepJSCC-f: Deep Joint Source-Channel Coding of Images with Feedback. IEEE J. Sel. Areas Inf. Theory 2020, 1, 178–193. [Google Scholar] [CrossRef]
Figure 1. A 3D image of a human lip captured by Artec® camera system [38].
Figure 1. A 3D image of a human lip captured by Artec® camera system [38].
Electronics 14 03907 g001
Figure 2. (Left): A ring-like ridge surface with a seed; (Middle): Visualization of the algorithm; (Right): Surface coloured according to the bending size.
Figure 2. (Left): A ring-like ridge surface with a seed; (Middle): Visualization of the algorithm; (Right): Surface coloured according to the bending size.
Electronics 14 03907 g002
Figure 3. Examples of the identified curves on fish jaw bones.
Figure 3. Examples of the identified curves on fish jaw bones.
Electronics 14 03907 g003
Figure 4. A comparison between the smoothed and the unsmoothed curve.
Figure 4. A comparison between the smoothed and the unsmoothed curve.
Electronics 14 03907 g004
Figure 5. Comparison between the true curves (black dots) and the estimated curves (red dots) on two types of shapes in the simulation study.
Figure 5. Comparison between the true curves (black dots) and the estimated curves (red dots) on two types of shapes in the simulation study.
Electronics 14 03907 g005
Figure 6. (a) Influence of the neighbourhood on the estimation of curvatures; (b) Influence of the noise scale on the estimation of curvatures.
Figure 6. (a) Influence of the neighbourhood on the estimation of curvatures; (b) Influence of the noise scale on the estimation of curvatures.
Electronics 14 03907 g006
Figure 7. Decomposition of the deviance of the estimated curve (red solid line) from the true curve (black dotted line) in Figure 5.
Figure 7. Decomposition of the deviance of the estimated curve (red solid line) from the true curve (black dotted line) in Figure 5.
Electronics 14 03907 g007aElectronics 14 03907 g007b
Figure 8. Examples of dissection.
Figure 8. Examples of dissection.
Electronics 14 03907 g008
Figure 9. The transmission model for 3D feature with DeepJSCC.
Figure 9. The transmission model for 3D feature with DeepJSCC.
Electronics 14 03907 g009
Figure 10. The detailed encoder and decoder architecture.
Figure 10. The detailed encoder and decoder architecture.
Electronics 14 03907 g010
Figure 11. Peak signal-to-noise ratio (PSNR) comparison between the proposed joint source–channel coding (JSCC) and conventional separated design.
Figure 11. Peak signal-to-noise ratio (PSNR) comparison between the proposed joint source–channel coding (JSCC) and conventional separated design.
Electronics 14 03907 g011
Figure 12. 2D SSIM evaluation results across SNR = 0–10 dB.
Figure 12. 2D SSIM evaluation results across SNR = 0–10 dB.
Electronics 14 03907 g012
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Xu, H.; Bowman, A.; Chen, W. Feature-Driven Joint Source–Channel Coding for Robust 3D Image Transmission. Electronics 2025, 14, 3907. https://doi.org/10.3390/electronics14193907

AMA Style

Liu Y, Xu H, Bowman A, Chen W. Feature-Driven Joint Source–Channel Coding for Robust 3D Image Transmission. Electronics. 2025; 14(19):3907. https://doi.org/10.3390/electronics14193907

Chicago/Turabian Style

Liu, Yinuo, Hao Xu, Adrian Bowman, and Weichao Chen. 2025. "Feature-Driven Joint Source–Channel Coding for Robust 3D Image Transmission" Electronics 14, no. 19: 3907. https://doi.org/10.3390/electronics14193907

APA Style

Liu, Y., Xu, H., Bowman, A., & Chen, W. (2025). Feature-Driven Joint Source–Channel Coding for Robust 3D Image Transmission. Electronics, 14(19), 3907. https://doi.org/10.3390/electronics14193907

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop