A Multi-Shot Approach for Spatial Resolution Improvement of Multispectral Images from an MSFA Sensor

Multispectral imaging technology has advanced significantly in recent years, allowing single-sensor cameras with multispectral filter arrays to be used in new scene acquisition applications. Our camera, developed as part of the European CAVIAR project, uses an eight-band MSFA to produce mosaic images that can be decomposed into eight sparse images. These sparse images contain only pixels with similar spectral properties and null pixels. A demosaicing process is then applied to obtain fully defined images. However, this process faces several challenges in rendering fine details, abrupt transitions, and textured regions due to the large number of null pixels in the sparse images. Therefore, we propose a sparse image composition method to overcome these challenges by reducing the number of null pixels in the sparse images. To achieve this, we increase the number of snapshots by simultaneously introducing a spatial displacement of the sensor by one to three pixels on the horizontal and/or vertical axes. The set of snapshots acquired provides a multitude of mosaics representing the same scene with a redistribution of pixels. The sparse images from the different mosaics are added together to get new composite sparse images in which the number of null pixels is reduced. A bilinear demosaicing approach is applied to the composite sparse images to obtain fully defined images. Experimental results on images projected onto the response of our MSFA filter show that our composition method significantly improves image spatial resolution and minimizes reconstruction errors while preserving spectral fidelity.


Introduction
Multispectral images are useful in a wide range of applications: facial recognition [1], remote sensing [2], medical imaging [3], and precision agriculture [4], among others.Multispectral image acquisition systems offer great diversity, particularly with scanning mode acquisition systems that acquire the multispectral image in multiple frames.They are divided into three categories: tunable filter cameras, tunable illumination cameras, and multi-camera systems.Tunable filters, such as LCTF (Liquid Crystal Tunable Filter) [5] and AOTF (Acousto-Optical Tunable Filter) [6], use electronic techniques to capture each multispectral band.Although these systems produce fully defined multispectral images, their acquisition time is beyond the scope of a real-time acquisition system.
On the other hand, instantaneous acquisition systems, or snapshots, capture the MS image in a single shot.They include single-sensor or multi-sensor multispectral systems, which are divided into several classes: multispectral filter array (MSFA), interferometers, tunable sensors, and filtered lens arrays [7].
The acquisition system based on a single-sensor one-shot camera coupled with an MSFA provides a compact, low-cost, real-time solution for multispectral image acquisition.
The camera can capture all the necessary spectral bands in a single snapshot [8].To achieve this, an MSFA is positioned in front of the sensor to capture mosaic images where each pixel location contains information from a single spectral band.An interpolation method is applied to the mosaic image to obtain the fully defined multispectral image [9].
The MSFA plays a crucial role in multispectral imaging by filtering the light entering the sensor.In a defined MSFA, the number of bands increases by reducing the number of pixels assigned to the band.A greater number of spectral bands in the MSFA allows for a more precise spectral analysis of the observed scene, but this results in a decrease in the spatial resolution of the image.Indeed, with more spectral bands, the distance between spectrally similar pixels increases [10].The main weakness of single-sensor oneshot cameras is their ability to efficiently reconstruct a complete multispectral image from a mosaic image, especially when the mosaic contains non-homogenous areas, abrupt transitions, and textured regions [11].
Previous works [4,12] have detailed our single-shot multispectral camera's design process, specifically designed to operate in the visible.This camera has a 4 × 4 MSFA moxel with eight spectral bands selected by a genetic algorithm.Each spectral band receives two pixels per moxel, where the mosaic arrangement is a moxel assembly over a monochrome sensor [13].After a snapshot, the camera provides a mosaic that is decomposed into eight sparse images, each containing pixels with the same spectral properties and null pixels.Thus, the sparse images have a very high number of null pixels.For our camera, in a 16-pixel moxel window, 14 pixels are null.This deficit can cause problems during image demosaicing, affecting image quality and visual fidelity and a loss of spatial resolution.
To address these issues, we propose a method for reducing the number of null pixels in sparse images.Our approach aims to reduce the number of null pixels by combining sparse images from multiple acquisitions.To achieve this, we combine camera displacements along both the vertical and horizontal axes.At each displacement, the camera captures an image of the observed scene, generating a mosaic of the scene with a spatial redistribution of pixels with similar spectral properties.Next, the set of sparse images from each postdisplacement acquisition is summed with those obtained without displacement to obtain new composite sparse images.The new sparse images are finally demosaiced.
In this study, we present the following contributions: • Setting up a dataset for our experiments, consisting in transforming images from a database of 31 into 8 bands to simulate our 8-band MSFA moxel.These images will then be mosaicked with our MSFA filter to simulate a snapshot from our camera; • Development of a new composition method with a multi-shot approach to reduce the number of null pixels in sparse images while maintaining the same number of spectral bands; • Performed visual and analytical comparisons using validation metrics to evaluate our experiments, demonstrating the improvement in spatial resolution of the final image obtained after demosaicing.
The remainder of this article is organized as follows: Section 2 presents the state of the art in improving the spatial resolution of MSFA images.Section 3 details the materials and methods used in our approach.Section 4 presents the experiments carried out and the results obtained.Section 5 discusses the results.Finally, Section 6 presents our conclusion.

Related Works on Improving the Spatial Resolution of MSFA Images
Much research has demonstrated the interest in improving the spatial resolution of multispectral images from an MSFA sensor.
Monno et al. [11,14] proposed a multispectral demosaicing method using a guided filter.This method is used in multispectral imaging to improve color reproduction and computer vision applications.The proposed method uses a guided filter to interpolate spectral components in a multispectral color filter array.The technique addresses the challenge of undersampling in multispectral imaging and shows promising results for practical applications.Its effectiveness is based on the establishment of an MSFA pattern with a dominant green band.
Wang et al. [15] proposed a method to improve the quality of images reconstructed from multispectral filter networks while minimizing the computational cost.It addresses the challenge of estimating missing data in images acquired by these networks using adaptive frequency domain filtering (AFDF).This technique combines the design of a frequency domain filter to eliminate artifacts with spatial averaging filtering to preserve spatial structure.By incorporating adaptive weighting, AFDF improves the quality of reconstructed multispectral images while maintaining high computational efficiency.
Rahti and Goyal [16] proposed a weighted directional interpolation method for estimating missing pixel values.They exploit both spectral and spatial correlations present in the image to intelligently select interpolation schemes based on the properties of binary tree-based MSFA models.By computing directional estimates and using edge amplitude information, the method progressively estimates missing pixel values and updates pixel arrangements according to the band's point of arrival (PoA) in the binary tree structure.
Zhang et al. [17] proposed a method that integrates a deep convolutional neural network with a channel attention mechanism to improve the demosaicing process.In this method, a mean square error (MSE) loss function is used to improve the accuracy of estimated pixel values in image processing.In addition, a contour loss is introduced to improve the sharpness and richness of textured images using high-frequency subband analysis in the wavelet domain.The method uses the TT-59 database [18] for training and evaluation.Multispectral images are processed to synthesize radiance data to demonstrate the effectiveness of the demosaicing technique.
Mihoubi et al. [19] proposed a demosaicing method called PPID based on the generation of a pseudo-panchromatic image (PPI).To ensure robustness to different lighting conditions, an adjustment of the value scale in the raw image is proposed before estimating the PPI, with the aim of mitigating biases caused by differences in spectral illumination distribution between channels.The remaining steps include calculating the spectral differences [20] between the original raw image and the PPI, using local directional weights for interpolation [21], and, finally, combining the PPI with the differences to estimate each channel of the final image.
Jeong et al. [22] proposed a method to improve image quality by estimating a pseudopanchromatic image using an iterative linear regression model.It then performs directional demosaicing, a technique that combines the pseudo-panchromatic image with spectral differences to produce a final interpolated image.The process includes steps such as directional interpolation using the BTES method [23] and calculation of weights to improve the accuracy of the final multispectral image.
Rathi and Goyal [9] proposed a method that uses the concept of the pseudo-panchromatic image and spectral correlation between spectral bands to efficiently generate a complete multispectral image.It involves estimating a pseudo-panchromatic image from a mosaic image using convolution filters based on the probability of the appearance of each spectral band [24] and binary masks.This pseudo-panchromatic image is then used to interpolate each spectral band to produce a multispectral image.The process iteratively improves the quality of the multispectral image by updating the pseudo-panchromatic image and estimating the spectral bands multiple times.
Liu et al. [25] proposed a new deep learning framework for multispectral demosaicing using pseudo-panchromatic images.The framework consists of two networks, the Deep PPI Generation Network (DPG-Net) and the Deep Demosaic Network (DDM-Net), which are used to generate and refine the PPI to improve image quality and recover high-frequency information in the demosaicing process.DPG-Net specifically focuses on improving the sharpness of the preliminary PPI to improve image resolution by learning the differences between the actual PPI and Mihoubi's blurred version [19], which ultimately leads to the production of the final refined PPI.DDM-Net uses bilinear interpolation to estimate missing pixel values in fragmented bands, followed by a neural network architecture that extracts color and texture features to improve image quality.By combining convolutional layers and loss functions, DDM-Net aims to minimize reconstruction errors and produce high-quality demosaiced images.
Zhao et al. [26] proposed a neural network model with two branches of adaptive features (DDMF) and edge infusion (PPIG).The proposed architecture combines weighted bilinear interpolation [21] to generate initial demonstration images with adaptive adjustments of pixel values in reconstructed multispectral images.It uses a DDMF module to generate convolution kernel weights that adapt to spatial and spectral changes, thus improving the accuracy of the demosaicing process.In addition, the PPIG edge infusion sub-branch integrates edge information to improve demosaicing accuracy in terms of spatial precision and spectral fidelity.
Most of the methods proposed to improve the spatial resolution of a multispectral image are based on complex steps during the demosaicing process.Our paper proposes a new approach based on a multi-shot method that happens before the demosaicing process.

The MSFA Moxel
The MSFA moxel is a grid of optical filters placed in front of the sensor of a multispectral camera to filter the incoming light into different spectral bands.Each pixel in the captured image is associated with a specific filter in the MSFA moxel, allowing light intensity to be measured in different parts of the electromagnetic spectrum.The MSFA allows the simultaneous acquisition of multispectral information during image acquisition by distributing the pixels on the image sensor according to their spectral sensitivity.The choice of MSFA size and the number of bands is essential for the acquisition and reconstruction of multispectral images.The MSFAs commonly used in the literature generally have the following two main characteristics: • Redundancy [27]: a band can have a probability of appearance greater than one, 1 n , where n represents the linear size of the MSFA moxel; • Non-redundancy [21]: each band has a probability of appearance of 1 n .In the case of bands with redundancy, the following two types of behavior can be observed:

•
Dominant bands: the probability of the appearance of certain bands in the MSFA moxel is higher than others; • Nondominant bands: all bands in the MSFA moxel have the same probability of appearance.
These characteristics of the MSFA moxel directly affect the quality and resolution of the multispectral images obtained after the acquisition and reconstruction process.The selection of the appropriate MSFA moxel depends on the specific application requirements, such as the desired spectral resolution, sensitivity to different wavelengths, and camera hardware constraints.
Our camera uses a 4 × 4 filter with equal probability of band appearance to acquire mosaic images, where each band is sampled by two pixels.This moxel was chosen to balance the spatial distribution of pixels in sparse images [28].This design is based on the color shade approach [12], which optimizes the spectral response of the filters and improves the quality of images acquired during a shot.Figure 1a illustrates the spectral band arrangement of our MSFA moxel.This moxel is used throughout our study to construct mosaic images and in demosaicing multispectral images.

Dataset
In our simulation, we project 31 image bands from the TokyoTech database (TT-31) [11] into 8 bands corresponding to our MSFA.This projection is performed on the response of the MSFA filters of our camera.The use of this projection is important because it allows us to work with accurate data that reflect the conditions we encounter in the real world when making acquisitions with our camera.This allows us to reduce the dimensionality of the images while preserving the most relevant spectral information.
Here are the steps in the projection process:  Determination of the desired number of bands for the resulting multispectral image, in our study, eight bands. Definition of Gaussian filter full width at half maximum (FWHM) in nanometers; in our study, this width is 30 nm.


Calculating the standard deviation of the Gaussian filter corresponding to the defined FWHM is necessary because the shape of the Gaussian is determined by its standard deviation. Calculation of the central wavelength of each Gaussian filter.we use a distance of 3 times the standard deviation of the start wavelength.Then, we move at a calculated interval between filters and end at a distance of 3 times the standard deviation of the end wavelength.Subsequently, we round the values to the nearest integer and sample at the desired spectral interval. Creation of Gaussian filters using a Gaussian function.Each filter is calculated based on the similarity between the spectral wavelength and the central wavelength of the filter.The greater the similarity, the higher the filter weight.Filters are normalized to ensure that their sum equals 1.


The recovery of original image data from 31 bands is followed by filtering using the created Gaussian filters. Multiplication of Gaussian filters to the weighted data to perform the 8-band multispectral transformation, selecting the appropriate spectral bands.
In this approach, it is assumed that there is no change in the inclination of the illuminance.

Mosaicking Process to Obtain Sparse Images
A mosaic image captured by our camera produces 8 sparse images after grouping pixels with similar spectral properties.Since we will be working with fully defined images, we use our MSFA moxel to generate mosaics from them. Figure 2 illustrates the Figure 1b shows the filters' spectral response in our MSFA model.Spectral response refers to how well the sensor detects and measures light in different spectral bands.This spectral response is given in the visible spectral interval [400 nm, 790 nm].

Dataset
In our simulation, we project 31 image bands from the TokyoTech database (TT-31) [11] into 8 bands corresponding to our MSFA.This projection is performed on the response of the MSFA filters of our camera.The use of this projection is important because it allows us to work with accurate data that reflect the conditions we encounter in the real world when making acquisitions with our camera.This allows us to reduce the dimensionality of the images while preserving the most relevant spectral information.Here are the steps in the projection process:

•
Determination of the desired number of bands for the resulting multispectral image, in our study, eight bands.

•
Definition of Gaussian filter full width at half maximum (FWHM) in nanometers; in our study, this width is 30 nm.

•
Calculating the standard deviation of the Gaussian filter corresponding to the defined FWHM is necessary because the shape of the Gaussian is determined by its standard deviation.

•
Calculation of the central wavelength of each Gaussian filter.we use a distance of 3 times the standard deviation of the start wavelength.Then, we move at a calculated interval between filters and end at a distance of 3 times the standard deviation of the end wavelength.Subsequently, we round the values to the nearest integer and sample at the desired spectral interval.

•
Creation of Gaussian filters using a Gaussian function.Each filter is calculated based on the similarity between the spectral wavelength and the central wavelength of the filter.The greater the similarity, the higher the filter weight.Filters are normalized to ensure that their sum equals 1.
• The recovery of original image data from 31 bands is followed by filtering using the created Gaussian filters.

•
Multiplication of Gaussian filters to the weighted data to perform the 8-band multispectral transformation, selecting the appropriate spectral bands.
In this approach, it is assumed that there is no change in the inclination of the illuminance.

Mosaicking Process to Obtain Sparse Images
A mosaic image captured by our camera produces 8 sparse images after grouping pixels with similar spectral properties.Since we will be working with fully defined images, we use our MSFA moxel to generate mosaics from them. Figure 2 illustrates the mosaicking process with our MSFA moxel and the grouping of pixels with similar spectral properties into sparse images.Figure 3 shows the spatial distribution of pixels in the sparse images of spectral band B1.The gray areas represent the available pixels, while the white areas represent the null pixels.Our approach is to reduce the number of null pixels in these sparse images.We expect that reducing the number of null pixels will reduce reconstruction errors during the demosaicing process.

Conceptualization of the Method
Let us define   ,  ,  ,  ,  ,  ,  ,  , the original spectral bands that have the fully defined information (pixels) of the eight bands obtained after projection.Let us define  as the mosaic obtained after the first snapshot without sensor displacement.Synthetically, it is obtained using our MSFA moxel on  bands.
Let us define  as the mosaic obtained with camera displacement of kj pixels, where kj ∈ {1, …, 3}, along the Dj axes, which can be either horizontal (H) or vertical (V).Synthetically, these mosaics are obtained by shifting the bands  of kj pixels along the Dj axes.This produces bands  , which are mosaicked with our MSFA moxel.
Figure 4 illustrates the different mosaics obtained with a one-pixel camera displacement on the vertical axis (k1 = 1 and D1 = V) and a one-pixel camera displacement on the horizontal axis (k2 = 1 and D2 = H).Figure 3 shows the spatial distribution of pixels in the sparse images of spectral band B1.The gray areas represent the available pixels, while the white areas represent the null pixels.
mosaicking process with our MSFA moxel and the grouping of pixels with similar properties into sparse images.Figure 3 shows the spatial distribution of pixels in the sparse images of spectr B1.The gray areas represent the available pixels, while the white areas represent pixels.Our approach is to reduce the number of null pixels in these sparse images.W that reducing the number of null pixels will reduce reconstruction errors dur demosaicing process.

Conceptualization of the Method
Let us define   ,  ,  ,  ,  ,  ,  ,  , the original spectral bands th the fully defined information (pixels) of the eight bands obtained after projection.Let us define  as the mosaic obtained after the first snapshot withou displacement.Synthetically, it is obtained using our MSFA moxel on  bands.
Let us define  as the mosaic obtained with camera displacement of k where kj ∈ {1, …, 3}, along the Dj axes, which can be either horizontal (H) or vert Synthetically, these mosaics are obtained by shifting the bands  of kj pixels alon axes.This produces bands  , which are mosaicked with our MSFA moxel.
Figure 4 illustrates the different mosaics obtained with a one-pixel displacement on the vertical axis (k1 = 1 and D1 = V) and a one-pixel camera displa on the horizontal axis (k2 = 1 and D2 = H).Our approach is to reduce the number of null pixels in these sparse images.We expect that reducing the number of null pixels will reduce reconstruction errors during the demosaicing process.

Conceptualization of the Method
Let us define B = {B 1 , B 2 , B 3 , B 4 , B 5 , B 6 , B 7 , B 8 }, the original spectral bands that have the fully defined information (pixels) of the eight bands obtained after projection.
Let us define I MSFA as the mosaic obtained after the first snapshot without sensor displacement.Synthetically, it is obtained using our MSFA moxel on B i bands.
Let us define I MSFA k j D j as the mosaic obtained with camera displacement of k j pixels, where k j ∈ {1, . .., 3}, along the D j axes, which can be either horizontal (H) or vertical (V).Synthetically, these mosaics are obtained by shifting the bands B i of k j pixels along the D j axes.This produces bands B ′ ij , which are mosaicked with our MSFA moxel.Figure 4 illustrates the different mosaics obtained with a one-pixel camera displacement on the vertical axis (k 1 = 1 and D 1 = V) and a one-pixel camera displacement on the horizontal axis (k 2 = 1 and D 2 = H).These mosaic matrices have the following shapes for kj displacement:

𝐼
The process of grouping pixels with similar spectral properties involves separating a mosaic image into different spectral bands using a binary mask formulated as follows: For each mosaic, we obtain a set of sparse images,  , by applying the following formula on them: For any camera displacement, we obtain the mosaics  with  ∈ 1, … , 8 ,  ∈ 1, … , 3 , and   ,  , where i represents the index of a band of the MSFA moxel and kj represents the displacement scalars along the horizontal (H) and vertical (V) axes.
Figure 5 shows the density of pixels that are spectrally similar in band B1 of the mosaics  ,  ,   .The gray areas represent the available pixels in the sparse image  of the band B1; the yellow areas represent those available in the sparse image  of the band  due to the camera's displacement on the vertical axis of k1 pixels; and the blue areas represent the pixels available in the sparse image  of the band  due to the camera's displacement on the vertical axis of k2 pixels.These mosaic matrices have the following shapes for k j displacement: The process of grouping pixels with similar spectral properties involves separating a mosaic image into different spectral bands using a binary mask formulated as follows: For each mosaic, we obtain a set of sparse images, , by applying the following formula on them: For any camera displacement, we obtain the mosaics I i k j D j with i ∈ {1, . . . ,8}, k j ∈ {1, . . . ,3}, and D j ϵ {H, V}, where i represents the index of a band of the MSFA moxel and k j represents the displacement scalars along the horizontal (H) and vertical (V) axes.
Figure 5 shows the density of pixels that are spectrally similar in band B1 of the mosaics I MSFA , I MSFA  The positions of the non-null pixels vary in each sparse image, and these pixels have the same spectral properties.Therefore, the sparse images can be combined (composition method), i.e., added together, to increase the number of non-null pixels and reduce the number of null pixels.The pixels are redistributed according to the camera displacement combinations.

Sparse Image Composition
The sparse image composition method is performed in 3 steps, as shown in Figure 6.The first step is to take an initial snapshot of a scene.This snapshot provides a mosaic image  , which is decomposed into sparse images  using Formula (2).Then we set the number N of compositions we want to make by specifying the displacement scalars kj, and the axes Dj.Finally, we obtain composite sparse images  , which contain more available pixels.The symbol "?" in the composite sparse images  , indicates the areas where new pixels can appear depending on the displacement combination.This composition method reduces the distance between two non-null pixels and is limited to three compositions.Beyond three compositions, implementing such a method can be very time consuming.The separation into sparse image is performed on the mosaics  and  with Formula (2), resulting in sparse images  and  ; The positions of the non-null pixels vary in each sparse image, and these pixels have the same spectral properties.Therefore, the sparse images can be combined (composition method), i.e., added together, to increase the number of non-null pixels and reduce the number of null pixels.The pixels are redistributed according to the camera displacement combinations.

Sparse Image Composition
The sparse image composition method is performed in 3 steps, as shown in Figure 6.The first step is to take an initial snapshot of a scene.This snapshot provides a mosaic image I MSFA , which is decomposed into sparse images ∼ I i using Formula (2).Then we set the number N of compositions we want to make by specifying the displacement scalars k j , and the axes D j .Finally, we obtain composite sparse images The positions of the non-null pixels vary in each sparse image, and these pixels have the same spectral properties.Therefore, the sparse images can be combined (composition method), i.e., added together, to increase the number of non-null pixels and reduce the number of null pixels.The pixels are redistributed according to the camera displacement combinations.

Sparse Image Composition
The sparse image composition method is performed in 3 steps, as shown in Figure 6.The first step is to take an initial snapshot of a scene.This snapshot provides a mosaic image  , which is decomposed into sparse images  using Formula (2).Then we set the number N of compositions we want to make by specifying the displacement scalars kj, and the axes Dj.Finally, we obtain composite sparse images  , which contain more available pixels.The symbol "?" in the composite sparse images  , indicates the areas where new pixels can appear depending on the displacement combination.This composition method reduces the distance between two non-null pixels and is limited to three compositions.Beyond three compositions, implementing such a method can be very time consuming.The separation into sparse image is performed on the mosaics  and  with Formula (2), resulting in sparse images  and  ;


The addition of the two sparse images is performed, such that    .
Figure 7 shows a composition of bands from the  and  mosaics.The eight sparse images have globally the same pixel distributions, which vary according to the parameters kj and Dj.Thus, for a given camera displacement, the pixel distribution in the composite sparse image  is the same, which justifies that we comment only on spectral band B1 of each composition.

Cases of Sparse Images Greater Than Two
For more than two bands, we obtain more than 30 possible compositions for the different values of the displacement scalars kj on the axes Dj.The following algorithm shows how the composition of N bands is achieved where 2  4: 1.The camera takes a first snapshot from which a mosaic  is obtained.


The addition of the two sparse images is performed, such that    .
Figure 7 shows a composition of bands from the  and  mosaics.The eight sparse images have globally the same pixel distributions, which vary according to the parameters kj and Dj.Thus, for a given camera displacement, the pixel distribution in the composite sparse image  is the same, which justifies that we comment only on spectral band B1 of each composition.

Cases of Sparse Images Greater Than Two
For more than two bands, we obtain more than 30 possible compositions for the different values of the displacement scalars kj on the axes Dj.The following algorithm shows how the composition of N bands is achieved where 2  4: 1.The camera takes a first snapshot from which a mosaic  is obtained.

Cases of Sparse Images Greater Than Two
For more than two bands, we obtain more than 30 possible compositions for the different values of the displacement scalars k j on the axes D j .The following algorithm shows how the composition of N bands is achieved where 2 < N ≤ 4: 1.
The camera takes a first snapshot from which a mosaic I MSFA is obtained.

2.
The separation into sparse images is performed on the mosaics I MSFA using Formula (2), resulting in the sparse images As long as j ≤ N:

The initialization step sets the values of
• The camera moves along the D j axis by k j pixels from its position (0, 0) and takes a snapshot, and a new mosaic is obtained.
• The new mosaic I i k j D j is decomposed using Formula (2), resulting in sparse images • The above sparse image is added to the previous sparse image , and the value of j is incremented.

5.
In the end, we get composite sparse images ∼ I i c . Figure 9 illustrates the spatial distribution of pixels of certain three-and four-band compositions.The blue area represents the H-axis displacement and the yellow area represents the V-axis displacement.2. The separation into sparse images is performed on the mosaics  using Formula (2), resulting in the sparse images  .3. The initialization step sets the values of  to  and j to 1.

As long as j ≤ N:
 The camera moves along the Dj axis by kj pixels from its position (0, 0) and takes a snapshot, and a new mosaic  is obtained.


The new mosaic  is decomposed using Formula (2), resulting in sparse images  . The above sparse image is added to the previous sparse image    , and the value of j is incremented.
5. In the end, we get composite sparse images  .The composition method redistributes pixels to provide more information and reduce the number of pixels to interpolate.It is important to note that with our MSFA moxel it is not possible to achieve a three-band composition with a displacement of two pixels on both the horizontal and vertical axes (k1 = k2 = 2 on the H and V axes).This would cause a problem with overlapping pixels at certain positions of the composite sparse image.

Bilinear Interpolation
To generate a fully defined image, we use the bilinear interpolation on the sparse images to deduce the null pixels according to the following Algorithm 1: The composition method redistributes pixels to provide more information and reduce the number of pixels to interpolate.It is important to note that with our MSFA moxel it is not possible to achieve a three-band composition with a displacement of two pixels on condition is reached.The final composite sparse images are demosaiced using a bilinear method to obtain the fully reconstructed images.

Experiments and Results
Our experiments aim to demonstrate how reducing the number of null pixels in sparse images can improve the quality of the spatial resolution obtained after interpolation.To achieve this, we will compare the fully reconstructed images of sparse images and composite sparse images.We will use the TokyoTech datasets TT-31 projected on the response of our MSFA filter, on which we will perform qualitative and quantitative analyses to determine the impact of this approach.

Metric
To evaluate our results, we use four quantitative metrics, namely:  PSNR (Peak Signal to Noise Ratio) [29]: PSNR is a widely used metric to assess the quality of a reconstructed or compressed image compared to the original image.This metric measures the ratio between the maximum power of the signal (which is called the peak signal) and the power of the noise that degrades the quality of the image representation (also known as the corrupting noise).Higher PSNR values indicate better image quality because they represent a higher ratio of signal power to noise power.
where n is the number of spectral bands in the MSFA moxel. SAM (Spectral Angle Metric) [30] calculates the angle between two spectral vectors in a high-dimensional space.Each spectral vector represents the spectral reflectance or irradiance of a pixel over several spectral bands.
Figure 10.General architecture of our projection, composition, and interpolation method.

Experiments and Results
Our experiments aim to demonstrate how reducing the number of null pixels in sparse images can improve the quality of the spatial resolution obtained after interpolation.To achieve this, we will compare the fully reconstructed images of sparse images and composite sparse images.We will use the TokyoTech datasets TT-31 projected on the response of our MSFA filter, on which we will perform qualitative and quantitative analyses to determine the impact of this approach.

Metric
To evaluate our results, we use four quantitative metrics, namely: • PSNR (Peak Signal to Noise Ratio) [29]: PSNR is a widely used metric to assess the quality of a reconstructed or compressed image compared to the original image.This metric measures the ratio between the maximum power of the signal (which is called the peak signal) and the power of the noise that degrades the quality of the image representation (also known as the corrupting noise).Higher PSNR values indicate better image quality because they represent a higher ratio of signal power to noise power.
where n is the number of spectral bands in the MSFA moxel.• SAM (Spectral Angle Metric) [30] calculates the angle between two spectral vectors in a high-dimensional space.Each spectral vector represents the spectral reflectance or irradiance of a pixel over several spectral bands.
The smaller the angle between two spectral vectors, the more similar the spectra are considered to be.• SSIM (Structural Similarity Index Measure) [31]: SSIM is a method used to measure the similarity between two images.This technique compares the structural information, luminance, and contrast of the two images, taking into account the characteristics of the human visual system.Compared to simpler metrics such as Mean Square Error (MSE) or PSNR, SSIM provides a more comprehensive assessment of image similarity by considering perceptual factors.The SSIM value ranges from 0 to 1, where • 1 indicates perfect similarity between images.• 0 indicates no similarity between images.
We use the structural_similarity function of the python skimage.metricsmodule to compute this metric.• RMSE (Root Mean Square Error) [32]: RMSE is a commonly used metric to evaluate the accuracy of predictions by measuring the average size of the errors between the predicted and actual values in a given set of predictions.The metric is expressed in the same unit as the target value.For example, if the target value is to predict a certain value, and we obtain an RMSE of 10, this indicates that the predicted value varies on average by ±10 from the actual value.The formula for calculating the RMSE is as follows:

Quantitative Evaluation
We have conducted tests on 20 images from the TokyoTech database, and the quantitative results of the PSNR, SAM, SSIM, and RMSE metrics are presented in Tables 1-4.The header "Snapshots" indicates the number of snapshots taken by the camera.For example, taking p snapshots, where p ∈ {2, . .., n}, means taking one snapshot without displacement of the camera and taking p − 1 other snapshots by displacements of the camera along the specified axes.The "Displacements" header of the tables specifies the different configurations of sparse image compositions, identified by the letter compositions, where • 'a' corresponds to a snapshot without displacement • 'b' corresponds to a snapshot after a displacement of 1 pixel on the H-axis • 'c' corresponds to a snapshot after a displacement of 1 pixel on the V-axis • 'd' corresponds to a snapshot after a displacement of 2 pixels on the H-axis • 'e' corresponds to a snapshot after a displacement of 2 pixels on the V-axis • 'f ' corresponds to a snapshot after a displacement of 3 pixels on the H-axis • 'g' corresponds to a snapshot after a displacement of 3 pixels on the V-axis The 'abc' displacements represent three different snapshots taken by a camera.The first snapshot is taken without any displacement, the second snapshot is taken after a horizontal displacement of one pixel, and the third snapshot is taken after a vertical displacement of one pixel.The values in each cell of the table represent the average of the eight spectral bands.Using this quantitative evaluation method, we can compare the values obtained from composite sparse images with the values obtained from sparse images without any composition.Note that the tables do not cover all possible combinations of camera displacements but only a selection.The results indicate that images reconstructed from composed bands are of higher quality than those reconstructed without band composition.However, the quality of the reconstructed image depends not only on the specific composition used but also on the individual image.11 and 12), and textured regions (Figure 13).We will compare the quality of images reconstructed from sparse and composite sparse images by selecting and zooming in on a 60 × 60 pixel area from "Butterfly8", a 155 × 165 pixel area from "Butterfly", and a 130 × 104 pixel area from "Party".The selected areas are indicated by red boxes in the original images.The quality of the reconstructions improves from two snapshots to four.We also note a significant correlation between the spatial distribution of pixels in the sparse image and the reconstruction results.show the reconstructions of fine details in images containing abrupt transitions, non-homogeneous areas (Figures 11 and 12), and textured regions (Figure 13).We will compare the quality of images reconstructed from sparse and composite sparse images by selecting and zooming in on a 60 × 60 pixel area from "Butterfly8", a 155 × 165 pixel area from "Butterfly", and a 130 × 104 pixel area from "Party".The selected areas are indicated by red boxes in the original images.The quality of the reconstructions improves from two snapshots to four.We also note a significant correlation between the spatial distribution of pixels in the sparse image and the reconstruction results.

Discussion
The study's results show a direct correlation between the number of compositions and the spatial resolution of the reconstructed image, especially when reconstructing abrupt transitions, non-homogeneous areas, and textured regions.The more compositions performed, the better the reduction of the distance between the non-null pixels of the sparse images, leading to a better spatial resolution after demosaicing.For each level of composition, there are differences in the qualitative and quantitative results depending on the values of the displacement scalars.
Several observations can be made about compositions involving two bands where only one camera displacement is required.For certain images in Figures 11 and 12, there is a preference for horizontal shifts, while for others in Figure 13, there is a preference for vertical shifts.Depending on the type of image, there is a clear improvement in abrupt transitions, non-homogeneous areas, and textured regions.Two-pixel displacements significantly improve local structures such as edges, textures, and patterns compared to the image obtained without band compositing.This improvement is manifested in higher SSIM values (Table 3), lower spectral similarity angle according to SAM (Table 2), and lower reconstruction error according to RMSE (Table 4).However, reconstruction with less noise is observed with 1-pixel or 3-pixel displacements, as shown by PSNR (Table 1).The study highlights a significant correlation between the spatial distribution of the pixels in the sparse images and the quantitative and qualitative results after reconstruction.Indeed, the displacement of 2 pixels better reduces the distance between two non-null pixels of the sparse images, leading to less overlapping in abrupt transitions and improved visual restitution, as shown by the displacement (ad, ae) in Figures 11-13.In conclusion,

Discussion
The study's results show a direct correlation between the number of compositions and the spatial resolution of the reconstructed image, especially when reconstructing abrupt transitions, non-homogeneous areas, and textured regions.The more compositions performed, the better the reduction of the distance between the non-null pixels of the sparse images, leading to a better spatial resolution after demosaicing.For each level of composition, there are differences in the qualitative and quantitative results depending on the values of the displacement scalars.
Several observations can be made about compositions involving two bands where only one camera displacement is required.For certain images in Figures 11 and 12, there is a preference for horizontal shifts, while for others in Figure 13, there is a preference for vertical shifts.Depending on the type of image, there is a clear improvement in abrupt transitions, non-homogeneous areas, and textured regions.Two-pixel displacements significantly improve local structures such as edges, textures, and patterns compared to the image obtained without band compositing.This improvement is manifested in higher SSIM values (Table 3), lower spectral similarity angle according to SAM (Table 2), and lower reconstruction error according to RMSE (Table 4).However, reconstruction with less noise is observed with 1-pixel or 3-pixel displacements, as shown by PSNR (Table 1).The study highlights a significant correlation between the spatial distribution of the pixels in the sparse images and the quantitative and qualitative results after reconstruction.Indeed, the displacement of 2 pixels better reduces the distance between two non-null pixels of the sparse images, leading to less overlapping in abrupt transitions and improved visual restitution, as shown by the displacement (ad, ae) in Figures 11-13.In conclusion, vertical shifts, especially those of 2 pixels, offer a good compromise between improving local structures and reducing noise in the reconstructed images.The study highlights the importance of considering the spatial distribution of pixels when planning camera shifts for optimal reconstruction.
In compositions with three bands and two camera displacements, there are 14 possible combinations of displacements on the horizontal and vertical axes.According to PSNR, 1pixel or 3-pixel displacements on both axes result in a less noisy reconstruction.SSIM shows that the structural reconstruction is almost equivalent in most cases.Moving along the same axis results in higher spectral similarity and fewer reconstruction errors, as indicated by SAM and RMSE.Visual results show increased sharpness for displacements on the same axis, but decreased sharpness for displacements of 1 pixel on both axes and 3 pixels on both axes.In conclusion, displacements on the same axis provide an optimal compromise between the structural and spectral quality of the reconstruction.At the same time, other configurations offer specific advantages and disadvantages in terms of noise reduction and visual sharpness.
The visual results obtained are very close to the reference image for four-band compositions with three camera displacements, with 10 possible combinations.This suggests a satisfactory ability to reconstruct images with a high level of visual fidelity, although the metrics show less good results than in the case of the three-band composition.However, the implementation of this type of shift is not directly feasible in a real-time acquisition system due to the increased complexity of the camera shift.Therefore, the use of this type of composition is not necessary in real-time acquisition systems.Nevertheless, the displacements of this type of composition on the same axis show excellent visual results.This observation suggests that a limited camera displacement for this type of composition may be sufficient to significantly improve the quality of reconstructed images without requiring the excessive complexity of a bi-axial composition.In conclusion, four-band compositions can produce satisfactory visual results, but their practical implementation in a real-time acquisition system is limited due to their displacement complexity.However, simpler strategies, such as moving along the same axis, can provide significant improvements while reducing the difficulty of operational feasibility.
In practice, the implementation of our method is possible, in particular, by using a tri-CCD system to capture and restore a motion scene of objects [33].This acquisition system has a beam splitter to split the light into two other axes.The prism redirects light to three sensors that capture a mosaic of the same scene with different observations, providing three mosaics of the same scene with different spatial information distributions.For static objects, a micron-precision camera translation system would be required to capture and restore the fully defined image.Figure 14 illustrates the operation of a tri-CCD system where each sensor is equipped with an MSFA.
The first MSFA filter is mounted on top of sensor 1 to obtain a mosaic with no information shift.The second MSFA filter is mounted on top of sensor 2 to obtain a mosaic with information shifted by 1 pixel on the horizontal axis.Finally, a third MSFA filter is mounted on top of sensor 3 to obtain a mosaic with information shifted by 1 pixel on the vertical axis.
system has a beam splitter to split the light into two other axes.The prism redirects ligh to three sensors that capture a mosaic of the same scene with different observations providing three mosaics of the same scene with different spatial information distributions For static objects, a micron-precision camera translation system would be required t capture and restore the fully defined image.Figure 14 illustrates the operation of a tri CCD system where each sensor is equipped with an MSFA.The first MSFA filter is mounted on top of sensor 1 to obtain a mosaic with n information shift.The second MSFA filter is mounted on top of sensor 2 to obtain a mosai with information shifted by 1 pixel on the horizontal axis.Finally, a third MSFA filter i mounted on top of sensor 3 to obtain a mosaic with information shifted by 1 pixel on th vertical axis.

Conclusions
This paper presents a first prototype simulation approach to improve the spatia resolution of multispectral images acquired by our MSFA single-shot camera, wit particular emphasis on reproducing fine details, abrupt transitions, and textured regions Our approach proposes a method of camera displacement along horizontal and/or vertica axes to capture multiple snapshots, thus generating different mosaics for the sam observed scene.We then proceed to assemble the spectrally similar pixels of these mosaic

Conclusions
This paper presents a first prototype simulation approach to improve the spatial resolution of multispectral images acquired by our MSFA single-shot camera, with particular emphasis on reproducing fine details, abrupt transitions, and textured regions.Our approach proposes a method of camera displacement along horizontal and/or vertical axes to capture multiple snapshots, thus generating different mosaics for the same observed scene.We then proceed to assemble the spectrally similar pixels of these mosaics to increase the number of non-null pixels in the sparse images.The results of our experiments carried out on TT-31 images projected on the response of our MSFA filter show a qualitative and quantitative improvement in the reconstruction based on composite sparse images, with better results validated by the PSNR, SAM, SSIM, and RMSE metrics.The next step will be implementing our multi-shot prototype on our camera by installing a micron-precision camera motion device.This will allow us to perform experiments on real images and propose a new demosaicing method based on composite sparse bands.

FigureFigure 1 .
Figure1bshows the filters' spectral response in our MSFA model.Spectral response refers to how well the sensor detects and measures light in different spectral bands.This spectral response is given in the visible spectral interval [400 nm, 790 nm].

J
. Imaging 2024, 10, 140 6 of 21 mosaicking process with our MSFA moxel and the grouping of pixels with similar spectral properties into sparse images.

Figure 2 .
Figure 2. Process of mosaicking and grouping pixels with similar spectral properties.

Figure 3 .
Figure 3. Spatial distribution of pixels in the sparse image of spectral band B1.

Figure 2 .
Figure 2. Process of mosaicking and grouping pixels with similar spectral properties.

Figure 2 .
Figure 2. Process of mosaicking and grouping pixels with similar spectral properties.

Figure 3 .
Figure 3. Spatial distribution of pixels in the sparse image of spectral band B1.

Figure 3 .
Figure 3. Spatial distribution of pixels in the sparse image of spectral band B1.

Figure 4 .
Figure 4. Mosaics obtained before and after camera displacement.(a)  mosaic obtained with main snapshot.(b)  mosaic obtained with dispacement of camera on the vertical axis of 1 pixel.(c)  mosaic obtained with displacement of camera on the horizontal axis of 1 pixel.

Figure 4 .
Figure 4. Mosaics obtained before and after camera displacement.(a) I MSFA mosaic obtained with main snapshot.(b) I MSFA 1V mosaic obtained with dispacement of camera on the vertical axis of 1 pixel.(c) I MSFA 1H mosaic obtained with displacement of camera on the horizontal axis of 1 pixel.

k 1 H 1 k 1 V 1 k
, and I MSFA k 2 V .The gray areas represent the available pixels in the sparse image ∼ I 1 of the band B 1 ; the yellow areas represent those available in the sparse image ∼ I of the band B ′ 11 due to the camera's displacement on the vertical axis of k 1 pixels; and the blue areas represent the pixels available in the sparse image ∼ I 2 H of the band B ′ 12 due to the camera's displacement on the vertical axis of k 2 pixels.

Figure 5 .
Figure 5. Spatial distribution of pixels in the sparse image of the spectral band B1.(a)  is the sparse image of spectral band B1.(b)  is the sparse image of spectral band B11 with the camera displacement on the vertical axis of k1 pixels.(c) is the sparse image of spectral band B12 with the camera displacement on the horizontal axis of k2 pixels.

Figure 6 .
Figure 6.Architecture of our composition method.

3.5. 1 .
Case of the Composition of Two Sparse ImagesFor two bands, we obtain six possible compositions for the different values of the displacement scalar on the two axes H and V.The following algorithm shows how the composition of two bands is achieved: The camera takes a first snapshot from which we obtain a mosaic  ; The camera moves k pixel(s) on the D axis and takes a second snapshot, from which a second mosaic  is obtained; 

Figure 5 . 1 k 1 V 1 k 2 H
Figure 5. Spatial distribution of pixels in the sparse image of the spectral band B 1 .(a)

Figure 5 .
Figure 5. Spatial distribution of pixels in the sparse image of the spectral band B1.(a)  is the sparse image of spectral band B1.(b)  is the sparse image of spectral band B11 with the camera displacement on the vertical axis of k1 pixels.(c) is the sparse image of spectral band B12 with the camera displacement on the horizontal axis of k2 pixels.

Figure 6 .
Figure 6.Architecture of our composition method.

3.5. 1 .
Case of the Composition of Two Sparse ImagesFor two bands, we obtain six possible compositions for the different values of the displacement scalar on the two axes H and V.The following algorithm shows how the composition of two bands is achieved: The camera takes a first snapshot from which we obtain a mosaic  ; The camera moves k pixel(s) on the D axis and takes a second snapshot, from which a second mosaic  is obtained; 

Figure 6 .
Figure 6.Architecture of our composition method.

3.5. 1 .Formula•
Case of the Composition of Two Sparse Images For two bands, we obtain six possible compositions for the different values of the displacement scalar on the two axes H and V.The following algorithm shows how the composition of two bands is achieved: • The camera takes a first snapshot from which we obtain a mosaic I MSFA ; • The camera moves k pixel(s) on the D axis and takes a second snapshot, from which a second mosaic I MSFA kD is obtained; • The separation into sparse image is performed on the mosaics I MSFA and I MSFA kD with The addition of the two sparse images is performed, such that ∼ a composition of bands from the I MSFA and I MSFA 1H mosaics.The eight sparse images have globally the same pixel distributions, which vary according to the parameters k j and D j .Thus, for a given camera displacement, the pixel distribution in the composite sparse image ∼ I i c is the same, which justifies that we comment only on spectral band B1 of each composition.J. Imaging 2024, 10, 9 of 21

Figure 7 .
Figure 7. Composition scheme of the two sparse images of the  and  mosaics.

Figure 8 Figure 8 .
Figure 8 shows the six possible compositions of band B1 with different values of the displacement scalar kj on the V and H axes.The new composition allows for more pixels and better redistribution to minimize the non-null pixel distance in the composite sparse images  .

Figure 7 .
Figure 7. Composition scheme of the two sparse images of the I MSFA and I MSFA 1H mosaics.

Figure 8
Figure 8 shows the six possible compositions of band B1 with different values of the displacement scalar k j on the V and H axes.The new composition allows for more pixels and better redistribution to minimize the non-null pixel distance in the composite sparse images

Figure 7 .
Figure 7. Composition scheme of the two sparse images of the  and  mosaics.

Figure 8 Figure 8 .
Figure 8 shows the six possible compositions of band B1 with different values of the displacement scalar kj on the V and H axes.The new composition allows for more pixels and better redistribution to minimize the non-null pixel distance in the composite sparse images  .

Figure 8 . 1 c
Figure 8. Pixel distribution in the composite sparse image

Figure 9 Figure 9 .
Figure 9 illustrates the spatial distribution of pixels of certain three-and four-band compositions.The blue area represents the H-axis displacement and the yellow area represents the V-axis displacement.

Figure 9 . 1 c
Figure 9. Pixel distribution in the composite sparse image

Figure 10 .
Figure 10.General architecture of our projection, composition, and interpolation method.

Figure 11 .
Figure 11.Qualitative assessment of the Butterfly8 image according to different compositions.Figure 11.Qualitative assessment of the Butterfly8 image according to different compositions.

Figure 11 .Figure 12 .Figure 12 .
Figure 11.Qualitative assessment of the Butterfly8 image according to different compositions.Figure 11.Qualitative assessment of the Butterfly8 image according to different compositions.

Figure 12 .Figure 13 .
Figure 12.Qualitative assessment of the Butterfly image according to different compositions.

Figure 13 .
Figure 13.Qualitative assessment of the Party image according to different compositions.

Figure 14 .
Figure 14.Tri-CCD system diffuses light to three sensors.

Figure 14 .
Figure 14.Tri-CCD system diffuses light to three sensors.

Table 1 .
PSNR comparison between single and multiple snapshots.

Table 2 .
SAM comparison between single and multiple snapshots.

Table 3 .
SSIM comparison between single and multiple snapshots.

Table 4 .
RMSE comparison between single and multiple snapshots.Figures 11-13 show the reconstructions of fine details in images containing abrupt transitions, non-homogeneous areas (Figures