Tomographic Reconstruction: General Approach to Fast Back-Projection Algorithms

: Addressing contemporary challenges in computed tomography (CT) demands precise and efﬁcient reconstruction. This necessitates the optimization of CT methods, particularly by improving the algorithmic efﬁciency of the most computationally demanding operators—forward projection and backprojection. Every measurement setup requires a unique pair of these operators. While fast algorithms for calculating forward projection operators are adaptable across various setups, they fall short in three-dimensional scanning scenarios. Hence, fast algorithms are imperative for backprojection, an integral aspect of all established reconstruction methods. This paper introduces a general method for the calculation of backprojection operators in any measurement setup. It introduces a versatile method for transposing summation-based algorithms, which rely exclusively on addition operations. The proposed approach allows for the transformation of algorithms designed for forward projection calculation into those suitable for backprojection, with the latter maintaining asymptotic algorithmic complexity. Employing this method, fast algorithms for both forward projection and backprojection have been developed for the 2D few-view parallel-beam CT as well as for the 3D cone-beam CT. The theoretically substantiated complexity values for the proposed algorithms align with their experimentally derived estimates.


Introduction 1.Problem Relevance
Computed tomography (CT) is a non-destructive X-ray technique used to probe the internal morphological structure of objects [1,2].It plays a pivotal role in diverse fields such as medicine [3][4][5], industry [6][7][8], safety protocols [9,10], and scientific exploration, particularly in scrutinizing the morphology of promising functional materials [11].Despite over 80 years of development in computed tomography algorithms [12,13], the pursuit of innovation and optimization remains ongoing.This is driven by various factors, including the imperative to reduce radiation dose in medical applications [14,15], augment sensitivity in screening setups [16], and refine spatial resolution in scientific [17], industrial [18], and medical [19,20] applications.Notably, CT has recently been harnessed for real-time imaging of dynamic processes [21,22].This breakthrough is made possible only by improving classical tomographic reconstruction methods.
Modern challenges addressed by CT impose strict constraints on both reconstruction time and accuracy.Increasing spatial resolution leads to an augmented dataset, demanding enhanced computational speed for reconstruction algorithms.This goal can be pursued through two routes: optimizing the reconstruction algorithms and/or amping up computational resources.Given the cost implications of bolstering computational power in tomographic systems, improving algorithmic efficiency stands out as the more attractive avenue.
The core of tomographic reconstruction algorithms lies in forward projection and backprojection.While both of them are linear operators, they depend on the CT setup.Various measurement setups, detailed in [23], are employed in tomograph designs.Each scheme corresponds to its specific pair of matched forward projection and backprojection operators.The scheme's parameters encompass the scanning trajectory, probing beam shape, and the dimensions and layout of the position-sensitive detector.The scanning trajectory refers to the path followed, depending on whether the object or the emitterdetector pair is stationary.The probing beam can be conical [24], fan-shaped [25], or parallel [26].Circular [27] and helical [28] trajectories are the most commonly used for the emitter-detector pair.Some specialized tomographs utilize more intricate scanning methods [29], and non-planar detectors are utilized [30].
Calculating the forward projection and backprojection operators is considered to be the most computationally demanding aspect of tomographic reconstruction techniques.Streamlining this process holds great promise for speeding up the entire reconstruction process.The acceleration of classical algorithms of CT reconstruction is an on-going problem with main efforts being directed towards effective GPU implementations [31,32], although new fast algorithms are being developed as well [33,34].There are fast algorithms for calculating the forward projection operator for various tomographic setups.For example, a fast approximate computation algorithm with asymptotic complexity of Θ N 3 √ N addition operations, where N is the linear size of the reconstruction, has been proposed for three-dimensional computer tomography [35].Noteworthy, this algorithm can be adapted to any scanning scheme.Regrettably, these findings, on their own, do not facilitate the development of fast tomographic reconstruction methods for cone-beam and other threedimensional scanning setups.This is due to the fact that while there are reconstruction algorithms that bypass the use of the forward projection operator (such as the Feldkamp algorithm [36]), there are no algorithms that do not employ the backprojection operator.To speed up reconstruction methods, fast algorithms for the forward projection operator, akin to those in [35], must be paired with equally fast algorithms for backprojection.These algorithms should be applicable across all scanning schemes.To address this challenge, this paper introduces a general approach to the construction of fast algorithms for calculating the transpose of operators.The proposed fast transposition algorithm efficiently handles a category of linear operators depicted as binary matrices, encompassing cases where application results rely solely on addition operations.

Related Works
Since the advent of CT, various approaches to speeding up tomographic reconstruction have emerged.They can be broadly categorized into three groups: those utilizing Fast Fourier Transform (FFT), methods employing hierarchical decomposition of the image to be recovered, and approaches applying hierarchical decomposition of linear integrals.Many researchers have proposed incorporating FFT in tomographic reconstruction algorithms.These algorithms can be fast due to projection and convolution operations in Fourier space, without completely offsetting this advantage during the transition to Fourier space.Achieving precise reconstruction through Fourier transform demands an infinite number of projections [37].In practice, however, the available amount of projections is finite, thus the accuracy of the result depends on the approximation schemes embedded in the algorithms.As noted by Natterer [38], the presence of artifacts in reconstructed images, stemming from the Fourier transform application, reduces their practical effectiveness.Natterer delves into an analysis of the origins of these distortions in his work [38].
Various approximation schemes are tailored to maximize the accuracy of the resulting reconstruction, depending on the specific instrument configuration.In the case of paral-lel beams, where the three-dimensional problem can be efficiently deconstructed into a series of independent two-dimensional ones, Mersereau and Oppenheim [37] advocated for reducing the approximation error by substituting the uniform polar coordinate grid in Fourier space with a grid of concentric squares.For cone beams, aiming to refine reconstruction precision, Dusaussoy [39] proposed employing a grid of concentric cubes.Averbuch et al. [40] later introduced the pseudo-polar Fourier transform for fast computation of both forward and inverse Radon transforms, highlighting Radon transform's potential in addressing parallel beam tomography challenges.In parallel schemes, an optimized Kaiser-Bessel interpolation was suggested for an algorithm utilizing non-uniform fast Fourier transform [41].Sullivan [42] proposed improving reconstruction precision through the application of gridding methods in algorithms tailored for scenarios involving both parallel and fan beams.Subsequently, Arcadu et al. [43] further refined this algorithm.
The widely-used FDK (Feldkamp, Davis and Kress) reconstruction algorithm [36] is the go-to fast method for cone beams.Its primary limitation lies in the need for a gradual change in morphological structure along the rotational axis [39].However, incorporating iterative strategies alongside the FDK algorithm [44] offers some flexibility in this regard.In conclusion, two key points emerge.Firstly, there is currently no universal approach to speeding up algorithms with FFT.Secondly, all known methods in this category are approximate, and their precision is a subject of scrutiny.
The following two approaches to fast reconstruction rely on hierarchical decomposition.In the first, the recoverable volume is decomposed and then the sub-volumes are aggregated.The second approach employs hierarchical decomposition of linear integrals [45,46] along with repeated use of common subsums.For example, in a parallel beam scheme [47], the square recoverable image is split into four images by vertical and horizontal lines through the center.Each of these images is reconstructed individually.This halves the linear size of the image and reduces the necessary projections by half compared to reconstructing a full-sized image, thereby reducing required operations.This approach is applied recursively.The computation speed depends on the stopping point in the hierarchical decomposition of the volume to be recovered.Reconstruction accuracy is crucially affected by the stopping point, the accuracy of interpolation used in the projection aggregation steps for reconstructing the small image, and the method used to merge the reconstructed small images.The hierarchical volume decomposition approach is not universally applicable.Different algorithms are needed and have been proposed for various measurement schemes.For a circular fan beam CT system, the algorithm is detailed in [48].For a circular cone beam CT system, refer to [49,50].Additionally, for a helical cone beam CT system, the algorithm is outlined in [51].
The hierarchical decomposition of linear integrals [45,46] presents a fundamentally different approach.It speeds up the computation of linear operators for both forward projection and backprojection, crucial elements in reconstruction algorithms.The key concept is based on the observation that in discrete space, lines share common subsequences of pixels, referred to as patterns, rather than intersecting at a single point.Clearly, reusing partial sums from these pattern intersections can significantly reduce the computational complexity for linear operators.To achieve fast reconstruction algorithms, it suffices to compute partial sums for a subset of possible patterns.The choice of patterns depends on the specific measurement scheme, emphasizing the need for universal transposing methods.In 1992, M. Brady and W. Yong introduced a method for fast computation of an approximate discrete Radon transform (Hough transform) on a plane, based on repeated use of common sub-sums [45].This method, credited to Brady and Yong, appears to have been independently developed by Walter Götz, who published it in his 1993 dissertation in German.Götz's work is referenced in an English-language article by Götz and Druckmüller published in 1995 [46].Although the Brady-Yong algorithm may not provide the most accurate approximation of straight lines by discrete patterns, it maintains a bounded approximation error.Brady and Yong's original work [45] provided an initial estimate for the deviation of position coordinates from the ideal straight line during summation.Subse-quently, refined hypotheses were proposed [52,53], leading to the precise upper bound of the approximation error [54].A recent algorithm, ASD2, has emerged for fast and accurate computation of the discrete Radon transform on a plane [55].This approach [55] shares similarities with the Brady-Yong method but incorporates the most accurate discretization of lines (from possible discretizations), resulting in improved accuracy compared to the Brady-Yong algorithm.In 1998, T.-K.Wu and M. Brady first extended the Brady-Yong algorithm to the three-dimensional case [56].The work [57] marked the first exploration of the precision of dyadic approximations in three-dimensional space.In 2020, a significant advancement was made by applying the Method of Four Russians, which modified the Wu-Brady technique to calculate sums over Θ(N 3 ) lines, rather than the excessive Θ(N 4 ) lines in the original Wu-Brady algorithm designed for three-dimensional tomography tasks (where N denotes the linear size of the reconstruction) [35].This resulted in a fast algorithm for computing the forward projection operator.A notable advantage of this algorithm [35] lies in its versatility throughout CT schemes.

Contributions
This work presents a novel general method for developing efficient algorithms to compute both forward projection and backprojection linear operators.Our method transforms fast algorithms designed for forward projection, solely reliant on addition operations, into efficient algorithms for backprojection.It is noteworthy that the computational complexities for both forward projection and backprojection operators are comparable.Furthermore, this method is versatile, and applicable to computing the transpose of any operator represented by a binary matrix.The algorithm for computing the transpose of the operator exhibits a similar asymptotic computational complexity as that of the forward operator.
This work combines the Method of Four Russians with the Brady-Yong algorithm for fast computation of the forward projection operator in arbitrary measurement schemes in the 2D case.By integrating these approaches with a universal method for transposing summing algorithms, the study devises a fast algorithm for calculating backprojection operators for both parallel and fan beams within the circular CT scheme (2D case).The computational complexity of these algorithms depends on the chosen stopping point in calculating partial sums over patterns.Experimental results validate the theoretically grounded potential for fast reconstruction methods that involve computing the backprojection operator using the proposed approach of transposing algorithms.
In previous research [35], a combination of the Wu-Brady and the Method of Four Russians was utilized for the circular cone beam CT system (3D case).However, the construction of the backprojection operator was not addressed in that work.The proposed method, which transforms fast algorithms for computing forward projection operators into efficient ones for computing backprojection operators, was applied to the algorithm described in the same work [35].An assessment of the complexity of the transposed algorithm from [35] was conducted.Implementing our method for transposing accumulative algorithms opens the door to obtaining efficient algorithms for computing backprojection operators from those for computing forward projection operators, applicable to any CT schemes in both two-dimensional and three-dimensional scenarios.

Paper Outline
The subsequent section defines the reconstruction problem and gives an overview of the employed methodologies.In the third section, we detail and justify a novel method for transposing summing algorithms.This method allows for deriving an algorithm to compute the transpose of an operator represented by a binary matrix through the original operator computation algorithm.The evaluation of the algorithmic complexity of the forward and transposed operators computation is presented within Section 3.3.In the fourth section, we apply this transposition method to established fast algorithms for computing forward projection operators in both 2D and 3D scenarios.Section 4 introduces novel fast algorithms for computing both forward projection and backprojection operators based on the fast Hough transform (FHT), accompanied by a complexity assessment.The 2D case is dealt with in Section 4.2 and the 3D case is the subject of Section 4.4.These algorithms are derived through the application of our proposed method for transposing algorithms, and a thorough complexity analysis is performed.The fifth section delves into the experimental validation of our proposed method of algorithm transposition.Finally, the paper concludes with a discussion highlighting the advantages of the proposed approach.

CT Reconstruction: Problem Statement and Solutions Overview
CT reconstruction entails creating a digital representation of an object based on a collection of measured X-ray images or tomographic projections.This digital depiction represents a discrete spatial distribution of the linear attenuation coefficient of X-ray radiation.

Projection Image Model in Parallel-Beam Circular CT
The classical mathematical model for generating a CT projection image is based on the following assumptions: -Monochromatic radiation is used for scanning; - The imaging process is approximated with geometric optics; -There are no radiation sources (primary or secondary) within the examined sample; -Radiation attenuation depends on the absorptive properties of the material and follows the Bouguer-Beer-Lambert law; -the detector only records the attenuated probing radiation.The detector cell has an infinitesimal size, a linear response function, and unit sensitivity.
The basic scheme for measuring projections with a parallel beam on an object is illustrated in Figure 1.When an object is probed with a parallel beam and the axis of rotation is perpendicular to the beam, we can analyze the CT problem on a plane without loss of generality.Consider a parallel set of planes perpendicular to the rotation axis.Measurements in each of these regions are independent.Let us take advantage of this and focus on a single plane.The Cartesian coordinate system x0y in the plane aligns with the center of rotation of the source-detector system, and the spatial distribution of the linear attenuation coefficient µ(x, y) of the probing radiation is described by a finite function.In this coordinate system, we will use the standard parameterization of a line: (ρ, ϕ).All rays in the probing beam at one projection position share the same ϕ coordinate.The image is captured by a planar position-sensitive detector located perpendicular to the probing direction.The ρ coordinate precisely locates the detector cell.For an infinitely thin X-ray beam at an angle ϕ reaching the detector at ρ, the radiation attenuation law through the object is expressed as: where I 0 is the source's radiant exitance, and I is the irradiance of the detector cell.Normalizing by I 0 and taking the logarithm yields the function g(ρ, φ), referred to as the ray integral: Through tomographic measurements, we acquire linear integrals of the function µ(x, y) along each straight line .This process, known as the Radon transform, maps a function from Euclidean space to its set of linear integrals.Radon introduced an explicit inversion formula for an infinite set of straight lines in 1917 [58].Building on this, the problem of tomographic reconstruction involves recovering the function µ(x, y) when linear integrals are known for a finite set of straight lines.The arrangement of these lines is referred to as the measurement scheme, which is determined by the tomograph's design, including the shape of the probing beam and the scanning scheme.Consequently, the reconstruction process must align with the chosen measurement scheme.
In practice, synchrotron radiation sources primarily employ monochromatic probing.Medical and industrial tomographs, on the other hand, utilize polychromatic radiation.Different approaches have been suggested to linearize the relationship between measurement results and the function describing attenuation.These methods vary from those needing extra calibration measurements [59,60] to fully automated techniques [61,62].Note that using parallel beams for scanning is primarily a feature of synchrotrons.In laboratory settings, achieving a parallel scheme necessitates the incorporation of extra optical components in the tomograph setup.Conversely, when employing an X-ray tube to generate the beam without extra optical elements, a cone beam is formed.In this setup, the source is situated at the apex of the cone, while the detector rests in the plane of its base.The object being probed is located within the cone.The collection of rays composing a tomographic projection under the current angle in a cone scheme differs from that in a parallel scheme for the same angle.
Consider a circular scanning path.In tomographic research, measurements are taken along a specified finite number of directions C, which determine the set of projection angles Φ = {φ i } C i=1 .For each projection angle, the linearized measurements are recorded as a vector g(φ) = (g 1 , . . ., g N ) T , where N represents the number of cells in the detector array (in the two-dimensional scenario).The measurements for all angles can be consolidated into a single vector G = (g(φ 1 ), . . ., g(φ C )) T = (g i : i ∈ Z 1,N•C ) T .The set of rays constituting the vector for the current projection angle in a cone beam differs from the set of rays forming the vector in a parallel beam for the same angle.
To perform reconstruction, we need to produce measured data discretization first.For µ discretization, the reconstructed area is divided into a regular grid of square pixels.The number of pixels in the horizontal direction is the same as in the vertical direction, and it matches the number of detector cells N.This grid contains a total of N 2 elements, which can be expressed as a vector M = (µ 1,1 , . . ., µ 1,N , µ 2,1 , . . ., µ N,N ) T = (µ i : i ∈ Z 1,N 2 ) T .
The vector M represents a 2D slice of the reconstructed 3D image.Ray integral values (2) are approximated using sums.The model for generating projection data is expressed in matrix form: where W is the projection matrix of size N • C × N 2 .The elements 0 ≤ w i,j ≤ 1 in this matrix specify each pixel's contribution to the sum.This W matrix defines the measurement setup.
The dimensions of the projection matrix depend on the number of detector cells and the quantity of projection angles.Its configuration is influenced by factors like scanning path, probe shape, the number of projection angles, and their absolute values.

CT Reconstruction Algorithms
In the formulation (3), the CT problem involves reconstructing a digital image of the studied object M given the known projection matrix W and the vector of linearized measured values G.
The matrix W is a sparse non-negative matrix, with elements representing weights calculated based on the chosen ray-pixel intersection model [23].We will treat the projection matrix W as a binary matrix for the forward projection operator in the natural basis of the introduced Cartesian coordinate system x0y.The value w i,j of the projection matrix element at indices i, j equals 1 if and only if the X-ray beam intersects the pixel of the reconstructed object's cross-section at coordinates i, j.
Various approaches have been proposed for reconstructing images from measured projections.Integral methods rely on the numerical implementation of the Radon transform inversion formulas.The most widely used method, found in most medical tomographs, is the convolution and backprojection technique.In tomography, a widely used family of algorithms for reconstructing images is based on the convolution and backprojection method, known as Filtered Back-Projection (FBP) [63].Buzug et al. [23] demonstrated that with the matrix representation from (3), the following relationship can be derived: where W T represents the backprojection operator matrix.The matrices (W T • W) −1 and (W • W T ) −1 are filtering operators that can be applied either after or before the projection, respectively.The convolution and backprojection method is a fast two-step approach.Initially, measured projections are convolved with a filter [64].Subsequently, a backprojection operation is conducted.Proposed optimizations targeted hardware solutions.Myagotin et al. presented an implementation of the method on multi-core processors [65].The method operates with low memory requirements due to its sequential execution.However, it requires a substantial number of projections.To ensure accurate reconstruction, projection angles should be evenly distributed within the 0 to 180-degree range, with the number of angles meeting the Nyquist criterion.Otherwise, additional data interpolation may be necessary [66], potentially impacting reconstruction accuracy.In cases of limited projections, an algebraic reconstruction approach has demonstrated effectiveness.

Algebraic Reconstruction: SIRT (Simultaneous Iterative Reconstruction Technique)
The algebraic approach to CT reconstruction, used alongside FBP [67], treats (3) as a system of linear algebraic equations, seeking a solution for M. Due to the system's high dimensionality, direct methods for solving linear systems of equations are not feasible.Specialized algorithms have been proposed for this purpose [68,69].In contrast to the algebraic reconstruction techniques proposed by Gordon et al. [68], where solution vector values are updated sequentially, the SIRT method, introduced by Gilbert in 1972 [69], updates all coordinates of the solution vector at each step.For each coordinate, the current value is calculated, taking into account all registered contributions.The solution at each iteration in matrix form is updated as follows: where λ p is the relaxation parameter [70] and W T represents the matrix of the backprojection operator.The diagonal matrices R and Q are of dimensions respectively.Their elements are computed using the following formulas: r ii = 1/ ∑ j w ij and When the number of projection angles and/or the angle range are restricted, employing the Total Variation (TV) method [71] for regularization yields a highly precise result.In such cases, the reconstruction is acquired by solving the convex optimization problem: where α is the regularization parameter [71].The optimized expression, and thus the iteratively obtained solution, includes both forward projection and backprojection operations.
While the algebraic approach to CT reconstruction demands more computational resources compared to the convolution and backprojection method, its significant advantage lies in its lack of strict limitations on the number and quality of tomographic projections, as well as its independence from uniform distribution of observed projection angles.Fast reconstruction using iterative algorithms can be achieved through two approaches.The first involves augmenting computational resources and devising methods for parallel processing [72].
The second focuses on designing and utilizing algorithms with reduced computational complexity.While given specific measurement schemes, there are efficient algorithms for forward projection operator computation [52], and fast computation of the backprojection operator is not guaranteed.

Neural Network-Based Reconstruction Techniques
The development of neural network-based approaches to tomographic reconstruction began in 1995 with the landmark work by Kerr et al. [73].They demonstrated that a neural network could produce tomographic images of comparable accuracy to those used in its training.This was based on the neural network's capacity to discern and evaluate intricate functional relationships.The neural network was trained using reconstructions obtained through the FBP method.Nowadays, the integration of neural network models into reconstruction algorithms facilitates high-quality reconstructions in few-view CT [74], especially in cases when the full range of angles is limited [75,76], or when the measured projections are noisy [77].The method's groundbreaking applications, such as tomographic measurements with nanometer resolution, real-time CT, and studying dynamic processes, pose challenges for traditional methods.In neural network approaches, the operations of forward projection and backprojection remain the mathematical cornerstone.

Fast Approximation Schemes
Efficient CT reconstruction algorithms can be built with fast approximate computational schemes that utilize the Fast Hough Transform (FHT), also known as the Fast Discrete Radon Transform [45,[78][79][80][81].The Brady-Yong algorithm [45] approximates straight lines using discrete dyadic patterns.While these patterns enable optimized summation calculations, they exhibit some degree of approximation inaccuracy.For comprehensive insights into the general properties and practical applications of FHT in 2D cases, refer to [82,83].
In [84], FHT was proposed for speeding up the Simultaneous Algebraic Reconstruction Technique (SART) [85] in 2D reconstruction.The authors applied FHT for both forward projection operator calculation of residuals and for transposed operator calculation of corrections in the iterative scheme.This led to a reduction in the asymptotic complexity of a single iteration from Θ N 3 to Θ N 2 log N .Additionally, in [86], a modification of the FBP algorithm, which employs FHT to speed up the most computationally intensive step (backprojection) is introduced.This modification allows for the image reconstruction with Θ N 2 log N addition operations and Θ N 2 multiplication operations.
In 3D CT reconstruction schemes, ref. [35] demonstrated that summing over Θ(N 3 ) patterns can be computed in Θ(N 7/2 ) addition operations by utilizing a generalization of FHT for 3D introduced by Ershov et al. [87].However, ref. [35] provides only a theoretical estimate and a general outline of the algorithm without a specific implementation, and the algorithm for the fast computation of the transposed operator was not discussed.
Thus, reducing the constant in the computation complexity of the forward projection algorithm and devising fast schemes for the transposed operator will substantially improve the efficiency of various CT reconstruction algorithms.

Notations and Definitions
Let B represent a Boolean matrix defined as where y denotes the index of the row in matrix B, x represents the index of the column in matrix B, H ≡ H(B) signifies the number of matrix rows, and W ≡ W(B) indicates the number of matrix columns, The number of ones in matrix B is denoted as C 1 (B) and can be computed using the formula: The count of summations for a Boolean matrix B is denoted as C Summ (B) and is calculated as follows: The number of duplications for a Boolean matrix B is denoted as C Copy (B) and is determined as Let dec p (•) represent a predefined decomposition of a Boolean matrix into the product of p Boolean matrices: where In general, we assume dec 1 (B) ≡ B.
Similar related functions C 1 (dec p (•)), H(dec p (•)), W(dec p (•)), C Summ (dec p (•)) and C Copy (dec p (•)) for the decomposition of a Boolean matrix into the product of p Boolean matrices are computed by summing the corresponding function over all matrices in the decomposition: H(dec p (B)) W(dec p (B)) C Summ (dec p (B)) C Copy (dec p (B))

Correspondence of Decompositions for Forward and Transposed Operators in Boolean Matrix Products
Let us consider the approximation of ray sum values in matrix form as shown in Equation ( 3).We use a linear operator: The matrix B of this operator is Boolean within the previously introduced Cartesian coordinate system x0y (n, m ∈ Z 1,∞ ).Without loss of generality, we further assume that the matrix B of the operator B and all matrices dec p i (B) in the decomposition do not contain zero rows and columns.This is because, if they did, the algorithmic complexity of computing BM could be reduced by excluding a portion of the input data M from consideration, where The transposed operator (17) of B, denoted as B T , defined by the matrix B T : For every decomposition dec p (B) of the matrix B representing the forward operator B, there is a corresponding decomposition dec p (B T ) of the matrix B T for the transposed operator B T .This is computed by considering the transposition of matrix products: Let us label an algorithm as summation-based if it relies solely on addition operations for data manipulation.
The algorithm for computing the result BM after applying the operator B defined as (17) to the input vector M, expressed as employs a sequential (row-wise) multiplication of matrix components dec p i (B) ≡ B i , i ∈ Z 1,p , in the decomposition dec p (B) of the operator's matrix B with the input vector M.This process employs only summation, hence its algorithmic complexity is C Summ (dec p (B)).
In this algorithm, the j-th component of the vector , is computed as the product of the j-th row of the matrix B i and the column vector 1) .This computation requires one less addition operation than the number of non-zero elements in the j-th row of the matrix B i .Consequently, to compute all components of the vector operations are performed.In total, the algorithm requires addition operations.
In a similar manner, the algorithm for computing the result of applying the transposed operator B T to the input vector, achieved through sequential (row-wise) multiplication of matrix components , in the decomposition dec p (B T ) of the matrix B T for the operator B T , is also summation-based only with an algorithmic complexity of C Summ (dec p (B T )).
Now, let us determine the complexity relationship between the algorithms for computing the results of applying the forward and transposed operators when utilizing a pair of corresponding decompositions: Since the number of non-zero elements in the corresponding matrix components of the decompositions dec p (B) and dec p (B T ) for the matrices B and B T of the forward and transposed operators B and B T is the same (i.e., C 1 (dec p (B)) = C 1 (dec p (B T ))), and for the decomposition dec p (B), the relationships H(dec p i (B)) = W(dec p i+1 (B)), i ∈ Z 1,p−1 , hold due to the compatibility of the dimensions of the multiplied matrix components B j , j ∈ Z 1,p in the decomposition dec p (B), equality (21) can be extended: Combining ( 21) and ( 22), we derive the following: Hence, the difference in algorithmic complexities when computing the results of applying forward and transposed operators using corresponding Boolean decompositions is a constant determined by the dimensions of the input and output vectors.The former result is illustrated in Figure 2.
Given that the precise computation of forward or transposed operators requires no less than n addition operations, the formula (23) In other words, the asymptotic algorithmic complexities for the precise computation of forward and transposed operators are of the same order.The described method is a versatile approach for transposing the algorithm used to compute the forward operator.This can be implemented in the form of sequential multiplications of Boolean matrices, which are derived from the known computationally efficient forward operator matrix decomposition, with an input data vector.It enables the derivation of an algorithm for computing the transposed operator with a well-defined asymptotic complexity, which is on par with that of computing the forward operator.
Next, we will explore the practical implementations of this generalized transposition method in the context of summation-based algorithms for computing backprojection operators in CT reconstruction.

Inverting the Projection Operator Calculation Algorithm (FHT Approximation) 4.1. Fundamental 2D FHT Concepts
Consider a standard rectangular Cartesian coordinate system denoted as X0Y associated with an image.This image can be represented as a 2 n × 2 n , n ∈ Z 1,∞ matrix, I 2 n .The bottom-left corner of the image corresponds to the point 0, and each element of the image, known as a pixel, is depicted as a square with a side length of 1 (see Figure 3).
Let us consider the parametrization (s, t) of straight lines on the plane X0Y [81].This defines a continuous straight line passing through points on the lines that bound the image (refer to Figure 3): -A predominantly vertical straight line (PVL) passes through points (s C , 0) and (s C + t C , 2 n ), situated on the straight lines that bound the image from above and below; -A predominantly horizontal straight line (PHL) passes through points (0, s C ) and (2 n , s C + t C ), situated on the straight lines that bound the image from the left and right.
For simplicity (similar to [45]), we focus exclusively on predominantly vertical straight lines with a negative slope t ≤ 0. ∈ Ω along the image boundaries.Here, s D ∈ Z 0,2 n −1 represents the integer shift at the pattern's beginning, and t D ∈ Z 0,2 n −1 denotes the integer shift of the pattern's end relative to its beginning (see Figure 3).The continuous straight lines extending beyond the image are approximated by a straight-line pattern defined as the pattern for the original continuous straight line within the region of the extended zero-padded image.
The creators of the FHT algorithm [45] proposed using a dyadic pattern as a discrete straight line, which can be defined both recursively [45] and analytically [53].The properties of this pattern, including estimates of its maximum deviation from a continuous straight line, have been explored in [46, 53,54].

Two-Dimensional Operator Computation Algorithms
In the development of CT reconstruction algorithms, particularly in cases with a small number of angles (few-view CT) [88] and/or monitored reconstruction (monitored CT) [89], the process of summing over all directions may prove to be excessive (refer to the analysis in Section 4.2.5).Hence, we will now explore a more adaptable approach.This entails computing sums for patterns of length 2 k , k < n, followed by aggregating these partial sums to derive totals along all discrete straight lines in the specified set of directions.

Two-Dimensional Projection Algorithm with FHT Termination and Aggregation (for Arbitrary Set of Straight Lines)
The original Brady-Yong algorithm [45] calculates sums for all discrete straight lines passing through the image.The algorithm dirPFHT (Directed Partial Fast Hough Trans-form) is a modification of the Brady-Yong algorithm.For an input image I 2 n with dimensions 2 n × 2 n , it computes the output vector of sums s = (s j ) q−1 j=0 for q discrete straight lines parameterized in (s, t), defined as a set L = {(x j , a j )} q−1 j=0 .Main Computation Stages: 1.
(Right) zero padding the original image I 2 n to R 0 ; 2.
Aggregating sums for each discrete straight line from L (using dirSU Mk).
Algorithm 1 provides pseudocode for two-dimensional projection algorithm with termination on kth iteration and aggregation.
Algorithm 1 Algorithm for forward FHT with termination and aggregation copy the original data 3: compute sums along patterns with length 2 k 5: for y = 0 to 2 n − 2 i step 2 i do along the 0Y shift of the pattern beginning 5: for x = 0 to 2 n+1 − 1 do along the 0X shift of the pattern beginning 6: x 2 ← (x − a/2 ) mod 2 n+1 circular shift 7: return recSU Mk(x, 0, a, n, R k , n, k) The Algorithm 4 recSU Mk recursively aggregates the precomputed partial sums over patterns of length 2 i , i ∈ Z k,n , in R k along a discrete straight line given by (x, a).

Algorithm 4 Aggregation
1: procedure recSU Mk(x, y, a, i, R k , n, k) else 5: x 2 ← (x − a/2 ) mod 2 n+1 circular shift 6: return s We will now introduce a matrix representation for the algorithm dirPFHT with specified values of n, k, L.

Matrix Representation of the 2D Projection Algorithm with FHT Termination and Aggregation
For i ∈ Z 1,k , the element of the tensor R i (x, y, a) stores the sum value along a pattern of length 2 i along the 0Y axis.Here, the pattern initiates at point (x, y), and the pattern's end is shifted by −a along the 0X axis.The tensor R 0 is expanded through zero-padding of the original image Let us denote the set of indices used in the tensor R i by where It is worth noting that For all i ∈ Z 0,n , each element of R i can be represented in terms of V (i) .More precisely, let J (i) be the bijective function mapping index (x, y, α) ∈ XYA (i) of the element of tensor R i to the index j of the corresponding element of vector V (i) : is according to formula Then, the inverse mapping A 2 2n+1 × 2 2n+1 Boolean matrix B i allows for the calculation of The value of element B i (p, t) for (p, t) ∈ Z 0,2 2n+1 −1 × Z 0,2 2n+1 −1 is determined by the following formula ), 0, otherwise, (33) where (x p , y p , a p ) = H (i) (p).
In each row and each column of the matrix B i , there are exactly two ones; therefore, According to the definition ( 9), the number of summations for the matrix B i is For some n, k, the discrete straight line given by (x, a) defines a set U k n (x, a) of indices for elements in R k , the values of which are employed to aggregate sums during the recursive call recSU Mk: The aggregating matrix S is of size q × 2 2n+1 and allows for the computation of the vector of sums The values of the elements in S are determined by the set of discrete straight lines (L): According to the algorithm, each row of matrix S contains precisely 2 n−k ones; hence, and following the definition in (9), the count of summations for matrix S is The sequential application of ( 32) and ( 37) leads to the matrix representation of dirPFHT, which takes the form The algorithm's computational complexity, considering both ( 35) and ( 40), is determined as follows: Given that projection directions are uniformly distributed, let the parameter α ∈ (0, 1] ∈ R determine the proportion of directions compared to the maximum possible number (2 n for each type of lines).This results in the number of distinct discrete straight lines for each type, as expressed below Considering the four types of lines in the 2D scenario, the total number of summations, denoted as T B (n, k, α), in the implementation (42), is as follows for naive summation along all discrete lines, considering image padding, and for the full FHT in the Brady-Yong method (without aggregation): Next, let us explore how k and α influence the number of operations, introducing the following notation We will analyze the normalized difference ∆ B n (δ, α) between the complexity of the proposed algorithm T B (n, n − δ, α) and the original Brady-Yong algorithm T B (n, n, 1), normalized by factor 2 2n+2 (representing the number of discrete straight lines for all types and shifts), Without aggregation (δ = 0), the complexities coincide: ∆ B n (0, α) = 0. To analyze the function's properties, let us take the second derivative of ∆ B n (δ, α) with respect to δ.This leads to With Let us determine the minimum value of the expression with respect to δ, yielding the following Therefore, the function ∆ B n (δ, α) attains its minimum at The optimal value, δ opt , can be approximated by rounding δ to the nearest integer: The optimal iteration for terminating the summation is Table 1 provides the values of δ opt for various α.Our analysis demonstrates the advantage of the proposed 2D projection algorithm with FHT termination and aggregation compared to the classical Brady-Yong algorithm.Moreover, when α, the proportion of projections used in reconstruction, decreases, the potential computational gain grows.

Transposition of the 2D Projection Algorithm with FHT Termination and Aggregation
To develop a fast algorithm for the calculation of transposed operator, we will analyze the 2D projection algorithm with FHT termination and aggregation (see Section 4.2.1) as a sequential application of partial sum computations for sub-patterns followed by aggregation.In line with the general transposition scheme for summation-based algorithms, the transposition of this algorithm involves executing the transposed stages in reverse order: (1) "Spreading" the value of each ray-sum along each pattern across all sub-patterns of length 2 k (summing value to all sub-patterns of length 2 k that constitute it); (2) Transposing the algorithm for computing partial sums.
The implementation of this algorithm, denoted as revPFHT (Reversed Partial Fast Hough Transform), computes the transposed operator for the vector of ray-sums s = (s i ) and the set of discrete straight lines L = {(x i , a i )} q−1 i=0 within the specified range of line directions.Key stages: 1.
Preparation of the tensor R k ; 2.
Employing revPFHT (transpose of algorithm dirPFHT for computing sums over patterns of length 2 k ).
Algorithm 5 provides pseudocode for transposition two-dimensional projection Algorithm 1.

Algorithm 5
Transpose of the algorithm dirPFHT for calculating sums along patterns of length 2 k 1: procedure revPFHT(L, s, n, k) R k ← revSU Mk(s j , x j , a j , R k , n, k) ∀j ∈ Z 1,q transpose of the aggregation algorithm 4: transpose of the summation algorithm 5: Given n and k, the revSU Mk Algorithm 6, for a discrete straight line specified by (x, a), stores the value s in the tensor R k corresponding to the sums over sub-patterns of length 2 k using a recursive call to the function recSPRk.

Algorithm 6 Transpose of the aggregation algorithm dirSU Mk
The Algorithm 7 recSPRk recursively "spreads" (adds) the value s across (to) all elements of tensor R k , which are used to calculate the sum along the discrete straight line specified by (x, a).Algorithm 7 Algorithm for recursive spreading for computing the transposed operator 1: procedure recSPRk(s, x, y, a, i, R k , n, k) To transpose the algorithm for calculating partial sums over sub-patterns of length 2 k , we "reverse" the loop with respect to the variable k and replace the sum accumulation with "spreading".The Algorithm 8 revPFHTk completes the process of computing the fully transposed operator for the matrix R k , which contains the results of transposing the aggregation algorithm (the results of calculating sums over sub-patterns of length 2 k ).
Algorithm 8 Completing the process of computing the fully transposed operator for the matrix R k for y = 0 to 2 n − 1 step 2 i do 6: return R 0

Analyzing the Complexity of the Transposed 2D Projection Algorithm with FHT Termination and Aggregation
When transposing the 2D projection algorithm with FHT termination and aggregation from ( 19) and ( 41), the matrix representation of the transposed operator is as follows The estimation of the complexity for computing the transposed 2D projection algorithm with FHT termination and aggregation, denoted as In general, the matrix S may have zero columns.Consequently, S T may have zero rows.The number of summations in the transposed aggregation algorithm is defined as where C NZR (S T ) > 0 represents the number of non-zero rows in the matrix S T .Let us introduce as an upper bound estimate for the number of summations Considering ( 21) and ( 35), we can calculate the number of summations in the transposed algorithm for partial sum calculation over sub-patterns: By taking into account ( 39) and ( 43), we can compute Substituting ( 62) and ( 63) into ( 60), we get When transposing the original Brady algorithm, aggregation is not performed (k = n, α = 0).Thus, the complexity estimate is as follows: Similarly to the analysis of the forward operator calculation algorithm (Section 4.2.3)let us examine the normalized difference ∆ B T n (δ, α) between the numbers of required operations Analytically (similarly Section 4.2.3),we can demonstrate that Hence, when α < 1, the proposed algorithm (revPFHT) for computing the transposed 2D projection operator is computationally more efficient when compared to the transposed Brady-Yong algorithm.

Fundamental 3D FHT Concepts
For detailed definitions of discrete patterns and extensions of the Brady-Yong algorithm for 3D straight lines, please refer to [87].In 3D, the image is represented by a cubic matrix, where each element, called a voxel, is a unit cube with a side length of 1.The (s, t) parameterization classifies 3D lines into three types based on their alignment with respective coordinate axes, as outlined in [87].A straight line is deemed predominantly aligned with a specific axis if the acute angle between it and that axis is less than or equal to the angle between this straight line and the other axes.In the following discussion, we focus on lines primarily aligned with the 0Z axis.Each of these straight lines is defined by two points, (s X , s Y , 0) and (s Y + t X , s Y + t Y , 2 n − 1), through which it passes (see Figure 6).Moreover, lines within each type are further classified into four subtypes based on the signs of parameters t X and t Y , determined by their orientation relative to the coordinate axes 0X and 0Y.In this study, we consider cases where there is a negative slope along both axes, i.e., t X < 0, t Y < 0. In [35,87], a technique based on FHT termination and aggregation was introduced for the efficient computation of the projection operator in 3D, akin to the dirPFHT algorithm detailed in Section 4.2.1.Now, let us discuss the implementation of the dirPFHT3D Algorithm 9, introduced in [35].Its primary stages involve padding the original image I with zeros, computing partial sums, and aggregating the sum for each straight line from L.

Algorithm 9 Forward projection operator computation in 3D
1: procedure dirPFHT3D(L, I, n, k) R k ← dirPFHTk3D(R 0 , n, k) compute partial sums (along patterns with length 2 k ) 5: s j ← dirSU Mk3D(x j , y j , a j , b j , R k , n, k) ∀j ∈ Z 0,q−1 (x j , y j , a j , b j ) ∈ L aggregation of sums along discrete straight lines from L 6:

return s
The tensor element R i (x, y, z, a, b) encapsulates the summation over a pattern of length 2 i .The pattern commences at point (x, y, z), and its terminus shifts along the 0X axis by −a and along the 0Y axis by −b.Tensor R 0 represents the original image padded with zeros.The set of voxel indices employed at the i-th step of the algorithm in tensor R i , is defined as follows The Algorithm 10 for computing partial sums, denoted as dirPFHTk3D, mirrors the previously described 2D version.

2:
for i = 1 to k do along the pattern length 2 i 3: for a = 0 to 2 i − 1 do along the absolute 0X shift of the pattern end 4: for b = 0 to 2 i − 1 do along the absolute 0Y shift of the pattern end 5: for x = 0 to 2 n+1 − 1 do along the 0X shift of the pattern beginning 6: for y = 0 to 2 n+1 − 1 do along the 0Y shift of the pattern beginning 7: for z = 0 to 2 n−i step 2 i do along the 0Z shift of the pattern beginning 8: circular shift along 0Y 10: 12:

return R k
The Algorithm 11 dirSU Mk3D operates on the tensor R k , containing sums over subpatterns of length 2 k .It conducts aggregation of sums along a discrete 3D straight line with parameters (x, y, a, b), using a recursive call to recSU Mk3D.
Algorithm 11 Algorithm for 3D sums aggregation The Algorithm 12 recSU Mk3D recursively accumulates sums along the discrete straight line given by (x, y, a, b) from the precomputed partial sums over patterns of length 2 i , i ∈ Z k,n , stored in R k .

Algorithm 12
Recursive call for sums aggregation over patterns in 3D case return s Next, we will establish the matrix representation for the algorithm dirPFHT3D given n, k, L.
For any i ∈ Z 0,n , we can represent R i as the vector V (i) .Let the index (x, y, z, a, b) ∈ XYZAB (i) of the tensor element R i and the index j of the corresponding element in the vector V (i) be bijectively mapped: The inverse mapping is denoted as The Boolean matrix B i of size 2 3n+2+i × 2 3n+1+i allows us to compute The value of element B i (p, t) for (p, t) ∈ Z 0,2 3n+2+i −1 × Z 0,2 3n+1+i −1 is determined by the following formula: where (x p , y p , a p ) = H (i) (p).Each row of the matrix B i contains exactly 2 ones, hence By substituting into (9), we arrive at The aggregation matrix S has dimensions q × 2 3n+2+k and is employed to calculate the vector of sums: In each row of the matrix S, there are exactly 2 n−k ones, thus, based on the (9), we find The successive application of ( 72) and ( 76) yields the following matrix representation

.3. Number of Addition Operations in the 3D Forward Projection Operator Calculation Algorithm with Aggregation
For one type of straight lines, the number of unique discrete straight lines is determined by the range of different origin and end shifts, which in 3D amounts to Let the proportion of considered straight lines be α.Since there are 12 types of straight lines and taking into account ( 75) and ( 77), the number of addition operations T B (n, k, α) when computing the forward operator (Section 4.4.1)can be calculated as follows: Fast summation for all directions without aggregation has a complexity of In 3D, forward projection requires summation along Θ(N 3 ) straight lines, corresponding to which, when substituted, yields Naive summation corresponds to k = 0, and the total number of summations during the operator calculation is In [35], a theoretical estimate of the asymptotically optimal k = n 2 is provided for the summation algorithm along Θ(2 3n ) lines, substituting which, we get The complexity estimate derived from the analysis of the matrix decomposition of the algorithm proposed in [35], Θ(2 7 2 n ), aligns with the theoretical estimate provided by authors [35].
Next, we will construct the algorithm for calculating the transposed operator, achieved by transposing the algorithm introduced in [35] for computing the forward operator in 3D (FHT with aggregation).The transposition of the algorithm for computing the forward projection operator in 3D with aggregation closely follows the process in the 2D case (see Section 4.2.4).It entails the sequential application of the transposed aggregation algorithm, referred to as revSU Mk3D, and the transposed algorithm for computing sums over patterns of length 2 k , denoted as revPFHT3D.Although line parameterization becomes more complex, and the aggregating tensor's dimensionality increases, the fundamental structure of the algorithm remains unchanged.
Implemented as a function called revPFHT3D Algorithm 13 (Reversed Partial Fast Hough Transform 3D), the algorithm computes the transposed operator.Given parameters n, k, it takes a vector of ray sums s = (s i ) q−1 i=0 for q directions defined as a set of discrete straight lines L = {(x i , y i , a i , b i )} q−1 i=0 , and returns a 2 n × 2 n × 2 n image I.

Algorithm 13 Algorithm for computing the transposed operator in 3D
1: procedure revPFHT3D(L, s, n, k) transposed algorithm for aggregation in 3D 4: transposed algorithm for summation in 3D 5:

return I
The Algorithm 14 revSU Mk3D is the transposed aggregation algorithm.Given parameters n and k, it adds value s to the elements of tensor R k , corresponding to partial sums over sub-patterns of length 2 k , along the discrete straight line defined by starting point (x, y) and shift of the end (a, b).

Algorithm 14
Transposed algorithm for aggregation when computing the transposed operator in 3D 1: procedure revSU Mk3D(s, x, y, a, b, R k , n, k) R k ← recSPRk3D(s, x, y, 0, a, b, n, R k , n, k) x 2 ← (x − a 2 ) mod 2 n+1 6: return R k The Algorithm 16 revPFHTk3D implements the transposed algorithm for computing partial sums along pattern of size 2 k .

Number of Addition Operations in the 3D Transposed Operator Algorithm with Aggregation
Just as in the 2D case (Section 4.2.5)considering ( 59), the complexity estimate T B T for the algorithm revPFHT3D computing the transposed operator in 3D can be expressed as where C NZR (S T ) > 0 denotes the count of non-zero rows in the matrix S T .Taking into account the matrix structure with the substitution (74), we derive The transposed algorithm for fast summation across all directions, without aggregation, bears a complexity of To provide an upper-bound estimate TB T for the complexity of computing the transposed projection operator let us define By substituting ( 75), (77), and ( 86), we find In 3D, CT reconstruction methods necessitate the calculation of the operator for Θ(N 3 ) straight lines, corresponding to Substituting, we obtain The transposed algorithm [35] does not implement aggregation, which means k = 0, so the number of summations is When employing aggregation (similar to the estimation for the forward operator, see Section 4.4.3),we adopt the asymptotically optimal value of k opt = n 2 , and by substituting it, we acquire Hence, the proposed and analyzed algorithm for fast computation of the transposed projection operator in 3D, obtained by transposing the algorithm proposed in [35], in the practical case of summing over Θ(N 3 ) straight lines, exhibits an asymptotic complexity no less favorable than the algorithm outlined in [35] for fast computation of the forward operator Θ(N 3 √ N) (with N denoting the linear size of the reconstruction).

Experiments and Discussion
We tested the proposed general transposition method for summation-based algorithms by implementing efficient forward projection and backprojection operators for the twodimensional scenario.The implementation, conducted in C++, followed the computational framework outlined in Section 4.2: FHT computation with termination and aggregation.The optimized implementation of the algorithms prompted numerical experiments to gauge the speed and quality of their performance.The realization of the method was performed using our closed libraries for fast image processing to achieve a realistic execution time together with adequate comparison to classical methods.For this reason, unfortunately, the implementation can not be provided in open access.Although we hope that our detailed pseudo code implementations would be enough to reproduce the results.
We compared the quality of reconstruction in the case of parallel-beam circular CT.As reference methods for CT reconstruction, we employed the classical FBP method [63] and the accelerated FBP for computed tomography image reconstruction (referred to as HFBP) from [86].The modified HFBP method, utilizing fast forward and backprojection operators obtained through the proposed transposition method, is denoted as HFBP-L.The reconstructions were acquired using the Smart Tomo Engine v.2.0.0 software [90].
For our experiments, we selected the Shepp-Logan phantom model (see Figure 7) with dimensions of N × N, N = 1024.The set of projections simulated measurements with a detector size of N for 4N − 3 source rotation angles around the object's center, with a uniform angle step from 0 to π.The projections were simulated in parallel geometry of the experiment as a simple forward projection, which can be reproduced by any open-source software for CT simulation.The reconstruction results, obtained using the listed methods, are displayed in Figure 7.The entire set of projection data was utilized for the reconstruction.It is important to note that in the case of the HFBP-L method, we applied optimal aggregation depth determined by corresponding formulas for δ opt outlined in Sections 4.2.3 and 4.2.5 for both forward projection and backprojection operators (with α = 1.0).
To evaluate the performance of the reconstruction methods (refer to Table 2), we employed numerical metrics including NRMSE, SSIM [91], and STRESS [92].The NRMSE metric for the reconstructed image (r i : i ∈ Z 1,N ) T is calculated as , where (r i : i ∈ Z 1,N ) T represents the vector of pixel values in the ideal phantom image.The consistency in accuracy metrics between the HFBP and HFBP-L reconstruction methods is not coincidental.This confirms the validity of our algorithm implementation and the proposed aggregation approach.The algorithm's accuracy should not be influenced by the aggregation level δ, and the reconstruction outcome should be indistinguishable from the result of the same algorithm without aggregation, as evidenced by matching accuracy metrics.At the same time, according to the performance measurements, both HFBP and HFBP-L fall slightly behind the classical FBP algorithm, as they employ a less precise straight line approximation for the sake of speed.Nonetheless, the visual quality of reconstruction using HFBP and HFBP-L is quite satisfactory, with no significant distortions observed in the images (refer to Figure 7).Comparing algorithm speed, it is crucial to highlight that the distinction between HFBP and HFBP-L methods lies solely in the computation of the backprojection operator.HFBP employs the transposed Brady-Yong algorithm without aggregation, whereas HFBP-L utilizes the revPFHT algorithm (refer to Section 4.2.5).When discussing execution time, the number of operations, or the algorithms' asymptotic complexity, we specifically focus on the projection operators, excluding additional consistent costs for both methods.
Both HFBP and HFBP-L share identical asymptotic algorithmic complexities.However, HFBP-L displays superior speed, characterized by a constant factor.For instance, at α = 1.0,HFBP-L requires at least 0.1N 2 fewer addition operations than HFBP.And at α = 0.1, this difference is at least 22.5N 2 (assuming optimal aggregation depth as per formula ( 55)).
Experimental measurements of algorithm execution time were conducted on a Windows 10 system with x64 architecture, driven by an AMD Ryzen 7 2700X processor clocked at 3.70 GHz, featuring a 16 MB L3 cache, and 64 GB of RAM.All measurements were executed in a single thread, and each measurement was repeated 1000 times.The recorded time values were averaged with an assessment of the standard deviation.The recorded time corresponds to the computation duration of the projection operation.Additionally, although computing the forward projection operator is not essential for HFBP and HFBP-L algorithms, corresponding values were measured for completeness and for comparison with theoretical dependencies.These values are also annotated in reference to their respective algorithm.
The computation time of both the forward projection and backprojection operators for the HFBP method is independent of α and can be represented by a single value.Specifically, the computation time for the forward projection operator is t HFBP dir = 44.0 ± 1.1 ms, while for the backprojection operator, it is t HFBP rev = 79.8 ± 1.8 ms.For the HFBP-L method, the computation time is influenced by both the number of projections, denoted by the parameter α, and the aggregation depth δ.The computation time for the forward projection and backprojection operators is detailed in Tables 3 and 4 respectively.Case δ = 0 corresponds to computing the operator fully using the Brady-Yong method without termination.
As the number of projection directions increases, the overall computation time for the forward projection operator also rises.In the range of 0 < α ≤ 0.3, aggregation leads to time savings (refer to Table 3).The most significant reduction in computation time for the forward projection operator, by 15.8%, is achieved at α = 0.01, δ = 3.
In the computation of the backprojection operator within the HFBP-L method, optimizing the aggregation depth speeds up the process for α values within the range of 0 < α ≤ 0.4 for the considered phantom (see Table 4).The highest acceleration, 21.5%, is attainable at α = 0.01, δ = 3 for computing the backprojection operator.
It is worth noting that even with a zero aggregation level (δ = 0), the computation time of the operators is influenced by the number of projections, despite the absence of aggregation.This arises from additional overhead costs related to the explicit formation of the straight line list. of computing the forward projection operator for a phantom with N = 1024.Here, α signifies the proportion of the total (4N − 3) projections.Time is measured in milliseconds.The minimum computation time for the forward projection operator, which corresponds to the experimental optimal aggregation depth, is emphasized in bold.for computing the backprojection operator during reconstruction using the HFBP-L algorithm for a phantom with N = 1024.The parameter α corresponds to a fraction of the total (4N − 3) number of projections.Time is measured in milliseconds.The minimum computation time for the backprojection operator, corresponding to the experimental optimal aggregation depth, is shown in bold.The time values for computing the forward projection and backprojection operators, provided in Tables 3 and 4, are represented as plots in Figure 8a,b.The plot illustrates the difference in time compared to δ = 0 for the corresponding α value.This facilitates comparison with the theoretical plots in Section 4.2.The key distinction from theoretical calculations lies in the fact that the optimal aggregation depth, δ opt , deviates from the theoretical value (refer to Section 4.2).Moreover, for α ≥ 0.5, there is no optimal integer aggregation level that would lead to a time improvement in computing the forward projection and backprojection operators.Essentially, in these scenarios, δ = 0 proves to be optimal.
In summary, the conducted experiments validate the theoretically grounded potential for accelerating reconstruction methods.This is achieved through a fast algorithm for computing the forward projection operator using FHT with termination and aggregation, and a fast algorithm for computing the backprojection operator, achieved by applying a common transposition technique for summation-based algorithms to the forward projection calculation.Time is saved by reducing the number of projection frames used in the reconstruction process.The proposed methods enable accelerations of up to 15% for the forward projection operator and 21% for the backprojection operator when compared to fast calculation methods based on the Brady-Yong algorithm.This acceleration is most pronounced when dealing with a small number of angles and utilizing the optimal aggregation depth δ = δ opt .This outcome holds particular significance in the realm of few-view CT.

Conclusions
This work introduces a versatile method for transposing summation-based algorithms, which rely exclusively on addition operations.This method facilitates the efficient computation of the transpose of linear operators represented as Boolean matrices, assuming a known fast computation algorithm is available.Importantly, this transposition technique maintains the asymptotic complexity of the algorithms, ensuring consistent asymptotic algorithmic complexity for both the original and derived algorithms.
The summation-based transposition method holds significant promise in computer tomography, particularly in algorithms for computing forward projection and backprojection operators, where matrices are transposed in pairs.By applying this method to a known algorithm for computing the forward projection operator, we can derive an algorithm for computing the backprojection operator.The number of addition operations in the resulting algorithm align with the copy operations in the original, and vice versa.
In this study, for the first time, we present fast summation-based algorithms for computing forward projection and backprojection operators in 2D tomographic reconstruction, tailored for parallel-beam CT, especially suitable for few-view CT (FVCT).Although the asymptotic complexity remains at Θ n 2 log n (where n represents the linear size of the reconstruction), consistent with the Brady-Yong algorithm, our proposed algorithms are superior in terms of constant factors.
We also extend the method to cone-beam 3D CT.Previously, a forward projection algorithm with a complexity of Θ n 7/2 was introduced for this setup [35].While the number and linear dimensions of the projections scale proportionally with n, a fast algorithm for backprojection was lacking.In this study, we devised such an algorithm, also with a complexity of Θ n 7/2 .Therefore, the summation-based transposition method serves as a versatile approach for developing fast algorithms in CT reconstruction, applicable across various measurement schemes, both in two and three-dimensional scenarios.Implementing this method with existing forward projection algorithms, such as [35], adaptable to diverse measurement schemes, facilitates the creation of fast algorithms for computing the backprojection operator in any measurement setup.
We implemented the computation of the backprojection operator by transposing the algorithm for computing the forward projection operator based on FHT with termination and aggregation.This allowed us to assess the accuracy and speed of the new algorithm, comparing it with theoretical dependencies and experimental baselines.The results confirm the theoretically justified potential for accelerating classical reconstruction methods using this transposition method.They also suggest a minimum time depending on the chosen aggregation level.Notably, applying the transposed algorithm, which employs FHT with termination and aggregation, to a fraction of 0.01 of the considered projection directions during reconstruction led to a 15% increase in speed for computing the forward projection operator and a 21% increase for the backprojection operator, with visually insignificant degradation of the result.
Further research into the dynamics of the parameters, particularly the fraction of projections used for reconstruction, where the application of aggregation allows for time savings, is considered worthwhile.This should be explored for various linear sizes of reconstruction.Moreover, the intriguing potential of applying the proposed summationbased transposition method to algorithms computing the forward projection operator, particularly those based on cutting-edge, accurate, fast algorithms for Hough transforms, warrants attention.Employing this method in tandem with precise, fast algorithms such as those in [55] could offer a promising avenue for achieving both speed and accuracy in reconstruction simultaneously.

Figure 1 .
Figure 1.A schematic illustration of parallel-beam CT.Red point in the center illustrates the rotation axis, dashed lines are the X-ray propagation directions, φ is the rotation angle, ρ is detector cell coordinate and the blue line is the effective signal registered by detector.

3. 3 .
Summation-Based Algorithms and Matrix Decompositions for Forward and Transposed Operators as the Product of Boolean Matrices-Algorithmic Complexity in Computing Forward and Transposed Operators

Figure 2 .
Figure 2. Relations between dimensions of Boolean decomposition factors and the count of summations of forward and transposed decompositions.

3. 4 .
Universal Method for Deriving Efficient Transposed Operator Calculation Algorithms Consider a Boolean matrix B representing the linear operator B, which can be decomposed into the product of Boolean matrices as dec p (B) = B p B p−1 . . .B 1 , which aligns with established fast algorithm for computing the operator B. Subsequently, the decomposition derived from dec p (B), denoted as dec p (B T ) = B T 1 B T 2 . . .B T p or the matrix B T of transposed operator B T , leads to an efficient algorithm for computing B T .The algorithmic complexity is asymptotically of the same order as that of computing the original operator B.

Figure 3 .
Figure 3. Illustration of X0Y axes, I 2 4 , and the structure of the dyadic pattern approximating a straight line (s = 10, t = −5).The black dots indicate the centers of the dyadic pattern pixels.The discrete set Ω = Z 0,2 n −1 × Z 0,2 n −1(24) defines the coordinates of the pixels (x, y) ∈ Ω.A discrete straight line (straight-line pattern) is a subset of Ω that approximates a continuous straight line.It is defined by a pair of pixels with coordinates (s D , 0) ∈ Ω and (s D − t D , 2 n − 1) ∈ Ω along the image boundaries.Here, s D ∈ Z 0,2 n −1 represents the integer shift at the pattern's beginning, and t D ∈ Z 0,2 n −1 denotes the integer shift of the pattern's end relative to its beginning (see Figure3).The continuous straight lines extending beyond the image are approximated by a straight-line pattern defined as the pattern for the original continuous straight line within the region of the extended zero-padded image.The creators of the FHT algorithm[45] proposed using a dyadic pattern as a discrete straight line, which can be defined both recursively[45] and analytically[53].The properties of this pattern, including estimates of its maximum deviation from a continuous straight line, have been explored in[46,53,54].

Algorithm 2 2 : 3 :
sums along discrete straight lines from L 6: return s Algorithm 2 dirPFHTk computes and stores the sums for the input tensor R 0 in the form of a tensor R k , summing over sub-patterns of length 2 k .When k = n, the result conforms to the output of the Brady algorithm [45].Forward FHT with termination 1: procedure dirPFHTk(R 0 , n, k) for i = 1 to k do along the exponent range of the pattern length for a = 0 to 2 i − 1 do along the absolute 0X shift of the pattern end 4:

y 2 ,Algorithm 3
a/2 ) saving the sum of two sub-patterns 9: return R k Algorithm 3 dirSU Mk aggregates the sum along a pattern with specified parameters (x, a), where x ∈ Z 0,2 n −1 , a ∈ Z 0,2 n −1 , for the tensor R k with sums along sub-patterns of length 2 k .It does so through a recursive call to recSU Mk.Aggregation for the forward projection operator 1: procedure dirSU Mk(x, a, R k , n, k) 2:

Figure 6 .
Figure 6.Illustration of parametrization predominantly 0Z-axis oriented line in 3D, t X < 0, t Y < 0. Different coloured dashed lines demonstrate the values of parameters used in parametrization.

4. 4 .
Fast 3D Forward Projection and Back-Projection Algorithm with FHT and Aggregation 4.4.1.A Fast 3D Forward Projection Operator Algorithm for Arbitrary Set of Straight Lines with Termination at k-th Iteration and Aggregation

2 :i = k then 3 :
The accumulation of partial summation results is accomplished through a recursive call to the Algorithm 15 recSPRk3D.Algorithm 15 Algorithm for recursive call when accumulating partial summation results1: procedure recSPRk3D(s, x, y, z, a, b, i, R k , n, k) if R k (x, y, z, a, b) ← R k (x, y, z, a, b) + s

Figure 7 .
Figure7.The Shepp-Logan phantom with N = 1024, the reconstruction results using the considered methods along with the difference maps compared to the ideal phantom image.The entire set of projection data was utilized for the reconstruction (α = 1.0).The HFBP-L method employed an optimal aggregation depth δ, determined by the formulas for δ opt from Sections 4.2.3 and 4.2.5 for both forward projection and backprojection operators.

Figure 8 .
Figure 8.The computation time for the forward projection and backprojection operators with FHT and aggregation at a specified depth level (the HFBP-L method) as the difference from the time at δ = 0 for the corresponding α value.(a) Dependency for the forward projection operator, (b) Dependency for the backprojection operator.

Table 1 .
Dependence of values δ opt on α.

Table 3 .
The average time t HFBP−L dir

Table 4 .
The average computation time t HFBP−L rev 1,n ) row vector of length n (i) i-th iteration step (state of variable a at i-th iteration step) N = 2 n number of linear detector cells C number of projection images (observed distinct projection angles) n exponent of the image linear size (input image is of size 2 n × 2 n ) k exponent of the pattern length (pattern is of length 2 k ≤ 2 n ) I 2 n image of size 2 n × 2 n R k tensor of sums over patterns of length 2 k q number of ray sums (x, y) point or pixel coordinates in the coordinate system X0Y a shift of the pattern's end (coordinate) α number of projection images expressed as a proportion of C max = 2 n+2 Θ(•) asymptotic computational complexity (O-symbology) a