Toward Precision Radiotherapy: A Nonlinear Optimization Framework and an Accelerated Machine Learning Algorithm for the Deconvolution of Tumor-Infiltrating Immune Cells

Tumor-infiltrating immune cells (TIICs) form a critical part of the ecosystem surrounding a cancerous tumor. Recent advances in radiobiology have shown that, in addition to damaging cancerous cells, radiotherapy drives the upregulation of immunosuppressive and immunostimulatory TIICs, which in turn impacts treatment response. Quantifying TIICs in tumor samples could form an important predictive biomarker guiding patient stratification and the design of radiotherapy regimens and combined immune-radiation treatments. As a result of several limitations associated with experimental methods for quantifying TIICs and the availability of extensive gene sequencing data, deconvolution-based computational methods have appeared as a suitable alternative for quantifying TIICs. Accordingly, we introduce and discuss a nonlinear regression approach (remarkably different from the traditional linear modeling approach of current deconvolution-based methods) and a machine learning algorithm for approximating the solution of the resulting constrained optimization problem. This way, the deconvolution problem is treated naturally, given that the gene expression levels of pure and heterogenous samples do not have a strictly linear relationship. When applied across transcriptomics datasets, our approach, which also allows the coupling of different loss functions, yields results that closely match ground-truth values from experimental methods and exhibits superior performance over popular deconvolution-based methods.


Introduction
In recent times, the continuous rise in the global cancer burden has emphasized the need for increased efforts in cancer treatment strategies. In 2020 alone, there were 19.3 million new cases and almost 10 million cancer deaths worldwide, with the number of new cases projected to climb to 28.4 million by 2040 [1]. This projection is not far-fetched, given the 36.9% increase in the number of new cases from 2012 to 2020 [2]. With up to half of all these cancer cases receiving radiotherapy during their treatment, radiotherapy is still a vital cancer treatment strategy [3][4][5].
In simple terms, radiotherapy involves using ionizing radiation, usually high-energy X-rays, to kill cancer cells or, at least, limit their proliferation by damaging their genetic code of life, known as deoxyribonucleic acid (DNA). In doing this, the goal is to achieve tumor control without introducing severe damage to surrounding normal tissues, enhancing approximate solutions to the inverse problem. Interestingly, this is because each method is conceptually different according to the choice of loss function (or objective functions) and the setting of the optimization problem (constrained or unconstrained). It is against this backdrop that packages such as Immunedeconv [27], TIMER2.0 [28], and TumorDecon [29] have sought to provide a unified platform that allows each of the different methods to be applied on the same dataset to compare or complement results. Accordingly, the strengths of each method can be harnessed to gain more robust and comprehensive estimates. Nevertheless, this approach is still susceptible to the potential issues associated with traditional linear modeling [30], given that "the relationship between the expression levels of pure and heterogeneous samples is not strictly linear" [14]. Moreover, dealing with large transcriptomics datasets calls for computationally efficient methods with fast rates of convergence and runtime [30].
Therefore, the main aim of this paper is to introduce and discuss a mathematical formulation that permits the TIIC deconvolution inverse problem to be handled in its natural state alongside an accelerated machine learning algorithm for approximating its solution. Through rigorous mathematical analysis, we show that the algorithm converges to an optimal solution of the inverse problem for various loss functions. More specifically, a globally optimal solution is guaranteed for convex loss functions. Furthermore, we use numerical experiments to show that the algorithm exhibits faster convergence rates and runtime than other traditionally used algorithms. When applied across transcriptomics datasets, our results closely match values from experimental methods and show superior performance over popular TIIC deconvolution-based methods. We end with a note on the detailed science behind these observations and an explanation of how this framework can be applied across similar inverse problems in biology, medical physics, and oncology.

Formulation and Discussion of the Deconvolution Problem
Let N denote the number of different cell types forming a mixture sample, and M be the number of genes whose expressions are measured in the sample. Let B = (b 1 , b 2 . . . , b M ) ∈ R M be the measurements of gene expression in the sample. Let S ∈ R M×N be the corresponding reference expressions matrix of the M genes from the N constituent cell types, and P = (p 1 , p 2 . . . , p N ) ∈ R N be the unknown proportions of mix of the different cell types. An operator r can model the relationship of B, S, and P as B = r(S, P). (1) The deconvolution problem is concerned with the inverse problem of estimating P, given B and S. This inverse problem can be formulated mathematically as the following equivalent constrained optimization problem: min P∈C L(B, r(S, P)), where L is a loss function measuring model fitness, and C is the set of constraints on P arising naturally from its definition as proportions, i.e., C = P ∈ R N : p i ≥ 0 ∀ i = 1, . . . , N, Many of the existing deconvolution methods (see, for example, [21][22][23][31][32][33][34][35] and references therein) consider B, S, and P to be linearly related in the form where e is a random error. Several issues with the linear framework have been identified [30]. More recently, the authors of [36] showed problems associated with different scales of gene expression within the linear framework and then proposed the following hybrid model: where d accounts for systemic technical variation. Equivalently, We note that exp(d) in Equation (5) is a constant factor adjustment across all genes. However, more than this constant factor adjustment may be required, because such systemic technical variations affect genes differently [37]. Thus, one may consider a more general model of the form where D is a diagonal matrix with diagonal entries as gene-specific factor adjustment generated from some known distribution. Accordingly, Equations (3) and (5) become special cases of Equation (6). Nevertheless, such a linear transformation may not efficiently describe nonlinear patterns. Consequently, we formulate a generic nonlinear framework for the deconvolution problem by considering the operator r in Equation (1) as a nonlinear transformation involving S and P. This is because generalized nonlinear regression methods have been shown to yield better prediction accuracy for complex nonlinear patterns, where traditional linear regression models may fail [38]. Specifically, our nonlinear operator r is given as The nonlinearity of Equation (7) depends on the value of θ. Note that Equation (7) reduces to the linear framework for θ = 1. Therefore, for an arbitrary sequencing dataset, it is worthwhile to determine what values of θ (different from one) closely describe the not strictly linear relationship of the gene expression profiles of pure and heterogeneous samples.
Rigorous data assimilation techniques are useful in making such determinations from arbitrary datasets. However, we refrain from such nontrivial rigorous analytical examinations as they are beyond the aims of this work. As a result, we choose our θ values using an empirical approach for the purpose of demonstrating the proof of principle which is the subject of this work. The empirical approach is based on our hypothesis that, for some θ ∈ (0, 1) ∪ (1, 2), we may be able to get suitable values satisfying the description of the relationship between the gene expression profiles of pure and heterogeneous samples. More specifically, we hypothesize that such a value might be slightly less than one or slightly greater than one on the order of a few decimal places. This hypothesis is guided by our preferred interpretation of the expression "not strictly linear".

Review of Some Commonly Used Loss Functions
Several research surveys of cell-type deconvolution methods (see, for example, [14,19,39]) have identified the quadratic/squared error loss and the ε-insensitive loss, as the most commonly used loss functions for reference-based cell-type deconvolution within the linear framework. On the one hand, the squared error loss (SEL) is formulated on the basis of squared deviations as follows: where S i denotes the i-th row of S. In fact, it is the most common choice of loss function due to its simplicity. Notably, the SEL is highly susceptible to outliers. This feature can where , with 0 < ρ ≤ 1 and ε > 0. In [41], the authors remarked that this function (Equation (9)) is smooth and inherits most of the desirable characteristics of several robust techniques, including insensitivity to outliers. They further demonstrated in great detail the computational efficiency and competitiveness of SILF compared to other well-respected techniques. Remarkably, these loss functions have been a dominating paradigm in the deconvolution problem, mainly due to their convexity.
Nevertheless, it has been shown in recent times that nonconvex loss functions improve the generic applicability and robustness of learning, especially in situations where the data and noise distributions are unknown [42]. One such loss function is given in Equation (10) as a Cauchy kernel risk-sensitive loss (CKRSL), derived using a Gaussian kernel-adapted operator and following methods similar to those in [43,44].
We remark that the abstract formulation presented in Section 2.1 has the advantage of accommodating several loss functions including those reviewed in this subsection.

Specification of the Loss Function for this Study
For the present study, we use a weighted squared error loss (WSEL) on the relationship operator r defined by Equation (7), expressed as

Accelerated Machine Learning Algorithm (AMLA)
An optimal solution to the constrained optimization problem of Equation (2) can be approximated by the following accelerated machine learning algorithm (AMLA): for some predefined values of the adaptive momentum parameter α j and step size (or learning rate) λ. A denotes the gradient of the desired loss function L, and ℘ C is the projection operator on C. The projection operator ℘ C locates a point in C having the least distance to a given point, while v 0 , v 1 are initialization points. AMLA converges to a solution of the deconvolution problem (Equation (2)) for various loss functions, including those reviewed in Section 2.2. A detailed mathematical analysis establishing this convergence is presented in Appendix A, starting with the preliminary mathematical tools, as well as the lemmas, theorems, and their proofs.
For our choice of loss function (Equation (11)), the gradient A is the vector defined by Furthermore, the control parameters α j and λ are determined by a Lipschitz constant (K) of A (See Theorem 12 in Appendix A). For A defined by Equation (13), a suitable K can be given by where Lastly, the projection operator on C denoted by ℘ C is computed using the alternating projections method [45].

Validation Datasets
Our validation datasets, which come from published findings in [22,23], consisted of experimentally measured immune cell-type proportions from tumor samples, bulk RNA sequencing data of tumor samples, and a gene expression reference profile. The gene expression reference profile was a signature matrix of eight Melanoma subsets derived from scRNA-seq (SMART-Seq2) (see Supplementary Table S2e in [22]). These melanoma subsets are listed across 3121 genes. They include five TIICs (B cells, CD8 T cells, CD4 T cells, NK cells, and macrophages) and other cell subsets, including endothelial cells, malignant cells, and cancer-associated fibroblasts (CAFs).
Experimentally measured immune cell-type proportions from tumor samples and bulk RNA sequencing data of tumor samples were obtained from [23]. In [23], the authors divided the single-cell suspensions collected from the lymph nodes of four patients with metastatic melanoma into two portions. For one portion, flow cytometry was used to measure the percentages of live cells, including four TIICs (B cells, CD4 T cells, CD8 T cells, and NK cells), malignant cells, and other cells made up of primarily stromal and endothelial cells (see Supplementary Table S3A in [23]). The fractions of each of these cell types for each patient are presented in Table 1 below. We refer to these cell-type fractions as the ground-truth values from the experiment (GTVEs).
The other portion of the single-cell suspensions was used for bulk RNA sequencing (RNAseq). We downloaded this RNA-seq data for the four Melanoma patients from the "example data from EPIC" link on the EPIC web application (http://epic.gfellerlab.org) accessed on 28 September 2022. It can also be accessed from the Gene Expression Omnibus (GEO) repository [46] through the accession number GSE93722. This RNA-seq data consists of 49,902 genes quantified in transcripts per million (TPM) for each of the four melanoma samples.  1 These consist mostly of stromal (for example, cancer-associated fibroblasts (CAFs)) and endothelial cells.

Deconvolution Workflow
Our deconvolution workflow consists of partly sequential steps necessary to achieve efficient and accurate estimation of TIICs from bulk RNA-seq data. As illustrated in Figure 1, the input data comprise a tab-delimited text file of bulk RNA-seq samples and gene reference profiles. These inputs are first processed using a simple data filter algorithm, which identifies the genes common to both inputs and passes values of these genes in each input data across the filter. These values are then fed into the respective variables of our nonlinear framework. After that, the cell fractions per sample are computed using AMLA. We remark that suitable values of the parameters θ and δ can be estimated using a rigorous non-trivial pattern analysis of the bulk RNA-Seq data, as indicated in Figure 1, although no attempt was made in this direction in the present work.

Software Used
We implement the deconvolution workflow described in Figure 1

Software Used
We implement the deconvolution workflow described in Figure 1 as a Python package. The package was developed using the IDE (Integrated Development Environment) Py-Charm Community Edition 2021.2.1 version 212.5080.64 created by JetBrains s.r.o, Prague, Czech Republic, running Python 3.9.7 (64 bit) version 3.9.7150.0. The package contains three custom-made modules: the filtering algorithm, our nonlinear framework, and AMLA. The Pandas library was used to manipulate the import of sequencing data and reference profiles and export of estimated fractions for visualization.
To compare our method with two popular cell-type deconvolution methods, we also generated results from CIBERSORTx [22] and EPIC [23] using the web application versions of their software available at https://cibersortx.stanford.edu/ accessed on 30 September 2022 and http://epic.gfellerlab.org accessed on 28 September 2022, respectively. We performed all these software activities on an Intel ® Core™ i5-6300U CPU @at 2.40GHz with 8 GB RAM on a 64 bit Windows 10 Pro operating system.

Estimating Cell-Type Fractions in Four Melanoma Samples Using Our Nonlinear Framework, EPIC, and CIBERSORTx
We considered the bulk RNA-Seq dataset of four melanoma patients and the gene reference profile containing eight cell subsets, as described in Section 2.5. By inputting tabdelimited text files of these two datasets into the filtering algorithm, we obtained 2928 genes common to both datasets. The values for these specific genes in the respective datasets were passed across the filtering algorithm into the nonlinear framework. We estimated eight cell-type fractions present in each of the four melanoma samples using four different versions of our nonlinear framework. These versions, named according to the value of the hyperparameter θ in Equation (11) and the procedure for applying AMLA, include those described below.

1.
Equivalent linear model (ELM), expressed for θ = 1, such that Equation (11) becomes AMLA is then applied to approximate the solution.
2. Linearized nonlinear model (LNM), expressed for θ = 0.92, such that Equation (11) becomes However, we do not apply AMLA directly to Equation (16). Rather, we linearize it to obtain the form and thereafter apply AMLA to approximate the solution.
Nonlinear model two (NM2), expressed for θ = 1.08, such that Equation (11) becomes AMLA is then applied to approximate the solution. For all four versions enumerated above, we set the variable δ, such that δ = 1. Moreover, we initialized AMLA with distinct set values of v 0 and v 1 , creating a cartesian product for the patient series. Table 2 summarizes the hyperparameter values δ and θ chosen for the named versions of our nonlinear framework. Table 2. Hyperparameter values for named versions of our nonlinear framework (Equation (11)). We have already emphasized that the nonlinearity of our model (Equation (11)) strictly depends on the value of θ, which must be different from one. Furthermore, we remarked that the selection and tuning procedure for this parameter can be achieved analytically on the basis of the input gene sequencing datasets using rigorous data assimilation techniques. However, our primary goal in this work was to demonstrate that a nonlinear regression approach for the TIIC deconvolution problem could yield significantly more accurate estimates of the fractions of cell types, including TIICs from the bulk gene expression data of tumor samples. Thus, we favored an empirical approach for selecting the hyperparameter θ, in line with this goal.

Version
Our empirical approach relies on the observation that the relationship between the gene expression profiles of pure and heterogeneous samples is not strictly linear. Guided by this, we interpret the expression "not strictly linear" as slightly different from one, and this difference can be either side of one, i.e., greater than or less than one. Because θ is strictly positive, we looked at θ ∈ (0, 1) ∪ (1, +∞). Guided by our hypothesis presented in Section 2.1, we randomly selected 0.92 from the interval (0, 1). We also selected 1.08 from the interval (1, +∞) by considering the symmetric distance of the previous selection from one. The hyperparameter δ is a smoothing parameter. The literature is filled with several rigorous techniques for smoothing parameter estimation. Again, for the same reasons as in θ, we chose to assume a default value of one.
We present two results when estimating the cell-type fractions in the four melanoma samples using EPIC. The first result which we denote as "EPIC1" was obtained from the default setting of the EPIC web application (http://epic.gfellerlab.org) accessed on 30 September 2022. The default setting comprises tab-delimited inputs of bulk RNA-seq dataset as described in Section 2.5. Furthermore, it includes a reference profile of seven cell subsets (B cells, CAFs, CD4 T cells, CD8 T cells, endothelial, macrophages, and NK cells), built from tumor-infiltrating cells from TPM normalized scRNA-seq (see Supplementary  Table S2A in [23]). This reference profile contains 23,684 genes. The second result, which we denote as "EPIC2", was obtained using the datasets as described in Section 2.5.
Furthermore, we estimated the cell-type fractions in the four melanoma samples using the CIBERSORTx web application (https://cibersortx.stanford.edu/) accessed on 30 September 2022. The gene expression reference profile used was as described in Section 2.5. Here, we present two results for our cell-type fractions estimation using CIBER-SORTx. The first result, which we denote as "CIBERSORTx1", was obtained by checking the batch correction box, selecting B-mode, and then running CIBERSORTx, after the tabdelimited bulk RNA-seq and reference profile files were uploaded. The second result, which we denote as "CIBERSORTx2", was obtained following the same procedure as in the first result, with the only exception being to uncheck the batch correction box. In both estimations using CIBERSORTx, quantile normalization was disabled, and the permutation for significance analysis was set at 100. We present these results in Table 3. It is easy to notice that many values for the deconvolution methods NM1 and NM2 were identical across Table 3. Although NM1 and NM2 had different θ values of 0.92 and 1.08, respectively, both θ values had a symmetric distance of 0.08 about 1.00. This observation directly suggests that the scalar θ (for θ = 1) in our nonlinear framework exhibits some symmetry about one, such that we can expect identical results for two choices of θ with the same symmetrical distance about 1.00.

Estimated vs. Experimentally Measured Cell-Type Fractions in the Four Melanoma Samples
We directly compare the estimated cell-type fractions in the four Melanoma samples presented in Table 3 with the ground-truth values from experiment (GTVE) presented in Table 1. To be able to do this, we ignored the EPIC1 results (due to the absence of malignant fractions) and NM2 results (as they are almost identical to NM1 results). Then, we aggregated the fractions-macrophages, endothelial cells, and CAFs-together as other cells since they satisfied the definition of other cells, as in Table 1. For EPIC2, we also added the values of "other cells" specified in the footnote below Table 3. The comparison is readily visualized in Figure 2.
It is clear from Figure 2 that estimates from NM1 most closely matched the GTVE across all four samples, as indicated by the close resemblance of NM1 and GTVE stacks in all four samples. Even so, stacks from other versions of our nonlinear framework (ELM and LNM) appeared to more closely resemble the GTVE stacks in all four samples than the stacks from EPIC and CIBERSORTx.
However, to quantitatively describe the extent of these resemblances, we considered a general-purpose error metric known as the root-mean-squared error (RMSE). RMSE is excellent for comparing the prediction error of different models for a specific variable. As a result, it is an incredibly good measure of model accuracy. The most accurate model would have an RMSE of zero, which is far from possible. Therefore, the model accuracy is determined by how close the RMSE is to zero, although this determination is made relative to the values of the observations or predictions. presented in Table 3 with the ground-truth values from experiment (GTVE) presented in Table 1. To be able to do this, we ignored the EPIC1 results (due to the absence of malignant fractions) and NM2 results (as they are almost identical to NM1 results). Then, we aggregated the fractions-macrophages, endothelial cells, and CAFs-together as other cells since they satisfied the definition of other cells, as in Table 1. For EPIC2, we also added the values of "other cells" specified in the footnote below Table 3. The comparison is readily visualized in Figure 2. It is clear from Figure 2 that estimates from NM1 most closely matched the GTVE across all four samples, as indicated by the close resemblance of NM1 and GTVE stacks in all four samples. Even so, stacks from other versions of our nonlinear framework (ELM and LNM) appeared to more closely resemble the GTVE stacks in all four samples than the stacks from EPIC and CIBERSORTx.
However, to quantitatively describe the extent of these resemblances, we considered a general-purpose error metric known as the root-mean-squared error (RMSE). RMSE is excellent for comparing the prediction error of different models for a specific variable. As a result, it is an incredibly good measure of model accuracy. The most accurate model would have an RMSE of zero, which is far from possible. Therefore, the model accuracy is determined by how close the RMSE is to zero, although this determination is made relative to the values of the observations or predictions.
Here, we calculated the RMSE for NM1, LNM, ELM, CIBERSORTx1, CIBERSORTx2, and EPIC2 for all cell types of the four melanoma samples and then for TIICs only. In the latter, we also included the RMSE calculation for EPIC2. We present these results in Figure  3. We calculated the RMSE values using the formula Cell Fractions Here, we calculated the RMSE for NM1, LNM, ELM, CIBERSORTx1, CIBERSORTx2, and EPIC2 for all cell types of the four melanoma samples and then for TIICs only. In the latter, we also included the RMSE calculation for EPIC2. We present these results in Figure 3. We calculated the RMSE values using the formula

B cells CD4 T cells CD8 T cells NK cells other cells malignant cells
where N is the total number of analyzed cell-type subsets, x i is the GTVE of the analyzed cell-type subset, andx i is the estimated value of the analyzed cell-type subset from a deconvolution method. Values from Tables 1 and 3 were utilized in these calculations. NM1 can be seen to have the lowest RMSE in both charts of Figure 3. Remarkably, its RMSE value was significantly lower than that of all the other deconvolution methods compared and was extremely close to zero, being in the range of 0-0.02. This indicates that NM1 outperformed the other deconvolution methods and gave more accurate estimates that closely matched GTVE. The accuracy of NM1 can be directly attributed to the nonlinear framework used and the choice of the value of θ. Thus, we validated our hypothesis that a selection of θ slightly greater than or less than one on the order of a few decimal places accurately captures the not strictly linear sense of the relationship between the bulk gene expression and the reference profile. By using the same reference profile in the direct comparison of the deconvolution methods, we successful limited any chance that the results are a direct consequence of a factor other than the setting or framework of the deconvolution problem.
Cells 2022, 11, x FOR PEER REVIEW 13 of 24 where N is the total number of analyzed cell-type subsets, is the GTVE of the analyzed cell-type subset, and is the estimated value of the analyzed cell-type subset from a deconvolution method. Values from Tables 1 and 3 were utilized in these calculations. NM1 can be seen to have the lowest RMSE in both charts of Figure 3. Remarkably, its RMSE value was significantly lower than that of all the other deconvolution methods compared and was extremely close to zero, being in the range of 0-0.02. This indicates that NM1 outperformed the other deconvolution methods and gave more accurate estimates that closely matched GTVE. The accuracy of NM1 can be directly attributed to the nonlinear framework used and the choice of the value of . Thus, we validated our hypothesis that a selection of slightly greater than or less than one on the order of a few decimal

Discussion
The linear framework has been the dominating paradigm in the computational deconvolution of TIICs and other tumor cell subsets from bulk gene expression data of tumor samples. Efforts toward improving the accuracy of cell fraction estimates from the computational deconvolution of bulk gene expression data have been centered on modifications still built around the basic linear framework. For instance, in both charts of Figure 3, CIBER-SORTx1 had a slightly lower RMSE value when compared to CIBERSORTx2, indicating improved accuracy. Similar to the conclusion in [47], we attribute this improved accuracy to the batch correction effect in CIBERSORTx1, which aims to reduce data variability resulting from technical differences between samples. In another instance, in Figure 3b, EPIC1 had a lower RMSE value when compared to EPIC2, CIBERSORTx1, CIBERSORTx2, and ELM, possibly because of the use of a specific TIIC reference profile. On the other hand, in Figure 3a, ELM, a linear version of our nonlinear framework, had a lower RMSE value (more accuracy) when compared to EPIC2, CIBERSORTx1, and CIBERSORTx2. A likely reason is that the computational algorithm AMLA used in ELM projects directly onto the natural constraint set, which is much different from the computation in CIBERSORTx and EPIC, applied on some broadly defined constraints set, followed by a renormalization of the obtained values.
It is clear from Figure 3 that these modifications would only yield minor improvements in accuracy since they are all based on the linear framework. Notably, the RMSE values from these linear framework-based models revolved around a narrow range of 0.11-0.18 and 0.08-0.11 in Figure 3a,3b, respectively. We may not expect any result different from these ranges if we were to analyze these datasets using other deconvolution methods, for example [24][25][26][31][32][33], because they are all based on the linear model with variations being in the choice of loss function or other technical modifications.
As shown in Figure 3, the RMSE of NM1 is significantly low, and there is a remarkable difference between its RMSE range and that of the linear models. This observation demonstrates the enormous positive gains in terms of accuracy associated with modeling the deconvolution problem within a nonlinear framework, as it truly represents the natural state of the problem. We also emphasize through LNM results that any attempts to linearize the nonlinear framework before applying the solution algorithm (in this case, AMLA) would vastly diminish these positive gains. However, the outcome may still be slightly better than those from the traditional linear modeling. As evident from Figure 3, LNM RMSE values range from 0.07 to 0.10, which is significantly distant from the NM1 RMSE range (0-0.02) but much closer to the linear models' RMSE ranges. For this reason, AMLA was especially designed to approximate solutions directly from the nonlinear framework without the need to linearize first.
Furthermore, AMLA's design allows it to exhibit faster rates of convergence and runtime in comparison to other traditionally used algorithms in machine learning, which is highly advantageous in the event of enormous amounts of bulk gene expression data of many tumor series. Verifying these with simple numerical experiments on R (set of real numbers) using known loss functions employed in regression analysis is straightforward. We consider the log-hyperbolic loss, squared error loss, Cauchy loss, and ε-insensitive loss given, for example, by Equations (20)-(23), respectively.
For each of the Equations (20)-(23), we approximated their solutions using AMLA, the well-known classical gradient descent algorithm (CGDA), and Nesterov's accelerated gradient (NAG) widely used in machine learning. Since the optimal solutions of the equations are known, we measured the convergence to the solutions from AMLA, CGDA, and NAG using variations of (x n − x) n∈N , where x n is the labeling obtained at the n-th iteration, and x is the optimal solution. We plot this measure of convergence against the number of iterations in Figure 4.
For each of the Equations (20)-(23), we approximated their solutions using AMLA, the well-known classical gradient descent algorithm (CGDA), and Nesterov's accelerated gradient (NAG) widely used in machine learning. Since the optimal solutions of the equations are known, we measured the convergence to the solutions from AMLA, CGDA, and NAG using variations of (‖ − ̅ ‖) ∈ℕ , where is the labeling obtained at the n-th iteration, and ̅ is the optimal solution. We plot this measure of convergence against the number of iterations in Figure 4.   (20)), (b) squared error loss (Equation (21)), (c) Cauchy loss (Equation (22)), and (d) ε-insensitive loss (Equation (23)).
From Figure 4, we can observe that AMLA converged in a significantly fewer number of iterations than CGDA and NAG, for all four loss functions considered. The implication of this observation is that AMLA has a higher order of convergence and, consequently, a faster rate of convergence since the "order of convergence defines the rate of convergence" [48]. Furthermore, Figure 4 affirms the robustness of AMLA. A robust algorithm is one that is theoretically guaranteed to converge, "starting from any initial design estimate" [49], "such that their correctness is not destroyed by round-off errors" [50]. We show in Appendix A, using rigorous mathematical analysis, that AMLA is guaranteed to converge to a solution of the constrained optimization problem (Equation (2)) for a variety of loss functions satisfying the stated conditions. The four loss functions presented in Figure 4 satisfy the stated conditions, thus leading to their convergence in Figure 4, even when CGDA and NAG did not converge (see Figure 4a,c). Moreover, as shown in Figure 4a-d, once AMLA converged, it maintained the flat zero line, which highlights its gross insensitivity to round-off errors, thus affirming its high robustness.
A very noteworthy affirmation of the high robustness of AMLA can be seen in Figure 4d. The ε-insensitive loss introduces extra parameters of ρ and ε whose selection can critically affect the convergence and robustness of any machine learning algorithm. As shown in Figure 4d, when we randomly selected ρ = 0.5 and ε = 0.2, AMLA converged in fewer than 20 iterations while CGDA was yet to converge beyond 20 iterations. Furthermore, when we randomly selected ρ = 0.8 and ε = 0.4, AMLA converged in about seven iterations, while CGDA converged in about nine iterations. These observations from Figure 4d clearly indicate that the values of the parameters ρ and ε are significant determinants of the rate of convergence of AMLA. In fact, from Figure 4d, we can see that the selection of ρ and ε can either increase or decrease the rate of convergence of AMLA. However, it does not affect the robustness of AMLA, as AMLA is guaranteed to converge regardless of the values of ρ and ε. It is important to point out that, as seen in Figure 4d, AMLA still outperformed CGDA for the chosen values of ρ and ε.
Overall, this proof of principle demonstrates that fully accurate and efficient computational deconvolution of tumor bulk gene expression data for estimation of the proportions of cell-type fractions, including TIICs, is best achievable using a nonlinear optimization framework, whose solution can be approximated by an accelerated machine learning algorithm (AMLA). This nonlinear optimization framework truly captures the natural state of the deconvolution problem. Consequently, in the future, we will implement the entire deconvolution workflow, described in Figure 1, as a cloud-based tool with a user-friendly graphical interface, which we shall call NECSTGEP (Naturally Estimating Cell-type Subsets from Tumor Gene Expression Profiles). NECSTGEP will be equipped with an additional rigorous machine learning algorithm that will be able to automatically fix the model parameters and the initialization values for AMLA, using pattern analysis of the input bulk gene expression profiles, as well as the reference profile. This is very essential as this proof of principle has shown a crucial role in yielding highly accurate estimation results. Similar problems in biology, oncology, and medical physics, such as the optimal scheduling of combined cancer therapies and reconstruction of gene regulatory networks, parade similar levels of complexity. Thus, they can benefit from an application of the technique described thus far, to yield highly accurate results.

Conclusions
We introduced and discussed a nonlinear constrained optimization framework for the computational deconvolution of TIICs and other tumor cell-type subsets from tumor bulk gene expression profiles, in addition to an accelerated machine learning algorithm (AMLA) for directly approximating its solution. Our analysis using real tumor transcriptomics datasets concluded that this nonlinear approach yields values closely matching ground-truth values from experiment, because it treats the problem in its natural state. Models NM1 and NM2 produced the "best" values for the estimated cell-type fractions in this study and were significantly different from those obtained using the traditional linear modeling approach. However, one main limitation of the study is the empirical choice of model hyperparameters, which will be addressed in future studies. This study, therefore, heralds a paradigm shift away from the traditional linear modeling of the TIIC deconvolution problem.   Acknowledgement: The authors are grateful for the comments of the two anonymous reviewers which helped to improve this work.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A
Let C be a nonempty, closed, and convex subset of a Hilbert space H.

Definition 2. A mapping Γ (a) is said to be monotone on C if and only if
A monotone operator is said to be maximal monotone if it has no proper monotone extension.

Lemma 7.
[54] Let {u n } n≥0 ⊆ H be a bounded sequence in H. Then, there exists a subsequence u n j j≥0 ⊆ {u n } n≥0 that converges weakly (denoted as u n j u * (say)) in H. Note that, for H = R N , strong and weak convergence coincide.
Lemma 8. Let {u n } n≥0 and {z n } n≥0 be real sequences such that u n ≥ 0 ∀ n ≥ 0, lim n→∞ n ∑ i=0 z i ∈ R and u n+1 ≤ u n + z n ∀ n ≥ 0. Then lim n→∞ u n exists in R.
Theorem 12. Let g be a real valued mapping on C, bounded below, and whose Frechet derivative denoted by A is K-Lipschitz. Let v j j≥0 be the sequence generated iteratively by AMLA (see Equation (12)), where the control parameters α j and λ are chosen in R such that Proof. According to Lemma 6, we have that Thus, we have that Define the sequence f v j+1 j≥1 as Then, f v j+1 j≥1 is bounded below, and it follows that Therefore, Theorem 13. Let the assumptions of Theorem 12 hold. Suppose further that A is monotone; denote by S Cg the set of critical points of g. Let v * ∈ S Cg and v j j≥0 be the sequence generated by AMLA. Then, the following applies: • v j j≥0 is bounded; • the sequence v j j≥0 converges waekly to a critical pointv of g ; moreover, if g is convex, then v is a minimizer of g Proof. Let u j := w j − λAv j and τ j := 2λ g v j − g v j+1 + λK v j+1 − v j 2 . Now, using the definition of v j and w j , Definition 1, the fact that v * ∈ S Cg , and the monotonicity of A, we get the following estimation: Next, according to Lemma 6, Therefore, v j j≥0 is bounded.
Since lim Hence, lim Now, we show that every weak sequential limit of v j j≥0 is in S Cg . Let W s v j denote the set of weak sequential limits of v j j≥0 . Then, the boundedness of v j j≥0 guarantees that W s v j is nonempty (see Lemma 7). Letv ∈ W s v j ; given Definition 4, it suffices to show that v −v, Av ≥ 0 ∀ v ∈ C. By definition of W s v j , there exists a subsequence v j k k≥0 ⊆ v j j≥0 such that where T C is the normality operator. According to Lemma 9, G is maximal monotone and 0 ∈ G(v) if and only if v ∈ S Cg . Thus, it suffices to show that (v, 0) ∈ graph(G) = {(v, u) ∈ H × H : v ∈ dom (G), u ∈ G(v)}. We recall that, for any maximal monotone operator Γ, if Γx − z, x − y ≥ 0 ∀ x ∈ dom Γ and z ∈ H, then z ∈ Γ(y).
Thus, let (v, u * ) ∈ graph(G) be arbitrary, u * , v −v ≥ 0 =⇒ 0 ∈ G(v) . Hence, we only need to verify that u * , v −v ≥ 0. Now, (v, u * ) ∈ graph(G) =⇒ u * − Av ∈ T C (v) =⇒ u * − Av, v − u ≥ 0 ∀ u ∈ C . However, by definition, v j+1 = ℘ C w j − λAv j , thus, Taking limit as k → ∞ , we get u * , v −v ≥ 0. Therefore,v ∈ S Cg and W s v j ⊆ S Cg . Finally, we show that v j v ∈ S Cg . To verify this, it is enough to show that W s v j is a singleton. We proceed by contradiction. Suppose there exist v , v ∈ W s v j with v = v , then there are subsequences v j k k≥0 , v j i i≥0 of v j j≥0 such that v j k v and v j i v . Therefore, according to Lemma 10, we have that This is a contradiction since lim j→∞ v j − v is never less than itself.
Hence, the sequence v j j≥0 converges weakly to a critical point of g. Furthermore, if g is convex, then, from 0 ≤ g(v) − g(v) − v −v, Av and v −v, Av ≥ 0 ∀ v ∈ C, we obtain g(v) ≤ g(v) ∀ v ∈ C. Thus,v is a minimizer of g.

Theorem 14.
Let the assumptions of Theorem 12 hold for H = R n . In addition, assume further that C is bounded and g possesses the KL property. Then, the sequence v j j≥0 generated by AMLA converges to an element of S Cg .
Proof. By definition, v j ∈ C ∀ j ≥ 0. Hence, the sequence v j j≥0 is bounded, and W s v j = φ. We now show that W s v j ⊆ S Cg . Letv ∈ W s v j be arbitrary; then, there is a subsequence v j k k≥0 of v j j≥0 such that v j k →v as k → ∞ . Using Theorem 12(b) and the definition of w j , we get lim This implies that w j k →v and v j k +1 →v as k → ∞ . Furthermore, the continuity of A guarantees that Av j k → Av as k → ∞ . Now, let v ∈ C be arbitrary; then, by definition of Taking the limit as k → ∞ , we have that v −v, λAv ≥ 0 =⇒ v −v, Av ≥ 0 ∀ ∈ C . Hence, W s v j ⊆ S Cg .
Next, we make use of Lemma 11 to prove that v j →v . For this, we define the map h : R n × R n → R ∪ {+∞} as follows: h(x, y) = g(x) + ω x − y 2 + I C (x) := F(x, y) + I C (x), where I C (x) = 0 , i f x ∈ C +∞ , otherwise ; then, h has the KL property on dom(h).
Moreover, the boundedness of v j j≥0 and continuity of h on C imply that there exists a subsequence x jk k≥1 of x j j≥1 such that x jk →x = (v,v) and h x jk → h(x) , as k → ∞ , thus verifying H3 of Lemma 11. Next, we show that H2 of Lemma 11 also holds. First, we recall that An estimation using the definition of w m and Lipschitz continuity of A gives Therefore, Lemma 11 guarantees that x j −→x = (v,v) . Since x j = v j , v j−1 , we have that v j −→v ∈ W s v j ⊆ S Cg.