Method for Accurate Detection of Amino Acids and Mycotoxins in Planetary Atmospheres

We present a systematic analysis of a large number of mass spectra accumulated as the number of ion fragments recorded in unit mass-to-charge detector channels. The method retrieves the abundances of detected species using an efficient deconvolution algorithm, which relies on fragment pattern recognition, mass calibration, and background correction. The abundance analysis identifies target species, amino acids, and mycotoxins through their characteristic fragmentation patterns in the presence of an increasing number of interfering species. The method offered robust and efficient retrieval of abundances of metabolic molecules in complex mixtures obscured by a wide range of toxic compounds.


Introduction
The in situ exploration of organic environments in the Solar System [1] as signatures of life outside the boundaries of Earth is the most recurring scientific objective in past [2][3][4], recent [5], and future missions [6][7][8]. The search for traces of organic life within our Solar System relies on identifying the building blocks of life, proteins, which are directly linked to the synthesis of amino acids by living organisms [9]. These bio-signatures are not expected to be abundant for most target planets [10][11][12]. Thus, instruments used to investigate them must be sensitive enough to distinguish them unambiguously. In addition, identification through the signature of life amidst organic matter is a difficult task due to sampling site selection [13], biological chirality [14], and molecular diversity [15,16].
Mass spectrometers (MSs) are the instruments of choice for in situ composition analysis of planetary atmospheres and planetary body exospheres [17]. These instruments are classified as ion MSs and neutral MSs, depending on the charge state of the analyzed molecules. Ion MSs rely on studied molecules being electrically charged external to the MS sensor, whereas neutral MS generates ions internally. Both MS types, regardless of ionization methods, differentiate ions by their respective mass-to-charge ratio. Contrarily to the Earth-bound MS, their spaceflight counterparts are much smaller. Scientific payloads comprise less than 10% of the total spacecraft weight. Mass spectrometers are subject to the ever-present need to reduce their sizes by orders of magnitude relative to state-of-the-art laboratory instruments. The scientific requirements often balance this scale-down to retain most analytical capabilities.
This study will not discuss the main differences between MSs in detail, nor will it address the sample collection and preparation methods. Our starting point is the sample as neutral gas, at a given temperature, within an MS. This is a realistic scenario of any current and future investigation where one measures the mass-to-charge ratio signature of an unknown sample and tries to identify or untangle all the species within-all while keeping in mind that in space-exploration scenarios the sample is minute, the time allotted is limited,

Methodology
Governed by the notion that organisms tend to minimize the metabolic cost of protein biosynthesis and, at the same time, maximize the number of amino acid combinations, Krick et al. [9] deduced relative probabilities P (c) for twenty amino acids listed in Table 1 contained within proteins. These protein-coding amino acids are indexed as target compounds (c) in the first column of Table 1. Their names with unique three-letter codes in parenthesis are given in the second column (name), followed by NIST [37] mass spectrum identification number (NIST EII#) relevant for species fragmentation due to electron impact ionization (EII). We note that our previous experimental MS/MS study [28] used the combination of the EII and the soft chemical ionization in liquid mixtures of alanine, glycine, methionine, phenylalanine, and serine. The fourth column (formula) is a chemical formula followed by metabolic probabilities (P (c) ) with which amino acid is likely to be found in living organisms. These probabilities indicate that the least abundant amino acids are tryptophan and cysteine, whereas leucine is the largest. The last column in Table 1 shows a fragmentation similarity, A (c) , of the given amino acid compared to all others. This descriptor takes values from zero (no common fragments exist in all other amino acids) up to 19 (one less than the total number of investigated amino acids). The upper limit, A (c) = 19, means that the fragmentation of a given amino acid (c) is identical to all others.
m , are derived from the NIST EII database [37] such that the f (c) is a mass spectrum for the species (c) normalized to unity dot product . All unpopulated mass channels m within the given species, are assigned zero fragmentation weight, α (c) m = 0. For example, only asparagine contributes the ion fragment m = 24 Da, and none of the amino acids have fragments in the 20 ≤ m ≤ 23 Da mass range. The global mass range is selected to contain all masses between the minimum and the maximum populated mass channel within the given set of compounds, which in this study is from 12 to 206 Da. The illustration of → f (c) spectra at mass resolution ∆m = 1 Da for all twenty amino acids is shown in Figure 1a. These are used in deciding if an ion fragment will be added to a particular mass channel m or not, in an automated procedure using the TrapParticle module from the CITA package [21]. We provide → f (c) spectra and a prescribed number of fragments N f as input to the TrapParticle module. Here we describe a decision procedure on how to distribute a prescribed number of fragments N f originating from any single species (c) in Table 1

Generation of Mixture Reference Mass Spectrum
Each species listed in Table 1 can be uniquely represented as a multi-dimensional vector, ⃗ ( ) = ∑ ( ) ⃗ , where ( ) is a measure of likelihood for the compound (c) to contribute an ion fragment of mass . Eigenvectors ⃑ are mutually orthogonal ( ⃗ • ⃗ = 0, ≠ ) and normalized to unity ( ⃗ • ⃗ = 1, = ′). The fragmentation probabilities, ( ) = ( ) • ( ) , are derived from the NIST EII database [37] such that the ⃑ ( ) is a mass spectrum for the species (c) normalized to unity dot product ( ⃗ ( ) • ⃗ ( ) = ∑ ( ) = 1). All unpopulated mass channels within the given species, are assigned zero fragmentation weight, ( ) = 0. For example, only asparagine contributes the ion fragment = 24 Da, and none of the amino acids have fragments in the 20 ≤ ≤ 23 Da mass range. The global mass range is selected to contain all masses between the minimum and the maximum populated mass channel within the given set of compounds, which in this study is from 12 to 206 Da. The illustration of ⃗ ( ) spectra at mass resolution ∆ = 1 Da for all twenty amino acids is shown in Figure 1a. These are used in deciding if an ion fragment will be added to a particular mass channel or not, in an automated procedure using the TrapParticle module from the CITA package [21]. We provide ⃗ ( ) spectra and a prescribed number of fragments as input to the TrapParticle module. Here we describe a decision procedure on how to distribute a prescribed number of fragments originating from any single species (c) in Table 1 over the global mass range 12 ≤ ≤ 206 Da.  Table 1 are characterized by: (a) fragmentation probabilities, ( ) , and their metabolic abundances ( ) used in the preparation of the reference mass spectrum ⃑ shown in (b). Insets show increasing complexity of the mass spectra above 100 Da due to the total number of detected ion fragments, . See text for details.
For each mass channel, , we find fragments that contribute to it, ( ) > 0, and we draw a uniform random number, r, with values between 0 and 1. If r≤ ( ) , we assign one ion fragment from the compound (c) to this mass channel and store it in the "ion cloud" format, which uniquely describes the sampling time, mass ( ) and charge ( ), position vector, thermal velocity vector, and its compound ancestry (c). We repeat the process until the total number of fragments, , is distributed over the global mass range, 12 ≤ ≤ 206 Da. For a large number of fragments, > 10 , even least-probable ion frag- For each mass channel, m, we find fragments that contribute to it, m, we find fragments that contribute to it, π (c) m > 0, and we draw a uniform random number, r, with values between 0 and 1. If r≤ π (c) m , we assign one ion fragment from the compound (c) to this mass channel and store it in the "ion cloud" format, which uniquely describes the sampling time, mass (m) and charge (q), position vector, thermal velocity vector, and its compound ancestry (c). We repeat the process until the total number of fragments, N f , is distributed over the global mass range, 12 ≤ m ≤ 206 Da. For a large number of fragments, N f > 10 6 , even least-probable ion fragments, for which π (c) m < 10 −6 , may appear in the sampled "ion cloud" if the condition N f ·π (c) m > 1 is satisfied. The "ion cloud" is then binned into ∆m = 1 Da wide bins concerning their mass-to-charge ratio m/q such that the number of m . The procedure mentioned above for creating the reference mass spectrum for a single compound (c) can be expanded for arbitrary compound mixtures, re f = 1, and such that η (c) re f represents the relative abundance of compound (c) in the reference mixture. One example of such a mixture is given in Table 1, where η (c) re f = P (c) represents the metabolic cost probability that a given amino acid (c) is present in living organisms [9]. The generation of the mixture reference mass spectrum, for a priori known compound abundances η (c) re f is automated by using the TrapParticle tool from the CITA suite of codes [21]. As previously described, for each mass channel, m, we find all compounds (c) that contribute to this mass channel, η m > 0, and we draw a random number r from the uniform distribution, 0 ≤ r ≤ 1. If obtained random value r satisfies the r ≤ η   Table 1. Less probable fragments (for m > 100 Da) will be suppressed due to the insufficient count statistics when N f < 10 3 . The issue of mass spectrum similarity among different compounds is partly due to the unit mass resolution, ∆m = 1 Da, where fine differences in masses of neighboring isobars (fragments with similar masses) will disappear when these mass peaks merge into 1 Da wide mass bins. At this low mass resolution, the dissimilarity between compounds is improved by increased counting statistics, N f > 10 6 . In this case, all less-probable fragments start appearing in the reference mass spectrum → R re f and contribute to dissimilarity of compounds, as illustrated in Figure 1b insets. Therefore, throughout this study, we repeat the analysis of a large number of mixtures that form the reference mass spectra, each containing increasing number of ion fragments N f = 10 2 − 10 6 , distributed in mass channels m with statistical uncertainty N m ± √ N m . Since each ion fragment of mass m present in the "ion cloud" mixture has its own ancestry (c), we a priori know how many of them are due to each compound (c), N (c) m , and thus the total number of ion fragments due to each compound (c), m . Furthermore, we also know the total number of ion fragments, N f = ∑ c N (c) , that is present in the reference mass spectrum → R re f , and hence we know abundances η (c) re f = N (c) /N f of each compound that was used to make the mixture.

Deconvolution of Reference Mass Spectrum
The deconvolution consists of evaluating different trial mixtures of candidate compounds, with respect to the trial abundance, η (c) , by using the iterative constrained least-square random walk method [38].
The standard iterative random-walk procedure starts with zeroed initial abundances for each compound, η (c) init = 0, and successively updates their values η (c) → η (c) ± δ with the fixed step size δ such that the dissimilarity distance ∆R is at the global minimum. Upon convergence, we report final abundances, η (c) f inal , for each compound in the reference mixture and compute the converged retrieval errors, ε function of the total number of fragments, N f : We repeat the random walk retrieval procedure ten times for several fixed numbers of fragments, 10 2 ≤ N f ≤ 10 6 , and fit the converged retrieval errors, ε  Table 2 together with their standard uncertainties. Table 2. Values of fitting parameters for the retrieval errors, see Equation (1), of amino acid abundances listed in Table 1. Standard deviations are enclosed in brackets. (15) (3)%. The same error is reduced to 4.42(85)%for retrieval of cystein from the spectrum with N f = 10 6 fragments. It is evident from Table 2 that the mass spectrum with the N f = 10 4 fragments can be deconvoluted to better than 16% for most amino acids except for the cystein, and for increased N f = 10 6 number of fragments, the retrieval accuracy is better than 5% for all amino acids studied here.
In this study we explored a novel approach to initiate the standard iterative randomwalk procedure using an estimate for the initial values of the unknown abundances, such that η (c) init = 0. For any two compounds, (c) and (c ), we compute their fragmental similarity value,  Table 1.
The explicit form for the square of the residual function, is minimized with respect to unknown abundances ( ) by finding the first derivatives ( ) and setting them to zero for every compound (c) in Table 1, which reduces to the following linear system of equations, where coefficients, ( ) = ⃗ • ⃗ ( ) = ∑ ( ) , are a priori known for the given mixture of amino acids from Table 1. The largest contributions to the residual function come from fragmentally similar compounds ( 0.6 ≤ , ≤ 1 ), and to quantify the uniqueness of each mass spectrum shown in Figure 1a, we compute the overlap weights, ( ) = ∑ , , where ≠ , and list them in the last column of Table 1. Compounds with the smallest ( ) weights (trp, tir, his, phe, pro) form a core subset for estimating the initial trial abundances, ( ) , to which other compounds can be added as long as Equation (2) Table 1, except for cys (131%), his (74%), met (53%), and trp (185%). Using these initial estimates, ( ) , we proceed with the standard random walk minimization algorithm, which now needs fewer iterations to converge to the same final ( ) abundances. The overall speedup depends on the iterative step size, , and the total number  Table 1.
The explicit form for the square of the residual function, , is minimized with respect to unknown abundances η (c) by finding the first derivatives ∂∆R 2 ∂η (c) and setting them to zero for every compound (c) in Table 1, which reduces to the following linear system of equations, where coefficients, B (c) m , are a priori known for the given mixture of amino acids from Table 1. The largest contributions to the residual function ∆R 2 come from fragmentally similar compounds (0.6 ≤ A c,c ≤ 1), and to quantify the uniqueness of each mass spectrum shown in Figure 1a, we compute the overlap weights, where c = c, and list them in the last column of Table 1. Compounds with the smallest A (c) weights (trp, tir, his, phe, pro) form a core subset for estimating the initial trial abundances, init , to which other compounds can be added as long as Equation (2) [38,39], which we used here to retrieve the final η (c) f inal abundances iteratively from reference mass spectra shown in Figure 1b. Namely, by inverting the Equation (2) for the increasing values of the total number of fragments (N f ≤ 10 6 ), the initial retrieval − 1 remain below 35% for most amino acids listed in Table 1, except for cys (131%), his (74%), met (53%), and trp (185%). Using these initial estimates, init , we proceed with the standard random walk minimization algorithm, which now needs fewer iterations to converge to the same final η  (2) will be reported elsewhere, and our preliminary findings suggest speedups of at least an order of magnitude for a large number of ion fragments (N f > 10 6 ). Acceleration of convergence is useful in situations when the chemical composition of atmospheric samples needs to be reported once per second, as is the case onboard the International Space Station, where the QITMS instrument monitors the cabin air composition [25].

Robustness Tests
For extraterrestrial mass spectrometry applications, where the increasing number of interfering species obscures the detection of life-bearing amino acids, the important metric is the robustness of the multi-dimensional Monte-Carlo random walk algorithm [38,39]. Interfering species, hereafter called confounders, are molecules with the same parent mass as the target compounds but may have different fragmentation patterns due to differences in the chemical structure. The simple case of a few target species and a small number of corresponding confounders is given in Table 3. Target compounds (t) are mixed with their respective confounders (c-n) to form the reference mixtures used in deconvolution studies. In a single reference mixture, each confounder enters with the constant unit weight (ω c−n = 1), whereas the corresponding target compound is added according to prescribed weights (ω t = 0, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, and 1). Every reference mixture is described by its unique ion fragment probability distribution → R mix (mass spectra with a variable number of ion fragments, N f =10 3 , 10 4 , 10 5 , or 10 6 ) which is a weighted sum of individual fragment distributions → f (c) . In this manner, the decision of whether the given mass channel m in the reference mixture mass spectrum → R mix is to be populated by another single ion fragment is governed by a priori known reference The decision of whether an ion fragment will be added to a particular mass channel m or not is automated using the TrapParticle module from the CITA package [21]. Namely, each accepted ion fragment is stored in the "ion cloud" format, which uniquely describes the sampling time, mass (m) and charge (q), position vector, thermal velocity vector, and its ancestry ((t) for the target or (c-n) for confounder, see Table 3). Due to its statistical nature, no two ion clouds of the same size N f are the same for any prescribed target mixing ratio, ω t . Individual fragment distributions for compounds in Table 3 were generated using canonical SMILES codes as input to the CFM-ID [40] algorithm.  and → f (c−n) at 10, 20, and 40 eV relative collision energies, and we use them in Equation (3) to prepare reference mixtures → R mix with equipartial confounders (ω c−n = 1) and variable target weights (ω t = 0, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, and 1). We then apply a random walk algorithm [38,39] to retrieve the target abundance η (t) and compare them to the reference target abundances contained in → R mix . The efficiency of retrieval is illustrated in Figure 3 for the tyrosine (tyr) and its five confounders (c-1, . . . , c-5). Each confounder is represented by its CID fragmentation pattern → f (c−n) and mixed equipartially with other confounders resulting in the mass spectrum shown in Figure 3 as grey bars. Consequently, each mass channel belonging to tyrosine is obscured by the contributions of several different confounders. If the reference mixture contained N f = 51,000 ion fragments, then 1023 fragments belonged to tyrosine (1:10 mix ratio to any confounder, i.e., ω t=tyr = 0.1). These a priori known reference ion fragments are marked as black caps in Figure 3. The random walk program retrieved 932 tyrosine ion fragments (see green bars in Figure 3) from the reference mass spectrum → R mix , which is 8.9% accuracy with 3.3% precision. If the reference mixture contained tyrosine in a 1:100 mix ratio with respect to any confounder (ω t=tyr = 0.01), and the mass spectrum contained N f = 501,000 ion fragments (1006 due to tyrosine), the random walk program retrieved 1048 tyrosine fragments-a 4.2% accuracy with 3% precision.  As an example of how well the random walk retrieval algorithm performs against the increasing number of confounders (n > 10), we used EII fragmentation patterns of four mycotoxins: citrinin (NIST#: 241948), patulin (NIST#: 53239), ochratoxin-B (NIST#: 64340), and zearalenone (NIST#: 290624). Mycotoxinsare fungal secondary metabolites, e.g., fusarium fungi, commonly present as hazardous contaminants in cereal-growing regions [41]. We use them here as representatives of biosignatures recently hypothesized by Limaye and collaborators [42,43] to be dissolved in acidic aerosols that form haze and the cloud layer of Venus. We generated 165 reference mass spectra ⃗ for each target mycotoxin as mixtures of up to n = 15 confounders, all present with the unit weights ( = ⋯ = = 1) but with the variable target weights ( = 0, 0.1, 0.2, …, 0.9, 1.0). Figure 4 shows that the retrieval method yields no false positives. Namely, when the target mycotoxins are absent from the reference mass spectrum, = 0, the random walk algorithm correctly reports zero abundance, ( ) = 0. As the target mycotoxin reference weight slowly increases towards equipartial mixtures, = 1, retrieval errors remain below 3% only if number of confounders is n ≤ 6, and retrieval errors tend to stay below 6% for n ≤ 15. The tyrosine reference spectrum is shown in black caps, whereas the retrieved tyrosine spectrum is marked in green bars. The equipartial mixture of interfering confounders and tyrosine in a 1:10 mix ratio is shown in gray bars. See text for details.
The retrieval error gets reduced with the improved counting statistics N f , which we can illustrate in the example of citrulline and its confounders (third row in Table 3). With a mixture containing N f = 10 6 ion fragments and 1:1 mixing ratio for citrulline (ω t=cit = 1) the random walk retrieved 166,019 out of the initially created 166,490 citrulline fragments, which is 0.28% accuracy with 0.25% precision.
Similar retrieval accuracies were obtained for other target species found in Table 3. For example, if reference mass spectra contained N f = 51,000 ion fragments and the target mixing ratio was 1:10 to all confounders (ω t = 0.1), retrieving accuracies were: 13.2% (pal), 2.9% (arg), 1.1% (lys), 6.1% (orn), 5.8% (gly), and 3.5% (ser). Generating CID fragmentation patterns, → f (c−n) , for increasing number of confounders (n > 10), using CFM-ID [40] algorithm is tedious and requires the knowledge of canonical SMILES codes for each interfering molecule. In that respect, NIST [37] database containing EII (70 eV) fragmentation patterns offers an automated method for compiling a large number of confounders by simply searching the library for the molecules with the same mass as a target compound.
As an example of how well the random walk retrieval algorithm performs against the increasing number of confounders (n > 10), we used EII fragmentation patterns of Mycotoxins are fungal secondary metabolites, e.g., fusarium fungi, commonly present as hazardous contaminants in cereal-growing regions [41]. We use them here as representatives of biosignatures recently hypothesized by Limaye and collaborators [42,43] to be dissolved in acidic aerosols that form haze and the cloud layer of Venus. We generated 165 reference mass spectra → R mix for each target mycotoxin as mixtures of up to n = 15 confounders, all present with the unit weights (ω c−1 = · · · = ω c−n = 1) but with the variable target weights (ω t = 0, 0.1, 0.2, . . . , 0.9, 1.0). Figure 4 shows that the retrieval method yields no false positives. Namely, when the target mycotoxins are absent from the reference mass spectrum, ω t = 0, the random walk algorithm correctly reports zero abundance, η (t) f inal = 0. As the target mycotoxin reference weight slowly increases towards equipartial mixtures, ω t = 1, retrieval errors remain below 3% only if number of confounders is n ≤ 6, and retrieval errors tend to stay below 6% for n ≤ 15. Figure 3. Deconvoluted CID (40 eV) mass spectrum with Nf = 51,000 ion fragments due to the mixture of tyrosine and its five confounders shown both in the linear (left) and the logarithmic (right) scale. The tyrosine reference spectrum is shown in black caps, whereas the retrieved tyrosine spectrum is marked in green bars. The equipartial mixture of interfering confounders and tyrosine in a 1:10 mix ratio is shown in gray bars. See text for details.
As an example of how well the random walk retrieval algorithm performs against the increasing number of confounders (n > 10), we used EII fragmentation patterns of four mycotoxins: citrinin (NIST#: 241948), patulin (NIST#: 53239), ochratoxin-B (NIST#: 64340), and zearalenone (NIST#: 290624). Mycotoxinsare fungal secondary metabolites, e.g., fusarium fungi, commonly present as hazardous contaminants in cereal-growing regions [41]. We use them here as representatives of biosignatures recently hypothesized by Limaye and collaborators [42,43] to be dissolved in acidic aerosols that form haze and the cloud layer of Venus. We generated 165 reference mass spectra ⃗ for each target mycotoxin as mixtures of up to n = 15 confounders, all present with the unit weights ( = ⋯ = = 1) but with the variable target weights ( = 0, 0.1, 0.2, …, 0.9, 1.0). Figure 4 shows that the retrieval method yields no false positives. Namely, when the target mycotoxins are absent from the reference mass spectrum, = 0, the random walk algorithm correctly reports zero abundance, ( ) = 0. As the target mycotoxin reference weight slowly increases towards equipartial mixtures, = 1, retrieval errors remain below 3% only if number of confounders is n ≤ 6, and retrieval errors tend to stay below 6% for n ≤ 15.  Similar results were obtained for amino acid targets with an increasing number of confounders included in the reference mass spectrum ⃗ with Nf = 10,000 ion fragments. Figure 5 illustrates that the maximum retrieval errors for alanine, aspargine, glutamine, and serine remain below 3.6%, with the number of confounders n ≤ 15. Further increase in the number of interfering species (n ≤ 24, n ≤ 48, and n ≤ 96) was studied only for the citrulline and the ornithine targets from Table 3  Under these stress conditions, we are using the a priori known fragmentation probabilities ( ) for targets when generating the reference mass spectra ⃗ with Nf = 10 6 fragments. Targets and confounders were mixed under equipartial conditions ( = − = 1). However, to increase the stress during the retrieval procedure, we enforced an additional 15% random noise on each fragmentation probability ( ) belonging only to the confounding species. This modification represents the uncertainty with which potential interfering species are known in advance in extraterrestrial atmospheres. In addition, by randomly perturbing counts in each confounder mass channel , we introduce the background noise that may be present in the experimental mass spectrum. Results for citrulline show that retrieval error changes from 0.27% for 24 confounders and 0.6% for 48 confounders to 3.8% for 96 confounders. In the case of ornithine, these errors were 0.2%, 5.2%, and 8.9%, respectively. Similar results were obtained for amino acid targets with an increasing number of confounders included in the reference mass spectrum → R mix with N f = 10,000 ion fragments. Figure 5 illustrates that the maximum retrieval errors for alanine, aspargine, glutamine, and serine remain below 3.6%, with the number of confounders n ≤ 15. Further increase in the number of interfering species (n ≤ 24, n ≤ 48, and n ≤ 96) was studied only for the citrulline and the ornithine targets from Table 3 but using their EII (70 eV) fragmentation patterns → f (c) . Under these stress conditions, we are using the a priori known fragmentation probabilities α (t) m for targets when generating the reference mass spectra → R mix with N f = 10 6 fragments. Targets and confounders were mixed under equipartial conditions (ω t = ω c−n = 1). However, to increase the stress during the retrieval procedure, we enforced an additional 15% random noise on each fragmentation probability α (c) m belonging only to the confounding species. This modification represents the uncertainty with which potential interfering species are known in advance in extraterrestrial atmospheres. In addition, by randomly perturbing counts in each confounder mass channel m, we introduce the background noise that may be present in the experimental mass spectrum. Results for citrulline show that retrieval error changes from 0.27% for 24 confounders and 0.6% for 48 confounders to 3.8% for 96 confounders. In the case of ornithine, these errors were 0.2%, 5.2%, and 8.9%, respectively.

probabilities
( ) for targets when generating the reference mass spectra ⃗ with Nf = 10 6 fragments. Targets and confounders were mixed under equipartial conditions ( = − = 1). However, to increase the stress during the retrieval procedure, we enforced an additional 15% random noise on each fragmentation probability ( ) belonging only to the confounding species. This modification represents the uncertainty with which potential interfering species are known in advance in extraterrestrial atmospheres. In addition, by randomly perturbing counts in each confounder mass channel , we introduce the background noise that may be present in the experimental mass spectrum. Results for citrulline show that retrieval error changes from 0.27% for 24 confounders and 0.6% for 48 confounders to 3.8% for 96 confounders. In the case of ornithine, these errors were 0.2%, 5.2%, and 8.9%, respectively. The fragmentation process of parent molecules differs for different ionization methods. In contrast to the EII, where cross sections for electron ionization at impact energies of 70 eV are standardized, the electrospray ionization (ESI) methods vary. ESI converts solutionphase parent analytes into gas-phase ions, which are then electrostatically extracted into MS at various voltages (10 V, 20 V, and 40 V). Assuming parent gas-phase ions are singlycharged and dependent on 10 eV, 20 eV, or 40 eV kinetic energies, they undergo different fragmentation scenarios in collisions with the solvent vapor and buffer gas (CID), including protonation. Therefore, the distribution of CID fragmentation patterns, as predicted by the CFM-ID [40] tool, changes with the relative collision energies of parent ions and the pressure of buffer gas. The success of the Random Walk retrieval method will always depend on how well fragmentation patterns are known for the given ionization strategy. Most target compounds listed in Table 3 at 10 eV collision energies yield fewer than seven fragments. Still, at 20 eV, additional smaller fragments start to appear, such that at 40 eV number of ion fragments is usually around 30. This behavior is illustrated in Figure 6a, where palmitic acid has fragments over a wide mass range at all three collision energies.
In contrast, ornithine predominantly dissociates in smaller fragments as the collision energy increases. A similar trend is observed for all confounders listed in Table 3, and thus their degree of interference is additionally dependent on the relative collision energy. The absolute error with which Random Walk retrieves the target compound from the 10:1 mixtures of corresponding confounders is shown in Figure 6b. Three largest retrieval errors at collision energy of 10 eV are found for citrulline (9.2%), tyrosine (5.2), and palmitic acid (4.5%). Citrulline also exhibits high retrieval errors at 20 eV (6.9%) and 40 eV (4.3%), followed by tyrosine and palmitic acid (3.9%) at 20 eV, tyrosine (6.9%), and ornithine (3%) at 40 eV. For all other target compounds and collision energies, retrieval errors remain below 3%. Glycine has retrieval errors below 1% mainly because it was obscured by only three confounders and thus was twice as abundant in 10:1 mixtures than other target compounds. The shift of fragment distribution to smaller fragments seen in lysine at 40 eV collision energy improves its retrieval accuracy mostly because its confounders do not follow the similar redistribution, thus making lysine the least similar molecule in the mixture. This is in stark contrast to ornithine and arginine, both of which have retrieval errors between 2.5% and 3.2%, mainly due to a weak propensity for further fragmentation once collision energies exceed 20 eV. cally extracted into MS at various voltages (10 V, 20 V, and 40 V). Assuming parent gas-phase ions are singly-charged and dependent on 10 eV, 20 eV, or 40 eV kinetic energies, they undergo different fragmentation scenarios in collisions with the solvent vapor and buffer gas (CID), including protonation. Therefore, the distribution of CID fragmentation patterns, as predicted by the CFM-ID [40] tool, changes with the relative collision energies of parent ions and the pressure of buffer gas. The success of the Random Walk retrieval method will always depend on how well fragmentation patterns are known for the given ionization strategy. Most target compounds listed in Table 3 at 10 eV collision energies yield fewer than seven fragments. Still, at 20 eV, additional smaller fragments start to appear, such that at 40 eV number of ion fragments is usually around 30. This behavior is illustrated in Figure 6a, where palmitic acid has fragments over a wide mass range at all three collision energies. Figure 6. The effect of CID relative collision energies on fragmental distribution and retrieval errors: (a) fragmentation probabilities, ( ) , tend to shift to smaller fragments at 40 eV for most target compounds in Table 3; (b) absolute errors remain below 10% when a mass spectrum with 10:1 mixtures contains Nf = 51,000 fragment ions.
In contrast, ornithine predominantly dissociates in smaller fragments as the collision energy increases. A similar trend is observed for all confounders listed in Table 3, and thus their degree of interference is additionally dependent on the relative collision energy. The absolute error with which Random Walk retrieves the target compound from the 10:1 mixtures of corresponding confounders is shown in Figure 6b. Three largest retrieval errors at collision energy of 10 eV are found for citrulline (9.2%), tyrosine (5.2), and palmitic acid (4.5%). Citrulline also exhibits high retrieval errors at 20 eV (6.9%) and 40 eV (4.3%), followed by tyrosine and palmitic acid (3.9%) at 20 eV, tyrosine (6.9%), and ornithine (3%) at 40 eV. For all other target compounds and collision energies, retrieval errors remain below 3%. Glycine has retrieval errors below 1% mainly because it was obscured by only three confounders and thus was twice as abundant in 10:1 mixtures than other target compounds. The shift of fragment distribution to smaller fragments seen in lysine at 40 eV collision energy improves its retrieval accuracy mostly because its confounders do not follow the similar redistribution, thus making lysine the least similar molecule in the mixture. This is in stark contrast to ornithine and arginine, both of which  Table 3; (b) absolute errors remain below 10% when a mass spectrum with 10:1 mixtures contains N f = 51,000 fragment ions.

Summary
We demonstrated applications of a computational method to retrieve relative abundances of amino acids and mycotoxins from complex mixtures containing a large number of interfering species. Obtained results are encouraging and show that life-bearing target species and species that are the product of the metabolism of microbial organisms can be detected with accuracies better than 10% for the sufficient counting statistics and sensitivity readily achievable with modern mass spectrometers. A novel contribution to this study is the method to speed up the convergence times computationally. Speedup comes from the inversion of the fragmental similarity matrix, which provides an optimal starting point for the standard random walk procedure used in previous studies. Future studies will focus on expanding the number of species to include fatty acids and products of microbial metabolism.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.