Evaluation of PAC and FASP Performance: DIA-Based Quantitative Proteomic Analysis

The aim of this study was to compare filter-aided sample preparation (FASP) and protein aggregation capture (PAC) starting from a three-species protein mix (Human, Soybean and Pisum sativum) and two different starting amounts (1 and 10 µg). Peptide mixtures were analyzed by data-independent acquisition (DIA) and raw files were processed by three commonly used software: Spectronaut, MaxDIA and DIA-NN. Overall, the highest number of proteins (mean value of 5491) were identified by PAC (10 µg), while the lowest number (4855) was identified by FASP (1 µg). The latter experiment displayed the worst performance in terms of both specificity (0.73) and precision (0.24). Other tested conditions showed better diagnostic accuracy, with specificity values of 0.95–0.99 and precision values between 0.61 and 0.86. In order to provide guidance on the data analysis pipeline, the accuracy diagnostic of three software was investigated: (i) the highest sensitivity was obtained with Spectronaut (median of 0.67) highlighting the ability of Spectronaut to quantify low-abundance proteins, (ii) the best precision value was obtained by MaxDIA (median of 0.84), but with a reduced number of identifications compared to Spectronaut and DIA-NN data, and (iii) the specificity values were similar (between 0.93 and 0.99). The data are available on ProteomeXchange with the identifier PXD044349.


Introduction
In mass spectrometry-based proteomics, sample processing encompasses several steps to extract, solubilize and digest the proteins [1].The ideal proteomics protocol should be (i) compatible with different extraction buffers, (ii) suitable for low sample inputs and (iii) composed of a low number of steps and automatable in order to reduce the variability associated with the operator [2,3].Though the choice of sample preparation protocol in mass spectrometry-based proteomics will largely depend on the goals of the study (e.g., the study of specific post-translational modifications or of specific subcellular compartments), a few approaches have been extensively adopted by proteomics laboratories and core facilities for whole-cell proteomic analysis.Among the most popular approaches which, at least partially, satisfy the characteristics listed above for the ideal proteomics protocol, it is worth mentioning filter-aided sample preparation (FASP) [4][5][6] and protein aggregation capture (PAC) [7].The two protocols show considerable versatility even for challenging samples such as biological fluids, so evaluating their performance is of great interest.
FASP is a widely used protocol for proteomic sample preparation because it exhibits several strengths such as compatibility with different extraction buffers [8], the removal of all interferents prior to enzymatic digestion and, last but not least, the ability to concentrate diluted protein solutions on the filter unit, ensuring optimal conditions for the enzyme/substrate interaction.Given its potential, the protocol has been made amenable to automation [9], leading to two important advantages: increased throughput and reduced variability associated with the operator.Despite these promising features, the classical FASP protocol shows limitations with low-input samples since there appears to be an optimum ratio between the filter size and the amount of protein loaded.For this reason, a smaller filter size (Well-Plate µFASP) [10] has been used to improve the interaction between proteins and the filter surface for low-concentration samples, overcoming this limitation.
PAC, on the other hand, is a protocol based on magnetic beads.Proteins are precipitated on the beads by adding an organic solvent and subsequently digested.In detail, in order for proteins to settle on the surface of the beads, it is important (i) to use the correct bead-protein ratio, (ii) to operate at the optimal salt concentration and (iii) to remove cellular debris and nucleic acids which could interfere with the interaction.The lack of any one of these conditions inevitably leads to the loss of material [11].Following the on-bead protein precipitation, all of the steps of the protocol are performed in a single test tube placed on the magnetic rack, which favors the separation of the bead/protein aggregate from the supernatants (lysis buffer plus washing).Repetitive washings of magnetic beads allow for the effective removal of detergents.Thus, after tryptic digestion, the peptide mixture is suitable for direct analysis by mass spectrometry without additional purification steps.Although the two protocols were developed to digest tens of micrograms of proteins, some modifications have allowed both protocols to be used for even smaller quantities.Nevertheless, specific methods are needed for the analysis of extremely small amounts of starting material.These have been developed by laboratories with expertise in low-input proteomics [12].
FASP and PAC sample preparation protocols along with the in-StageTip (iST) protocol developed by Mann and co-workers [13] were compared in a paper by Sielaff and colleagues [14].The comparison was performed primarily on a qualitative basis (proteome coverage), as it did not include the use of mixed proteomes for probing diagnostic accuracy.Sielaff et al. evaluated the performance of the three protocols from the perspective of proteome coverage and quantitative precision (CVs) in the low-microgram range, achieving similar performance for the high-end of the interval (20 µg) but achieving better quantitative reproducibility in the case of SP3 and iST for the low-end of the tested range.In the work by Sielaff et al., MS detection was performed in UDMS E mode, a specific type of data-independent acquisition (DIA).DIA analysis has become widely adopted in recent years, following great improvements in the sequential window acquisition mode and in data analysis software [15][16][17], in order to overcome the missing value problem derived by the stochastic nature of data-dependent acquisition (DDA) [18,19].DIA is a sensitive method able to select and sequence even peptides deriving from low-abundance proteins.The main problem encountered with DIA data is the complexity of the spectra, since a single MS2 spectrum in a DIA file is not associated with a single peptide sequence, but it is the result of the fragmentation of multiple co-isolated precursors in a given m/z range.For this reason, while DIA overcomes the problem of missing values by providing a deep proteomic profile [20,21], it also generates very complex files.Thanks to the design of sophisticated software, however, today the analysis of DIA data is no longer a problem.
The aim of this study was to compare two widely adopted sample preparation protocols, FASP and PAC, in terms of proteome coverage and diagnostic accuracy using a three-species protein mix (Human, Pisum sativum and Soybean).Two different quantities of starting protein were processed by both protocols: a "routine" amount (10 µg) and a more challenging amount (1 µg).MS detection was performed in DIA mode.Since data analysis is a key step of every proteomics pipeline, especially in the case of DIA mode, raw data were processed using three of the most commonly used software: Spectronaut, MaxDIA [22] and DIA-NN [23].Nevertheless, extensive benchmarking of DIA data analysis software was not the purpose of this study and can be found elsewhere [24].

Results and Discussion
The main objective of this study was to evaluate the diagnostic accuracy of two different digestion protocols combined with DIA analysis [24,25].For this purpose, two different amounts (1 µg and 10 µg) of PmixA and PmixB were digested in quadruplicate.The repeatability of the two protocols was calculated through the Pearson coefficient (Figures S1 and S2), highlighting a good agreement between replicates across the entire dataset.
In our experience, FASP-derived peptide solutions might contain residual amounts of SDS or other detergents which might compromise LC performance in the long run; for this reason, we routinely perform StageTip SCX purification before LC-MS/MS analysis for FASP-digested samples.On the contrary, PAC-digested samples could, in principle, be injection-ready [26].Considering that the micropurification step could add imprecision to the FASP method, thus favoring the PAC protocol, PAC samples were analyzed in two different modalities: (i) direct injection and (ii) injection after StageTip SCX purification.In this way, it was possible to assess both the advantage of the directed injection for PAC and the effect of the purification step on precision and proteome coverage.After sample processing, all 48 samples (16 FASP and 16 PAC-1 with purification, and 16 PAC-2 without purification) were analyzed by LC-MS/MS in DIA mode.DIA raw files were processed by Spectronaut, MaxDIA and DIA-NN.For the first two data analyses, we used a spectral library built by dividing 10 different fractions into a total of 20 µg of peptides (pool of PmixA and PmixB).The obtained results were investigated at two different levels: (i) qualitative analysis, to indicate the condition able to identify the highest number of proteins (Table S1), and (ii) quantitative analysis, to evaluate the diagnostic accuracy of the DIA method.
From a qualitative point of view, the PAC-1 protocol (with purification) performed on 10 µg of starting material yielded the highest number of protein identifications (mean value of 5491), whereas the lowest number was obtained from the FASP protocol starting from 1 µg (mean value of 4855; Figure 1).Furthermore, although SCX purification adversely affects peptide recovery, in this experiment, the comparison between PAC-1 and PAC-2 showed a significant (p << 0.001) increase in the number of identified proteins (about 200 more proteins, on average; Table S2) regardless of the data analysis software used.Furthermore, the qualitative analysis enclosed the evaluation of missing values since even if the DIA method overcomes the stochastic nature of DDA, it does not completely solve the problem.As it can be seen in Figure S3, data completeness at the level of protein groups was higher than 94% in all conditions, suggesting that quantitative data would not have been substantially affected by the choice of the imputation strategy.
analysis is a key step of every proteomics pipeline, especially in the case of DIA mode, raw data were processed using three of the most commonly used software: Spectronaut, MaxDIA [22] and DIA-NN [23].Nevertheless, extensive benchmarking of DIA data analysis software was not the purpose of this study and can be found elsewhere [24].

Results and Discussion
The main objective of this study was to evaluate the diagnostic accuracy of two different digestion protocols combined with DIA analysis [24,25].For this purpose, two different amounts (1 µg and 10 µg) of PmixA and PmixB were digested in quadruplicate.The repeatability of the two protocols was calculated through the Pearson coefficient (Figures S1 and S2), highlighting a good agreement between replicates across the entire dataset.
In our experience, FASP-derived peptide solutions might contain residual amounts of SDS or other detergents which might compromise LC performance in the long run; for this reason, we routinely perform StageTip SCX purification before LC-MS/MS analysis for FASP-digested samples.On the contrary, PAC-digested samples could, in principle, be injection-ready [26].Considering that the micropurification step could add imprecision to the FASP method, thus favoring the PAC protocol, PAC samples were analyzed in two different modalities: (i) direct injection and (ii) injection after StageTip SCX purification.In this way, it was possible to assess both the advantage of the directed injection for PAC and the effect of the purification step on precision and proteome coverage.After sample processing, all 48 samples (16 FASP and 16 PAC-1 with purification, and 16 PAC-2 without purification) were analyzed by LC-MS/MS in DIA mode.DIA raw files were processed by Spectronaut, MaxDIA and DIA-NN.For the first two data analyses, we used a spectral library built by dividing 10 different fractions into a total of 20 µg of peptides (pool of PmixA and PmixB).The obtained results were investigated at two different levels: (i) qualitative analysis, to indicate the condition able to identify the highest number of proteins (Table S1), and (ii) quantitative analysis, to evaluate the diagnostic accuracy of the DIA method.
From a qualitative point of view, the PAC-1 protocol (with purification) performed on 10 µg of starting material yielded the highest number of protein identifications (mean value of 5491), whereas the lowest number was obtained from the FASP protocol starting from 1 µg (mean value of 4855; Figure 1).Furthermore, although SCX purification adversely affects peptide recovery, in this experiment, the comparison between PAC-1 and PAC-2 showed a significant (p << 0.001) increase in the number of identified proteins (about 200 more proteins, on average; Table S2) regardless of the data analysis software used.Furthermore, the qualitative analysis enclosed the evaluation of missing values since even if the DIA method overcomes the stochastic nature of DDA, it does not completely solve the problem.As it can be seen in Figure S3, data completeness at the level of protein groups was higher than 94% in all conditions, suggesting that quantitative data would not have been substantially affected by the choice of the imputation strategy.After outlining the main aspects of the qualitative data, our attention shifted to the quantitative data provided by FASP and PAC through the use of three different software.DIA data were analyzed by Spectronaut, MaxDIA and DIA-NN, and TPs, TNs, FPs and FNs were calculated (Table 1).Importantly, the very few proteins identified as regulated but showing a trend opposite to that expected (e.g., a soybean protein found downregulated in A) were removed from the TP count.
Table 1.Total identified proteins for Human, Pisum sativum and Soybean.TPs (Pisum sativum and Soybean proteins found changed), TNs (Human proteins found unchanged), FPs (Human proteins found changed), and FNs (Pisum sativum and Soybean proteins found unchanged) are also reported.Regarding specificity and precision, the lowest average values were obtained for FASP 1 µg (0.84 and 0.32, respectively), while the highest values were obtained from PAC-1 10 µg (0.99 and 0.82, respectively) and PAC-2 10 µg (0.99 and 0.86; Figure 2 and Table S3).

Human
Of all scenarios presented regarding sensitivity, the extremely low value found for FASP 1 µg was certainly noteworthy (Figure 3).This result was already predictable from the particularly high value of FNs.
After considering the accuracy of the method in relation to the experimental conditions, the performance of the three different software for data analysis was investigated.Overall, the specificity values returned by each software were similar, with the highest values equal to 0.99 obtained by MaxDIA and DIA-NN (median values across the six different binary comparisons made).However, there was one important aspect to consider: DIA-NN returned nearly twice as many identifications as MaxQuant.In addition, it also yielded a lower number of FPs.The same situation was not observed for sensitivity, where the best result (0.67) was obtained by processing the data with Spectronaut without the aid of a spectral library.In this case, Spectronaut showed its ability to better quantify lowabundance proteins (Soybean and Pisum sativum) compared to other software, thus detecting less FNs.Of all scenarios presented regarding sensitivity, the extremely low value found for FASP 1 µg was certainly noteworthy (Figure 3).This result was already predictable from the particularly high value of FNs.After considering the accuracy of the method in relation to the experimental conditions, the performance of the three different software for data analysis was investigated.Overall, the specificity values returned by each software were similar, with the highest values equal to 0.99 obtained by MaxDIA and DIA-NN (median values across the six different binary comparisons made).However, there was one important aspect to consider: DIA-NN returned nearly twice as many identifications as MaxQuant.In addition, it also yielded a lower number of FPs.The same situation was not observed for sensitivity, where the best result (0.67) was obtained by processing the data with By observing the values shown in Table S3, it can be seen that the highest precision was obtained by MaxDIA (median precision equal to 0.84).Nevertheless, in the light of the lower number of identifications detected by MaxDIA, the obtained precision needed some consideration.Among the different software used, MaxDIA identified fewer proteins.Thus, it is reasonable to postulate that these were the most abundant proteins, i.e., the ones quantifiable with the lowest error, returning a good precision value.On the other hand, DIA-NN showed a compelling precision value (median of 0.83), especially when considering that it quantified a much higher number of low-abundance proteins, achieving a deeper proteome coverage with respect to MaxDIA.
All other values, especially those obtained from Spectronaut (0.54 for Direct-DIA and 0.57 for the analysis with the spectral library), were unacceptable values for a quantitative analysis.Since Spectronaut showed very high sensitivity, in order to reduce false positives and improve precision, a new statistical analysis was performed on the best condition only (PAC-1 10 µg).In detail, without modifying any parameter in the Spectronaut software, the matrix with the quantified proteins was analyzed in Perseus using an FDR at 0.01 (Table 2).As reported in the table, this strategy certainly led to the reduction in the absolute number of regulated proteins but provided a more accurate list of candidates, with the precision improved from 0.82 to 0.94.Regarding the sensitivity, the decrease from 0.75 to 0.56 is related to the increase in FNs.Finally, to further investigate our results, it was verified if the accuracy of the method could be improved by quantifying the proteins with a minimum of two peptides.The aim of this new analysis was to understand whether improving the data quality (more accurate measurements with more peptides) improved the accuracy of the quantitative method.In the light of the considerations, for the software that allowed us to modify the quantitative strategy (Spectronaut and MaxDIA), the number of TPs, TNs, FPs and FNs were calculated (Table 3).By performing a comparison between the two analyses (minimum of one peptide vs minimum of two peptides), Spectronaut showed small variations both for the specificity and precision (Table S4), though sensitivity generally improved due the decrease in the FN rate.Compared to Spectronaut, MaxDIA showed a similar value for specificity but a reduced sensitivity (0.69); moreover, a worsening of precision for small quantities was observed (1 µg).

Materials and Methods
All chemicals used in the experiments here described were purchased from Sigma-Aldrich (St. Louis, MO, USA) unless otherwise indicated.

Sample Preparation
HEK 293 was lysed with RIPA buffer and the protein concentration amount was estimated by using the Bradford Protein Assay.In detail, a volume of 200 µL of HEK 293 cell lysate with a concentration equal to 19 µg/µL was centrifugated to discard the pellet.A volume of 105 µL of the lysate (2 mg of protein) was brought to 400 µL with 100 mM Tris buffer at pH 8.0 and 1% SDS (v/v) obtaining an approximate concentration of 5 µg/µL.Pisum sativum and Soybean powders (5 mg) were dissolved in 2 mL of the same buffer and vortexed (approximate concentration equal to 2 µg/µL); a total of 400 µL was used for each proteome.
To reduce and alkylate disulfide bonds, 40 µL of 100 mM dithiothreitol (DTT) was added to 400 µL of each solution, followed by the addition of 48 µL of 200 mM of iodoacetamide (IAA); each step included 1 h of incubation at 37 • C with shaking (650 rpm on a Thermomixer).Finally, 8 µL of 100 mM DTT was added to quench residual iodoacetamide and the incubation was allowed to proceed at 37 • C for 30 min.

Preliminary Protein Quantification
Before creating the protein mixtures, approximately 10 µg of reduced and alkylated proteins was digested by using the PAC protocol [27] (explained in detail below) in order to estimate the protein amount of each stock solution (Human, Soybean and Pisum sativum); the estimated ratio of trypsin to protein was 1 to 50 (Sigma-Aldrich, product no.T6567).After tryptic digestion, the peptide solutions were 100 µL; from each solution, a 20 µL aliquot was withdrawn, combined with 30 µL of solution A (0.1% formic acid (FA) and 2% of acetonitrile; ACN) and analyzed by LC-MS/MS analysis.
To estimate the protein quantity, a calibration line was constructed by preparing four different solutions (5, 10, 25 and 50 ng/µL) of a HeLa digest stock (100 ng/µL, Thermo Fisher Scientific); 2 µL of each solution was analyzed by LC-MS/MS.For all samples, the same acquisition method was used.Using the HeLa samples as external standards, an estimation of the protein concentration of the three protein mixtures was obtained by interpolation using, as a measurement, the log10 of the area under the curve (AUC) calculated from the LC-MS/MS files.The following approximate concentrations were obtained: (i) Human proteins equal to 1.6 µg/µL, (ii) Soybean equal to 100 ng/µL and (iii) Pisum sativum equal to 640 ng/µL.

Protein Blend Preparation
After protein estimation, the three proteomes (Human, Soybean and Pisum sativum) were mixed in order to create two different blends, as shown in Table 4.The two blends created a protein fold-change ratio (PmixA/PmixB) of exactly 1 for Human proteins, 3 for Soybean and 0.5 for Pisum sativum proteins.

Protein Digestion
In order to compare the performance of the FASP and PAC protocols, two different amounts of proteins (1 and 10 µg) for Pmix A and Pmix B were digested in quadruplicate, thus processing a total of 32 samples.

FASP Protocol
Before being loaded into the filter, the mixtures of reduced and alkylated proteins were prepared, as reported in Table 5.These dilutions allowed us to load the same volume of protein solution for each sample, and thus to perform all FASP digestions in parallel in the same batch.Since each condition was performed in quadruplicate, overall, 16 samples were digested by using the FASP protocol (Figure 4).Except for the step of cysteine alkylation, which in this case was performed in solution, the FASP protocol was carried out as previously reported [28].Protein reduction and alkylation outside the filter did have a negligible effect in terms of proteome coverage (data not shown) but allowed us to minimize potential differences and biases between the two protocols.Protein digestion was performed by adding 200 ng of trypsin to both the 1 µg and the 10 µg mixes.The choice of adding the same trypsin amount to both mixes was dictated by the fact that increasing the enzyme/substrate ratio is beneficial in cases of lower substrate concentrations [29].

PAC Protocol
The PAC protocol [27] requires the right proportions between beads, protein amounts and aqueous and organic solvents.In our laboratory procedure, the protein solution starting volume was 20 µL.For this reason, samples were diluted as reported in Table 6.Proteins were digested as follows.A total of 5 µL of MagReSyn Hydroxyl beads (100 µg of beads, Resyn Bioscicences), previously conditioned with 70% ACN (v/v), was added to all samples.Protein precipitation on the beads was promoted by bringing the ACN concentration to 70% and subsequently incubating the suspension in a thermomixer at room temperature with stirring at 1100 rpm for ten minutes.The samples were then placed in a magnetic rack and subjected to several washing steps (all performed on the rack): three washes with 100% of ACN and one wash with 70% ethanol.The last

PAC Protocol
The PAC protocol [27] requires the right proportions between beads, protein amounts and aqueous and organic solvents.In our laboratory procedure, the protein solution starting volume was 20 µL.For this reason, samples were diluted as reported in Table 6.Proteins were digested as follows.A total of 5 µL of MagReSyn Hydroxyl beads (100 µg of beads, Resyn Bioscicences), previously conditioned with 70% ACN (v/v), was added to all samples.Protein precipitation on the beads was promoted by bringing the ACN concentration to 70% and subsequently incubating the suspension in a thermomixer at room temperature with stirring at 1100 rpm for ten minutes.The samples were then placed in a magnetic rack and subjected to several washing steps (all performed on the rack): three washes with 100% of ACN and one wash with 70% ethanol.The last supernatant was removed; then, 51 µL of digestion buffer (50 mM triethylammonium bicarbonate, TEAB, containing 200 ng of trypsin) was added to the beads.The suspension was incubated overnight at 37 • C and 1100 rpm (Figure 5).Proteins were digested as follows.A total of 5 µL of MagReSyn Hydroxyl beads (100 µg of beads, Resyn Bioscicences), previously conditioned with 70% ACN (v/v), was added to all samples.Protein precipitation on the beads was promoted by bringing the ACN concentration to 70% and subsequently incubating the suspension in a thermomixer at room temperature with stirring at 1100 rpm for ten minutes.The samples were then placed in a magnetic rack and subjected to several washing steps (all performed on the rack): three washes with 100% of ACN and one wash with 70% ethanol.The last supernatant was removed; then, 51 µL of digestion buffer (50 mM triethylammonium bicarbonate, TEAB, containing 200 ng of trypsin) was added to the beads.The suspension was incubated overnight at 37 °C and 1100 rpm (Figure 5).

SCX Purification
In our experience, the peptide mixture obtained after FASP digestion may contain residual amounts of detergent, and for this reason, all samples processed by the FASP protocol were purified through strong cation exchange (SCX) purification [30].PAC samples, on the other hand, allow for direct injection (without a purification step), but to perform an equal comparison, PAC digests were analyzed both directly (without purification) and after SCX purification.
In detail, half of the eluate from the FASP and PAC protocols was purified by SCX StageTips for the removal of traces of residual detergent.In order to prepare the StageTips, a piece of Empore TM -3M SCX (Millipore) resin was withdrawn using a blunt-ended syringe needle (gauge 16).For StageTip purification, 100 µL of FASP and 25 µL of PAC eluates were diluted 4-fold in wash solution 2 (0.5% formic acid (FA) and 80% ACN); since salts interfere with the binding of peptides to the SCX stationary phase, this dilution step was critical to reduce the salt concentration below 5 mM.Diluted samples were loaded on the StageTips and washed with 50 µL of wash solution 1 (0.5% FA and 20% ACN) and 50 µL of wash solution 2. Peptides were then eluted in 10 µL of 500 mM ammonium acetate containing 20% ACN and dried at 30 • C in a speed-vac; the samples with the starting protein amount of 1 µg (theoretical 500 ng of purified peptides) were resuspended in 25 µL of solution A, while the samples with the starting protein amount of 10 µg (theoretical 5 µg of purified peptides) were resuspend in 100 µL of solution A.

High pH Reversed-Phase C 18 Fractionation
To build the DIA library, 20 µg of total peptides (PmixA and PmixB) was loaded on a high-capacity StageTip containing a greater amount of C18 stationary phase (threefold) to perform basic reversed-phase fractionation, as previously reported [31].In detail, the peptide mixture was acidified with 0.1% of trifluoroacetic acid (TFA) in order to achieve a pH lower than 3. Before diluted sample loading, the stationary phase was washed and conditioned with 50 µL of solution A (0.1% TFA and 50% ACN) and 50 µL of solution B (0.1% TFA), respectively.The stationary phase was washed with solution B after sample loading, and finally, the peptides were fractionated using solutions composed by 10 mM TEAB, 0.2% ammonium hydroxide and increasing concentrations of ACN (4, 8, 12, 16, 20, 24, 28, 32, 40 and 80%).
Each fraction was eluted in 30 µL and dried at 30 • C in a speed-vac; the fractions were resuspended in 40 µL of solution A.
Ten fractions were analyzed in DDA mode, and the identified peptides were used to draw up the DIA spectral library.

LC-MS/MS Analysis
Peptides were separated by using an Easy nLC-1000 chromatographic instrument coupled to an Exploris 480 mass spectrometer (both from Thermo Scientific, Bremen, Germany).
For the generation of the spectral library, 2 µL from each basic RP fraction was separated using a linear gradient of 63 min at a flow rate of 300 nL/min on a 15 cm, 75 µm i.d.column, in-house packed with 3 µm C18 silica particles (Dr.Maisch).The gradient was generated using mobile phase A (0.1% FA and 2% ACN) and mobile phase B (0.1% FA and 80% ACN).Peptide separation was achieved at a flow rate of 300 nL/min using the following gradient: from 4% B to 12% B in 16 min, from 12% B to 36% B in 16 min and from 36% B to 100% B in 8 min; the column was cleaned for 5 min with 100% B. For the analysis of fractions, the mass spectrometer operated in DDA mode using a top-12 method.In detail, the MS full scan was 375-1400 m/z with a resolution of 60,000, AGC target of 1 × 10 6 and maximum injection time of 50 ms.The mass window for the isolation of the precursor was 1.6 m/z, with a resolution of 30,000, an AGC target of 1 × 10 6 and a maximum injection time "custom"; HCD fragmentation was set at normalized collision energy of 30 and dynamic exclusion of 10 s.
For the analysis of FASP and PAC individual samples, 100 ng of peptide mixtures was separated using the same gradient described above, while the mass spectrometer operated in DIA mode.The DIA method was composed of 32 consecutive MS2 windows acquired at 30,000 resolution, with an AGC target of 5 × 10 5 and a maximum injection time of 50 ms.In details, the DIA method enclosed 24 windows with an isolation window of 15 m/z, 5 windows with an isolation of 30 m/z and 3 windows with an isolation window of 50 m/z; the overlap for each window was equal to 0.5 m/z.The resulting m/z range was from 350 to 1000.

DIA Data Processing
The raw files were analyzed using three different software: MaxQuant (version 2.1.3.0,Max-Planck-Institute of Biochemistry 2021), Spectronaut (version 18.4, Biognosys AG, Switzerland) and DIA-NN (version 1.8.1).DIA analysis in MaxQuant and Spectronaut software was performed using our experimental spectral library, while DIA-NN was run in library-free mode by performing deep learning-based library generation.In addition, Spectronaut analysis was also performed using the Direct-DIA mode.
The following databases were used for all analyses: Human (79,684 sequences downloaded on 30 May 2022), Pisum sativum (64,176 sequences downloaded on 18 October 2023) and Soybean (74,863 sequences downloaded on 18 October 2023).

MaxQuant
Raw files of the ten fractions were imported in MaxQuant (version 2.1.3.0) to create our spectral library using the MaxQuant algorithm; this process provided three different files: msms.txt,evidence.txtand peptides.txt.The used parameters were as follows: protein databases (Human, Pisum sativum and Soybean), first and main search peptide tolerance, respectively, of 20 ppm and 4.5 ppm, trypsin/P as an enzyme, and two missed cleavages.Carbamidomethylation of cysteines was set as static modification, and oxidation of methionine and protein N-terminal acetylation were allowed as variable modifications.The value of FDR was set to 0.01 and only the peptides with >7 amino acid residues were selected for identification; only unique ones were used for protein quantification.
DIA runs were analyzed by the MaxDIA algorithm by loading the three spectral library files (msms.txt,evidence.txtand peptides.txt),generated from the DDA experiments, in the appropriate tab; the other settings were the same as reported above for library processing.

Spectronaut
In order to build the spectral library, raw files from high-pH reversed-phase fractionation were analyzed in Spectronaut (version 18.4), setting the Q-value cut-off to 0.01, with a minimum and maximum of fragment ions of 3 and 6, respectively.DIA runs were loaded in Spectronaut, and the obtained identifications were filtered by a Q-value of 0.01.Protein quantification was performed using "Only Protein Group Specific" and missing values in the precursor identified in at least 35% of runs were imputed using global imputing.For protein quantification in the panel "major grouping", the minimum number of peptides were set to 1 and the maximum to 3. Runs were normalized through global normalization.
DIA files were also analyzed in Direct-DIA mode, thus without spectral library, using the same settings for quantitative analysis.Direct-DIA was performed by "Direct-DIA+Fast".

DIA-NN
Protein sequences of the three proteomes were imported in DIA-NN and a spectral library was predicted using the deep learning algorithms implemented in DIA-NN (version 1.8.1).The used parameters were as follows: trypsin as an enzyme, allowing for up to one missed cleavage site, charge states of 1-4 for peptides consisting of 7-40 amino acids, carbamidomethylation of cysteines, oxidation of methionine and an FDR of 1% for precursor identifications; the quantification strategy was set to "Any LC, high accuracy", whereas normalization was set to global.Matching between runs was activated.

Statistical Analysis
To perform statistical analysis, the matrices obtained from the different software were imported in Perseus (version 2.0.6.0,Max-Planck-Gesellschaft, München).In detail, protein intensity values were transformed in the logarithmic scale (log2); only proteins quantified in at least three replicates of at least one sample group (PmixA or PmixB) were kept, while missing values were imputed using default settings (width of 0.3 SD; down shift of 1.8 SD).
Significantly different proteins between PmixA and PmixB were detected by Student's t-test corrected for multiple hypothesis testing with a Permutation-based FDR equal to 0.05.An S0 value of 0.2 was used.Histograms were generated by using Numbers (v 13.2).

Diagnostic Accuracy of the Method
Sensitivity, specificity and precision were calculated for each tested condition to evaluate the diagnostic accuracy of the method.Since the composition of the protein mix was known, it was possible to evaluate the following parameters: (i) false positives (FPs), i.e., the Human proteins found to be regulated; (ii) true negatives (TNs), i.e., the Human proteins classified as unchanged; (iii) true positives (TPs), i.e., the Soybean and Pisum sativum proteins found to be regulated; (iv) false negatives (FNs), i.e., Soybean and Pisum sativum proteins classified as unchanged.All of the proteins assigned to more than one proteome were excluded.

Conclusions
Through benchmarking experiments, the diagnostic accuracy of two widely used sample preparation protocols was evaluated, also considering the data analysis pipeline.
Our data indicated that the regular FASP protocol, with respect to PAC, showed a lower number of identifications and worse diagnostic accuracy for low quantities.SCX purification after PAC digestion had no negative effects on either proteome coverage or diagnostic accuracy.However, direct injection into the LC-MS system after PAC digestion represents an important advantage over FASP and, for this reason, should be considered as the preferred route.

Figure 3 .
Figure 3.The figure encloses the volcano plot for the worst result (A) and for the best result obtained (B); the FDR threshold was set to 0.05.

Figure 3 .
Figure 3.The figure encloses the volcano plot for the worst result (A) and for the best result obtained (B); the FDR threshold was set to 0.05.

Figure 4 .
Figure 4.The figure summarizes the information relative to the FASP protocol.Created with BioRender.com.

Figure 4 .
Figure 4.The figure summarizes the information relative to the FASP protocol.Created with BioRender.com.

Figure 5 .
Figure 5.The figure shows the information about the PAC protocol.Created with BioRender.com.Figure 5.The figure shows the information about the PAC protocol.Created with BioRender.com.

Figure 5 .
Figure 5.The figure shows the information about the PAC protocol.Created with BioRender.com.Figure 5.The figure shows the information about the PAC protocol.Created with BioRender.com.

Table 2 .
The table reports the values obtained by applying different FDR thresholds (0.05 and 0.01) in Perseus to the Spectronaut matrix.This strategy was applied only to the best condition (PAC-1 10 µg).

Table 3 .
The table shows the number of TPs, TNs, FPs and FNs calculated starting from a matrix composed of proteins quantified by a minimum of 2 peptides.

Table 5 .
Dilution of protein mixes before FASP digestion.

Table 6 .
Dilution of protein mixes before PAC digestion.

Table 6 .
Dilution of protein mixes before PAC digestion.