#### 2.1. Study Area and Data Sources

The study lakes cover a total area of more than 5400 km

^{2}, are located in the mid-lower Yangtze River Basin (28°30′ N~31°40′ N, 112°33′ E~121°00′ E), and have a warm and humid subtropical monsoon climate (

Figure S1). The biological data in this study were collected by our research group during field surveys over the past 20 years (1998~2019). We compiled 116 lake-years data from 91 Yangtze mainstem-disconnected shallow lakes (1207 samples in total,

Table S1). Sampling stations at each lake were systematically set in the offshore zones according to lake area, with the number of stations per lake ranging from 1 to 58. We analyzed the effects of sampling effort in

Figure S2 to clarify that the sampling stations in each lake are appropriate and sufficient. All biological data selected for this study were quantitative measurements from spring and autumn to ensure comparability, except for a few lakes with only one season of the field survey.

All macrobenthos samples (1207 samples mentioned above) were sampled via one grab at each sampling station with a modified Peterson grab (1/16 m

^{2}), washed gently through a 425 μm sieve, and preserved in 10% formalin. After samples were rinsed with water in the laboratory, all individuals were sorted, counted, and identified to the lowest practical taxonomic level. Aquatic oligochaetes were identified to genus, polychaetes and leeches to family or genus, molluscs and arthropods to genus, and the remaining taxa to family [

24,

25]. Submersed macrophytes (B

_{Mac}) were sampled just above the sediment by scythes (1/5 m

^{2}) 2–4 replicates at each sampling point [

26].

Physico-chemical parameters including water temperature, pH, conductivity, mean water depth (Z

_{M}), and Secchi depth (Z

_{SD}) were measured in situ during each visit. A water sample (1 L) was collected from each site during each visit and brought back to the laboratory to measure the concentrations of total nitrogen (TN), total phosphorus (TP), and chlorophyll

a in phytoplankton (Chl

a) [

27]. We used the spring and autumn environmental parameters data to match with the biological data we used.

Landscape variables (climate, land cover) were extracted for each lake set to the World Geodetic System (WGS)-1984 Coordinate System and a grid resolution at 30 arc-second (ca. 1 km). We obtained mean air temperature and mean precipitation data (

http://www.worldclim.org/, accessed on 30 December 2020) for each lake and land-cover information (

http://www.globallandcover.com/, accessed on 30 December 2020) for a 500 m buffer along the lake shoreline, and classified the studied lakes as either urban or cultivated based on whether cultivated land or artificial surfaces dominated. We defined urban lakes when the proportion of artificial surfaces was >50% of the shoreline buffer, and we defined lakes as cultivated when the proportion of cultivated land was >50% of the shoreline buffer. Urban and cultivated lakes were used to test the discrimination power of indices. Landscape data were extracted using the zonal tool in ArcGIS 10.6.

#### 2.2. Index Development

Four indices were developed: O/E-

_{SA} index, O/E-

_{RF} indices, B-IBI and ASPT (full name in

Table S3). The O/E-

_{SA} index calculated expected species richness by species (

S_{O})-area (

A) modeling based on lake area and the observed species richness of each lake. O/E-

_{RF} indices were RIVPACS (River Invertebrate Prediction and Classification System) indices with species richness calculated by the sum of probabilities of capture (Pc) of taxa predicted by RF modeling. The constructed processes were as follows.

- (1)
O/E-_{SA} index

A species (

S_{O})-area (

A) model was developed based on lake area to predict the expected species richness (

S_{E}) and calculated an observed to expected index (O/E-

_{SA}) by calculating the

S_{O}/

S_{E} ratio. We used percentage error (PE) to compare predictive power of three different linear regression forms, including a linear model (

S/

A), semi-log model (

S/log

_{10}A), and power model (log

_{10}S/log

_{10}A). The formula was: PE = ∑|

P/

O − 1| × 100/

n, where

P is the expected value and

O is the observed value [

28]. Quantile regression models were developed to confirm the optimal and simpler model used to predict the expected species richness (

S_{E}). The optimal conditional quantiles were confirmed by model fitting (pseudo

R^{2}) and evaluation of two parameters (i.e., slope and intercept distribution). The biological condition of each lake was evaluated by calculating the ratio of the observed value (S

_{O}) to the expected value (S

_{E}).

- (2)
O/E-_{RF} indices

O/E-

_{RF} indices were developed following established procedures [

19,

29,

30]. First, we identified 21 reference lakes according to the status of lakes, available physico-chemical data, and professional judgment. These reference lakes (marked in

Table S1) met the following conditions: (1) the lakeside zone was basically maintained in a natural state and the partial littoral area had submerged macrophytes, with an average biomass greater than 200 g/m

^{2}; (2) there was no or very little diffuse-source pollution around the lake; (3) there was no or little fishery disturbance (annual yield of fishery less than 15 t) than in other lakes. The pollution status was estimated qualitatively, and fishery disturbance was measured by the annual fishery yield, which ranged from 0 to 540 t. We then clustered reference lakes by applying the β-flexible clustering technique (β = −0.5) to pairwise Sørensen dissimilarities based on the presence and absence of macroinvertebrate taxa across the reference lakes. We then developed a RF model to predict cluster membership from natural environmental predictors (

Table S4) and used the probabilities of cluster membership predicted by the RF model to weight taxon occurrence frequencies within reference site clusters to predict taxon-specific probabilities of capture (Pc). We calculated O/E based on taxa with Pc ≥ 0.5 (hereafter O/E

_{50}) and ≥0 (hereafter O/E

_{0}). In addition, we developed null O/E models with the Pc of individual taxa set to be equal across all sites. We used the randomForest package to develop Random Forest (RF) models with 1500 trees for each model [

31].

- (3)
Other indices

We also calculated ASPT and B-IBI scores. The ASPT is based on the tolerance values of individual families to organic pollution [

32]. The ASPT represents the average tolerance of organisms at the family level and can be determined by dividing the Biological Monitoring Party (BMWP) index score by the number of families present. The BMWP system also considers the tolerance of macroinvertebrates to organic pollution. Families are assigned a score between 1 and 10 according to their tolerances, then the BMWP score is the sum of the values for all families present in the sample [

33]. The final ASPT score ranges between 1 and 6, the lower value represents higher tolerance of organic pollution (e.g., Oligochaeta have the highest tolerance and score as 1).

For B-IBI development [

34], we started with 42 candidate metrics included in five metric categories: taxonomic richness, taxonomic composition, tolerance, functional feeding group, and habitat quality. Metrics were selected following range, discrimination power, and redundancy tests. First, metrics with a median of 0 for reference lakes were eliminated through range test. Second, the discrimination power of each metric was defined as the degree of inter-quartile overlap in the box plots of both reference and test sites for each metric. Third, metric redundancy was calculated using the Spearman correlation between all candidate metric. Metrics with high correlation (|r| > 0.7) and with

p < 0.05 were removed. Finally, we selected the following four metrics to calculate the final score: total number of taxa, Biotic Index (BI), % Gastropoda individuals, and %collector-gatherer individuals. We used general taxa pollution tolerance values to calculate the Biotic Index (BI). We calculated the scores of metrics that decreased in response to stressors by the fraction of the 95th percentile value. We scored metrics that increased in response to stressors by the radio of the difference between the maximum value and the metric value and the difference between the maximum value and the 5th percentile value. The final B-IBI score were calculated by summing the scores of the four metrics.

Index performance of all indices was compared in terms of precision, bias, responsiveness, and sensitivity [

35]. To facilitate performance comparisons among all indices, we calculated standardized O/E-

_{SA}, ASPT, and B-IBI scores by dividing raw scores by the mean of reference site scores so that reference site scores were centered on one. We used linear regression and Spearman rank correlation analysis to develop the relationship between each pair of all biological indices. We then compared the effectiveness of these indices by analyzing the relationship between macrobenthos indices and eutrophication metrics through quantile regression analysis. The discrimination power of urban and cultivated lakes by these indices was completed using a Wilcoxon test [

36].