Single-Cell Atlas of the Drosophila Leg Disc Identifies a Long Non-Coding RNA in Late Development

The Drosophila imaginal disc has been an excellent model for the study of developmental gene regulation. In particular, long non-coding RNAs (lncRNAs) have gained widespread attention in recent years due to their important role in gene regulation. Their specific spatiotemporal expressions further support their role in developmental processes and diseases. In this study, we explored the role of a novel lncRNA in Drosophila leg development by dissecting and dissociating w1118 third-instar larval third leg (L3) discs into single cells and single nuclei, and performing single-cell RNA-sequencing (scRNA-seq) and single-cell assays for transposase-accessible chromatin (scATAC-seq). Single-cell transcriptomics analysis of the L3 discs across three developmental timepoints revealed different cell types and identified lncRNA:CR33938 as a distal specific gene with high expression in late development. This was further validated by fluorescence in-situ hybridization (FISH). The scATAC-seq results reproduced the single-cell transcriptomics landscape and elucidated the distal cell functions at different timepoints. Furthermore, overexpression of lncRNA:CR33938 in the S2 cell line increased the expression of leg development genes, further elucidating its potential role in development.


Introduction
Long non-coding RNAs (lncRNAs) are defined as RNAs longer than 200 nucleotides and not translated into functional proteins. Human GENCODE (v40) identifies 17,748 lncRNA genes, which roughly equates to the number of protein-coding genes (19,988) signifying the importance of lncRNAs. The majority of lncRNAs are transcribed by RNA polymerase II and are often 5 -end 7-methyl guanosine (m7G) capped, 3 -end polyadenylated, and spliced similarly to mRNAs. They are often classified based on their position relative to neighboring genes (divergent, convergent, intergenic, antisense, sense, enhancer, intronic, and miRNA host), transcript length (long intergenic, very long intergenic, and macroRNA), association with annotated protein-coding genes, association with other DNA elements, protein-coding RNA resemblance, association with repeats, association with a biochemical pathway, sequence and structure conservation, biological state, association with subcellular structures, and function [1,2]. As lncRNAs provide supportive roles by fine-tuning gene expression levels at the epigenetic, transcriptional, and post-transcriptional levels, they are implicated in various biological processes and diseases. The contribution of lncRNAs to organ development in several mammalian species has revealed a transition of broadly expressed lncRNAs towards an increasing number of spatiotemporal-specific and conditionspecific lncRNAs [3]. The role of lncRNAs in cancer has been studied extensively, but they are also involved in many other human diseases from neurological disorders to cardiovascular issues [4]. Notably, lncRNA expression is generally spatiotemporal specific, indicating the unique functions and probable pharmacological targeting of lncRNA.
Drosophila melanogaster (fruit fly) is an ideal model organism to study developmental and cellular processes in higher eukaryotes, including humans, because a wide range of genetic tools can be applied and its genome has been extensively studied [5]. In fact, the D. melanogaster genome is 60% homologous to that of humans and nearly 75% of human disease-causing genes are believed to have functional homologs in the fruit fly [6]. Furthermore, its short generation time, high fecundity, and low maintenance as well as the abundance of publicly available fly stocks and databases also make D. melanogaster an appealing model organism.
Despite different taxonomic origins, the Drosophila larval leg disc, which develops into the adult leg, is an ideal model for studying the complex vertebrate limb because it is relatively simple and amenable to genetic manipulations. Research on fly imaginal discs has revealed the tissue compartments and organ-specific regulator genes critical to development, and has generated established models for the study of cellular interactions and complex genetic pathways [7]. Moreover, the easy accessibility of imaginal discs further supports their utility.
Advances in the past decade on single-cell RNA sequencing (scRNA-seq) and related computational analysis pipelines have allowed scientists and bioinformaticians to understand the cellular heterogeneity of tissues at an unprecedented level, from manually selecting a single-cell under the microscope to plate-based and droplet-based high throughput methods with multimodal capabilities [8]. Since the publication of the first single-cell transcriptome study based on a next-generation sequencing platform, the number of publications on scRNA-seq addressing development, disease, and bioinformatics tool improvement has exponentially grown [9][10][11][12]. Many of these publications have focused on developmental biology, often involving single-cell studies, as it represents a crucial period during which cells first begin to differentiate [13]. Single-cell transcriptomics studies on Drosophila larval imaginal wing and eye-antennae discs have emerged since 2018 [14][15][16][17][18][19] and shown that single cells could be mapped to the distinct subregions of their respective imaginal discs, thus confirming the spatial expression of genes determined by previous immunostaining methods.
While the Fly Cell Atlas recently performed single-nucleus RNA-sequencing (snRNAseq) on adult Drosophila legs [20], single-cell transcriptomics and epigenomics studies on the developing leg imaginal disc remain lacking due to the challenges of its dissection compared to the larger wing and eye-antennae discs. We thus report the single-cell transcriptomic and epigenomic landscapes of w 1118 third leg discs (L3) across three time points of development of third-instar larvae. We identified and validated a novel, highly expressed lncRNA in the distal epithelial cells that changes its spatial expression at various stages of development and confirms its importance in leg development.

Single-Cell RNA-Sequencing Identifies Four Main Cell Types in L3 Discs
To study the cellular heterogeneity of developing L3 discs, collected embryos were dissected for L3 discs at 121 h (T1), 133 h (T2), and 168 h (T3) after egg laying (AEL) ( Figure 1A) for scRNA-seq. Sequencing statistics showed similar data quality amongst the three samples, including the percentage of mapped reads, percentage of mapped reads aligned to genes, number of cells, and mean reads per cell (Table S1). The L3 disc was identified as a trio of discs on either side of the larval body that differed from the wing and haltere discs in morphology and patterning ( Figure 1B). Cell preparation workflow involved dissection and dissociation of L3 discs into single cells, after which a portion of the cells was used for scRNA-seq and the remaining cells having their nuclei isolated for singlecell assay for transposase-accessible chromatin (scATAC-seq) ( Figure 1C). Both assays used the 10× Genomics platform and the prepared libraries were subjected to sequencing and subsequent data analysis. The integrated dataset overlayed T1, T2, and T3 individual samples and identified four distinct clusters ( Figure 1D). The largest cluster represented the leg disc epithelium, which expressed epithelial markers Fasciclin 3 (Fas3) and narrow (nw) ( Figure 1E) [21]. Expression of Sp1 and Ultrabithorax (Ubx) confirmed that the cells originated from L3 discs [22,23]. The second-largest cluster represented muscle cells, which expressed the muscle markers twist (twi), Holes in muscle (Him), Secreted protein, acidic, cysteine-rich (SPARC), tenectin (tnc), cut (ct), Amalgam (Ama), and terribly reduced optic lobes (trol) [17,24,25]. The identity of the immune cell cluster was determined by the expression of regucalcin, Hemolectin (Hml), Peroxidasin (Pxn), Transferrin 1 (Tsf1), and reversed polarity (repo) [19,26,27]. The smallest cluster represented the neuronal cells, which expressed found in neurons (fne) and couch potato (cpo) [15,28]. The relative expression levels of the marker genes were tabulated (Table A1). The third leg disc was differentiated from other leg discs as it occurred as a mid-size disc with a concentric ring-like pattern at its center within a trio of discs on bilateral sides of the larvae which also included the wing and haltere discs. (C) Flowchart of scRNA-seq and scATAC-seq experiments. Dissected leg discs were dissociated into single cells, a portion of which were used for scRNA-seq with the remaining cells having their nuclei isolated for scATAC-seq. Both scRNA-seq and scATAC-seq used the 10× Genomics Chromium Controller and proceeded with their respective library preparation protocols, sequencing, and data analysis. (D) UMAP visualizations of the scRNA-seq data show that T1, T2, and T3 overlay each other, although the four identified cell types were quite segregated.

Subclustering of the Main Cell Types Reveals Cell Subtypes
The muscle cell cluster was composed of early and late muscle cell subclusters ( Figure 1F). The early cells expressed tenectin (tnc), terribly reduced optic lobes (trol), cut (ct), maternal gene required for meiosis (mamo), Thor, kin of irre (kirre), roughest (rst), and rolling pebbles (rols) ( Figure 1G) [17,19,24,25,29]. The late cells expressed Holes in muscle (Him), twist (twi), Myocyte enhancer factor 2 (Mef2), muscleblind (mbl), and Fasciclin 2 (Fas2) [19,30]. Early muscle cells increased expression of late muscle cell marker Fas2 over time in terms of both expression level and the number of cells that expressed this gene ( Figure S1). Late muscle cell marker Mef2, a skeletal muscle differentiation transcription factor, similarly increased expression in the late muscle cell subcluster over time in terms of both expression level and the number of cells that expressed the gene. The heatmap of the most upregulated genes in the early and late muscle cells showed a distinction in upregulated genes between the two subclusters ( Figure S2).
The leg disc epithelium cluster was subclustered into six cell subtypes, including the distal, medial, and proximal cells as well as stem cell-like cells, such as those of the proximaldistal-axis (PD axis) and anterior-posterior-axis (AP axis), and cells of undetermined fate ( Figure 2A). Identification of fate undetermined cells was based on their top upregulated DEGs and gene ontology analysis of the DEGs (Tables A2 and A3). The distal cells expressed the markers aristaless (al), C15, and Distal-less (Dll) ( Figure 2B) [22]. The medial cells expressed dachshund (dac) and Dll, while the proximal cells expressed teashirt (tsh) and homothorax (hth) [22]. The PD axis cells expressed vestigial (vg), spalt-related (salr), and spalt major (salm) [39], and the AP axis cells expressed hase und igel (hui) (FlyBase ID FBgn0033968).

Identification of a Long Non-Coding RNA of Unknown Function in Distal Cells
The most upregulated genes in each leg disc epithelium subcluster are shown in a heatmap ( Figure 2C). The genes colored black represent known markers for their respective subclusters, those colored blue represent genes with known functions as potential markers for their respective subclusters, and the genes colored red represent genes with unknown functions as potential markers for their respective subclusters. lncRNA (lncRNA:CR33938) is unique because the 10× 3 gene expression kit uses oligo(dT) primers to detect polyAtailed transcripts, which mostly include mRNAs. However, lncRNA:CR33938 expression was observed in this study ( Figure 2D). Indeed, lncRNA:CR33938 was identified in other studies by using polyA + bulk RNA-sequencing [40,41]. Upon splitting the integrated data into its respective samples (T1, T2, and T3), lncRNA:CR33938 expression was negligible in T1, appeared more widespread in T2 and became specific to the distal cells in T3. Note that while most distal cells expressed lncRNA:CR33938, a small subset of medial and proximal cells also expressed the lncRNA.

Experimental Validation of lncRNA:CR33938 Expression in L3 Discs
Fluorescence in-situ hybridization (FISH) of lncRNA:CR33938 in T1, T2 and T3 L3 discs was performed alongside region-delineating controls Dll and dac ( Figure 2E). Expression of only Dll represented the distal cells, while co-expression of Dll and dac or only dac represented the medial cells. LncRNA:CR33938 expression in T1 L3 discs did not occur. LncRNA:CR33938 expression in T2 L3 discs was present in the proximal, medial, and distal cells, while lncRNA:CR33938 expression in T3 L3 discs was most prominent in distal cells, although medial cells also showed more limited expression. More specifically, six regions of large punctated lncRNA:CR33938 expression were observed in T3, including four regions of high expression and two regions of lower expression. Three of the six regions were within the distal cells, while the other three regions were outside of the distal cells, suggesting expression of lncRNA:CR33938 in cells other than the distal cells, namely the medial cells. The proximal cells also had a low level of lncRNA:CR33938 expression, as displayed by a tint of red fluorescence peripheral to the medial cells delineated by the dac marker. In fact, Figure 2D did show other cells expressed the lncRNA, but these cells represented only small subsets of the subclusters. Thus, these FISH results corroborated the scRNA-seq data. Note that the punctated regional expressions may suggest localized lncRNA:CR33938 function in aggregates.

Conservation of lncRNA:CR33938 in Insect Species
The conservation state of lncRNA:CR33938 across 124 insect species revealed that the lncRNA had a high conservation level in exon regions ( Figure 3A). Moreover, a comparison with the conservation state of all 2258 lncRNAs annotated in the reference annotation suggested that lncRNA:CR33938 was more conserved than 90% of the other lncRNAs ( Figure 3B). These concordances reflected a critical regulatory role of lncRNA:CR33938 in insect development. across insects is greater than 0.8. (C) Overexpression of lncRNA:CR33938 in S2 cells produced an increase in the expression of leg development genes, including PD axis genes, distal leg tarsal discor, and medial leg tibial dac according to qPCR. There was no effect on proximal leg femur genes. *, ***, **** equate to p-values of less than 0.05, 0.001 and 0.0001, respectively.

Overexpression of lncRNA:CR33938 in S2 Cells
Transient overexpression of full-length lncRNA:CR33938 in S2 cells produced an approximately 40,000-fold increase in expression level compared to the empty vector control ( Figure 3C) according to qRT-PCR. Correspondingly, expression of the PD axis genes (Hh, wg, and dpp) showed an increasing trend upon lncRNA:CR33938 overexpression. While there was no effect on the expression of genes controlling proximal leg femur growth, expression of distal leg tarsal disco-r and medial leg tibial dac significantly increased with lncRNA:CR33938 overexpression. This corroborated the scRNA-seq and FISH data that lncRNA:CR33938 more greatly affected (and was normally expressed in) the distal end of the leg. The sample-wide integrated scATAC-seq dataset showed an overlaying of the T1, T2, and T3 individual samples ( Figure 4A) and identified twelve distinct clusters based on differences in chromatin accessibility ( Figure 4B). A heatmap distinguished the proportion of cells in each cluster at each timepoint and showed differences in chromatin accessibility and cell composition across three timepoints ( Figure 4C). For example, cluster 6 (C6) showed greater than 80% of the cells in T1, a small proportion of the cells in T2, and nearly no cells in T3. Similarly, T1 had many cells from cluster 12 (C12) and cluster 2 (C2). Upon integration of the chromatin accessibility data with the gene expression data, seven cell types identified in scRNA-seq were transferred to the scATAC-seq clusters ( Figure 4D). These cell types corresponded to the cell subtypes of the PD axis of the leg disc epithelium (proximal, medial, and distal cells) as well as those of the muscle, neuronal, and immune cells.
The cell type identities were confirmed by an inferred gene score of chromatin accessibility for a list of known marker genes specific to the cell types ( Figure 4E). Similar to the gene expression data in scRNA-seq, high gene scores of Sp1 and Ubx confirmed that the cells originated from L3 discs. All cells that composed the leg disc epithelium (proximal, medial, and distal cells) showed markers Fas3 and nw. The presence of Dll only (without dac), al, and C15 confirmed the identity of the distal cells. Dll (with dac) and dac only confirmed the identity of the medial cells. Similarly, tsh and hth were markers for the proximal cells, while Him and twi represented the muscle cells. The neuronal cells showed high gene scores for nervana 3 (nrv3) and complexin (cpx), and the immune cells produced high scores for Hml (for hemocytes) and Pxn (for plasmatocytes) markers.
The most enriched motifs for each cell type are shown as a heatmap ( Figure 4F). The GATA motifs were evident in the immune cells, with several GATA family members observed. The PRDM9 and HIF2a.bHLH motifs were highly enriched in the medial and distal cells, respectively. While the NRF motif was enriched in the proximal cells, its enrichment was more evident in the neuronal cells. The muscle cells were enriched in many motifs, including Maz, KLF14, ZNF, Egr2, Olig2, Egr1, KLF10, and Klf9. The neuronal cells were also enriched for many motifs, including MyoD, Myf5, E2A, PAX5, MyoG, Tcf12, EKLF, Ascl1, and NRF.

Chromatin Accessibility Differentiated the T1, T2, and T3 Distal Cell Functions
Gene set enrichment of T2 genes relative to T1 and T3 genes relative to T1 showed differences in cellular processes ( Figure 4G). The T2 distal cells were more involved in metabolic processes, while the T3 distal cells had a larger role in chitin-based larval cuticle development.
Fragment coverage within the 40,000 base pairs on either side of the distal cell marker gene Dll showed increased coverage in the distal cells with marked co-accessibility in neighboring genes ( Figure 4H). Distal cell marker gene C15 similarly displayed increased coverage in the distal cells with marked co-accessibility in neighboring genes.

Discussion
We used scRNA-seq and scATAC-seq to explore the Drosophila L3 disc transcriptomic and epigenomic landscapes, respectively, at three timepoints of development. The multiomics datasets corroborated each other and showed similar cell types that delineated the various regions of the leg disc, namely, those along the PD axis. Moreover, scRNA-seq identified an experimentally validated late-stage distal-specific and conserved lncRNA (lncRNA:CR33938) that, upon further characterization by overexpression studies, promoted distal leg growth gene expression. In addition, differences in chromatin accessibility determined by scATAC-seq indicated the disparate functions of early-and late-stage distal cells.
Given that the three legs of Drosophila differ in their developmental programs, their underlying differences cannot be ignored when studying leg disc development [23]. Subsequently, we specifically isolated the third leg disc to provide a more coherent single-cell atlas.
Simultaneous multi-omics library preparation methods, where the same cell or nuclei are used for different single-cell assays, were not available at the time these experiments were completed. As a result, the same cell suspension was used for both scRNA-seq and scATAC-seq to minimize biological variation. Furthermore, the limited number of cells extracted per leg disc prevented the execution of multiple experiments of biological replicates, in which one experiment consisted of one replicate. Rather, a single assay comprised of many biological replicates was conducted for each time point.
The computationally determined assignment of cell types to clusters depended upon the most upregulated genes in each cluster and prior information about cell type-specific marker genes. In addition to the prominent distal, medial, and proximal cell types, cells that did not express explicit marker genes denoting specific cell types represented early developing cells with undetermined fate, which we referred to as "stem-cell like cells".
We found a large cluster of epithelial cells and a smaller cluster of muscle cells in the L3 discs, which corroborated previous studies that have shown the presence of many epithelial cells and accompanying muscle cells in the wing discs of third-instar larva [15][16][17]. Previous studies have suggested that the epithelial cells of the wing disc can be mapped to distinct subregions, including the pouch, hinge, notum, and peripodial membrane [16]. Similarly, the epithelial cells of the leg disc could be mapped to distinct proximal, medial, and distal subregions. Regarding muscle cells, research has shown that they can be subcategorized into direct and indirect flight muscles [17]. Given this finding, we also subcategorized the L3 disc muscle cells based on early versus late muscle development genes.
Our results also demonstrated the presence of neuronal and immune cells in L3 discs, which corroborates the recent single-nucleus transcriptomics study on the adult fruit fly leg by the Fly Cell Atlas showing the presence of various differentiated neurons as well as hemocytes and glial cells [20]. This illustrated that the neuronal cells in the developing leg disc have not yet differentiated, though they can subsequently differentiate.
Despite the publication of several works on single-cell transcriptomic landscapes of the Drosophila wing and eye-antennae imaginal discs [14][15][16][17][18][19], this study is the first to describe the transcriptomic and epigenomic landscape of the leg disc, specifically the third leg disc, at single-cell resolution. The Fly Cell Atlas study determined the singlenucleus transcriptomic atlas of the adult fruit fly leg, but it was based on fully differentiated tissues [20]. Conversely, our work was based on developing tissue and characterized the importance of an identified lncRNA. lncRNAs tend to have lower expression levels than protein-coding genes [42]. The detection of lncRNA:CR33938 by our polyA-tailed single-cell transcriptomics assay indicated that it had a robust level of expression and suggested that it had an important physiological function in leg development given that lncRNA expression is environment-specific [42].
Our work highlighted the spatiotemporal expression of lncRNA:CR33938. It was largely distal-specific, as suggested by the scRNA-seq and FISH results, despite some cells other than the distal cells also showing expression. We also provided evidence that lncRNA:CR33938 may have an important role in leg development. We used a previously published list of larval stage genes that establish the PD axis of the Drosophila leg [43]. LncRNA:CR33938 promoted tarsal leg growth gene expression upregulation, namely disco-r. The expression of its paralog, disco, is maintained by Dll, and disco gene function is also required for the maintenance of Dll expression [44]. Given the important function of disco in maintaining a key gene in PD axis development, this suggests that lncRNA:CR33938 has an important potential role in leg development. Note that dac was also upregulated by lncRNA:CR33938 overexpression. This was not surprising, as the lncRNA was also expressed in the medial cells in T2. This suggests that lncRNA:CR33938 exhibits a spa-tiotemporal role, in which it first modulates medial leg cell fate early in development and then distal leg cell fate modulation later in development. This hypothesis could be tested by using the GAL80 temperature-sensitive and GAL4 with UAS system of flies to spatiotemporally overexpress lncRNA:CR33938 and to assess leg phenotype, such as leg length. Prior to this study, lncRNA:CR33938 did not have an annotated function, but our study indicated that it may influence late-stage distal, and perhaps mid-stage medial, leg growth.

Fly Maintenance and Stocks
The w 1118 Drosophila melanogaster fly line was obtained as a gift from Prof. Edwin Chan's lab at The Chinese University of Hong Kong. All flies were maintained at room temperature in regular light-dark cycles in vials containing standard cornmeal agar medium (Nutri-fly, #66-112).

Fly Breeding Schedule for T1, T2, and T3
Male and female w 1118 flies were allowed to mate for 2 h at room temperature in a clear plastic cup with an attached petri dish containing apple juice agar. After this time had elapsed, embryos were transferred from the apple juice agar plate to a vial containing standard cornmeal agar medium. They were then allowed to grow for 121, 133 or 168 h, corresponding to T1, T2, and T3, after which the third leg discs of these third-instar larvae (L3) were dissected.

Third Leg Disc Dissection and Single Cell Dissociation
At least 70 L3 discs were dissected for T1, and at least 50 L3 discs each were dissected for T2 and T3. These discs were collected in an Eppendorf tube containing phosphatebuffered saline (PBS) with 0.04% bovine serum albumin (BSA) on ice. After pipetting out the PBS from briefly centrifuged samples, we added TrypLE Select Enzyme (10×) (ThermoFisher, Waltham, MA, USA, #A1217702). The discs were then incubated in a thermomixer shaken at 500 rpm for 25 min at 37 • C (with the tube being flicked every five minutes). S2 medium (Gibco, Waltham, MA, USA, #21720) supplemented with 10% fetal bovine serum and 2% penicillin/streptomycin were then added to stop the dissociation reaction. Finally, the isolated single cells were washed and resuspended in PBS + 0.04% BSA.

DNA Library Preparation and Sequencing
The complementary DNA (cDNA) libraries for T1, T2, and T3 were prepared according to the 3 scRNA-seq library preparation protocol (v3.1) of 10× Genomics. In summary, a microfluidics chip was used to produce GEMs (Gel Bead-in-Emulsions), which are droplets that each contain a single microbead with attached oligonucleotides that include a unique cell barcode, a single cell, and reverse transcription reagents. When the single cell lyses within the intact GEM, the cellular polyA-tailed transcript sequences become exposed, reverse transcription occurs, and each cDNA transcript within the same cell receives the same cell barcode with a different UMI (unique molecular identifier). Subsequently, the droplets lyse and the cell-barcoded cDNA from all cells are pooled and amplified. cDNA library construction involved fragmentation, end-repair, A-tailing, double-sided size selection, and sample index incorporation. Quality control and qualitative analysis of the final library were performed on an Agilent Bioanalyzer DNA High Sensitivity chip (Beijing, China). Sequencing of the libraries was completed on the Illumina NovaSeq6000 platform by Novogene (Beijing, China).

scRNA-seq Raw Data Processing, Quality Assessment, and Filtering
The raw paired-end sequencing data files (Fastq) were processed using the Cell Ranger pipeline v4.0.0 with default settings. Read alignment and UMI counts were based on a BDGP6 genome reference fasta file and annotated by a BDGP6.28 gtf file devel-oped by Ensembl. Cell-UMI count tables were loaded into Seurat v4.0 [45]. Cells with 1000-250,000 UMI counts and less than 5% mitochondrial genes were used as filtering gates to select cells for downstream analysis. We only kept genes with at least 20 UMI counts in all cells. Qualimap (v2.2.1) was further used to assess the percentage of mapped reads and percentage of mapped reads aligned to genes for comparison between T1, T2, and T3. 4.6. scRNA-seq Data Integration, Clustering, and Cell Type Identification The T1, T2 and T3 filtered single-cell datasets were merged and integrated using Seurat (v4.0.) Batch effects between samples were corrected using Harmony (v1.0) prior to clustering analysis. PCA was used to determine the optimal dimension for dimensionality reduction, and clustering was performed based on K-nearest neighbor (KNN) graphs with a resolution of 0.02 before UMAP visualization of the single-cell data in two dimensions. The major cell types of the clusters were identified based on known marker genes, and these marker genes were listed among the most upregulated differentially expressed genes compared to other clusters. All clusters were further subclustered into constituent cells based on known marker genes. Dotplots, featureplots, heatmaps, and UMAPs were then generated. Other than the known marker genes, novel genes were also identified as potential markers.

4.7.
Validation of scRNA-seq Results by FISH and Confocal Imaging w 1118 flies were bred and T1, T2, and T3 L3 discs were dissected as described above. The discs were fixed in 3.7% paraformaldehyde on ice for 30 min. Then, probe hybridization was completed according to the protocol provided by Molecular Instruments. The discs were first permeabilized in a detergent solution containing sodium dodecyl sulfate and Tween-20, before custom-designed probes for Dll, dac, and lncRNA:CR33938 were hybridized to the fixed and permeabilized discs for 20 h at 37 • C. After several washes with 5X SSC-Tween-20, hairpins with different fluorophores for each probe were added and incubated for 16 h in darkness at room temperature. The discs then underwent another several washes with 5X SSC-Tween-20 and were mounted onto a Menzel-Glaser Superforst Plus microscope slide (Thermo Scientific, Waltham, MA, USA, #J1800AMNZ) with a Hydromount mounting medium (National Diagnostics, Charlotte, NC, USA, #HS-106) and a 22 × 50 mm Deckglaser microscope coverglass (VWR, Radnor, PA, USA, #630-1461). The mounts were visualized on a Leica SP8 confocal microscope and each sample was imaged every 0.25 µm along the z-axis. The confocal images were z-stacked and processed with Leica Application Suite software.

Construction of lncRNA:CR33938 Expression Vector for Expression Studies
Total RNA was extracted from D. melanogaster L3 discs using NucleoZOL (Macherey-Nagel, UK, #740404.200) following the manufacturer's protocol. Following RNase-free DNaseI (Thermo Scientific #EN0521) treatment and DNaseI inactivation by EDTA, the purified RNA was subjected to cDNA generation using PrimeScript II (Takara, Japan, #RR036A). The cDNA concentration was measured using the Qubit High Sensitivity double-stranded DNA assay.
lncRNA:CR33938 was amplified with PCR from the cDNA using the following primers with restriction site sequences inserted: forward primer 5 -TTTGGTACCTTGAGTCCGAGAGGTT-3 and reverse primer 5 -CGCTCTAGACTCTTTTTTTGGTAGCCTATT-3 ). The amplicon and the pAc5.1/V5-His B expression vector (Invitrogen, Waltham, MA, USA, #V411020) were digested with KpnI (New England Biolabs, USA, #R3142) and XbaI (New England Biolabs, Ipswich, MA, USA, #R0145) restriction enzymes and subsequently ligated using T4 DNA ligase (Invitrogen #15224017). The ligation mixture was transformed to chemically competent Escherichia coli (Invitrogen, #C404003) and selected using 100 mg/mL of ampicillin. The sequence of the lncRNA:CR33938 construct cloned into the expression vector was verified by Sanger sequencing at the Beijing Genomics Institute. Transfection-ready plasmid DNA was extracted using a Plasmid Miniprep kit (Invitrogen, #K210011).

S2 Cell Culture and Transfection
D. melanogaster S2 cells were provided by Prof. Jerome Hui from the School of Life Sciences of The Chinese University of Hong Kong. S2 cells were cultured in Schneider's Drosophila Medium (Gibco #21720) supplemented with 10% heat-inactivated fetal bovine serum (Gibco #10270) and 1% penicillin-streptomycin antibiotic mixtures (Gibco #15140122) in a 25 • C humidified incubator. The cloned pAc5.1-lncRNA:CR33938 construct was transfected into S2 cells using Effectene (Qiagen, Germantown, TN, USA, #301425) and the pAc5.1 backbone vector was used as a negative control. The cells were then incubated at 25 • C for 48 h prior to RNA extraction for qRT-PCR.

RNA Extraction and qRT-PCR of S2 Cells
RNA was extracted with NucleoZOL (Macherey-Nagel, Allentown, PA, USA, #740404.200). Following RNase-free DNaseI (Thermo Scientific #EN0521) treatment and DNaseI inactivation by EDTA, the purified RNA was subjected to cDNA generation using PrimeScript II (Takara, Tokyo, Japan, #RR036A). The cDNA concentration was measured with a Qubit High Sensitivity double-stranded DNA assay. For qRT-PCR, 1 ng of template cDNA and 1xTB Green II (Takara #RR820) were added to each well of a 96-well plate (Axygen, Union City, CA, USA, #PCR-96-FSC) and covered with an optical adhesive film (Applied Biosystems ABI, Waltham, MA, USA, #4311971) prior to execution on a BioRad CFX96 real-time PCR detection system. Primer sequences for each tested gene are listed in Table S2.

Nuclei Isolation for scATAC-seq
The same suspension of single cells used for scRNA-seq was used for nuclei isolation for scATAC-seq. Nuclei isolation was performed according to a 10× Genomics low input protocol for scATAC-seq with some optimizations. The cell suspension was pelleted and lysed on ice in a buffer containing the detergents Tween-20 and nonidet-P40 (NP40) for 30 s. The isolated nuclei were then washed twice and resuspended in chilled 10× diluted nuclei buffer (provided by 10× Genomics). Trypan blue stained nuclei were observed under the microscope to assess nuclei quality.

DNA Library Preparation and scATAC-seq
DNA library preparation was performed according to the scATAC-seq preparation protocol (v1.1) of 10× Genomics. First, the nuclei suspensions were incubated in a transposition mix that included a transposase that preferentially fragments the DNA in open regions of the chromatin. Simultaneously, adapter sequences were added to the ends of the DNA fragments. As in scRNA-seq, a microfluidics chip was used to produce GEMs (Gel Bead-in-Emulsions), but in this case, the droplets contained a single microbead with attached sequences consisting of a unique cell barcode, a single nucleus, and DNA amplification reagents. Once the DNA from each nucleus was barcoded, all nuclei were pooled for DNA library construction. Because only the histone unbound areas of the genome are cut by the transposase, the library consisted of DNA fragments that represented the open chromatin regions of the genome. Quality control and qualitative analysis of the final library were performed on an Agilent Bioanalyzer High Sensitivity DNA chip. The libraries were sequenced on the Illumina NovaSeq6000 platform by Novogene at PE50 with a sequencing depth of approximately 50,000 read pairs per cell.

scATAC-seq Data Analysis
The raw paired-end sequencing data were processed by Cell Ranger ATAC pipeline v2.0.0 with default settings, using a dm6 UCSC reference generated by the 10× Genomics mkref function. Data processing, filtering, dimensionality reduction, and clustering were performed with ArchR v1.0.1 [46]. UMAP visualizations of the scATAC-seq clusters were created before and after integration with scRNA-seq data. Determination of cell type identities were aided by manual annotation of cell type-specific marker genes based on gene scores estimated from the chromatin accessibility data. Peak calling with MACS2 v2.2.7.1 [47] was performed on each cell cluster. Identification of robust peak sets allowed the prediction of enriched transcription factor motifs for each cluster. Gene ontology analysis was performed using Cluster Profiler to determine the enriched distal process in T2 and T3 relative to T1. Genome browser plots depicting co-accessibility of distal genes with nearby genes were also generated using ArchR.

Data Availability Statement:
The sequencing data presented in this study are openly available in NCBI SRA with BioProject ID PRJNA831899.

Conflicts of Interest: The authors declare no conflict of interest.
Appendix A Table A1. Relative expression levels of marker genes. The average log2FC expression levels of the marker genes displayed in Figures 1 and 2