Large-Scale Dataset for the Analysis of Outdoor-to-Indoor Propagation for 5G Mid-Band Operational Networks

: Understanding radio propagation characteristics and developing channel models is fundamental to building and operating wireless communication systems. Among others uses, channel characterization and modeling can be used for coverage and performance analysis and prediction. Within this context, this paper describes a comprehensive dataset of channel measurements performed to analyze outdoor-to-indoor propagation characteristics in the mid-band spectrum identiﬁed for the operation of 5th Generation (5G) cellular systems. Previous efforts to analyze outdoor-to-indoor propagation characteristics in this band were made by using measurements collected on dedicated, mostly single-link setups. Hence, measurements performed on deployed and operational 5G networks still lack in the literature. To ﬁll this gap, this paper presents a dataset of measurements performed over commercial 5G networks. In particular, the dataset includes measurements of channel power delay proﬁles from two 5G networks in Band n78, i.e., 3.3–3.8 GHz. Such measurements were collected at multiple locations in a large ofﬁce building in the city of Rome, Italy by using the Rohde & Schwarz (R&S) TSMA6 network scanner during several weeks in 2020 and 2021. A primary goal of the dataset is to provide an opportunity for researchers to investigate a large set of 5G channel measurements, aiming at analyzing the corresponding propagation characteristics toward the deﬁnition and reﬁnement of empirical channel propagation models.


Introduction
The New Radio (NR) standard of 5G cellular networks promises to connect a massive number of heterogeneous devices, accommodate high data rates, and support low latency applications [1,2]. The 5G NR supports a large variety of use cases with diverse requirements and in particular enhanced Mobile Broadband (eMBB), Ultra-Reliable Low Latency Communications (URLLC), and massive Machine-Type Communications (mMTC) [3]. In addition to traditional broadband services, novel applications that can benefit from 5G include logistics and shipping, smart grids, smart factories, agriculture, extended reality, healthcare, mission-critical services, and autonomous driving [4][5][6].
The 5G NR operates over two different frequency ranges. Frequency Range 1 (FR1) includes sub-6 GHz frequency bands, which are also referred to as low-band (up to 2 GHz) and mid-band (up to 6 GHz) [7]. Frequency Range 2 (FR2) includes the millimeter-wave (mm-wave) band, from 24.25 to 52.6 GHz.
A significant number of commercial 5G networks are currently under deployment in the mid-band, and in particular in the 3.3-3.8 GHz spectrum portion, called Band n78, which offers a good trade-off between coverage and capacity [8][9][10][11][12]. Therefore, understanding signal propagation characteristics in the mid-band is particularly important especially in the urban scenario where most of the 5G deployments are taking place.
Various 5G use cases target indoor scenarios, such as smart buildings with different types of sensors, factory automation, object tracking in a factory, remote control of robotic machinery, and automated vehicles in logistic applications, to mention a few [13][14][15][16]. Currently, these scenarios are likely to be addressed via outdoor deployments of 5G Base Stations (BS). This emphasizes the need for understanding propagation characteristics in outdoor-to-indoor scenarios.
This paper fills the above gap by making open-source a dataset of outdoor-to-indoor measurements collected on commercial 5G mid-band systems deployed in an urban scenario. To the best of our knowledge, this is the first work to provide a large-scale dataset of measurements performed over commercially deployed 5G mid-band networks in an outdoor-to-indoor scenario.
The paper is organized as follows. The experimental setup and data collection methodology are discussed in Section 2. Section 3 describes the collected data, including an explanation of the dataset structure, while Section 4 draws conclusions and identifies future work made possible by the use of the provided dataset.

Measurement System and Data Collection Methodology
This section provides information regarding the adopted measurement system and the methodology for data collection as well as a description of the locations where measurements were performed.

Measurement System
The measurement campaigns were performed using the Rohde & Schwarz (R&S) TSMA6 toolkit [37], which includes the R&S TSMA6 network scanner and a laptop running the ROMES software. ROMES was used for visual inspection of the ongoing measurement campaigns and for exporting the corresponding collected data. The R&S TSMA6 network scanner is an integrated system comprising a Radio Frequency (RF) omnidirectional antenna and a Global Positioning System (GPS) antenna. The RF antenna was used for passive monitoring of downlink control signals transmitted by 3rd Generation Partnership Project (3GPP) access technologies, i.e., 5G NR, Long-Term Evolution (LTE), and Narrowband Internet of Things (NB-IoT), working at different licensed frequency bands. The GPS antenna was used to identify the geographic information of the measurements being collected. As shown in Figure 1, the used measurement system included an R&S TSMA6 toolkit, RF and GPS antennas, ROMES software, and also a 5G-capable device running performance measurements. Note that performance-related data collected via the 5Gcapable device are not included in the present dataset, which instead contains coveragerelated data collected via TSMA6 scanner, as further detailed in Section 3. More in detail, the R&S TSMA6 scanner was used to measure 5G NR Synchronization Signal Blocks (SSBs) transmitted by the 5G BSs active in the area covered by the measurement campaigns. In particular, TSMA6 was used to (a) provide Automatic Channel Detection (ACD), in order to detect downlink signals by cellular systems operating at different channel frequencies, and (b) decode Physical Cell Identifiers (PCIs)/SSBs information, ultimately providing quantitative coverage measures for all the detected signals. TSMA6 was also used to provide estimated positions of the detected 5G BSs, which were identified by the corresponding PCIs.

Data Collection Methodology
A set of measurement campaigns was performed at the second floor of the Department of Information Engineering, Electronics, and Telecommunications (DIET) of Sapienza University of Rome, Italy. All measurement campaigns were conducted statically in different indoor locations. Figure 2 shows the map of the second floor of the DIET Department, along with the eight specific locations where measurement campaigns were performed. Location 1 and Location 4 are laboratories, Location 8 is a study hall, and all other locations are offices. Green, blue, and red lines represent walls, windows, and doors, respectively. In total, 18 measurement campaigns were performed at the eight locations. During these measurement campaigns, multiple 5G PCIs from four different operators were detected in Band n78. However, the present dataset focuses on the two 5G operators resulting in the highest number of data samples. We refer to these two operators as Op1 and Op2, operating at carrier frequencies of 3649 MHz and 3725 MHz, respectively. Figure 3 shows the PCIs detected at each location for both operators and included in the dataset. Once the measurement campaigns were executed, the corresponding data were exported using ROMES [38]. It was observed that both operators deploy a 5G NR system with a sub-carrier spacing of 30 kHz. It was also noticed that Op1 adopts a beam-based transmission scheme where each PCI transmits multiple SSBs (up to eight SSBs). On the contrary, each PCI of Op2 transmitted a single SSB. A detailed discussion on beam-based transmission is given in [39]. However, it can be observed that due to different propagation conditions, a different number of SSBs was detected at different locations, with a maximum of eight SSBs detected. Moreover, the received power of each SSB also differs due to the direction of the beams associated to the SSBs. Figure 4 shows an example of the received power measured at Location 1 for all the SSBs of the same PCI of Op1 and the unique PCI/SSB pair of Op2. For Op1, the average and median of the received power of SSB 6 are higher than all the other SSBs of Op1. For this reason, SSB 6 is defined as the strongest SSB of Op1 at Location 1.
Following the above definition of strongest SSB, the dataset in this paper provides data on the strongest SSBs for each PCI of Op1 at each location, i.e., those SSBs having the highest average Reference Signal Received Power (RSRP) at each measurement location. For Op2, since each PCI transmits only one SSB, the dataset provides data on a single PCI/SSB pair.  Ultimately, the dataset is composed of so-called channel power delay profiles (PDPs). Each PDP represents a channel measurement at a given location and time. Since the PDP is measured over SSB signals, each PDP sample in the dataset corresponds to a detected PCI/SSB pair. Moreover, since the propagation channel is composed by several propagation paths, each PDP includes its paths with their absolute propagation time in microseconds (µs) and received power (in dBm) values. The total PDP in-band power level is also provided in the dataset (in dBm). From the PDP, various propagation parameters can be calculated, such as the number of paths, inter-arrival times, root mean square delay spread, and mean excess delay. A comprehensive discussion and analysis on such propagation parameters, also including a correlation analysis for them, can be found in [39]. Table 1 summarizes the performed measurement campaigns for each operator and location in terms of number of measurement campaigns and corresponding number of collected PDPs for each location. Note that the dataset includes measurements of PCIs/SSBs for which the number of PDPs, collected at a given location during an entire measurement campaign, was more than 50. We set this threshold empirically after observing that a reasonable estimate of the propagation parameters mentioned above can be obtained when the number of PDPs is at least equal to 50. Therefore, PCI/SSB pairs with less than 50 PDPs are not provided in the dataset.   2) Detected Operators" has two further subfolders named "Op1_PDP" and "Op2_PDP" for PDP measurement data of Op1 and Op2, respectively. Furthermore, subfolder "Op1_PDP" contains five separate subfolders for each measurement location named as "Location_1", "Location_2", "Location_3", "Location_4", and "Location_5". Each of these folders has comma-separated values (CSV) files of PDP measurements. Similarly, subfolder "Op2_PDP" contains seven separate subfolders for each measurement location named as "Location_1", "Location_2", "Location_4", "Location_5", "Location_6", "Location_7", and "Location_8". Each of these folders has CSV files of the measurement data for each conducted campaign and detected PCI. The structure of CSV file format explained in Figure 6. Finally, subfolder "(3) Additional Information" contains a text file with the information of transmit power and estimated position of each PCI. Additionally, the estimated Signal to Interference plus Noise Ratio (SINR) values for each PCI and measurement location are also given in the text file. A description of the variables in the CSV files in subfolders "Op1_PDP" and "Op2_PDP" is given below:

Data Description
• ID represents the sequence number of PDPs. • Longitude and Latitude shows the GPS coordinates of the current measurement location. • PCI refers to the Physical Cell Identity. PCI is a cell identifier used to distinguish the different cells of a 5G system. Cellular BSs are most often referred to in terms of their PCI for their identification.   As a reference example, an excerpt of the dataset containing two collected PDPs is shown in Table 2. Note that each PDP is identified by a unique ID. It follows that in the excerpt reported in Table 2, the first PDP (ID = 1) is composed by three paths, while the second PDP (ID = 2) is composed by seven paths. Table 2. An example of two PDPs collected at the same location and frequency carrier of 3649 MHz. The first PDP (ID = 1) includes three paths, while the second PDP (ID = 2) includes seven paths.

Conclusions
This paper presents a dataset of measurements performed over commercial 5G networks operating in the mid-band. To the best of our knowledge, this is the first 5G dataset collected and made available for operational 5G mid-band networks. The provided dataset contains power delay profiles, and related information, for measurements in outdoor-toindoor scenarios, such as the number of paths, delay and power of each path, and total received power. Measurements originate from multiple campaigns and were collected at different locations. The dataset can serve as the basis for multiple analyses, including comprehensive channel propagation characterizations and modeling, based on real-deployed networks and standardized signals. The presented dataset can also be used for coverage evaluation of 5G communication systems deployed in urban environments. In general, measurements performed in dedicated laboratory setups do not always ensure a proper characterization and modeling of the channel propagation experienced in real-world scenarios. On the other hand, measurements performed on real deployed networks, such as the ones made available in the present dataset, can lead to channel characterization and modeling that better represent real-world scenarios. As a future work, we plan to adopt the same measurement setup to perform novel measurement campaigns on 5G networks, focusing on outdoor scenarios under mobility (e.g., user walking and driving). Hence, we also plan to make the corresponding dataset available in order to trigger further analysis of the propagation characteristics of mid-band 5G systems.

Conflicts of Interest:
The authors declare no conflict of interest.