Official Statistics and Big Data Processing with Artificial Intelligence: Capacity Indicators for Public Sector Organizations

: Efﬁcient monitoring and achievement of the Sustainable Development Goals (SDGs) has increased the need for a variety of data and statistics. The massive increase in data gathering through social networks, traditional business systems, and Internet of Things (IoT)-based sensor devices raises real questions regarding the capacity of national statistical systems (NSS) for utilizing big data sources. Further, in this current era, big data is captured through sensor-based systems in public sector organizations. To gauge the capacity of public sector institutions in this regard, this work provides an indicator to monitor the processing capacity of the public sector organizations within the country (Pakistan). Some of the indicators related to measuring the capacity of the NSS were captured through a census-based survey. At the same time, convex logistic principal component analysis was used to develop scores and relative capacity indicators. The ﬁndings show that most organizations hesitate to disseminate data due to concerns about data privacy and that public sector organizations’ IT personnel are unable to deal with big data sources to generate ofﬁcial statistics. Artiﬁcial intelligence (AI) techniques can be used to overcome these challenges, such as automating data processing, improving data privacy and security, and enhancing the capabilities of IT human resources. This research helps to design capacity-building initiatives for public sector organizations in weak dimensions, focusing on leveraging AI to enhance the production of quality and reliable statistics.


Introduction
Public sector organizations' adoption of advanced information technology, including artificial intelligence, triggers an increase in digital administrative data sources (DADs). This digitalization reveals the various processing, storage, and privacy issues of the host departments and the country's national statistical institutions (NSIs) [1]. Administrative data is one of the six main categories into which large datasets are classified; additionally, it is argued that administrative data fulfill the characteristics of big data if their recording velocity is high [2]. The united nations economics commission of europe (UNECE) defines three main classifications of the types of big data [3]. The first is social networks, i.e., humansourced information, while the second is traditional business systems (process-mediated data) based on data produced by public agencies and businesses, which belongs to the class of "Administrative data" "we have earlier called DADs", and the third is the Internet of Things (IoT), i.e., machine-generated data [4,5]. The IoT connects the physical and digital worlds with gadgets through the Internet and several other protocols. The gadgets generate a significant amount of data that contain essential knowledge about the physical world. Wireless sensor networks (WSNs) are rich big data sources in the IoT, among many other plausible data sources. Different sensor nodes produce big data in large-scale networks [6,7]. With the increasing number of IoT sensing devices available, data generated from public sector organizations are expected to grow exponentially; this digital administrative data fall under the definition of "Big data" [8,9]. DADs are vital for measuring the performances of a country or region's different economic and population measures. For an in-depth review of the literature regarding methodological tools and operationalization for big data in official statistics from national and international organizations, one may refer to Abbas et al. [10].
The demand to "Leave No One Behind" in the 2030 Agenda and its SDGs has increased the need for a variety of data and statistics to support informed policy and decision making in all countries [11]. While a strong official statistical system is vital for a country to formulate, implement, and monitor its development policies, it also assists the government in evidence-based planning [12]. United Nations (UN) agencies are also working to support countries in strengthening their statistical capacity to produce and use data, including artificial intelligence, for better policy formulation and approaches to implement the 2030 development agenda at the national level. Several capacity development projects are underway to empower the NSS of countries worldwide, including workshops, direct country assistance, training/guidance material development, establishment of specialized networks, and study visits [13]. The details of these projects are available on the UN Statistics Division's website.
Rogge, Agasisti, and De Witte [14] presented a global overview of statistical capacity development in their Paris 21's Partner Report on Support to Statistics (PRESS). A support of USD 623 million for statistical capacity building from the donors of development cooperation agencies (UNICEF, IMF, European Commission/Eurostat, UNFPA, World Bank) was provided in 2016 [10]. Lebada [15] argued that giving the data requirements of the SDG monitoring framework precedence over the development of NSS could be a miscalculation. Rather, countries should prioritize the development of an effective NSS that is sufficiently flexible, responsive, and cost-effective to meet the enormous demand of the SDG monitoring framework and national information needs. This enormous expansion of scope and scale raises serious concerns about the capacity of national statistical systems, or what others have termed the "Data eco-system", to implement such a massive monitoring framework [10]. The complexity and ambition of this challenge led Mogens Lykketoft, President of the UN General Assembly, to describe it as an "Unprecedented Statistical Challenge" [15].
The rising demand and importance of good-quality, independent official statistics provide a unique opportunity to make a real and long-lasting investment to improve NSS [16]. Moreover, it is important to improve the NSS at the grassroots level and empower this system with respect to grey areas. Assessing a country's NSS's strengths and weaknesses is vital before launching capacity-building activities. However, no scale, indicator, or index can be used to gauge and rate the efficiency of organizations in the public sector across a nation [11][12][13][14][15][16][17]. Only the World Bank's statistical capability indicator provides a nationwide overview of about 140 developing nations. Furthermore, their capacity indicator is based on a diagnostic methodology created to evaluate the nation's capability using metadata. It is typically accessible to most nations and used to track statistical capacity development over time. The framework of the World Bank's statistical capability indicator is composed of four elements: statistical methodology, data source, periodicity, and timeliness [18,19]. Besides this, we have proposed a micro indicator that provides a comprehensive capacitive assessment of the official statistics system inside a country in line with the World Bank's macro statistics capacity indicator. Several factors (Table 1) relating to the official statistics system and big data processing are considered in this study when measuring the capacity of public sector organizations.
Convex logistic principal component analysis PCA addresses data privacy concerns while playing a significant role in the field of AI. Convex logistic PCA allows companies to mine high-dimensional datasets for insightful information without jeopardizing the privacy of individual users. The secrecy of sensitive information is maintained while dimensionality Systems 2023, 11, 424 3 of 17 reduction and pattern recognition are possible. Convex logistic PCA ensures that data are kept anonymous, reducing the chance of privacy violations. Convex logistic PCA provides a workable approach for using AI while protecting privacy thanks to its capacity to strike a compromise between data value and privacy protection. The proposed capacity indicator gives a small picture of within-country public sector organizations to process large amounts of data through artificial intelligence and produces official statistics from their DADs. A census-based survey, "Survey of Official Statistics Pakistan (SOS-Pak)," was introduced at the national level in Pakistan to monitor capacity indicators. All the federal and provincial government organizations were contacted through post-mail inquiry with email and telephonic follow-up. A one-fourth response of 171 provincial and federal public sector organizations was received regarding different aspects related to the processing of big data with official statistics. Convex logistic principal component analysis (PCA) was used to compute capacity scores as a dimensionality reduction tool. These scores were then transformed into relative capacity indicators (RCIs), which compare and assess the NSS on various dimensions at the organizational level.
In this paper, we have measured the statistical and big data processing capacity of Pakistan's public sector organizations through certain measures. The methodological aspects are covered in Section 2, in which data collection methodology is discussed along with the questionnaire tool and the measures used to compute the statistical capacity indicator. Section 3 contains the descriptive results of the survey used to determine measures in the study and scores were calculated using convex logistic PCA based on the collected measures. A statistical capability indicator was developed to determine the capability of public sector organizations. Finally, the paper is concluded in Section 4.

Methodology
In this article, we have developed statistical and big data processing capacity indicators in comparison with the World Bank's statistical capacity indicators. Our developed indicators are based on primary data collected from 171 public sector organizations. To collect data from public sector organizations, the survey used was the "Survey of Official Statistics Pakistan", i.e., SOS-Pak. A questionnaire was developed to collect these measures. The entire survey methodology is discussed in Section 2.1, explaining the key modules of the questionnaire with measures covered in each module. The data collection process is explained in Section 2.2, key official statistics and big data capacity measures with descriptive findings are discussed in Section 2.3, and the dimensionality reduction AI tools for the development of official statistics and big data processing capacity indicators are discussed in Section 2.4.

Survey Methodology
A questionnaire, SOS-Pak, was designed to obtain measures about the official statistics production and big data utilization for its production (see Supplementary Materials file S1). In the public sector organization operating under the NNS of Pakistan, this survey covered several topics, including data collection, data dissemination, data privacy concerns, reported and unreported data sources, big data literacy, working IT human resource and infrastructure, and rationalization of statistical and data processing human resources. SOS-Pak was carried out at the national level in collaboration with the Bureau of Statistics (BoS) in Punjab, Pakistan, to determine the capacity of public sector organizations in Pakistan to handle big data and to produce official statistics. As a survey tool, a questionnaire with four parts was employed. The self-explanatory questionnaire includes the following modules:

RC-Rationalization of Statistical Cadre
The IP module includes the department's name, respondent's title, basic pay scale, contact information, and organization size. The second OS module, however, addresses issues with data privacy, reported and unreported data sources, and data collection/recording methods for data product dissemination. Abbas et al. [20] reviewed the key aspects of this module that dealt with disclosed and unreported data sources.
The third module, "BD," was created to examine how well public sector organizations might handle using big data sources to provide official statistics. Measures were designed by considering several studies conducted by different UN institutions, UNECE [21], UNSD [22,23], and business firms like Mark et al. [24,25].
The fourth module (RC) was designed to monitor public sector organizations' statistical and non-statistical human resources. A review of employment positions, approved and vacated posts, work duties, and activities performed by the relevant cadres is required to rationalize the statistical cadre and data processing human resources. The target population of the study involved all the public sector Organizational Units (OUs) working under the Federal Government (FG) and Provincial (Punjab) Government (PG). A frame based on 472 OUs of the federal and 286 OUs of the provincial government was used to conduct this census-based survey.

Data Collection
Data were collected based on a survey where each OU of the frame was contacted through postal mail with email and telephonic follow-up (see Supplementary Materials file S2). A postage-paid return envelope was sent to all 758 OUs. The reference period for data collection was from March to August 2017. The questionnaire's IP, OS, and RC modules were requested to be filled in by any well-versed organization officer. In contrast, an IT professional of that responding organization was asked to fill in the BD module. The official statistics capacity indicator takes place through three dimensions with thirteen measures.

Official Statistics and Big Data Processing Capacity Measures
Using convex logistic PCA, this section has created scores based on several metrics relating to official statistics and big data processing capacity. Both the OS and BD capacity measurements' scores are created individually. These scores help investigate the potential that is accessible and the weak areas that require capacity growth in POS using contemporary data sources. The measures listed in Table 1 are used to create scores. Measures are divided into major categories to rationalize the whole OS and BD models into partially segmented models so that developed scores may explain the ideal deviation.

Official Statistics Capacity Measures
Here, the public sector organizations' competence for producing statistics effectively-from data collection and recording to data dissemination-is assessed. This capacity is separated into three partial groups.

(a) Data collection/recording for dissemination
This collection of measures includes those that have to do with creating and disseminating official data. The main metrics related to this first partial category of OS capability scores include data collecting, recording, storage, data gathering functionality, data generation, and product distribution.

(b) Liaison with other departments on data
One of the primary indicators included in this second subcategory of OS competency ratings is the extent to which organizations engage.

(c) Data Privacy
The safeguards for data privacy are covered in this category's section. Measures for the framework to address privacy issues, the degree of confidentiality, and the significance of these sources, as well as a collection of statistics based on needs, are all included in this dimension.

Big Data Processing Capacity Measures
Several indicators are used to assess how effectively public sector organizations can use big or large data sources to produce statistical products or to supply the data needed to produce those products. These indicators are divided into four subcategories here, which are detailed below.
(a) Big data 3Vs The three key aspects of big data-volume, velocity, and variety-and data recording and storage are assessed to determine the extent of big data usage. Scores that can be used to verify and compare the level of big data literacy in public sector organizations are produced using knowledge of big data, its value for POS, and its utility in organizational planning and decision-making.
(c) Big data workings This dimension also includes looking for possible sources of big data, working together between the public and private sectors to process big data, and working with other groups to process and handle big data. This dimension includes how groups in the public sector use big data, and whether they do so now or plan to do so soon.

(d) Big data skills
This dimension is the main sub-component of the big data processing capability measures. Here, a measure for the computation of BD capacity scores is the availability of adequate IT infrastructure, IT human resources, statistical expertise of IT professionals, and training requirements for IT professionals in public sector organizations. Furthermore, the detailed rationalization of statistical cadre in public sector organizations is explained.

Official Statistics Capacity Indicator
This includes four dimensions with 18 measures:
Big data skills.
PCA is a well-known data visualization, feature analysis, and compression technique. A logistic PCA for the binary data version of this method was developed [26]. The PC scores in logistic PCA are, like in normal PCA, linear combinations of the saturated model's natural parameters. Both logistic PCA and convex logistic PCA are based on the same ideas. Logistic PCA minimizes rank k projection matrices, which is the only distinction. Contrarily, convex logistic PCA minimizes over the Fan tope, which is a convex hull of the rank k projection matrix.
The scores are generated individually for the OS and BD capacity models using convex logistic PCA. The scores are subsequently converted into RCIs via the vector transformation technique. By applying the linear transformation, the RCIs are calculated as follows.
where k is the PC scores vector of the ith entity, and m is the upper record value of the vector. Using this index, each organization can be generally evaluated compared to the one with the highest score. When the RCI is 100, it identifies the most competent organization among a group of OUs in terms of a given metric.

Results and Findings
This section contains key descriptive findings and results of AI-based dimensionality reduction tools for the development of official statistics and big data processing capacity indicators.

Key Descriptive Findings
This survey covers a total response of 23% of OUs, i.e., the results are presented based on a net response of 171 public sector OUs. The respondents of this survey were the top managers of their respective organizations. The net sample includes mixed-size OUs from small (<50) to large (>1000) employment sizes.
The following are the key findings related to the OS Module. In this section, we looked into and gathered data on several measures relating to Pakistani public sector organizations' capacity to process big data.
(a) Data recording and storage system Most OUs collect, document, and store data in some way. The study's findings show that 94% of OUs record the data they acquire electronically, compared to only 6% of departments that do the same manually. The value of DADs depends on how easily their storage system is accessed. To analyze digital data storage in the public sector, respondents from the IT department were asked to explain the implemented system in their organizations. Figure 1 compares data saved in various ways on an FG and PG level. (b) 3Vs of data The 3Vs hold new opportunities for developing new official statistics and restructuring existing ones. High volume may produce more accurate and detailed statistics, high velocity may give more frequent and timely statistics, and high variety may lead to an official multidimensional statistic. Here, we have captured these characteristics in the public sector organizations of Pakistan. Table 2 can aid understanding of the public sector departments' volume and velocity of data. The Table displays a crosstab of the administered data's volume and velocity. Big data sources are absent from 23.6% of OUs overall, they are present in 33.8% of OUs with certainty, and 42.6% of OUs may have them but need further explanation.  The 3Vs hold new opportunities for developing new official statistics and restructuring existing ones. High volume may produce more accurate and detailed statistics, high velocity may give more frequent and timely statistics, and high variety may lead to an official multidimensional statistic. Here, we have captured these characteristics in the public sector organizations of Pakistan. Table 2 can aid understanding of the public sector departments' volume and velocity of data. The Table displays a crosstab of the administered data's volume and velocity. Big data sources are absent from 23.6% of OUs overall, they are present in 33.8% of OUs with certainty, and 42.6% of OUs may have them but need further explanation. The survey results regarding the variety of data revealed that 90% (141/157) of OUs reported recording numerical data, 82% (128/157) of OUs reported recording text data, 50% (79/157) of OUs recorded graphic data, and 4.5% (7/157) of OUs recorded other data types.

Volume of Data
(c) Big data literacy IT personnel in public sector organizations were asked to complete a survey to determine their level of big data literacy. The respondents were given a broad questionnaire, and when asked if they had ever heard the word "big data" before, the question was phrased as, "Do they ever hear the term BIG DATA before now?" In total, 101 out of 154 OUs, or 66%, said they had heard of big data. The three-point Likert scale results in Figure 2 demonstrate that the public sector is fully aware of the value of using big data to save time and money while enhancing the accuracy of official production statistics (POS). The questionnaire asked respondents to describe their past, present, and future use of big data. Currently, 8% (13/159) of OUs are utilizing both administrative and non-administrative big data sources. Below is a list of the preferences of the FG and PG OUs for potential big data sources as described by UNECE. Due to its high potential, usability, and applicability in public sector functioning, the administrative record is considered to be a top preference of public sector organizations. It is important to note that the next five classifications aside from "Administrative Data" typically rely on unstructured data collections.
(e) IT personnel to handle Big Data It is essential to have competent IT staff to manage big data sources. Out of the 153 OUs that responded, 44 indicated they had IT staff that were highly trained. The IT people resources in the separate organizations employ data processing tools wisely. Figure 3 shows the percentage of different data processing technologies that IT HR possesses and The questionnaire asked respondents to describe their past, present, and future use of big data. Currently, 8% (13/159) of OUs are utilizing both administrative and nonadministrative big data sources. Below is a list of the preferences of the FG and PG OUs for potential big data sources as described by UNECE.
Opinion Records (19%, 31/160) Due to its high potential, usability, and applicability in public sector functioning, the administrative record is considered to be a top preference of public sector organizations. It is important to note that the next five classifications aside from "Administrative Data" typically rely on unstructured data collections.

(e) IT personnel to handle Big Data
It is essential to have competent IT staff to manage big data sources. Out of the 153 OUs that responded, 44 indicated they had IT staff that were highly trained. The IT people resources in the separate organizations employ data processing tools wisely. It is important to note that the next five classifications aside from "Administrative Data" typically rely on unstructured data collections.
(e) IT personnel to handle Big Data It is essential to have competent IT staff to manage big data sources. Out of the 153 OUs that responded, 44 indicated they had IT staff that were highly trained. The IT people resources in the separate organizations employ data processing tools wisely. Figure 3 shows the percentage of different data processing technologies that IT HR possesses and employs. (f) IT infrastructure to deal with big data (f) IT infrastructure to deal with big data Adequate IT infrastructure and a knowledgeable IT staff are crucial to fulfill modern data processing requirements. Of the 153 OUs examined, 36 (23.5%) were found to have well-equipped IT infrastructure and sufficient resources to meet their big data processing needs. It was observed that most FG and PG OUs rely on structural databases (SQL DBMS).

(g) Statistical capacity of IT human resources
This question was posed to determine the statistical capacity of IT human resources in public sector organizations. The paucity of advanced statistical abilities is clearly illustrated by the ratio of various statistical skills among IT employees in public sector OUs ( Figure 4). The first step towards data insights is data visualization, which only 19% of IT employees are proficient in. This circumstance highlights the urgent necessity for public sector organizations to strengthen the statistical capacity of IT HR. Adequate IT infrastructure and a knowledgeable IT staff are crucial to fulfill modern data processing requirements. Of the 153 OUs examined, 36 (23.5%) were found to have well-equipped IT infrastructure and sufficient resources to meet their big data processing needs. It was observed that most FG and PG OUs rely on structural databases (SQL DBMS).

(g) Statistical capacity of IT human resources
This question was posed to determine the statistical capacity of IT human resources in public sector organizations. The paucity of advanced statistical abilities is clearly illustrated by the ratio of various statistical skills among IT employees in public sector OUs ( Figure 4). The first step towards data insights is data visualization, which only 19% of IT employees are proficient in. This circumstance highlights the urgent necessity for public sector organizations to strengthen the statistical capacity of IT HR.

(h) Training needs of IT human resources
Participants were asked to describe their demands by checking the desired answer to investigate the demand for training in data processing technologies by public sector organizations. In total, 76% (117/154) of OUs aim to train their IT employees in various data processing tools in the near or distant future to meet the demands of today's data processing difficulties. Figure 5 illustrates the percentage distribution of training needs for various data processing software among public sector organizations. Other requirements

(h) Training needs of IT human resources
Participants were asked to describe their demands by checking the desired answer to investigate the demand for training in data processing technologies by public sector organizations. In total, 76% (117/154) of OUs aim to train their IT employees in various data processing tools in the near or distant future to meet the demands of today's data processing difficulties. Figure 5 illustrates the percentage distribution of training needs for various data processing software among public sector organizations. Other requirements were discovered for GIS, BI, ACL Analytics, Oracle, and STATA. (h) Training needs of IT human resources Participants were asked to describe their demands by checking the desired answer to investigate the demand for training in data processing technologies by public sector organizations. In total, 76% (117/154) of OUs aim to train their IT employees in various data processing tools in the near or distant future to meet the demands of today's data processing difficulties. Figure 5 illustrates the percentage distribution of training needs for various data processing software among public sector organizations. Other requirements were discovered for GIS, BI, ACL Analytics, Oracle, and STATA.

(i) IT and Statistical outsourcing
Organizations in the public sector were found to have lower IT and statistical capabilities than those in the private sector. Public-private partnerships are essential if the government system is to operate more effectively. We looked into this and discovered

(i) IT and Statistical outsourcing
Organizations in the public sector were found to have lower IT and statistical capabilities than those in the private sector. Public-private partnerships are essential if the government system is to operate more effectively. We looked into this and discovered that 19% (29/153) of the organizations used to work with other IT companies to meet their data solution demands.
(j) Liaison with other departments on data Liaison between various public sector OUs may assist them in improving their functionality and overcoming the challenges of contemporary data processing requirements. In this study, we discovered that 19.3% (29/150) of OUs work together with other statistical and non-statistical organizations to process data, collect data, compile it, store it, analyze it, write reports, and disseminate it. Most of the collaboration among different organizations was found with the statistical bodies.
(k) Reasons for lacking big data use To identify potential causes of the decreased use of contemporary data sources, the IT staff of the participating public sector OUs were given a list of thirteen reasons, one of which was left open-ended, to rank on a five-point Likert scale (strongly disagree = 1 to strongly agree = 5). Figure 6 shows a stacked bar graph of the ratings.
The main causes for the slow adoption of big data sources include a lack of advanced statistical and data processing abilities, a lack of research and research environments, low levels of awareness of big data, and technical stagnation. By removing these obstacles, public sector organizations may be able to use contemporary data processing methods.

Selection of Optimum Data Reduction Approach
A variety of data reduction approaches were used over the complete and dimensional capacity groups to assess the ability of public sector OUs to process official statistics (OS) and big data (BD). R software was used to compare the logistic PCA, Convex Logistic PCA, and exponential family PCA previously described by Landgraf and Lee [18]. Table 3 displays the variance explained by different methods. The explained deviation shows that convex logistic PCA effectively calculates OS and BD capacity scores. organizations was found with the statistical bodies.
(k) Reasons for lacking big data use To identify potential causes of the decreased use of contemporary data sources, the IT staff of the participating public sector OUs were given a list of thirteen reasons, one of which was left open-ended, to rank on a five-point Likert scale (strongly disagree = 1 to strongly agree = 5). Figure 6 shows a stacked bar graph of the ratings. The main causes for the slow adoption of big data sources include a lack of advanced statistical and data processing abilities, a lack of research and research environments, low levels of awareness of big data, and technical stagnation. By removing these obstacles, public sector organizations may be able to use contemporary data processing methods.

Selection of Optimum Data Reduction Approach
A variety of data reduction approaches were used over the complete and dimensional capacity groups to assess the ability of public sector OUs to process official statistics (OS) and big data (BD). R software was used to compare the logistic PCA, Convex Logistic PCA, and exponential family PCA previously described by Landgraf and Lee [18]. Table 3 displays the variance explained by different methods. The explained deviation shows that convex logistic PCA effectively calculates OS and BD capacity scores.   Based on measurements gathered for various dimensions, convex logistic PCA was used to produce the OS and BD capacity scores in this case, and Equation (1) was used to calculate RCIs. The R package "logistic PCA" was used to calculate scores for both full and partial models.

Official Statistics Relative Capacity Indicator (OSRCI)
Using convex logistic PCA scores based on reported measures, the OSRCI was designed to assess public sector organizations' competence in producing official statistics for the overall and sub-dimensions presented in Table 3. When compared to the most active organization in the particular measure, an organization's OSRCI indicates how it stacks up relative to that organization. As an illustration, Tables 4 and 5, which exhibit data from a net sample of 171 OUs, illustrate ten federal and provincial government departments with the highest OSRCI. Among all the FG public sector organizations, the State Bank of Pakistan (SBP) had the highest index score for official statistics capacity (OSRCI = 100). The Bureau of Statistics Punjab ranked first with an OSRCI of 100 among all provincial public sector organizations. The indication for these organizations' performance in the next three sub-dimensions also sits at 100, indicating that they are effectively meeting the POS standards. The RCI for Sub-D1 was 100, while it was 82.3 and 75.8 for Sub-D2 and D3, respectively, for PPARC, which had an OSRCI of 87.1. In terms of its official statistics capabilities, the ACO division of the Pakistan Bureau of Statistics (PBS) was comparably indexed as OSRCI = 64.6 compared to SBP. The fact that Sub-D1's RCI is 100 shows that this wing's methods for gathering and disseminating data are on par with those used by the SBP system. However, the RCI for Sub-D2 based on stated measures was incredibly low (RCI = 5.2), indicating that the wing's data-sharing with other departments (statistical or non-statistical) is insufficient.
The low RCI for Sub-D2 implies that the key Sub-D2 metrics of the organization require improvement or capacity growth.
As shown in Table 5, the PG OUs with low Sub-D2 ranks also have low OSRCI ratings. This includes the Department of Literacy and Non-Formal Basic Education, the Faisalabad Institute of Cardiology, the Excise Taxation Department, and the Narcotics Control Department. Both FG and PG government agencies lack Sub-D2. This suggests that OUs are not effectively sharing data with one another or with other OUs, whether those OUs are involved in statistics or not.

Big Data Processing Relative Capacity Indicator (BDRCI)
The BDRCI calculates an organizational unit's overall big data processing capacity by considering all eighteen associated metrics. The RCI for Sub-D1 refers to the 3Vs (volume, variety, and velocity) of an organization's data. Moreover, the RCI for Sub-D1 will be found within an organization that manages and produces a high volume, velocity, and variety of data. The OUs' big data literacy and comprehension level is also related to the RCI for Sub-D2. Through the RCI created under Sub-D3, the capacity of public sector organizations to work with big data sources at current or future positions is captured. Finally, the RCI obtained in Sub-D4 allows for the visualization of the organizations' big data processing capabilities.
The BDRCI demonstrates the ability of public sector organizations to manage big data sources, as well as their knowledge of big data workings and the presence of the necessary skills for processing big data sources. For the top 10 FG and PG OUs, respectively, the capacity scores derived from convex logistic PCA were utilized to build BDRCI, as shown in Tables 6 and 7. As stated in Table 6, the State Bank of Pakistan again tops the list in this instance of BDRCI, which is currently used as a yardstick to assess other FG organizations' big data processing capabilities. PPRCI, which formerly had a 100 OSRCI, is currently at a 49.5 BDRCI due to a weak position in terms of the measurements from Sub-D1, D3, and D4. Because of the poor performance found in Sub-D2 and D3, it was indexed at RCI = 58.0. According to the BDRCI for PG organizations (Table 7), the Provincial Disaster Management Authority (PDMA) has the largest big data processing capacity in the Punjab provincial government. PRP and CFM are weak in Sub-D3 with an RCI of 49.8. Tables 6 and 7 reveal that FG OUs are weak in Sub-D3 (large data workings) and PG Ous are weak in Sub-D4. Figure 7 displays the violin plot of RCI for overall capacity, official statistics capacity, big data processing capacity, and big data processing sub-capacity dimensions. This graphic presents boxplots and kernel densities for each statistic. The model's violin plot indicates that all FG and PG OUs have an average RCI of 40 and a normal kernel density. OS capacities have an RCI violin with a flat kernel density curve of 40. Big data processing capacity averages 50 with less variation. The lowest RCI for Sub-D2 OS modules indicates that data sharing between departments is limited. The Sub-D2 in the BD violin plot highlights the importance and value of big data sources to public sector organizations.  As stated in Table 6, the State Bank of Pakistan again tops the list in this instance of BDRCI, which is currently used as a yardstick to assess other FG organizations' big data processing capabilities. PPRCI, which formerly had a 100 OSRCI, is currently at a 49.5 BDRCI due to a weak position in terms of the measurements from Sub-D1, D3, and D4. Because of the poor performance found in Sub-D2 and D3, it was indexed at RCI = 58.0.
According to the BDRCI for PG organizations (Table 7), the Provincial Disaster Management Authority (PDMA) has the largest big data processing capacity in the Punjab provincial government. PRP and CFM are weak in Sub-D3 with an RCI of 49.8. Tables 6 and 7 reveal that FG OUs are weak in Sub-D3 (large data workings) and PG Ous are weak in Sub-D4. Figure 7 displays the violin plot of RCI for overall capacity, official statistics capacity, big data processing capacity, and big data processing sub-capacity dimensions. This graphic presents boxplots and kernel densities for each statistic. The model's violin plot indicates that all FG and PG OUs have an average RCI of 40 and a normal kernel density. OS capacities have an RCI violin with a flat kernel density curve of 40. Big data processing capacity averages 50 with less variation. The lowest RCI for Sub-D2 OS modules indicates that data sharing between departments is limited. The Sub-D2 in the BD violin plot highlights the importance and value of big data sources to public sector organizations.

Conclusions
The growing importance and demand for high-quality, independent official statistics provide an excellent opportunity to invest in strengthening the national statistical systems in a meaningful and long-lasting way. To fulfill the modern data requirements for effective government administration, national statistical institutes are providing an increasing range of official statistics from both conventional and contemporary sources, including the use of artificial intelligence. Additionally, it is more crucial to strengthen the NSS at the ground level and equip this system with all data producers, whether statistical or non-statistical institutions; in other words, to build the capacity of all (public sector) data producers in their grey areas. UN agencies are also working to strengthen their statistical capacity to produce and use data for better policy formulation and approaches to implement the 2030 development agenda.
In this case, a SWOT analysis of the national statistical system is required to examine the potential grey areas before beginning capacity-building programs. But there is no such scale that can be used to gauge and rank the nation's public sector organizations according to various criteria. Only the World Bank's statistical capacity index provides an overview of the statistical capacity of around 140 developing nations at the national level. In line with the macro statistics capacity indicator from the World Bank, we created a micro indicator that provides a comprehensive capacitive evaluation of the official statistics system within a nation by taking into account several dimensions related to the official statistics system and large data processing, including the use of artificial intelligence.
Pakistan launched a national census-based survey (SOS-Pak) to gather data from public sector organizations on several different metrics. Large or big digital administrative data sources exist within every third public sector organization. However, only 7% of organizations are currently utilizing these big data sources. Furthermore, it was discovered that the preferred data requirement for the public sector also includes organizational administrative data. This stems from its significant potential, applicability, and usability in governing governmental operations. Nevertheless, it has also been discovered that onethird of Pakistan's public sector organizations are worried about data disclosure controls, which account for 25% of the data that are produced or recorded remaining in storage rather than being used for national statistics. Consequently, this points to a general need for them to increase their capacity for advanced statistical and data disclosure control. In the future, about half of public sector organizations plan to work with big data sources, but on the other hand, roughly three out of every four organizations lack the IT infrastructure and human resources necessary to manage contemporary data sources. Because there is a statistical knowledge gap among current IT personnel, the statistical expertise of IT human resources has also raised concerns about the best use of digital data sources. Organizations that responded to the survey also stated their preferences for training requirements for both structured and unstructured databases. The study's findings also point to the urgent need for public sector organizations to work together on data-related projects. The utilization of big data sources and the production of quality official statistics are hindered by several issues, including insufficient awareness of big data, technological limitations, limited resources, and inadequate skills in advanced statistical and data processing techniques.