An Overview of Economics and Econometrics Related R Packages

Michelaki, Despina; Tsagris, Michail; Adam, Christos

doi:10.3390/stats8040085

Open AccessReview

An Overview of Economics and Econometrics Related R Packages

by

Despina Michelaki

,

Michail Tsagris

^* and

Christos Adam

Department of Economics, University of Crete, 74100 Rethymnon, Greece

^*

Author to whom correspondence should be addressed.

Stats 2025, 8(4), 85; https://doi.org/10.3390/stats8040085

Submission received: 8 August 2025 / Revised: 16 September 2025 / Accepted: 22 September 2025 / Published: 26 September 2025

Download

Browse Figures

Versions Notes

Abstract

This study provides a systematic overview of 207 econometrics-related R packages identified through CRAN and the Econometrics Task View. Using descriptive and inferential statistics and text mining to compute the word frequency and association among words (n-grams and correlations), we evaluate the development patterns, documentation practices, publication outcomes, and methodological scope. The findings reveal that most packages are created by small-to-mid-sized teams in Europe and North America, with mid-sized collaborations and packages including vignettes being significantly more likely to achieve journal publication. While reverse dependencies indicate strong ecosystem integration, they do not predict publication, and Bayesian or dataset-only packages remain underrepresented. Growth has accelerated since 2010, but newer packages exhibit fewer updates, raising concerns about sustainability. These findings highlight both the central role of R in contemporary econometrics and the need for broader participation, methodological diversity, and long-term maintenance.

Keywords:

economics; econometrics; R statistical software

1. Introduction

As Ragnar Frisch explained in the first editorial note of Econometrica (1933), ‘econometrics is the unification of statistics, economic theory, and mathematics’. A modern rephrasing could be, ‘Application of statistical and mathematical methods to economic data, guided by economic theory’. Each perspective is necessary but insufficient for understanding quantitative relationships in modern economic life. Econometrics aims to give empirical content to economic relationships. The three key ingredients are economic theory, economic data, and statistical methods. Neither ‘theory without measurement’ nor ‘measurement without theory’ is sufficient for explaining economic phenomena. It is, as Frisch emphasized, their union that is the key to success in the future development of econometrics [1].

Defining R precisely is not straightforward. R is both a software environment for quantitative analysis and a programming language oriented toward data science. Moreover, R evolved from the S language, originally developed at Bell Laboratories for quantitative analysis. The most appropriate description seems to be that R is an environment (actually a programming language) which is focused on the processing of data and their (statistical) analysis, along with wide visualization (graphic) possibilities. important features of the R environment are its flexibility and the possibility of constantly expanding its applications. This is most evident in the large number of additional packages available in the official repository.

Previous systematic reviews of R packages have focused on fields unrelated to the present study. One review focused on the application of exploratory factor analysis (EFA), examining the implementation and methodological aspects of this statistical technique within R [2]. Another review pertained to sports statistics and sports analytics, evaluating R packages designed for analyzing and interpreting data in the context of sports performance and related metrics [3].

Despite R’s increasing adoption in econometrics, systematic evaluation of econometrics-related R packages remains largely unexplored. This paper addresses that gap by offering a structured overview of 207 R packages relevant to economic and econometric analysis. (A list of the packages with information is provided as electronic Supplementary Material.) The objective is to classify, summarize, and analyze these packages with respect to their statistical features, metadata, and publication profiles. Moreover, we assess trends in package development, identify methodological concentrations (e.g., Bayesian vs. frequentist), and explore regional contributions to the software ecosystem. To assess development trends, documentation, and publication patterns, we utilize text mining techniques. This overview is, to the best of our knowledge, the first attempt to delineate the packages dedicated to the field of economics in general. By linking the methodological scope to documentation and dissemination practices, the study provides new empirical insights into the structure and reproducibility of econometric software, highlighting both the strengths and challenges of R’s role in modern econometric research.

The remainder of this paper is structured as follows: Section 2 outlines the methodology used to identify and select relevant R packages from CRAN and the Econometrics Task View. Section 3 presents the results of the statistical analyses, including descriptive summaries and inferential relationships. Its subsections investigate author profiles, update behaviors, documentation practices, and citation networks. Section 3.11, specifically, highlights key findings from a text mining analysis of the package descriptions. Finally, Section 4 concludes with a summary of this paper’s contributions and potential directions for future work.

2. Methods

To systematically identify econometrics-related R packages, a structured and reproducible search strategy was employed. The goal was to compile a comprehensive set of packages that are actively used in econometric analysis, ensuring relevance across multiple subfields such as time series analysis, panel data modeling, causal inference, regression methods, Bayesian econometrics, and data collection through web scraping.

The search began with a CRAN keyword query using ‘econom’ to identify packages explicitly linked to econometrics. This approach ensured the inclusion of packages whose titles, descriptions, or metadata reference econometric applications. Next, the scope was expanded to include additional relevant areas, incorporating packages that contribute to econometric modeling, statistical inference, and computational methods widely used in the discipline. Packages covering time series, panel data, causal inference, regression, Bayesian econometrics, or web scraping were included. A total of 64 packages were included in this process.

To further refine the selection, the CRAN Task View on Econometrics was reviewed. All packages listed under this category were included in the dataset, except for one package that was deemed irrelevant to econometrics. The Task View provided a curated collection of tools recommended by domain experts, ensuring that the dataset covered key methodologies and widely used econometric techniques. After this process, 145 packages were also included.

The search was conducted between 1 and 31 January 2025. A total of 209 packages were identified; however, 2 packages were excluded from the analysis. The first, lotterybr, was omitted as it contains data related to Brazilian lottery games, focusing on the exploration of their dynamics and outcomes. The second, xts, was excluded due to its primary function of facilitating the uniform handling of R’s various time-based data classes by extending the zoo package. This extension preserves native format information, enables user-level customization and extension, and enhances cross-class interoperability. Following these exclusions, a total of 207 packages remained for further analysis. This process is visually depicted in Figure 1, illustrating the systematic approach adopted for this overview.

Once the relevant packages were identified, detailed metadata were recorded for each package to facilitate subsequent analysis. The extracted metadata included package name, first release year, creator country, number of authors, number of updates, availability of vignettes, presence of built-in datasets, and references in journal and book publications. Additionally, to assess the broader impact and usage of each package, reverse dependencies (imports, suggests, enhances) were documented, along with whether the package was specifically designed for Bayesian analysis or web scraping. Lastly, the gender of the package creator was noted where possible, allowing for demographic insights into the development community.

By implementing this structured search strategy, a total of 207 R packages were identified and systematically analyzed. This methodology ensured that the selection was both comprehensive and relevant, providing a solid foundation for evaluation of the role of R in contemporary econometric research.

3. Statistical Analysis

3.1. Main Characteristics

Table 1 summarizes the main characteristics of the collected dataset. In terms of time period, most packages were produced between 2009 and 2018 (46%), with fewer being produced in recent years (2019–2024) (28%) and even fewer between 1999 and 2008 (26%). Most creators are based in Europe (56%) and North America (27%), with less representation from other continents.

Most packages were created by small teams; 56% had one or two authors, while only 4% had more than ten. Update frequency followed a similar trend: 54% of packages received 1–10 updates, while 9% underwent extensive revisions (51–216 updates). In terms of content and documentation, 68% of the items contained data, while 46% included vignettes, indicating a moderate level of user support and reproducibility. Regarding publication types, about 45% were linked to journals, while only 19% were associated with books. Thus, journal-based dissemination has a stronger connection than book-based dissemination.

Most packages (93%) were reverse-imported, i.e., suggested by or enhanced by other packages. Furthermore, 94% of the creators in the dataset were male. Only 7% of the packages were ‘dataset-only’, the same percentage used Bayesian methods, and 8% involved web scraping. Finally, 73% of packages were categorized under the CRAN Task View for Econometrics, reflecting a strong focus on econometric applications.

3.2. Journal Publication and Author Group

Of the 94 published packages, 46 (48.9%) were created by teams of three to ten authors, compared to only 36 (31.9%) of the 113 unpublished packages, as indicated in Table 2. Packages created by smaller groups of one or two authors were more likely to remain unpublished (73, 64.6%) than to be published (44, 46.8%). Only 8 packages, evenly divided between published and unpublished, were created by large teams of 11–26 authors.

The

χ^{2}

test of independence was used to assess the relationship between author group size and publication status. The test indicated a significant association (p-value = 0.035) between team size and the likelihood of journal publication. Notably, mid-sized collaborations (3–10 authors) were more likely to achieve academic publication than smaller or larger groups.

This suggests that group size reflects not only the scope of collaboration but also the professional networks and institutional support that facilitate publication in peer-reviewed journals. In the context of computational econometrics, collaborative teams may benefit from the division of labor in package development, increased methodological rigor, and improved documentation—factors that align with academic standards emphasizing transparency, replicability, and the collective vetting of analytical tools [4,5]. Thus, author team composition may serve as an indirect indicator of both the scholarly orientation and the publication potential of econometrics-related R packages.

3.3. Vignettes and Journal Publication

An analysis of vignette inclusion and journal publication revealed notable differences among the econometrics R packages. As shown in Table 3, of the 94 packages with a peer-reviewed journal publication, 59 (62.8%) included a vignette. By contrast, only 37 of the 113 unpublished packages (32.7%) included a vignette. This suggests that vignettes—extended documentation or tutorials that improve reproducibility, usability, and adoption—are more common in packages with academic support [6].

The

χ^{2}

test revealed a significant association between publication status and vignette inclusion (p-value

< 0.001

). This suggests that academic standards, which emphasize transparency and reproducibility in quantitative research, as well as sound software engineering practices, encourage the creation of vignettes [4,5].

In econometrics, vignettes signal methodological robustness, as reproducibility of empirical models and procedures is essential for both validation and extension. Packages with vignettes are therefore more likely to be published, because they align with the evolving norms in computational social sciences, which value openness and replicability [7].

3.4. Reverse Imports/Suggests/Enhances and Journal Publication

Of the 94 published packages, 90 (95.7%) had reverse citations, while only 4 (4.3%) did not (Table 4). Similarly, 103 of the 113 unpublished packages (91.2%) had at least one reverse reference, while 10 (8.8%) had none. Although published packages showed a slightly higher proportion of reverse citations, the difference was negligible.

No significant association was found between being referenced by other packages and journal publication (p-value = 0.300). This indicates that network-based popularity or reuse, measured through reverse dependencies, does not predict whether an econometrics R package is published in an academic journal. This may suggest that scholarly publication and community participation follow different logics; one is academically focused on peer review, and the other is practically driven by utility and reuse within the R ecosystem [8]. Although reverse dependencies often indicate a package’s importance within a software ecosystem, their lack of association with publication suggests differing incentives and audiences in academic and developer communities.

3.5. Reverse Imports/Suggests/Enhances and Updates

The ecosystem of econometrics-related packages shows a notable pattern of interconnectedness: packages with reverse imports or suggestions tend to form new dependencies and undergo more frequent updates. As shown in Table 5, of the 195 packages with at least one reverse dependency, 100 (51.3%) had 1–10 updates, 75 (38.5%) had 11–50 updates, and 18 (9.2%) had 51–216 updates. In contrast, among the 14 packages without reverse dependencies, 12 (85.7%) had only 1–10 updates, and none had more than 50.

A significant association was found (p-value = 0.045), indicating that update behavior is related to a package’s degree of integration within the R system. This suggests that packages with reverse dependencies are updated more frequently, likely because they face higher user demand, greater visibility, or pressure to maintain backward compatibility. In econometric software, where reliable computation is essential, closely connected packages often serve as infrastructure for other tools. The tendency to update these packages more quickly aligns with software engineering best practices and reflects community dynamics where release schedules are shaped by user feedback and inter-package dependencies [4,5,9].

3.6. Continent of Creator and Year of Creation

Temporal and geographic patterns reveal how contributions to econometrics R packages have evolved over time. According to Table 6, the first packages (1999–2008) were primarily developed in Europe (29) and North America (19), with minimal contributions from other continents. Between 2009 and 2018, European contributions rose sharply (55), and geographic diversity expanded, with notable input from Asia (7), Oceania (4), and South America (8). In 2019–2024, Europe continued to dominate (33), while contributions from other continents remained comparatively low.

No significant association was found between release period and geographic origin (p-value = 0.322). This suggests that, although Europe has remained the dominant contributor, the relative proportions across regions have not changed significantly over time. This reveals the gradual globalizing effect on open-source development, especially in more technical disciplines such as econometrics. While Europe and North America remain centers for statistical software production, increasing contributions from Asia and South America show a growing spirit of inclusion in the global organization of the R ecosystem. As open-source avenues drop barriers to participation, the geographic diversity of software development may increase even more with the general wave toward democratization in scientific computing.

3.7. Year of Creation and Updates

Analysis of the relationship between release period and update frequency revealed a notable pattern in software maintenance over time. As shown in Table 7, packages released between 1999 and 2008 received more updates, with 36 (58.1%) in the moderate range (11–50) and 14 (22.6%) in the high range (51–216). By contrast, most packages from 2009–2018 (n = 64, 66.7%) had only 1–10 updates, with fewer showing moderate (28, 29.2%) or high (4, 4.2%) numbers of revisions. Recent packages (2019–2024) showed an even stronger tendency toward infrequent updates: 44 of 57 (77.2%) had only minimal maintenance, and none had more than 50 updates.

There is a strong association between package age and maintenance intensity (p-value

< 0.001

). Older packages tend to receive more consistent long-term updates, likely due to established user bases, integration into workflows, and sustained feedback. This trend reflects they general principles of scientific software lifecycles, where long-term maintenance, community effort, and adaptability determine longevity and impact [10,11]. By contrast, newer packages may be in early adoption phases or designed for niche applications, and thus show limited post-release activity. This raises questions about the long-term sustainability of newer software projects and the institutional incentives for maintaining research code [12,13].

3.8. Continent of Creator and Bayesian Analysis

The relationship between the inclusion of Bayesian analysis features and the geographic location of package authors was examined, and only slight regional differences were found. According to Table 8, Bayesian techniques appeared in only 15 packages across all regions. In North America, 3 of 55 packages (5.5%) included Bayesian methods, compared to 10 of 117 in Europe (8.5%). Asia, Oceania, and South America contributed at most one package each with Bayesian features.

The regional disparities were not statistically significant (p-value = 0.845), indicating no relationship between the inclusion of Bayesian features and the creator’s region. This suggests that, while Bayesian analysis remains marginal in econometric R software, it is not concentrated in any particular region.

The slow adoption of Bayesian methods in econometrics may reflect a persistent disciplinary bias toward the frequentist approach or perceptions of Bayesian inference as complex and computationally demanding [14,15]. However, as computational resources expand and Bayesian methods become more common in empirical studies, future research could revisit this issue to track regional adoption trends.

3.9. Dataset Availability and Journal Publication

Whether dataset-only econometrics R packages, those without analytical functions, or methodological implementations were more or less likely to be published in academic journals was assessed using a contingency table. As shown in Table 9, of the 14 dataset-only packages, only 2 (14.3%) were published, while 12 (85.7%) were not. In contrast, of the 193 packages with analytical functions or models, 92 (47.7%) had associated journal publications.

A statistically significant association was found between dataset-only status and publication (p-value = 0.032). These results indicate that dataset-only packages are significantly less likely to be published in peer-reviewed journals. This reflects academic norms in econometrics, where methodological innovation, algorithmic development, or model implementation is typically required for publication [16,17]. While dataset packages may serve important pedagogical or empirical roles, they often do not meet the innovation threshold required by scholarly journals. This aligns with broader trends in computational social science, where publication incentives prioritize methodological contributions over data curation, despite the recognized value of open data [18].

3.10. Growth of Packages

Figure 2 presents two cumulative line plots that collectively illustrate the temporal and geographic evolution of econometrics-related packages from 2000 to 2025. These visualizations provide insight into both the overall growth of the field and the global distribution of its development. This pattern mirrors broader trends in international scientific collaboration and knowledge production, where once-dominant regions are being increasingly joined by emerging contributors across the Global South [19,20].

Figure 2a presents the cumulative number of packages created over time. The horizontal axis provides information about the year of creation while the vertical axis provides the cumulative count. The curve is approximately exponential: growth began in the early 2000s, accelerated after 2010, and continued steeply into the 2020s. This indicates that the ecosystem of econometric tools in R is maturing and rapidly expanding. Also, there is a growing dependence on R as a key computational platform in academic and applied econometrics [21,22]. The sharp growth demonstrates how R has become an essential platform for the distribution of econometric tools within the open-source community.

Figure 2b disaggregates the cumulative count by continent—Asia, Europe, North America, Oceania, and South America—revealing important regional dynamics in the development of econometrics packages. North America and Europe dominate the landscape, contributing the largest number of packages and showing strong growth, particularly in the post-2010 period. Asia demonstrates a growing presence, with a steady rise in contributions pointing to expanding research and development capacity in econometrics. Although Oceania and South America contributed fewer packages overall, both regions show steady growth, reflecting the global diffusion of econometric expertise and R development.

Together, the two panels suggest that econometric practice is becoming globalized through open-source development. This development is led by a few dominant regions, but is becoming increasingly inclusive over time. The steep overall trajectory in the left panel is primarily driven by North America and Europe, as shown in the right panel, but the emergence of contributions from Asia and other regions signifies a broadening base of participation.

This combination implies two key outcomes:

Democratization of Econometric Tools: The open-source nature of R enables researchers and practitioners worldwide to contribute and access sophisticated econometric methods, reducing traditional barriers associated with proprietary software [23,24].

Decentralization of Innovation: While historical hubs remain influential, the growth in contributions from underrepresented regions reflects a shift toward a more decentralized model of methodological innovation and software dissemination in the econometrics community [19].

Thus, Figure 2 illustrates not only the growth of R packages but also the changing geography of knowledge production in quantitative economics, enabled by the open-source ethos and collaborative structures of the R ecosystem.

3.11. Text Analysis of Descriptions

Textual analysis of the descriptions of these 207 R packages was performed. These descriptions underwent pre-processing, where text was transformed to lowercase, and punctuation, stops, non-informative words, hyperlinks, numbers, and references were removed. The lemmatization of these words was performed using the TreeTagger tool [25,26] via the package [27]. The purpose of lemmatization is to transform the words to their base or dictionary form and to produce valid words that facilitate the organization and analysis of the text. The lemmatization for this TreeTagger software version is well-used and trained on the Decision Trees model. Then, wordclouds of both the most frequent words and phrases are produced [28]. The extraction of the most frequent phrases was achieved utilizing the phm R package [29]. The phm method is based on inputting a corpus of text, block splitting according to punctuation, and extracting and sorting unique n-grams of words, given words to avoid starting/ending with (like stop words) and phrases to be excluded, a minimum threshold frequency, and the avoidance of overlapping for each phrase. Thus, a series of unique most frequent phrases is the output of this phm tool.

The wordclouds produced from the above procedures are illustrated in Figure 3, and the frequencies of the most frequent 20 words and phrases are shown in Table 10 and Table 11, respectively. According to Figure 3a and Table 10, the main purposes of the packages found at the word level are estimation and fitting (estimate, likelihood, fit), regression models (linear, probit, logit), inference and testing (test, statistic, inference), and data management (data, variable, testing). Regarding Figure 3b and Table 11, the main categories at phrase level are time series (time series, impulse response), classical estimation (maximum likelihood estimation, least square, two step), linear and generalized regression (linear regression, generalized linear, beta regression, quantile regression), panel analysis (panel data, fixed effect, random effect), causal inference and policy evaluation (instrumental variable, synthetic control, treatment effect), and system/simultaneous equations (discrete choice, multinomial probit).

4. Conclusions

This overview provides a comprehensive analysis of 207 econometrics-related R packages, offering new insights into the structure, development patterns, and dissemination of econometric tools within the R ecosystem. By integrating descriptive statistics, inferential analyses, and text mining techniques, this study contributes to a clearer understanding of the current state of computational econometrics in R.

The results show that most packages are developed by small-to-mid-sized teams in Europe and North America, with minimal but growing contributions from Asia and South America. Mid-sized teams (3–10 authors) have the highest likelihood of journal publication. Bayesian methods and dataset-only packages remain underrepresented, and newer packages receive fewer updates, raising sustainability concerns.

Several key findings have emerged: First, the majority of packages are created by small-to-medium-sized teams, with those developed by mid-sized groups being more likely to appear in academic journals. Although most packages are widely interconnected through reverse dependencies, this does not predict publication. Second, packages with thorough documentation—especially those including vignettes—are significantly more likely to be published, underscoring the role of reproducibility and usability in scholarly dissemination. Third, although the presence of reverse dependencies is high across packages, this network connectivity does not predict publication, suggesting divergent academic and practical incentives. Moreover, while most packages originate from Europe and North America, there is a growing, though still modest, contribution from other regions, indicating an ongoing globalization of econometric software development.

The text mining analysis further revealed that R packages cover a broad range of econometric techniques, including time series, panel data, causal inference, and estimation procedures, confirming the versatility and methodological richness of the R ecosystem. However, Bayesian methods and dataset-only packages remain underrepresented, and recent packages tend to be updated less frequently, raising questions about sustainability and support for newer tools. Overall, the findings imply that, while R has become central to econometric research, its ecosystem reflects both strong regional concentration and methodological biases, with broader participation and long-term maintenance emerging as key challenges for future development.

Future research can build on this work in several ways: First, a deeper functional benchmarking of package performance beyond metadata and description analysis would enhance practical decision making for users. Second, more attention could be given to the role of community dynamics and user feedback in shaping the development of packages. Finally, extending the overview to include GitHub-hosted packages and those not yet available on CRAN would provide a fuller picture of innovation and experimentation in computational econometrics.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/stats8040085/s1.

Author Contributions

Conceptualization: M.T.; methodology: D.M.; formal analysis: D.M. and C.A.; investigation: D.M.; validation: M.T. writing—original draft preparation: D.M. writing—review and editing: D.M., C.A. and M.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Frisch, R. Editorial. Econometrica 1933, 1, 1–4. [Google Scholar]
Govindasamy, P.; Isa, N.J.M.; Mohamed, N.F.; Noor, A.M.; Ma, L.; Olmos, A.; Green, K. A systematic review of exploratory factor analysis packages in R software. Wiley Interdiscip. Rev. Comput. Stat. 2024, 16, e1630. [Google Scholar]
Casals, M.; Fernández, J.; Martínez, V.; Lopez, M.; Langohr, K.; Cortés, J. A systematic review of sport-related packages within the R CRAN repository. Int. J. Sport. Sci. Coach. 2023, 18, 621–629. [Google Scholar] [CrossRef]
Gentzkow, M.; Shapiro, J.M. Code and Data for the Social Sciences: A Practitioner’s Guide. J. Econ. Perspect. 2014, 28, 191–206. [Google Scholar]
Marwick, B.; Boettiger, C.; Mullen, L. Packaging Data Analytical Work Reproducibly Using R (and Friends). Am. Stat. 2018, 72, 80–88. [Google Scholar] [CrossRef]
Stodden, V. Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science. Comput. Sci. Eng. 2010, 12, 8–12. [Google Scholar]
Peng, R.D. Reproducible Research in Computational Science. Science 2011, 334, 1226–1227. [Google Scholar] [CrossRef] [PubMed]
Eghbal, N. Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure; Ford Foundation: New York, NY, USA, 2016. [Google Scholar]
Bartoń, K. Package ‘MuMIn’: Multi-Model Inference. R Package Version 1.47.1. 2023. Available online: https://cran.r-project.org/web/packages/MuMIn/ (accessed on 7 August 2025).
Hinsen, K. Dealing with software collapse in computational science: The need for software engineering education and training. Comput. Sci. Eng. 2019, 21, 104–109. [Google Scholar] [CrossRef]
Wilson, G.; Aruliah, D.A.; Brown, C.T.; Hong, N.P.C.; Davis, M.; Guy, R.T.; Haddock, S.H.D.; Huff, K.D.; Mitchell, I.M.; Plumbley, M.D.; et al. Best Practices for Scientific Computing. PLoS Biol. 2014, 12, e1001745. [Google Scholar] [CrossRef] [PubMed]
Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 2016, 533, 452–454. [Google Scholar] [CrossRef] [PubMed]
Katz, D.S.; Hong, N.P.C.; Howison, J.; Löffler, F.; Hwang, L.; Crick, T.; Turk, M. Recognizing the Value of Software: A Software Citation Guide. F1000Research 2021, 9, 1257. [Google Scholar] [CrossRef] [PubMed]
Koop, G. Bayesian Econometrics; Wiley: Hoboken, NJ, USA, 2003. [Google Scholar]
Rossi, P.E.; Allenby, G.M.; McCulloch, R. Bayesian Statistics and Marketing; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
Zeileis, A.; Koenker, R. Econometrics in R: Past, present, and future. J. Stat. Softw. 2009, 27, 1–5. [Google Scholar] [CrossRef]
Leamer, E.E. Let’s Take the Con out of Econometrics. Am. Econ. Rev. 1983, 73, 31–43. [Google Scholar]
Tenopir, C.; Allard, S.; Douglass, K.; Aydinoglu, A.; Wu, L.; Read, E.; Manoff, M.; Frame, M. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 2011, 6, e21101. [Google Scholar] [CrossRef] [PubMed]
Chen, K.; Zhang, Y.; Fu, X. International research collaboration: An emerging domain of innovation studies? Res. Policy 2019, 48, 149–168. [Google Scholar] [CrossRef]
Wagner, C.S.; Leydesdorff, L. Network structure, self-organization, and the growth of international collaboration in science. Res. Policy 2005, 34, 1608–1618. [Google Scholar] [CrossRef]
Croissant, Y.; Millo, G. Panel data econometrics in R: The plm package. J. Stat. Softw. 2008, 27, 1–43. [Google Scholar]
Zeileis, A.; Kleiber, C.; Jackman, S. Regression models for count data in R. J. Stat. Softw. 2008, 27, 1–25. [Google Scholar] [CrossRef]
Stallman, R.M. Free Software, Free Society: Selected Essays of Richard M. Stallman; GNU Press: Boston, MA, USA, 2002. [Google Scholar]
Von Hippel, E. Democratizing Innovation; MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
Schmid, H. Probabilistic part-of-speech tagging using decision trees. In New Methods in Language Processing; Routledge: Abingdon, UK, 1994; Volume 12, pp. 1–9. [Google Scholar]
Schmid, H. Improvements in Part-of-Speech Tagging with an Application to German; Springer: Dordrecht, The Netherlands, 1999; pp. 13–25. [Google Scholar]
Michalke, M. koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity. R Package Version 0.13-8. 2021. Available online: https://cran.r-project.org/web/packages/koRpus/index.html (accessed on 7 August 2025).
Fellows, I. wordcloud: Word Clouds. R Package Version 2.6. 2018. Available online: https://cran.r-project.org/web/packages/wordcloud/index.html (accessed on 7 August 2025).
Small, E.; Cabrera, J. Principal phrase mining: An automated method for extracting meaningful phrases from text. Int. J. Comput. Appl. 2025, 47, 84–92. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the selection of the R packages for the overview.

Figure 2. Cumulative growth of R packages.

Figure 3. Most frequent words and phrases in R package descriptions.

Table 1. Description of the collected variables.

Variable	Category	Percentage
Year of creation	1999–2008	26%
	2009–2018	46%
	2019–2024	28%
Continent of creator	Europe	56%
	North America	27%
	South America	5%
	Asia	7%
	Oceania	5%
Number of authors	1–2	56%
	3–10	40%
	11–26	4%
Number of updates	1–10	54%
	11–50	37%
	51–216	9%
Contains data	Yes	68%
	No	32%
Contains vignette	Yes	46%
	No	54%
Journal publication	Yes	45%
	No	55%
Book publication	Yes	19%
	No	81%
Reverse imports/suggests/enhances	Yes	93%
	No	7%
Gender of the creator	Male	94%
	Female	6%
Datasets only	Yes	7%
	No	93%
Bayesian analysis	Yes	7%
	No	93%
Web scraping	Yes	8%
	No	92%
CTV Econometrics	Yes	73%
	No	27%

Table 2. Journal publication by author group.

	Journal Publication
Author Group	No Publication	Publication
1–2	73	44
3–10	36	46
11–26	4	4

Table 3. Vignette presence by journal publication.

	Vignette
Journal Publication	No	Yes
No	76	37
Yes	35	59

Table 4. Reverse imports/suggests/enhances by journal publication.

	Reverse Imports/Suggests/Enhances
Journal Publication	No	Yes
No	10	103
Yes	4	90

Table 5. Updates by reverse imports/suggests/enhances.

	Updates
Reverse Imports	1–10	11–50	51–216
No	12	2	0
Yes	100	75	18

Table 6. Continent of creator by year of creation.

	Continent of Creator
First Year	Asia	Europe	North America	Oceania	South America
1999–2008	2	29	19	4	0
2009–2018	7	55	22	4	8
2019–2024	5	33	14	3	2

Table 7. Updates by year of creation.

	Number of Updates
Year of Creation	1–10	11–50	51–216
1999–2008	4	36	14
2009–2018	64	28	4
2019–2024	44	13	0

Table 8. Bayesian analysis by continent of creator.

	Bayesian Analysis
Creator Continent	No	Yes
Europe	107	10
North America	52	3
South America	10	0
Asia	13	1
Oceania	10	1

Table 9. Journal publication by dataset availability.

	Journal Publication
Datasets Only	No	Yes
No	101	92
Yes	12	2

Table 10. Twenty most frequent words.

	Word	Frequency
1	data	100
2	estimation	83
3	regression	78
4	linear	65
5	effect	64
6	test	57
7	base	50
8	variable	50
9	time	45
10	fit	43
11	economic	42
12	estimate	41
13	generalize	36
14	panel	36
15	spatial	30
16	two	30
17	choice	29
18	design	29
19	estimator	29
20	series	29

Table 11. Twenty most frequent phrases.

	Phrase	Frequency
1	time series	21
2	maximum likelihood estimation	16
3	fixed effect	10
4	instrumental variable	10
5	confidence interval	8
6	generalized linear	7
7	linear regression	7
8	panel data	7
9	synthetic control	7
10	cross sectional	6
11	data set	6
12	beta regression	5
13	call sur	5
14	data drive	5
15	generalized additive	5
16	least square	5
17	sample selection	5
18	treatment effect	5
19	two step	5
20	average treatment effect	4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Michelaki, D.; Tsagris, M.; Adam, C. An Overview of Economics and Econometrics Related R Packages. Stats 2025, 8, 85. https://doi.org/10.3390/stats8040085

AMA Style

Michelaki D, Tsagris M, Adam C. An Overview of Economics and Econometrics Related R Packages. Stats. 2025; 8(4):85. https://doi.org/10.3390/stats8040085

Chicago/Turabian Style

Michelaki, Despina, Michail Tsagris, and Christos Adam. 2025. "An Overview of Economics and Econometrics Related R Packages" Stats 8, no. 4: 85. https://doi.org/10.3390/stats8040085

APA Style

Michelaki, D., Tsagris, M., & Adam, C. (2025). An Overview of Economics and Econometrics Related R Packages. Stats, 8(4), 85. https://doi.org/10.3390/stats8040085

Article Menu

An Overview of Economics and Econometrics Related R Packages

Abstract

1. Introduction

2. Methods

3. Statistical Analysis

3.1. Main Characteristics

3.2. Journal Publication and Author Group

3.3. Vignettes and Journal Publication

3.4. Reverse Imports/Suggests/Enhances and Journal Publication

3.5. Reverse Imports/Suggests/Enhances and Updates

3.6. Continent of Creator and Year of Creation

3.7. Year of Creation and Updates

3.8. Continent of Creator and Bayesian Analysis

3.9. Dataset Availability and Journal Publication

3.10. Growth of Packages

3.11. Text Analysis of Descriptions

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI