# Evaluating Research Trends from Journal Paper Metadata, Considering the Research Publication Latency

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- A novel methodology that includes the new nsaMK method to identify term trends from metadata from journal paper, when considering the inherent time lag between the research completion and paper publication date;
- A definition of the research publication latency and an empirical formula to derive the number of prediction steps considered by the proposed method to countermeasure the effect of the journal review and publication process upon the research trend evaluation;
- An evaluation of the new nsaMK method in an electronic design automation case study by comparing it with the classical MK trend test. The superiority of nsaMK is confirmed by a 45% reduction of the mean square error of Sen’s slope evaluations and by an increase of correct term trend indications with 66%.

## 2. Preliminaries

#### 2.1. Time-Series ARIMA Model Prediction. Auto-ARIMA Method

#### 2.2. Mann–Kendall Trend Test with Sen’s Slope Estimator

## 3. Proposed Research Term Trend Evaluation

#### 3.1. Proposed Methodology

**Definition**

**1.**

**Definition**

**2.**

- Phase I: identify the number of steps N to be predicted. This number depends on the research publication latency and on the moment in time for which the research trends are computed and can be obtained using the following formula:$$N=\lfloor {t}_{RPL}+\tau \rceil \phantom{\rule{2.em}{0ex}}\phantom{\rule{2.em}{0ex}}\left[\mathrm{years}\right],$$
- Phase II: form the annual time series for a specified key term by computing the number of its occurrences in paper metadata (i.e., title, keywords and abstract) during each year. For this, the following procedure can be used: each paper’s metadata are automatically or manually collected; the titles, keywords, and abstracts are concatenated into a text document, which is fed into an entity-linking procedure (e.g., TagMe [28], AIDA [29], Wikipedia Miner [30]), to obtain the list of terms that characterizes the paper; and, count the number of papers per each year where the key term occurs.
- Phase III: apply the proposed n-steps-ahead Mann–Kendall procedure for the annual time series containing the occurrences of the specified key term.

#### 3.2. N-Steps-Ahead Mann–Kendall Method

## 4. Experimental Results

#### 4.1. Data Acquisition and Preprocessing

#### 4.2. nsaMK Method Evaluation

- Using the Yue and Wang variant of MK test for journal papers published between 2011 and 2020 (MK2020). The results of the MK2020 are considered as ground-truth.
- Using our nsaMK method when considering journal papers published between 2010 and 2019 and the predicted values for 2020 (nsaMK2020).
- Using the Yue and Wang variant of MK test for journal papers published between 2010 and 2019 (MK2019).

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

Rank | Term | ndf | Rank | Term | ndf |
---|---|---|---|---|---|

1. | integrated circuit | $0.211$ | 13. | neural network | $0.064$ |

2. | optimization | $0.163$ | 14. | low power | $0.059$ |

3. | computer architecture | $0.136$ | 15. | hybrid | $0.055$ |

4. | algorithm | $0.130$ | 16. | system on chip | $0.055$ |

5. | logic gates | $0.125$ | 17. | mathematical model | $0.053$ |

6. | computational modeling | $0.121$ | 18. | power | $0.044$ |

7. | latency | $0.094$ | 19. | convolutional neural network | $0.044$ |

8. | fpga | $0.090$ | 20. | logic | $0.044$ |

9. | task analysis | $0.084$ | 21. | memory management | $0.044$ |

10. | energy efficiency | $0.073$ | 22. | real time systems | $0.042$ |

11. | machine learning | $0.071$ | 23. | cmos | $0.042$ |

12. | ram | $0.067$ | 24. | nonvolatile memory | $0.041$ |

Term | nsaMK2020 | MK2019 | MK2020—Ground Truth | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

z | p-Value | Slope | z | p-Value | Slope | z | p-Value | Slope | |||

1 | integrated circuit | −5.057 | $4.2\times {10}^{-7}$ | −0.008 | −2.715 | 0.00662 | −0.005 | −4.809 | $1.5\times {10}^{-6}$ | −0.008 | ✓ |

2 | optimization | 8.494 | 0 | 0.005 | 5.964 | $2.4\times {10}^{-9}$ | 0.005 | 10.738 | 0 | 0.006 | ✓ |

3 | computer architecture | 4.856 | $1.2\times {10}^{-6}$ | 0.009 | 4.059 | $4.9\times {10}^{-5}$ | 0.009 | 6.828 | $8.6\times {10}^{-12}$ | 0.012 | ✓ |

4 | algorithm | 0 | 1 | −0.000 | −2.047 | 0.04060 | −0.002 | 1.061 | 0.28868 | 0.001 | ✓ |

5 | logic gates | 0.597 | 0.54995 | 0.001 | 1.257 | 0.20871 | 0.002 | 0.300 | 0.76357 | 0.000 | ✓ |

6 | computational modeling | 0.590 | 0.55509 | 0.001 | −0.522 | 0.60141 | −0.000 | 1.714 | 0.08649 | 0.001 | ✓ |

7 | latency | 1.459 | 0.14438 | 0.001 | 2.027 | 0.04261 | 0.001 | 4.459 | $8.2\times {10}^{-6}$ | 0.004 | ✓ |

8 | fpga | 3.910 | $9.2\times {10}^{-5}$ | 0.006 | 1.842 | 0.06533 | 0.003 | 3.571 | 0.00035 | 0.008 | ✓ |

9 | task analysis | 1.862 | 0.06250 | 0 | 1.816 | 0.06932 | 0 | 2.031 | 0.04216 | 0.001 | ✓ |

10 | energy efficiency | 5.235 | $1.6\times {10}^{-7}$ | 0.007 | 4.005 | $6.1\times {10}^{-5}$ | 0.004 | 4.894 | $9.8\times {10}^{-7}$ | 0.007 | ✓ |

11 | machine learning | 5.856 | $4.7\times {10}^{-9}$ | 0.003 | 4.512 | $6.3\times {10}^{-6}$ | 0.003 | 5.025 | $5.0\times {10}^{-7}$ | 0.005 | ✓ |

12 | ram | 4.346 | $1.3\times {10}^{-5}$ | 0.002 | 4.232 | $2.3\times {10}^{-5}$ | 0.003 | 5.118 | $3.0\times {10}^{-7}$ | 0.004 | |

13 | neural network | 1.938 | 0.05250 | 0.002 | 0.907 | 0.36428 | 0 | 2.892 | 0.00382 | 0.004 | ✓ |

14 | low power | 0 | 1 | 0.000 | 1.712 | 0.08684 | 0.001 | 1.910 | 0.05607 | 0.001 | |

15 | hybrid | 3.298 | 0.00097 | 0.003 | 2.580 | 0.00987 | 0.002 | 4.213 | $2.5\times {10}^{-5}$ | 0.004 | ✓ |

16 | system on chip | 3.637 | 0.00027 | 0.003 | 6.111 | $9.8\times {10}^{-10}$ | 0.004 | 3.694 | 0.00022 | 0.003 | ✓ |

17 | mathematical model | −5.016 | $5.2\times {10}^{-7}$ | −0.005 | −0.924 | 0.35519 | −0.004 | −3.686 | 0.00022 | −0.004 | |

18 | power | −4.734 | $2.1\times {10}^{-6}$ | −0.002 | −3.678 | 0.00023 | −0.001 | −2.532 | 0.01133 | −0.001 | |

19 | convolutional neural network | 3.842 | 0.00012 | 0.001 | 3.275 | 0.00105 | 0.001 | 3.361 | 0.00077 | 0.002 | ✓ |

20 | logic | 0.497 | 0.61901 | 0.000 | −0.685 | 0.49291 | −0.001 | 2.419 | 0.01553 | 0.001 | ✓ |

21 | memory management | 5.262 | $1.4\times {10}^{-7}$ | 0.003 | 5.672 | $1.4\times {10}^{-8}$ | 0.003 | 9.513 | 0 | 0.004 | ✓ |

22 | real time systems | 0 | 1 | 0 | 2.307 | 0.02102 | 0.002 | 1.859 | 0.06289 | 0.003 | |

23 | cmos | 7.518 | $5.5\times {10}^{-14}$ | 0.003 | 7.646 | $2.0\times {10}^{-14}$ | 0.004 | 6.133 | $8.6\times {10}^{-10}$ | 0.002 | ✓ |

24 | nonvolatile memory | 2.058 | 0.03958 | 0.001 | 2.701 | 0.00690 | 0.002 | 5.340 | $9.2\times {10}^{-8}$ | 0.003 |

