Research Trends, Hotspots and Frontiers of Ozone Pollution from 1996 to 2021: A Review Based on a Bibliometric Visualization Analysis
Round 1
Reviewer 1 Report (Previous Reviewer 3)
I greatly appreciated this version of the manuscript. It now reflects with precision the real path of the study of atmospheric zone, giving to the reader the opportunity to undestrand the evolution and the reasos of its actuality.
The structure of the paper is very well organised, and the lecture is enjoyable
Author Response
Thank you a lot!
Author Response File: Author Response.pdf
Reviewer 2 Report (Previous Reviewer 1)
This revised manuscript is a significant improvement over the previous version, though it still needs additional work prior to publication. The authors addressed most of my comments or gave reasonable arguments as to why they chose not to make changes, with one exception that is noted below.
The exception is that this manuscript continues to include figures that are not of publication quality because important information is unreadable or because they contain unnecessary and distracting information because that is what their software outputs. The authors state in response to this criticism in my previous review that they cannot be changed because this is what the software outputs. This is not an acceptable excuse for using figures that are not publication quality, so these figures will need to be re-worked prior to publication, either by using appropriate image editing software or using different software to prepare the figures. In addition, there are a editorial errors that need to be corrected.
Specific problems or suggestions are discussed below.
Undefined term:
I do not understand what "silhouette" means and where it is defined. It should be defined when it is first mentioned.
Minor Errors:
Figure 1 contains an error in the caption for the "y" axis. It should be "number of papers", not "Canada".
On line 159 it is stated that the most cited papers listed in Table 1 are all related to the United States. However, although most of the entries in this table give "USA" in the "country" column, two of them do not, including the highest ranked paper.
Figure problems:
Figure 2: The labels on the "X" axis in the plots need to have larger font so they are more readable. The font size for the type of plot that is given below each plot also needs to be increased.
Figures 3 and 6: The size of the three networks shown on Figure 3 should be increased so they are at least as large as that shown on Figure 6. Both figures need a legend showing the colors of the lines and what the circles mean. Although stated in the text that the line colors indicate the year according to their "warmth", not everyone knows that that means in this context, and in any case it is best that figures have all information about colors and symbols in the legend or caption. The software information on the figures is unreadable because of the small font, but is unnecessary and distracting and should be removed.
Figure 4 and 9: These figures are only useful if one can see which keywords are being analyzed. Both of these figures are unusable either because the keywords are in too small a font (with most being buried under lines) in Figure 4, and are missing from Figure 9. Also the year should be shown in a larger font and there should be a legend showing what the line colors mean, which is not obvious in this case since they link items with different years. If the authors can't give the keywords and make them readable without excessive effort, the figure should be deleted and any conclusions drawn from stated in the text, or perhaps summarize the most importance co-occurrences in tables if that is considered necessary.
Figure 5: The "year" column has "1996" in all rows and should be deleted.
Figure 7 and 8: The size of the font for the year needs to be increased. The software text should be removed because it is unnecessary, the font is unreadable, and it gets in the way.
Suggestions:
Although Table 5 is useful, it might be more useful if it a column with one or 2 words stating the subjects of the papers would be useful.
One of the things I complained about in my previous review was that the cluster identification numbers (e.g., #0, etc) differed throughout the text. The authors' response was that these are output by the software and refer to the order in terms of numbers of items in the clusters (#0 the most, #1 the next, etc), which varies with the type of cluster. However, I couldn't find that explanation in the text (though I it might be there and I didn't notice it). Including one or two sentences saying that these cluster numbers refer to the order of the clusters in terms of numbers of items would give these identification numbers more meaning to the reader.
Author Response
This revised manuscript is a significant improvement over the previous version, though it still needs additional work prior to publication. The authors addressed most of my comments or gave reasonable arguments as to why they chose not to make changes, with one exception that is noted below.
The exception is that this manuscript continues to include figures that are not of publication quality because important information is unreadable or because they contain unnecessary and distracting information because that is what their software outputs. The authors state in response to this criticism in my previous review that they cannot be changed because this is what the software outputs. This is not an acceptable excuse for using figures that are not publication quality, so these figures will need to be re-worked prior to publication, either by using appropriate image editing software or using different software to prepare the figures. In addition, there are a editorial errors that need to be corrected.
Response:Thank you for your suggestions. We have done our best to improve all the pictures (Minor Errors)
Specific problems or suggestions are discussed below.
Response: All the specific problems or suggestions are corrected and adopted
Undefined term:
I do not understand what "silhouette" means and where it is defined. It should be defined when it is first mentioned.
Response: Thanks! “Silhouette is a measure of the similarity between a node and its cluster compared with other clusters. The value range is - 1 to 1. The larger the value, the more this node matches its cluster rather than its adjacent cluster. It is an evaluation method for the clustering effect.”(lines 462-464) We have defined it in the footnote to Table 3.
Minor Errors:
Figure 1 contains an error in the caption for the "y" axis. It should be "number of papers", not "Canada".
Response: We have modified the "y" axis in Figure 1 by changing “Canada” to the “number of papers”(lines 162)
On line 159 it is stated that the most cited papers listed in Table 1 are all related to the United States. However, although most of the entries in this table give "USA" in the "country" column, two of them do not, including the highest ranked paper.
Response: Thank you. According to your suggestion, we have added the elaboration in Table 1:”Most of the most cited papers were related to the United States, which showed that the United States has a important position in the field of ozone pollution, this information can be proved in the following country cluster analysis. In addition, the most cited articles come from the Netherlands and Finland, while the tenth most cited articles come from the England.”(lines 166-170)
Figure problems:
Figure 2: The labels on the "X" axis in the plots need to have larger font so they are more readable. The font size for the type of plot that is given below each plot also needs to be increased.
Response: Thank you, in Figure 2, all fonts have been enlarged. The original 18 point font has been changed to 22 point font.(line 214).
Figures 3 and 6: The size of the three networks shown on Figure 3 should be increased so they are at least as large as that shown on Figure 6. Both figures need a legend showing the colors of the lines and what the circles mean. Although stated in the text that the line colors indicate the year according to their "warmth", not everyone knows that that means in this context, and in any case it is best that figures have all information about colors and symbols in the legend or caption. The software information on the figures is unreadable because of the small font, but is unnecessary and distracting and should be removed.
Response: Thank you for your advice We have modified Figure 3 according to your suggestion, and added corresponding legends in all figures to indicate the meaning of line colors. In addition, the size of the numbers was adjusted and the software information was deleted.(lines 261,263,268,405)
Figure 4 and 9: These figures are only useful if one can see which keywords are being analyzed. Both of these figures are unusable either because the keywords are in too small a font (with most being buried under lines) in Figure 4, and are missing from Figure 9. Also the year should be shown in a larger font and there should be a legend showing what the line colors mean, which is not obvious in this case since they link items with different years. If the authors can't give the keywords and make them readable without excessive effort, the figure should be deleted and any conclusions drawn from stated in the text, or perhaps summarize the most importance co-occurrences in tables if that is considered necessary.
Response: Thank you for your advice. Some of the cluster names in Figures 4 and 9 are hidden in order to make the pictures more standardized. The corresponding names are introduced in the description and discussion of the pictures. We think they will not interfere with the readers' reading. We have adjusted the year font size and added the legend to show the meaning of the color of the line according to your suggestion.(lines 369, 835)
Figure 5: The "year" column has "1996" in all rows and should be deleted.
Response: Thank you! Accordingly, we have deleted the "year" column in Figure 5.(lines 386)
Figure 7 and 8: The size of the font for the year needs to be increased. The software text should be removed because it is unnecessary, the font is unreadable, and it gets in the way.
Response: We have adopted your advice. Thank you! (lines 407,819).
Suggestions:
Although Table 5 is useful, it might be more useful if it a column with one or 2 words stating the subjects of the papers would be useful.
Response: We have adopted your advice. Thank you! We added a column of "Themes" in Table 5 to state the subjects of the papers.(line 769)
One of the things I complained about in my previous review was that the cluster identification numbers (e.g., #0, etc) differed throughout the text. The authors' response was that these are output by the software and refer to the order in terms of numbers of items in the clusters (#0 the most, #1 the next, etc), which varies with the type of cluster. However, I couldn't find that explanation in the text (though I it might be there and I didn't notice it). Including one or two sentences saying that these cluster numbers refer to the order of the clusters in terms of numbers of items would give these identification numbers more meaning to the reader.
Response: According to your suggestion, we have added corresponding instructions in the article. “And the previous identifier of the cluster is given according to the size of the cluster (the number of documents contained or the frequency of being mentioned), and #0 Idicates the maximum number of documents contained in the cluster”. (lines 331-334)
Author Response File: Author Response.pdf
Reviewer 3 Report (New Reviewer)
I consider that this is a very well-presented paper. I have some minor recommendations concerning the content:
0. It is not clear, whether there is or is not this kind of research in the field.
1. Scientifically, there is a set of research questions recommended for this kind of research in the literature. This paper didn't identify a research question assigned to the goal(s). From this perspective, the article is incomplete, and I believe it is possible to improve this aspect. This is just a recommendation and could be the decision of the authors to improve this methodological aspect.
2. I recommend a motivation for the decision to select only articles from WoSCC. Why didn't you analyze the Scopus database, and why not are included the articles from Proceeding of the Conferences? I expect, as a reader, a short explanation of this matter.
3. Regarding the co-occurrence analysis, there is a superficial explanation concerning this aspect. The authors could take into consideration biblio-references even from the Sustainability journal on this subject. So, it is not very clear what this co-occurrence means and why this is important from a utility point of view for this domain.
For ex., The following explanation is unclear: "Keyword co-occurrence refers to the frequency of two related keywords appearing in the same article." Please, explain very clearly what this means and refer this to a respected reference in the field. Besides, there are journals with stringent rules for establishing the list of keywords.
4. The same aspect (as point 3) for co-cited reference.
Pay attention to the reference in-text. See line 292!
5. We consider that the Conclusions can be improved in respect of regarding the field's rules.
Author Response
I consider that this is a very well-presented paper. I have some minor recommendations concerning the content:
- It is not clear, whether there is or is not this kind of research in the field.
Response: Thank you. We have explained this in the introduction.” In addition, as far as we know, no scholar has analyzed the ozone pollution research in the way of bibliometrics.”(lines 78-79)
- Scientifically, there is a set of research questions recommended for this kind of research in the literature. This paper didn't identify a research question assigned to the goal(s). From this perspective, the article is incomplete, and I believe it is possible to improve this aspect. This is just a recommendation and could be the decision of the authors to improve this methodological aspect.
Response: Thank you for your suggestions. There are indeed some review papers that discuss relevant issues, but it is difficult to find a review paper covering the whole ozone pollution field. Similar papers are usually relatively specific; We also considered these comparisons in the process of research. However, we found that the relevant review papers could not give the latest (recent months) literature hotspots. Therefore, we chose to use the bibliometric analysis of the papers in the latest months to verify the research hotspots obtained by the bibliometric analysis before 2021. This verification is more specific and intuitive.
2.I recommend a motivation for the decision to select only articles from WoSCC. Why didn't you analyze the Scopus database, and why not are included the articles from Proceeding of the Conferences? I expect, as a reader, a short explanation of this matter.
Response: Thank you for your suggestion. The reason why we chose WoSCC data instead of Scopus database is that the former contains all the bibliometric parameters required by CiteSpace software. At the same time, the relevant parameters of conference papers are usually incomplete, which may lead to errors in the visualization of bibliometric analysis. We have made necessary explanations in the data acquisition section.” We choose WoSCC data because it contains all bibliometric parameters required by CiteSpace software. At the same time, the relevant parameters of conference papers are often incomplete, which may lead to errors in the visualization of bibliometric analysis.”(lines 93-95)
- Regarding the co-occurrence analysis, there is a superficial explanation concerning this aspect. The authors could take into consideration biblio-references even from the Sustainability journal on this subject. So, it is not very clear what this co-occurrence means and why this is important from a utility point of view for this domain.
For ex., The following explanation is unclear: "Keyword co-occurrence refers to the frequency of two related keywords appearing in the same article." Please, explain very clearly what this means and refer this to a respected reference in the field. Besides, there are journals with stringent rules for establishing the list of keywords.
Response: We have added some explanations in the corresponding sections. “Co-occurrence analysis is an analysis method that quantifies co-occurrence information in various information carriers. It can reveal the content association of information and the co-occurrence relationship implied by feature items. The keyword co-occurrence analysis method uses the common occurrence of keywords in the literature set to determine the relationship between the keywords in the research field represented by the literature set. The keywords with a high frequency of occurrence mean that their status in the research field is more important. According to the centrality of different keywords in the co-occurrence analysis, hot spots and breakthroughs in the research field can also be found.”(lines 304-311)
- The same aspect (as point 3) for co-cited reference.
Response: We have added some explanations in the corresponding sections: “Co-cited references refer to two (or more) papers cited by one or more later papers at the same time, and it is said that these two papers constitute a co cited relationship. In short, the co cited references is very similar to the co-occurrence keyword analysis. The co-cited relationship of the references will change with time. The development and evolution of a discipline can be explored through the research of the co-cited references network.”(lines 390-395)
Pay attention to the reference in-text. See line 292!
Response: Thank you for your mention
5.We consider that the Conclusions can be improved in respect of regarding the field's rules.
Response: Thank you for your suggestion. We adjusted the conclusion part to better conform to the specifications of the paper and the readers' understanding.
Author Response File: Author Response.pdf
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
This paper describes a survey of publications in the filed of ground-level ozone research carried out since the mid 1990's. I found the Discussion section very interesting and informative in terms of contributions of publications in different countries and the subject matters studied and how they varied from year to year. I think this makes a contribution and could be useful to researchers in the field, especially those just starting their career. Unfortunately, the Methods section was inadequate, some important concepts were poorly defined, and some parts and figures in the Results section were difficult to understand and therefore probably not useful to the general researcher in this field. Some of the figures and tables could be improved significantly. This is discussed further below.
Research on ground-level ozone pollution did not begin in the 1990's, contrary to the statement in the first paragraph in the Introduction (Line 35). I started working in this field in the mid 1970's and ozone pollution was an active area of research since the time of Haagen-Smit in the 1960's. The authors need to justify why they restricted their literature to papers published after 1995. The Web of Science core collection website states that it has citations going back to the 1900's, so data availability can't be the reason (or is it?). There were a number of changes in trends and emphasis in ground-level ozone pollution research between the '60's and and '70's and 1996, and information about that might help put the changes after 1995 in better context. The fact that this work covers such a limited time period in the history of ozone pollution research should at least be mentioned in the "limitations" section, if not emphasized as the most important limitation.
It would have been useful to present information on what subject matters were most studied in the different countries. Some subjects would primarily be from work in a particular country (i.e., most of the studies about pollution in China are probably done by Chinese researchers), and others would be of interest in many countries. I do not remember seeing information on this.
The title should add the qualifier "Ground Level" or "Tropospheric" before "Ozone". This paper does not discuss research in stratospheric ozone (other than in the context of intrusion), which I would think would also be an active area of research. Perhaps papers in this area did not show up in their searches because stratospheric ozone isn't considered pollution, but the title should still reflect that the focus is on ground-level and not stratospheric ozone.
The Methods section just describes the software tool, and is not very informative for those not familiar to that software or cluster analysis. Information about the methods needs to be given in terms of concepts that general researchers in the air pollution field are more familiar with. This should be the intended audience for this paper.
The concepts of "betweenness", "centrality", "burst detection", "burst strength" and "co-cited" are poorly defined. What definitions they give are limited to the concepts used in cluster analysis and the software they employ, and are not familiar to general researcher in this field (or at least to me). They should give more explanation about exactly what they mean in the context of air pollution research, using examples that general researchers who are not as familiar with bibliographic data or cluster analysis as the authors are. This would make the Results section and more understandable, and better support the useful discussion.
The use of the same identifiers (e.g., #0) for different types of clusters, or for the same type of clusters in different figures, make the paper more difficult to follow. For example, #0 refers variously to "tropospheric ozone" for keywords on Figure 3, "eastern China" for "co-cited reference clusters" on Figure 6, "stratospheric intrusion" for "co-cited reference clusters" on Figure 7, "air pollution" for keywords on Figure 8, and "short term effects" for keywords on Figure 9. They need to have each type of cluster they are discussing have the same identifier. Or at least use consistent identifiers when referring to the same type of cluster.
Figure 1 is very interesting, but I don't know why they list the years backwards, contrary to what one would normally expect. The dramatic increase in publications since around 2010 was surprising to me until I realized it is likely due to the Chinese being much more active in the field starting around then. Figure 1 would have been more informative if it used different colors or plots for publications from different countries. It probably would show the dramatic increase is mainly due to China, with the other countries increasing at much more moderate levels. But it might have revealed interesting trends for the other countries besides China. I would have thought that work on ozone in the U.S. might be decreasing in recent years because of the increased emphasis on PM research, though maybe that started before 1996.
Table 1 would benefit by including a column giving the cluster IDs associated with the paper given in other tables, assuming they follow the recommendation of having unique symbols for each type of cluster. It might be useful to also indicate the country of the publication on the table.
Table 2 is poorly presented. One expects entries in each of the rows to be related, but the table is actually 5 independent tables. Having separate tables for "Country", "Institution", etc would be easier to read, and eliminate the need to force information into too-narrow columns. They probably wouldn't take up that much more journal space than the present table if formatted properly, or with some tables side-by-side yet clearly separated.
I do not understand the network visualization diagrams on Figures 2, 5, and 9 at all. What does the position of the points on the 2D plane mean, and what does it mean when some clusters overlap and others do not, with some are on opposite sides of the diagrams? Do the colors of the lines indicate the years, as they do on the other figures? (If so, this may be useful information and should be pointed out.) I do not think these figures are necessary to understand the main points they make in the Discussion section, so maybe they can be deleted without affecting the utility of this paper. But if they are to be useful, they need to be explained better and not use font that is too small to see clearly. This may take considerable journal space and may not be worth it. Also, it looks like Figure 3a is the same as Figure 3c. The figure caption says Figure 3a refers to countries, but the figure gives author names. This must be an error.
It looks like Table 3 may be another case where they have unrelated information in the columns and are actually two tables in one. They should be split into two tables, though locating them side-by-side is ok as long as they are clearly separated.
Figures 3, 6, 7, and 8 are useful, but need to be better captioned. It is stated in the text what the different colors mean (at least in some cases), but not in the figure caption. The year is given in much too small a font, and vertical lines showing the year might be useful. What do the highlighted points (and their colors) on Figure 3 mean? Are the colors on Figure 3 consistent with those used on the other figures? Is there some reason for the ordering of the categories? Is Figure 7 the same as Figure 6 except for a shorter time frame and different clusters? Most but not all of the categories in Figure 7 are the same as those in Figure 6.
Figure 4 is potentially interesting but is not well presented. There is no explanation of the colored lines under "1996-2021", but I suppose it refers to the years when the keywords were most frequently used. In that case, what is the point of the columns giving the "begin" and "end" years, since this is the same information. The column "year" is pointless since it is "1996" in each row.
Table 5 needs to include an indication of the subject matters of the listed references. Readers need to look up each citation number in the reference list for the table to be meaningful, assuming they understand what "burst strength" and what the "begin" and "end" years mean (which I do not).
The English in the paper is generally good, but there are a few places where awkward phrasing is used. It probably could benefit by review by an English language editor.
My recommendation is that they need to clarify the concepts, methods, and terms used, and improve the identifiers, tables and figures as discussed above, perhaps removing figures that may not be useful without much more extensive explanation. It would have made more contribution if it covered a greater time period, but it is still publishable limited to this period. However, the fact that ozone pollution was a very active area of research before 1996 should be emphasized.
Reviewer 2 Report
My review is attached.
Comments for author File: Comments.pdf
Reviewer 3 Report
I'm sorry to say that this manuscript is misleading, considering the issue of Ozone research and inadequately referenced. From 1996 up to 2021 a huge amount of scientists, not reported here, strongly contributed to the advancement of knowledge in the sector (i.e. ozone transport and deposition: Millan Millan, ozone effects on vegetation: Paoletti, and many others involved).
On pag. 1 line 35 appears this sentence I absolutely cannot accept "Accordingly, the research on Ozone pollution begins in the 1990s". In my opinion, the authors have maturated a very strange background on ozone: Lyons and Cole produced one of the epical studies on ozone transport within the breeze system during 70s.
I'm sure the authors had in mind to write something else and not cancel more than 40 years of research and researchers.
I strongly recommend the authors take a step back and rethink their manuscript in a different frame or focalise only on one country where, probably, some of their findings could have justification.
Thus, my opinion is encourage the authors to rethink the manuscript by having a more deep and referenced vision of this affasinating problem.