Analyzing Sustainability Literature in Maritime Studies with Text Mining

: Since the world’s ﬁrst Earth Summit in Rio de Janeiro in 1992, sustainability has become a focal point of signiﬁcant debate for industry, government, and international organizations. As a result, research on sustainability of maritime logistics is on the rise, yet fragmented in terms of conceptual development, empirical testing and validation, and theory building. The aim of this paper is therefore two-fold: the ﬁrst aim is to present a literature review of key journal articles in the ﬁeld of maritime studies published between 1993 and 2017 using a technique of topic modelling; and the second is to provide future research directions with respect to major topics, themes and co-authorship patterns. Mapping and consolidation of sustainability issues are achieved by conducting a generative probabilistic text-mining technique, called latent Dirichlet allocation (LDA), for latent data discovery and relationships among text document data. Moreover, bibliometric analysis is conducted to visualize the landscape of sustainability research. Based on the results, a new intellectual structure of sustainability research is created, the underlying themes are identiﬁed, key trends and patterns are extracted and future research development trajectories are mapped for the ﬁeld of maritime studies.


Introduction
The new paradigm of development that combines growth with sustainability is theoretically challenging, yet desirable to attain a sustainable future. In September 1997, the International Maritime Organization (IMO) adopted an International Convention Protocol to achieve "sustainable maritime development" [1], which is aligned to the notion of "sustainable development" that "meets the needs of the present without compromising the ability of future generations to meet their own needs" [2]. Sustainability necessitates all three dimensions-environmental, economic, and social-to be considered in business development. The environmental dimension requires a reduction in environmental impacts, while the economic dimension concerns minimization of costs or business continuity. The social dimension relates to community well-being, including the provisions of human rights, better working conditions and improved labor regulations [3]. Sustainability has three main objectives: minimize economic costs, mitigate negative environmental impacts, including air and water pollution, and enhance social justice, protect human rights and improve working conditions for labor.
In recent decades, the issue of sustainability has been taken seriously by the maritime industry through implementing policies and strategies to reduce shipping miles, promoting similarity/dissimilarity matrix would help formalize a framework to reflect the meaning, scope and application of the concept of sustainability in the maritime studies.
This paper primarily aims to identify key concepts and terms applied to denote the notion of maritime sustainability. This study employs topic modeling techniques, such as latent Dirichlet allocation (LDA) and other computational approaches, to extract and visualize the underlying thematic groupings, trends and patterns in sustainability research in the field of maritime studies. We analyze 155 research papers published in peer-reviewed journals since 1993, text mining this research for relevance to maritime sustainability.
The rest of the paper is structured as follows. The next section describes the data utilized. Section 3 presents the methods employed to identify specific patterns and trends in textual data. Findings and results are presented in Section 4. Finally, Section 5 concludes this study by highlighting research limitations, implications and future research.

Data Collection
Maritime-related journals were first selected from the area of transportation, logistics, shipping, and port-related journals listed on the science citation index (SCI), science citation index expanded (SCIE) and social science citation index (SSCI). The search for these areas was carried out in the Web of Science and Scopus database (two databases were selected for cross-checking). Maritime related-papers which contain the word "port" or "ports" or "seaport" or "seaports" or "container terminal" or "container terminals" or "container port" or "container ports" or "shipping" or "sea transport" in the title or keywords or abstracts within "sustainability" or "sustainable" related-papers were identified. This search covered the 25-year period from 1993 until 2017, and resulted in 155 papers, which were published in 48 peer-reviewed academic journals, excluding book chapters, editorials, papers on conference proceedings, and working papers (Appendix A Table A1). Table 1 shows the significant increase in the number of papers since 2012 on sustainability in the field of maritime studies, as illustrated in Figure 1. Our analysis shows that the term "sustainability" was first used by Brooks, (1993) in maritime studies, which was considered to be a "competitive advantage" for ocean container carriers. Most of these papers were related to seaport-related topics, followed by shipping. A small proportion of these related to maritime logistics.

Pre-Processing
The input data format for building topic models is a document-terms matrix (DTM), which consists of rows corresponding to the documents (papers) and columns that correspond to the terms (words found in each paper's title, abstract, and keywords), as shown in Figure 2. To generate the document-term matrix, raw textual data need to be preprocessed [41] by following a number of preparatory steps. These include converting tokenized terms (words), lowercasing, stemming, and removing "stopwords".
First, the corpus, which is the full text of documents, represents papers. The corpus must be tokenized into a smaller bag of words (word tokenization) by splitting a text and making word tokens. The second step, lowercasing, prevents a word with different capitalization from being mistaken for two different words. Third, stemming terms is conducted to remove pluralization or other suffixes and to normalize different tenses and variations of the same word (i.e., /select-/: select, selects, selection, selecting, selected). Special terms, such as "study", "analysis", and "research", which are often used in abstracts, are also omitted. Lowercasing and stemming make words into a simpler and more uniform form for text mining. Last is removal of stop words (such as, "and", "but", and "or"); such words do not contain information that is needed for analysis. Because these unnecessary words can impair the accuracy of topic modeling calculations, a predefined "stopwords" list is created and these words are deleted [41]. Additional preprocessing steps involve removing numbers, punctuation characters, and white space. Furthermore, some terms that have limited discriminatory value in the corpus dataset are excluded; for example, terms such as "model", "management", and "strategy". These types of words make the output harder to interpret, as these terms are featured in almost all topics.

Formation of the Document Terms Matrix
The generation of the DTM is the most basic task in all forms of text mining, including topic modeling. The topic modeling approach relies on the "bag-of-words" format, which describes the

Pre-Processing
The input data format for building topic models is a document-terms matrix (DTM), which consists of rows corresponding to the documents (papers) and columns that correspond to the terms (words found in each paper's title, abstract, and keywords), as shown in Figure 2.

Pre-Processing
The input data format for building topic models is a document-terms matrix (DTM), which consists of rows corresponding to the documents (papers) and columns that correspond to the terms (words found in each paper's title, abstract, and keywords), as shown in Figure 2. To generate the document-term matrix, raw textual data need to be preprocessed [41] by following a number of preparatory steps. These include converting tokenized terms (words), lowercasing, stemming, and removing "stopwords".
First, the corpus, which is the full text of documents, represents papers. The corpus must be tokenized into a smaller bag of words (word tokenization) by splitting a text and making word tokens. The second step, lowercasing, prevents a word with different capitalization from being mistaken for two different words. Third, stemming terms is conducted to remove pluralization or other suffixes and to normalize different tenses and variations of the same word (i.e., /select-/: select, selects, selection, selecting, selected). Special terms, such as "study", "analysis", and "research", which are often used in abstracts, are also omitted. Lowercasing and stemming make words into a simpler and more uniform form for text mining. Last is removal of stop words (such as, "and", "but", and "or"); such words do not contain information that is needed for analysis. Because these unnecessary words can impair the accuracy of topic modeling calculations, a predefined "stopwords" list is created and these words are deleted [41]. Additional preprocessing steps involve removing numbers, punctuation characters, and white space. Furthermore, some terms that have limited discriminatory value in the corpus dataset are excluded; for example, terms such as "model", "management", and "strategy". These types of words make the output harder to interpret, as these terms are featured in almost all topics.

Formation of the Document Terms Matrix
The generation of the DTM is the most basic task in all forms of text mining, including topic modeling. The topic modeling approach relies on the "bag-of-words" format, which describes the To generate the document-term matrix, raw textual data need to be preprocessed [41] by following a number of preparatory steps. These include converting tokenized terms (words), lowercasing, stemming, and removing "stopwords".
First, the corpus, which is the full text of documents, represents papers. The corpus must be tokenized into a smaller bag of words (word tokenization) by splitting a text and making word tokens. The second step, lowercasing, prevents a word with different capitalization from being mistaken for two different words. Third, stemming terms is conducted to remove pluralization or other suffixes and to normalize different tenses and variations of the same word (i.e., /select-/: select, selects, selection, selecting, selected). Special terms, such as "study", "analysis", and "research", which are often used in abstracts, are also omitted. Lowercasing and stemming make words into a simpler and more uniform form for text mining. Last is removal of stop words (such as, "and", "but", and "or"); such words do not contain information that is needed for analysis. Because these unnecessary words can impair the accuracy of topic modeling calculations, a predefined "stopwords" list is created and these words are deleted [41]. Additional preprocessing steps involve removing numbers, punctuation characters, and white space. Furthermore, some terms that have limited discriminatory value in the corpus dataset are excluded; for example, terms such as "model", "management", and "strategy". These types of words make the output harder to interpret, as these terms are featured in almost all topics.

Formation of the Document Terms Matrix
The generation of the DTM is the most basic task in all forms of text mining, including topic modeling. The topic modeling approach relies on the "bag-of-words" format, which describes the text that represents the occurrence of terms (words) within a paper (document). A row in the DTM represents a document (N = 155 in Figure 2), and a column represents a term (M = 576 in Figure 2). Before applying LDA, a document-term matrix (DTM) should be established due to the fact that the DTM format is required for the processing of topic modeling. Each cell in the DTM matrix indicates how often each term occurs in each document. The calculated DTM can be expressed via matrix algebra so that data can be analyzed by representing text as computer-readable numbers.
The DTM is not only used for generating the word cloud (Appendix A, Figure A1) that provides a graphical representation of word frequency in a source text, but also generates a dendrogram (Appendix A, Figure A2) that reflects hierarchical clustering and classification of words. However, these two figures do not provide enough information for determining the intellectual structure of sustainability literature. On the other hand, topic modeling provides a convenient way to summarize and understand large unclassified texts by discovering hidden themes and topical patterns in the archives of such texts. Moreover, the topic modeling employed in this study provides a useful tool in analyzing collections of textual information by using an algorithmic solution.
In the following sections, this paper discusses a topic modeling approach based on the DTM, using an analytical technique based on Bayesian statistics, to provide a way to categorize the corpus into a set of "topics" that contains meaningful words.

Method: Topic Modeling
Topic modeling belongs to the field of unsupervised machine learning [41], which uses algorithms to identify specific patterns of text, unlike supervised machine learning, which requires coding rules and data training. Topic modeling algorithms with statistical methods analyze words from documents to discover the topics contained in them.
Topic modeling automatically categorizes documents according to their underlying theme structures [42]. LDA is the most straightforward probabilistic topic approach to document modeling [36,42]. Figure 3 presents the concept behind the LDA model (Blei,[43] and Kuhn, [35]). K represents the number of topics. K can be determined by researchers based on practical experience rather than scientific evidence. However, this study uses a statistical method for determining the value of K, following the suggestions of Newton and Raftery [44] and Griffiths and Steyvers [45]. It is noted that each topic consists of specific words by Dirichlet distributions and the topic model algorithm from a DTM. The suggestion for determining K, referring to a harmonic mean, allows the discovery of the optimal number of topics, as well as measuring the goodness-of-fit in the modeling. By using the calculation of the harmonic mean, which is one of the maximal log-likelihood methods, the optimum number of topics (K) can be ascertained, as shown in the Appendix A, Figure A3. In the case of ports, the value of K was determined to be 10; for shipping and maritime logistics, values were 14 and 9, respectively. In the case of papers on ports, the optimal number of topics using the harmonic means method was 21 when the value of the maximum log-likelihoods is observed (Appendix A, Figure A3). However, too many topics generate noise from undesirable or unwanted topic groups in the text data, because similar topic groups should be separated to create a larger number of topics. Thus, we selected a value (K = 10) where the slope of the line in the maximum log-likelihood graph begins to flatten. Moreover, k and K represent different quantities: K stands for the specified number of topics, k indicates the auxiliary index over topics. text that represents the occurrence of terms (words) within a paper (document). A row in the DTM represents a document (N = 155 in Figure 2), and a column represents a term (M = 576 in Figure 2). Before applying LDA, a document-term matrix (DTM) should be established due to the fact that the DTM format is required for the processing of topic modeling. Each cell in the DTM matrix indicates how often each term occurs in each document. The calculated DTM can be expressed via matrix algebra so that data can be analyzed by representing text as computer-readable numbers. The DTM is not only used for generating the word cloud (Appendix, Figure A1) that provides a graphical representation of word frequency in a source text, but also generates a dendrogram (Appendix, Figure A2) that reflects hierarchical clustering and classification of words. However, these two figures do not provide enough information for determining the intellectual structure of sustainability literature. On the other hand, topic modeling provides a convenient way to summarize and understand large unclassified texts by discovering hidden themes and topical patterns in the archives of such texts. Moreover, the topic modeling employed in this study provides a useful tool in analyzing collections of textual information by using an algorithmic solution.
In the following sections, this paper discusses a topic modeling approach based on the DTM, using an analytical technique based on Bayesian statistics, to provide a way to categorize the corpus into a set of "topics" that contains meaningful words.

Method: Topic Modeling
Topic modeling belongs to the field of unsupervised machine learning [41], which uses algorithms to identify specific patterns of text, unlike supervised machine learning, which requires coding rules and data training. Topic modeling algorithms with statistical methods analyze words from documents to discover the topics contained in them.
Topic modeling automatically categorizes documents according to their underlying theme structures [42]. LDA is the most straightforward probabilistic topic approach to document modeling [36,42]. Figure 3 presents the concept behind the LDA model (Blei, [43] and Kuhn, [35]). represents the number of topics.
can be determined by researchers based on practical experience rather than scientific evidence. However, this study uses a statistical method for determining the value of , following the suggestions of Newton and Raftery [44] and Griffiths and Steyvers [45]. It is noted that each topic consists of specific words by Dirichlet distributions and the topic model algorithm from a DTM. The suggestion for determining , referring to a harmonic mean, allows the discovery of the optimal number of topics, as well as measuring the goodness-of-fit in the modeling. By using the calculation of the harmonic mean, which is one of the maximal log-likelihood methods, the optimum number of topics ( ) can be ascertained, as shown in the Appendix, Figure A3. In the case of ports, the value of was determined to be 10; for shipping and maritime logistics, values were 14 and 9, respectively. In the case of papers on ports, the optimal number of topics using the harmonic means method was 21 when the value of the maximum log-likelihoods is observed (Appendix, Figure A3). However, too many topics generate noise from undesirable or unwanted topic groups in the text data, because similar topic groups should be separated to create a larger number of topics. Thus, we selected a value (K = 10) where the slope of the line in the maximum log-likelihood graph begins to flatten. Moreover, and represent different quantities: stands for the specified number of topics, indicates the auxiliary index over topics.  α represents the Dirichlet parameter, indicating the parameter for determining θ d . The topic distribution per paper in topic modeling is represented by the Dirichlet distribution, which is given by the Dirichlet parameter α (α > 0). α is generally set as α = 50/K, as suggested by Griffiths and Steyvers [45]. Thus, the parameter α is applied with different values depending on K (i.e., in the case of ports, α should be 5, when the parameter K is 10). d represents individual documents. In the case of ports, d has 72 documents (papers) concerning sustainability. The number of shipping papers concerning sustainability is 70, and maritime logistics has 13 papers concerning sustainability. In other words, θ d is a random variable from α, indicating the topic proportions for the individual dth document. As shown in Figure 3, the value α determines topic proportions, θ d , as shown in Figure 3. The relationship between θ d and α can be shown as follows: Then, Z d is determined by θ d , indicating the topic assignments for the dth document. Thus, Z d,n shows the topic assignment for the nth word (n depends on each document) in each document, d. η is the parameter for determining β k . While θ d represents topic proportions, each β k represents the word (term) distributions. The relationship between β k and η can also be shown as follows, in that the value η determines topic proportions β k . The value η is assumed to be 0.1, as suggested by Griffiths and Steyvers [45].
For each topic being indexed by k, β k is the model parameter, indicating word proportions within the topic. Therefore, W d,n is the observed variable in the document from the multinomial distribution (p(W d,n Z d,n , β k )), indicating the nth word of document d.
This study conducts topic modeling using R statistical software version 3.4.4 (R Development Core Team, Vienna, Austria) ("R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License", R software website (www.r-project.org)), which provides attractive environments for computational methods of data science [41]. The next section presents the results from the R software using the topicmodels package.

Topics on Port Issues in Sustainability
In Table 2, 10 major topics ordered by the highly distributed terms (words) in each paper, and the corresponding top five terms for each topic, are listed and visualized. represents the Dirichlet parameter, indicating the parameter for determining . The topic distribution per paper in topic modeling is represented by the Dirichlet distribution, which is given by the Dirichlet parameter ( > 0). is generally set as = 50/ , as suggested by Griffiths and Steyvers [45]. Thus, the parameter is applied with different values depending on (i.e., in the case of ports, should be 5, when the parameter is 10). represents individual documents. In the case of ports, has 72 documents (papers) concerning sustainability. The number of shipping papers concerning sustainability is 70, and maritime logistics has 13 papers concerning sustainability. In other words, is a random variable from , indicating the topic proportions for the individual th document. As shown in Figure 3, the value determines topic proportions, , as shown in Figure 3. The relationship between and can be shown as follows: ~ Dirichlet( ) Then, is determined by , indicating the topic assignments for the th document. Thus, , shows the topic assignment for the th word ( depends on each document) in each document, . is the parameter for determining . While represents topic proportions, each represents the word (term) distributions. The relationship between and can also be shown as follows, in that the value determines topic proportions . The value is assumed to be 0.1, as suggested by Griffiths and Steyvers [45].

~ Dirichlet( )
For each topic being indexed by , is the model parameter, indicating word proportions within the topic. Therefore, , is the observed variable in the document from the multinomial distribution ( ( , | , , )), indicating the th word of document .
This study conducts topic modeling using R statistical software version 3.4.4 (R Development Core Team, Vienna, Austria) ("R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License", R software website (www.r-project.org)), which provides attractive environments for computational methods of data science [41]. The next section presents the results from the R software using the topicmodels package.

Topics on Port Issues in Sustainability
In Table 2, 10 major topics ordered by the highly distributed terms (words) in each paper, and the corresponding top five terms for each topic, are listed and visualized.  The right panel in Table 2 presents a broader view of the topic model. The size of circles represents the distance between each topic in two-dimensional space. Based on the researchers' prior knowledge, each topic name can be defined and extracted by manual examination. Many studies related to harbors and ports deal with the following topics: green ports [46][47][48][49], green performance [50], greening of ports [31,51], green port policy [52], employees' green behaviors in the port [11], and green concerns [53]. Topic #3 deals with the issue of emission, policy, pollutants, and cost; for example, emissions in ports [10,54,55], the environmental costs of port related emissions [56], carbon emission evaluation [57], Emission Control Areas (ECA) [58,59], and emission tax policy [10]. As Sislian et al. [30] indicate, sustainable development in ports usually focuses on environmental issues. Topics #2, #5, and #7 indicate environmental issues in industry and port cities, such as the sustainable development of port cities/areas [60], environmental reform [49], environmental performance [61], environmental sustainability in seaports [46,62], environmental management systems [63], environmental risk perceptions [64], environmental management [65], and environmental efficiency [66].

Topics on Shipping Issues in Sustainability
Since the value of the harmonic mean was stable around 14, the optimal number of topics is determined to be 14 as shown in Table 3. The right panel in Table 2 presents a broader view of the topic model. The size of circles represents the distance between each topic in two-dimensional space. Based on the researchers' prior knowledge, each topic name can be defined and extracted by manual examination. Many studies related to harbors and ports deal with the following topics: green ports [46][47][48][49], green performance [50], greening of ports [31,51], green port policy [52], employees' green behaviors in the port [11], and green concerns [53]. Topic #3 deals with the issue of emission, policy, pollutants, and cost; for example, emissions in ports [10,54,55], the environmental costs of port related emissions [56], carbon emission evaluation [57], Emission Control Areas (ECA) [58,59], and emission tax policy [10]. As Sislian et al. [30] indicate, sustainable development in ports usually focuses on environmental issues. Topics #2, #5, and #7 indicate environmental issues in industry and port cities, such as the sustainable development of port cities/areas [60], environmental reform [49], environmental performance [61], environmental sustainability in seaports [46,62], environmental management systems [63], environmental risk perceptions [64], environmental management [65], and environmental efficiency [66].

Topics on Maritime Logistics Issue in Sustainability
Using the same method, nine was the optimal number of topics ascertained for maritime logistics issues, as shown in Table 4.

Topics on Maritime Logistics Issue in Sustainability
Using the same method, nine was the optimal number of topics ascertained for maritime logistics issues, as shown in Table 4.  Topic #1 (logistics, improvement, environmental) can be explained by the title: the logistics system and flow for improving sustainable maritime logistics [6,57,84]. Topics #3 and #4 have the same core word, "quality", indicating quality management in supply chains [85] and service quality analysis for logistics companies [86]. For the methodology, quality function deployment (QFD) was employed for analysis of sustainable maritime supply chain. Topics #8 and #9 located in quadrant IV (sustainable supply chain design and strategy) can be explained by supply chain design for the sustainability of maritime logistics [87][88][89].

Co-Occurrence Analysis
Key hotspots of high-frequency keywords are identified by co-occurrence analysis [90]. Bibliometric methods were employed to investigate the hotspots with keywords given by the author(s). There were 18 high-frequency keywords (occurrence above 5) detected within the collected 155 papers during the period 1993-2017, which were related to maritime sustainability. The term "management" (except for "sustainability") had the highest number of occurrences, followed by "port", "emission", "impact", and "performance". "Impact", and "emission" are the most significant nodes, which form the basis for other terms to be linked to maritime sustainability, as shown in Table  5.  Topic #1 (logistics, improvement, environmental) can be explained by the title: the logistics system and flow for improving sustainable maritime logistics [6,57,84]. Topics #3 and #4 have the same core word, "quality", indicating quality management in supply chains [85] and service quality analysis for logistics companies [86]. For the methodology, quality function deployment (QFD) was employed for analysis of sustainable maritime supply chain. Topics #8 and #9 located in quadrant IV (sustainable supply chain design and strategy) can be explained by supply chain design for the sustainability of maritime logistics [87][88][89].

Co-Occurrence Analysis
Key hotspots of high-frequency keywords are identified by co-occurrence analysis [90]. Bibliometric methods were employed to investigate the hotspots with keywords given by the author(s). There were 18 high-frequency keywords (occurrence above 5) detected within the collected 155 papers during the period 1993-2017, which were related to maritime sustainability. The term "management" (except for "sustainability") had the highest number of occurrences, followed by "port", "emission", "impact", and "performance". "Impact", and "emission" are the most significant nodes, which form the basis for other terms to be linked to maritime sustainability, as shown in Table 5.  Figure 4 provides the network for keyword co-occurrence (above 5 times) in maritime sustainability papers with the same results as shown in Table 5. The thickness of the line surrounding the circle indicates their betweenness centrality [91] (p. 311). The term "port" shows the highest betweenness centrality among other keywords, although the number of connectivity is four. In other words, this keyword, "port", has a low degree of connection but high betweenness centrality. This result can be interpreted as meaning that the connections of "port" are linking different keywords together, having an influence across the whole network, even though "port" has fewer connections. Regarding authors' keywords, many papers regarding sustainability research in maritime topics have used such keywords, including emissions, management, impact, and performance.  Figure 4 provides the network for keyword co-occurrence (above 5 times) in maritime sustainability papers with the same results as shown in Table 5. The thickness of the line surrounding the circle indicates their betweenness centrality [91] (p. 311). The term "port" shows the highest betweenness centrality among other keywords, although the number of connectivity is four. In other words, this keyword, "port", has a low degree of connection but high betweenness centrality. This result can be interpreted as meaning that the connections of "port" are linking different keywords together, having an influence across the whole network, even though "port" has fewer connections. Regarding authors' keywords, many papers regarding sustainability research in maritime topics have used such keywords, including emissions, management, impact, and performance.   Figure 5 (as in Figure 4) indicates a high degree of connectivity in the network, while the yellow rim of the circle indicates a low degree of connectivity. The size of the circles indicates the number of papers.
The keywords related to maritime sustainability in recent years are governance, corporate social responsibility, and supply chain management, which are different from the keywords in the earlier phase of sustainability research.  provides information on co-authorship by country, affiliation, and collaboration between authors. China has conducted the most research on maritime sustainability. In a timeline view, authors from western countries including England, Germany, the Netherlands, Spain, Italy, and USA have conducted maritime sustainability research. Authors from Asian countries, including Singapore and Taiwan, were among those who undertook research on maritime sustainability at an earlier stage. Recently, researchers from China and South Korea have produced many studies in maritime sustainability, as shown in Figure 6a. The possible explanation is that Asian researchers have started to study sustainable development in social, economic, and environmental dimensions beyond traditional economic growth in maritime industry. Regarding affiliation, three universities (i.e., Nanyang Technological University, Singapore; The Hong Kong Polytechnic University, Hong Kong; and National Taiwan University, Taiwan) are at the center of research with various connections between other academic affiliations as shown in Figure 6b. Lam and Acciaro are the authors who have the most papers published and connections established with other authors on sustainability in the field of maritime studies (Figure 6c). Figure 6c and 6d have been generated based on the authors' information and their citation network of 155 research papers, which contain the maritime related-keywords in the title or keywords or abstracts within "sustainability" or "sustainable". We acknowledge that the work of some other prominent scholars in the field might have been overlooked in the filtering process. The keywords related to maritime sustainability in recent years are governance, corporate social responsibility, and supply chain management, which are different from the keywords in the earlier phase of sustainability research. Figure 6 provides information on co-authorship by country, affiliation, and collaboration between authors. China has conducted the most research on maritime sustainability. In a timeline view, authors from western countries including England, Germany, the Netherlands, Spain, Italy, and USA have conducted maritime sustainability research. Authors from Asian countries, including Singapore and Taiwan, were among those who undertook research on maritime sustainability at an earlier stage. Recently, researchers from China and South Korea have produced many studies in maritime sustainability, as shown in Figure 6a. The possible explanation is that Asian researchers have started to study sustainable development in social, economic, and environmental dimensions beyond traditional economic growth in maritime industry. Regarding affiliation, three universities (i.e., Nanyang Technological University, Singapore; The Hong Kong Polytechnic University, Hong Kong; and National Taiwan University, Taiwan) are at the center of research with various connections between other academic affiliations as shown in Figure 6b. Lam and Acciaro are the authors who have the most papers published and connections established with other authors on sustainability in the field of maritime studies (Figure 6c). Figure 6c,d have been generated based on the authors' information and their citation network of 155 research papers, which contain the maritime related-keywords in the title or keywords or abstracts within "sustainability" or "sustainable". We acknowledge that the work of some other prominent scholars in the field might have been overlooked in the filtering process.

Discussion and Conclusions
This study conducted a comprehensive literature review on sustainability in the field of maritime studies. From SCI/SSCI journals, 155 academic papers related to port, shipping and maritime logistics were extracted. Consolidation of sustainability issues was achieved by conducting a latent Dirichlet allocation for latent data discovery and relationships among text document data. The landscape of sustainability research was illustrated using bibliometric analysis. In this study, a new intellectual structure of sustainability literature was created, current trends and key coauthorship patterns were mapped, and future research development trajectories in the field of maritime studies were projected.
The results from text mining indicate the extracted common sustainability issues for ports and shipping businesses. Broadly, these issues are related to green ports/shipping, carbon emission/climate change and region-specific environmental regulation/management. The need to optimize shipping routes/networks to reduce carbon/green house gas emission, shrink distances, and reduce logistics costs is an additional sustainability issue for shipping.
For maritime logistics, sustainability issues are generally related to achieving optimal logistics systems, sustainable supply chain design, and service quality management. The co-occurrence analysis identified high-frequency keywords, including sustainability, management, port, emission, impact, performance, model, logistics, system and framework. More recently, the keywords have shifted to include governance, corporate social responsibility, and supply chain management. A shift in publication on sustainability from OECD (Organization for Economic Co-operation and Development)-dominated countries, with the exception of Singapore and Taiwan, to China and South Korea is noted in the visual illustration of mapped data.
A notable finding in this study shows that most of the sustainability issues related to ports, shipping and maritime logistics were related to economic and environmental dimensions of sustainability. However, social aspects, such as labor laws and standards, working conditions, maritime employment, regional growth and development, and the livability of communities within

Discussion and Conclusions
This study conducted a comprehensive literature review on sustainability in the field of maritime studies. From SCI/SSCI journals, 155 academic papers related to port, shipping and maritime logistics were extracted. Consolidation of sustainability issues was achieved by conducting a latent Dirichlet allocation for latent data discovery and relationships among text document data. The landscape of sustainability research was illustrated using bibliometric analysis. In this study, a new intellectual structure of sustainability literature was created, current trends and key co-authorship patterns were mapped, and future research development trajectories in the field of maritime studies were projected.
The results from text mining indicate the extracted common sustainability issues for ports and shipping businesses. Broadly, these issues are related to green ports/shipping, carbon emission/climate change and region-specific environmental regulation/management. The need to optimize shipping routes/networks to reduce carbon/green house gas emission, shrink distances, and reduce logistics costs is an additional sustainability issue for shipping.
For maritime logistics, sustainability issues are generally related to achieving optimal logistics systems, sustainable supply chain design, and service quality management. The co-occurrence analysis identified high-frequency keywords, including sustainability, management, port, emission, impact, performance, model, logistics, system and framework. More recently, the keywords have shifted to include governance, corporate social responsibility, and supply chain management. A shift in publication on sustainability from OECD (Organization for Economic Co-operation and Development)-dominated countries, with the exception of Singapore and Taiwan, to China and South Korea is noted in the visual illustration of mapped data.
A notable finding in this study shows that most of the sustainability issues related to ports, shipping and maritime logistics were related to economic and environmental dimensions of sustainability. However, social aspects, such as labor laws and standards, working conditions, maritime employment, regional growth and development, and the livability of communities within port regions, have attracted relatively less academic investigation within the field of maritime studies. Arguably, sustainability of the maritime industry requires that all dimensions are equally addressed.
There are theoretical and policy implications of these findings. From a theoretical perspective, this study has developed a broader understanding of the major themes and conceptual models to help theorize the notion of sustainability in the context of maritime studies. It enables researchers to: see the value of text-mining tools to help synthesize diverse results; understand the inconsistencies in the extant body of literature; and, develop new theories and models of maritime sustainability. From a policy perspective, this study has created a grounded platform to help researchers to develop maritime sustainability assessments and improvement frameworks to identify and evaluate key sustainability principles, guidelines and measurements employed in the field of maritime studies.
Future research could pay equal attention to researching social aspects of maritime management, including the effectiveness of community planning, labor laws and regulations, and strategic policy-making. From a supply chain perspective, single port-to-port chains is an issue that has received limited attention that requires further investigation. Competitive port supply chains therefore could be incorporated in the future research agenda. Sustainability of the broader region within which a port operates needs to be integrated in the evaluation of sustainability measures on port functions, shipping services and maritime logistics operations. In addition, new case studies from developing countries and emerging economies could be encouraged to provide wider insights into sustainability issues in a globalized marketplace. Furthermore, interdisciplinary research across institutions and nations could be better promoted by journals to enhance cross-cultural learning across different business settings and industry practices to help preserve the natural environment and mitigate deleterious impacts of maritime operations on the natural habitat, while generating growth and employment opportunities for local port communities. constructive comments and suggestions, which have contributed to improving this paper. However, the authors are responsible for the paper. ; and, (d) maritime logistics issue papers in sustainability (n = 13). By using a DTM, a word cloud which indicates the most frequently used terms (words) in the collected documents can be generated. The size of words corresponds to the frequency of the terms in the 155 documents. As illustrated, the terms "/envirntl-/", "green", "/emiss-/", "/economy-/", "/perform-/", "/polici-/", "/strategy-/", and "/effect-/" are the top eight most important words that predict some research topics in the field of maritime studies. ; and, (d) maritime logistics issue papers in sustainability (n = 13). By using a DTM, a word cloud which indicates the most frequently used terms (words) in the collected documents can be generated. The size of words corresponds to the frequency of the terms in the 155 documents. As illustrated, the terms "/envirntl-/", "green", "/emiss-/", "/economy-/", "/perform-/", "/polici-/", "/strategy-/", and "/effect-/" are the top eight most important words that predict some research topics in the field of maritime studies. Figure A2. Cluster dendrogram. The cluster tree is generated by evaluating each word's similarity to other words from 155 documents relating to maritime studies. The Y-axis of the dendrogram represents the distance between clusters; the greater the height, the greater the difference between clusters.
(a) Figure A2. Cluster dendrogram. The cluster tree is generated by evaluating each word's similarity to other words from 155 documents relating to maritime studies. The Y-axis of the dendrogram represents the distance between clusters; the greater the height, the greater the difference between clusters. Figure A2. Cluster dendrogram. The cluster tree is generated by evaluating each word's similarity to other words from 155 documents relating to maritime studies. The Y-axis of the dendrogram represents the distance between clusters; the greater the height, the greater the difference between clusters.