- freely available
Educ. Sci. 2016, 6(1), 2; doi:10.3390/educsci6010002
Abstract: This paper describes developments in Welsh-language terminology within the education system in Wales. Following an outline of historical terminology work, it concentrates on the consolidation of terminology standardization at the Language Technologies Unit, Bangor University, with particular reference to two projects, one concerned with terminology for school-age and further education, the second concerned with higher education. The developments described include the adoption of international standards in terminology standardization and their incorporation in an online terminology standardization environment and dissemination platform that enable access to the centralized terminological dictionaries via a number of sophisticated websites, portals and mobile apps featuring rich dictionary entries. Some of the issues in managing large term collections are explored, and usage statistics are presented for the resources described.
1.1. The Case for Terminology Standardization in Wales
Wales is one of the constituent countries of the United Kingdom. At the time of the most recent census (2011) it had a population of 3 million, with 562,016 (19%) having indicated that they were Welsh-speakers . Its education system, for which the Welsh Government has devolved responsibility, provides education in the medium of both English and Welsh, up to and including university level. According to the latest figures, over one in five primary, middle and secondary school pupils in Wales are enrolled in Welsh-medium education . As a result, Welsh-language versions of classroom materials, course specifications and examination papers are produced for a large proportion of the subjects taught up to secondary level. Most of these resources are produced by the examination board entitled the Welsh Joint Education Board (WJEC) and a number of independent suppliers based primarily in Wales, usually with financial assistance from the Welsh Government. In further and higher education, Welsh-medium provision is increasing due to investment from the Welsh Government and the establishment of a virtual Welsh-medium national college, the Coleg Cymraeg Cenedlaethol, which has branches in universities across Wales.
Welsh-medium education therefore encompasses a broad variety of subject domains and age groups, and involves multiple stakeholders and participants, many of whom are involved in producing Welsh-language resources. Terminology standardization, that is, the development and adoption of technical terms by an authoritative body for a specific purpose, is thus an essential consideration as it is important that Welsh-medium students experience a continuity of terminology from course book to examination paper, from subject to subject, and from one educational stage to the next. A lack of continuity would place Welsh-medium students at a disadvantage in comparison to those studying the same course through the medium of English, where the technical terms may be more established.
Bilingual terminological dictionaries differ from traditional bilingual general-language dictionaries in that they do not attempt to list all the meanings and possible target-language equivalents which relate to a source-language word, leaving the user to decide which of the equivalents to use. Rather, terminological dictionaries endeavour to prescribe a single term as the preferred label for a specific concept within a certain domain or subject. Therefore, they are to be used in the context of language for special purposes, rather than for everyday language.
1.2. Historical Terminology Work
Awareness in the education sector of a need for Welsh terminological dictionaries dates back to the early twentieth century when efforts were being made to bring the language into the school curriculum. A report published in 1927, Welsh in Education and Life, revealed not only a general shortage of resources for teaching Welsh in schools and for teaching other subjects through the medium of the language at university, but also a lack of dictionaries and other terminological resources and the problems this created . There was some way to go before the concept of standardizing terminology would arrive in Wales, however a year later a standardized orthography for Welsh, Orgraff yr Iaith Gymraeg, was published, stabilizing the development of the language and setting the scene for subsequent terminology work in education. By 1971 close to thirty lexicons were available for a range of subjects including biology, geography, history and physics  (pp. 29–35). These were produced by the WJEC and the University of Wales Press, the former concentrating on subjects taught in schools and the latter on university subjects, with some overlap between the two  (p. 44). These were followed in 1973 by the first modern Welsh-language terminological dictionary, Geiriadur Termau, published by the University of Wales Press and reflecting “the effort of many people engaged in education in Wales to produce lists of terms required for the teaching of a number of school subjects through the medium of Welsh”  (p. vi). This marked a major step forward because it was bi-directional: the previous term lists were in one direction only, reflecting the priorities of the time, namely the need to translate material from English to Welsh  (pp. 44–45). It also included information such as grammatical gender, which had been lacking in some older term lists deemed “inadequate” by teachers due to this omission  (p. ix). The introduction to the volume mentions that developing terminology in education was also a problem for other languages not previously widely used in education, but it does not refer in more detail to this, nor does it refer in its methodology to processes used elsewhere or to the concept of standardization in terminology work  (p. 45);  (pp. vi–xii).
In the years between the publication of Geiriadur Termau and the early 1990s, many other terminological dictionaries appeared, produced by various institutions involved in Welsh-medium education, such as the aforementioned University of Wales Press and the WJEC (for a comprehensive list see ). However, as many of these volumes were conceived and published independently within separate institutions, competing terminological dictionaries for the same domain occasionally appeared, with some having been in concurrent development  (p. ii).
1.3. Consolidation of Terminology Standardization in Education
In 1993, the School Curriculum and Assessment Authority (the body then responsible for the tasks and statutory tests which were used to assess pupils at ages 7, 11 and 14 years), put out to tender the work of standardizing terminology for all key stages and all subjects in the National Curriculum  (p. v);  (p. 46). This new standardized curriculum, introduced in the wake of the Education Reform Act of 1988 , was to be taught in state-funded schools in England and Wales. The brief for the terminology project was to “establish objective criteria for standardizing Welsh terminology” and to “develop computer-based databases to manage and store the terminology data”  (p. 47). The tender was won by the School of Education at the University of Wales, Bangor (now Bangor University), which set up the Canolfan Safoni Termau, a new Centre for the Standardization of Terminology, in order to carry out the work. This centre was later amalgamated into what is today known as the Language Technologies Unit (LTU), one of five units which make up Canolfan Bedwyr, a Welsh language centre at Bangor University.
To fulfil the first part of the brief, the Canolfan Safoni Termau looked beyond Wales and decided to base its objective criteria for standardization on the work of the International Organization for Standardization (ISO) and specifically on three of its standards: ISO 704 Terminology Work—Principles and Methods , ISO 860 Terminology work—Harmonization of concepts and terms  and ISO/TR 12618 Computational aids in terminology—creation and use of terminological databases and text corpora ;  (p. 47). The policy of adapting these standards to suit the needs of the Welsh language has been described as “crucial in mainstreaming Welsh terminology work and establishing standardization criteria in step with current international best practice”  (p. 194).
The project culminated in the release in 1998 of a single volume combining the terms from all subjects then taught in the National Curriculum. The volume, Y Termiadur Ysgol , built on and standardized the work already undertaken in the many separate subject-specific terminological dictionaries and lists that had been released in the previous decades, with the data for the first time being stored in a database form, albeit on a local machine. This facilitated the creation of a searchable software version of the terminological dictionary which was included on CD with the printed volume. The software version featured sophisticated lemmatization functionality which allowed users to search using inflected forms and still find the appropriate entry.
Y Termiadur Ysgol was followed in 2006 by an expanded edition, Y Termiadur , featuring thousands of additional terms and amendments to some of the terms that had failed to gain purchase in the language. It included, for the first time, terms for vocational and A Level courses. In addition to the print and software edition, Y Termiadur now appeared in website form as a fully searchable terminological dictionary with many of the advanced search facilities found in the software edition.
Work on the third iteration of the Termiadur series, Y Termiadur Addysg , began in 2011, following the publication of the Welsh Government’s Welsh-medium Education Strategy which explicitly included terminology standardization as one of its objectives  (p. 18). In each iteration of the dictionary, the title was changed. The first title, Y Termiadur Ysgol, translates as “the School Terminological Dictionary”, the second Y Termiadur as simply “the Terminological Dictionary”, and the third Y Termiadur Addysg as “the Education Terminology Dictionary”. The reference to education was dropped in the second iteration of the dictionary due to its widespread use among translators and others outside the school system  (p. 161). The current iteration, still ongoing at the time of writing, continues in digital format only. Y Termiadur Addysg is available online and as part of the Cysgliad software package for Microsoft Windows. In 2012, it was made available within apps for Amazon, Android and Apple iOS. Funded by the Welsh Government, its remit has expanded to include terminology from further education courses and domains, in addition to those found in primary, middle and secondary education.
The development of terminology for Welsh-medium higher education (HE) did not take place in parallel to that of terminology for schools in the 1990s as consistent funding for terminology in this sector only began in earnest in 2009. Prior to this, a few subject-specific HE terminological dictionaries were published and funded by different bodies. In 2004, a Dictionary of Terms for Psychology  was printed, followed by a Dictionary of Terms for Woodland Management in 2005 . These were funded by Bangor University in conjunction with the Wales branch of the British Psychological Society and the Forestry Commission Wales, respectively. A new funding partner to Bangor University became involved in 2008 with the publication of a Dictionary of Legal Terms  and the online publication of a Dictionary of Terms for the Creative Industries . This partner was the Centre for Welsh-medium Higher Education (CWMHE). For some years previously, there had been calls for the establishment of a “federal college” to better support and consolidate Welsh-medium provision in HE. Strategies and proposals for its creation coalesced into the founding of the CWMHE, and ultimately led to the establishment in 2011 of a virtual Welsh-medium federal college, the Coleg Cymraeg Cenedlaethol (“Welsh National College”). The aim of the CWMHE and later the Coleg Cymraeg Cenedlaethol was to promote, develop and broaden Welsh-medium provision within Wales’ existing universities by funding Welsh-medium PhDs and lectureship posts within them (in addition to those already funded by the universities themselves) as well as extending the range of subjects available to students in Welsh. In order to increase collaboration between different institutions involved in the development of Welsh-medium provision in specific domains, the CWMHE created “subject panels”, whose members included lecturers from every Welsh university. Subject panels could apply to the CWMHE for grants to develop terminology, and the Dictionary of Terms for the Creative Industries was published with one such grant. In 2009 the CWMHE, predicting that the demand for terminology would only increase in future with growing numbers of Welsh-medium lecturers and students, determined that a more efficient and cost-effective way of fulfilling terminological needs in the HE sector would be to fund a single long-term terminology project with a dedicated full-time terminologist. This would avoid funding applications for multiple, parallel terminology projects and would ensure that all terminology work in the sector would be coordinated and would follow the same methodology and dissemination strategies. Conscious of the need to ensure consistency between the terminology taught in schools and that taught at university, the CWMHE chose to base its project in the LTU at Bangor University, where work on the Termiadur series was ongoing. This would not only foster collaboration between both education terminology projects but also enable them to share resources. It also fitted neatly with objective 5.5 of the 2010 Welsh-medium Education Strategy, which noted that the HE sector should contribute to joint working arrangements for the development and standardization of terminology in education  (p. 43). Initially funded for a period of two years, with the inception of the new Coleg Cymraeg Cenedlaethol, the project secured recurrent funding and is ongoing at the time of writing. The first steps taken were to prepare the three print HE terminological dictionaries from 2004 and 2008 for online publication and add these and the fourth online HE terminological dictionary to a new HE terminology portal, where users could search university terminology all in one place. These early HE terminological dictionaries included definitions, and it was determined that definitions would continue to be included as an integral part of future terminology work at HE level. More subject-specific terminological dictionaries were added in fields including international politics, sports science, chemistry, mathematics and physics. In 2015, it was decided that these would all be rebranded as one new terminological dictionary, entitled Geiriadur Termau’r Coleg Cymraeg Cenedlaethol, the “Coleg Cymraeg Cenedlaethol Terminological Dictionary” (referred to henceforth as the Coleg Cymraeg dictionary in this article) . At this point, subject fields began to be displayed as sub-fields of the main terminological dictionary (work to tag entries by subject field is underway in the Y Termiadur Addysg project). Currently, the Coleg Cymraeg dictionary features 14 subjects agreed upon by the project board in consultation with subject panels and with reference to the priorities outlined in the Coleg Cymraeg Cenedlaethol’s Academic Plan. In comparison, Y Termiadur Addysg covers terminology for over 70 courses in approximately 35 subjects (the exact number of subjects is open to interpretation as some courses such as Humanities may include an element of a number of different subjects and similar courses, whilst film studies could be considered an aspect of media studies even though both are offered as separate A Level courses.) As a result, Y Termiadur Addysg prioritizes breadth of coverage, while the Coleg Cymraeg dictionary prioritizes depth of knowledge. The Coleg Cymraeg dictionary with definitions is available online and, as of 2015, within apps available for Amazon, Android and Apple iOS. Although both the Y Termiadur Addysg project and the Coleg Cymraeg dictionary project standardize terms for the education sector and each employ a single terminologist with additional editorial and technical support, they differ in some respects with regard to their priorities and target users. Whilst the Coleg Cymraeg dictionary is aimed at a fairly homogenous audience of lecturers, researchers, in-house university translators and undergraduate and post-graduate students, the audience of Y Termiadur Addysg is more diverse. In addition to professionals such as teaching staff and the translators of course materials and examination papers, Y Termiadur Addysg is intended to be used by students of all ages including primary school pupils, and the format and content of the terminological dictionary must therefore be accessible to those studying at all academic levels. Another issue in a bilingual country such as Wales is that many of the parents of students attending Welsh-medium schools are not themselves Welsh-speakers , and there may be varying degrees of Welsh-language proficiency amongst the students themselves. This means that Y Termiadur Addysg must present its information to the user in a simpler manner than that which is required of the Coleg Cymraeg dictionary and it must provide additional user aids, such as text-to-speech, for those to whom the Welsh language is less familiar.
Unlike most of Y Termiadur Addysg’s intended audience, intended users of the Coleg Cymraeg dictionary have chosen to specialize in a specific field and consequently a higher level of expertise can be assumed when designing terminological dictionary entries intended for their use. HE students are required to possess a more detailed understanding of the concepts relating to their studies, and it is important that definitions of the concepts taught in Welsh-medium HE are available in Welsh.
2. Experimental Section: Meeting the Needs
The theoretical basis for Welsh terminology work in education is the ISO 704 standard, which requires that a term be linguistically correct, accurate, concise and monosemous and that it should, ideally, give rise to derivatives . This means that not only should a term comply with linguistic norms, which in the case of Welsh means following the spelling guidelines found in the latest edition of Orgraff yr Iaith Gymraeg (1987), but that it should also reflect, as far as is possible, the characteristics of the concept, and that the term should refer to one concept only. Other ISO standards have informed the work in Wales over the years, and these have formed the basis of a number of guidelines produced by the LTU for various government bodies, the most recent of which appeared in 2007 . However, perhaps the second most important ISO document with regard to Welsh term formation and selection is ISO 15188 on Project management guidelines for terminology standardization , which emphasizes the importance of a consensus-based approach in the validation and adoption of terms within subject-specialist communities . As a result, both projects seek the opinion of subject specialists for candidate terms where there is concern over the appropriateness of one word over another as a label for a concept. In the Coleg Cymraeg project, many of the terms to be standardized are submitted to the project by the subject specialists themselves. These concepts must then be defined and the specialists play a major role in the process, with feedback being given either by a small number of individuals or, less frequently, a larger selection of lecturers from the relevant subject panel. This is possible as the number of terms dealt with in this project is comparatively smaller, with the result that a large percentage of terms is discussed with subject specialists. These specialists include individuals and organizations external to the HE sector, as was the case with terms for the creative industries where Welsh media professionals and broadcasting companies were involved in standardization. In the Y Termiadur Addysg project the percentage of terms brought to the attention of subject specialists is comparatively smaller, as the terms used in many subject fields have been well established through decades of teaching, and are lifted primarily from published works that have gone through an editorial process by publishers such as the WJEC. Y Termiadur Addysg however also recieves ad hoc enquiries regarding terminology from education stakeholders, including subject specialists, which are then incorporated into the terminological dictionary.
2.1. Term Collection
One of the main tasks facing the terminologist during the creation of a terminological dictionary is that of collecting the terms relevant to a particular subject domain. This includes source language terms (which in the case of Wales means English terms), and candidate target language (Welsh) terms which require standardization. These can be gathered from a variety of different resources of varying provenances, which generally include existing reference works, specific documents and support materials  (p. 116) which belong to the domain in question. Within the context of Y Termiadur Addysg these correspond to historic bilingual terminological dictionaries, English course books and their Welsh translations, annual bilingual examination papers and bilingual course specifications (these specifications describe to educators what the students are required to learn on the course and are therefore a valuable source of terminology.) Past and yet-to-be-published examination papers and course specifications are key resources for the project as it is vital that terms used in an examination are not encountered by the student for the very first time in the examination itself.
Given that such materials have been published for Welsh-medium schools over a period of several decades and continue to be published regularly today, there is a wealth of parallel texts at the disposal of the Y Termiadur Addysg project, from which the terminologist can select English terms and candidate Welsh terms. Subjects covered by these resources are mostly from traditional fields (mathematics, religious education, history), many of which have long been taught through the medium of Welsh. Vocational fields, which were only recently added to the remit of the project, have fewer such resources.
In contrast, in HE, it is only relatively recently that study materials and research have begun to be published in Welsh in some fields (such as solar physics, genetics and sports science). There is the added complication that the more scientific the area of study, the more likely it is that researchers will publish in English, both in order to appeal to a larger readership and in order to publish in prestigious international journals, an important consideration when preparing for the Research Excellence Framework (REF) which assesses the quality of research carried out in HE institutions in the UK.
There are very few bilingual resources available to the Coleg Cymraeg dictionary project for term collection work. The most commonly encountered resource is term lists supplied by lecturers, which include English source terms and the lecturer’s own suggestion for candidate Welsh terms. Another, much rarer possibility, is that an academic might author and submit for editing and publication an entire terminological dictionary for a particular subject field. This has occurred once in the history of the Coleg Cymraeg dictionary project, with submission in 2015 of an earth sciences terminological dictionary by geologist Dyfed Elis-Gruffydd .
Candidate Welsh terms may also be sourced from yet-to-be-published HE course books and published or yet-to-be-published articles from Welsh scholarly journals. A difficulty, however, is that the source English term is not provided and, given that the Welsh candidate term is often a neologism coined by the author, the terminologist must work backwards from the Welsh and decipher which concept is in question in order to arrive at a source term. The more closely the candidate term reflects the characteristics of the concept, the easier it is to achieve this. Articles used for such purposes are primarily published in Gwerddon, a peer-reviewed interdisciplinary e-journal published twice annually and also financed by the Coleg Cymraeg Cenedlaethol . Although the Coleg Cymraeg dictionary project has been involved in the creation of relatively few course books, in 2015, it was involved in the standardization of terms for a book on the foundations of public law. This is the first in a series of law course books, again funded by the Coleg Cymraeg Cenedlaethol, which are set to be published in the coming years. Welsh candidate terms were collected from the volume and standardized in collaboration with its author, the project lead, the Welsh Government’s Chief Jurilinguist and its First Legislative Counsel from 2007 to 2010. The involvement of others involved in terminology standardization in government helps ensure that terms used in the education sector are consistent with those recommended by BydTermCymru, the government’s terminology website for translators . This heralds a new working method for the Coleg Cymraeg dictionary project where the terminologist is involved in the creation of new course material from their initial stages until their publication, offering terminological aid throughout the process.
As the HE terminology project uses materials funded by the same body as that which funds the terminology work, namely the Coleg Cymraeg Cenedlaethol, it avoids a problem which has affected Y Termiadur Addysg, and that is gaining access to the academic texts in which candidate Welsh terms appear. Obtaining printed copies of all such materials produced for schools is a costly venture and, having bought these, many are subject to copyright, leading to difficulties in using the most efficient means of candidate term extraction. Obtaining digital copies from producers is, for the time being, unlikely.
The Coleg Cymraeg dictionary project sources English terms not only from the materials previously mentioned but also, where copyright permits, from HE terminological dictionaries produced for a different pair of languages (as long as English is one of these and acts as a pivot language for the term list). This has occurred once in the Coleg Cymraeg dictionary project, when biologist Adam Oliver Brown gave permission to adapt his French-Language Glossary of Biological Terminology for the Welsh-language. This was produced for students of Ottawa University, and includes French definitions and English equivalent terms .
In cases where an English term list has been collected yet no equivalent candidate Welsh terms have surfaced in any of the materials mentioned above, then the terminologists must find a solution. New Welsh technical terms may be coined, adapted from Welsh general-language words or borrowed from another language entirely. Finding a new Welsh term using one of these methods is rarely problematic. Welsh is a well-developed language that has been used as a medium of literature for fifteen centuries  (p. x). It has long-established conventions for combining prefixes, suffixes and word elements, and a tradition spanning nearly two millennia of borrowing from Latin, a language from which many technical terms draw inspiration or are derived. Conventions for the transliteration of chemical names, for instance, have long been established, so that “ethylenediaminetetraacetic acid” would unambiguously be rendered as “asid ethylendeuamintetraasetig”. Welsh terminologists may also look to other Celtic languages for inspiration. If an equivalent term for a given concept already exists for example in Irish , this may provide a useful pattern to follow, as these languages share common roots and structures with Welsh.
More problematic, perhaps, than creating new terms, is achieving consensus regarding the appropriate term candidate where a number of potential equivalents already exist, especially if several of them are in current use. Where a term candidate has gained currency and is linguistically and conceptually acceptable it is considered best practice (in keeping with the international standards) for the term candidate in question to be selected as the preferred term. Whilst terms in English generally gain currency and become well-established through their continued use within a certain discourse, the discourse in less-resourced languages may not gain enough participations for such a process to reach a conclusion organically. For this reason, less-resourced languages can find themselves required to participate in a greater degree of prescription, as opposed to description, than is the case for other more-resourced languages.
Problems include the use of competing forms for a single concept, a single general-language form being used for similar but distinct concepts within the same subject domain, or subject specialists deeming the term candidate(s) inappropriate for conveying the meaning of the concept. For example, the declaration in a court of law of a person’s guilt, which in English is represented by the term “conviction”, appears in Welsh alternatively as both euogfarnu and collfarnu. A familiar, general-language word in use for multiple technical concepts can occur in Welsh due to the prioritization of familiarity over technical accuracy by translators. One such example has been the use of ffrwythloni artiffisial (equivalent to “artificial fertilization”) for “artificial insemination” as strict equivalents for “insemination” such as mewnsemenu are often seen as alien and unfamiliar. However, in this case a “fertilization” equivalent is inappropriate as the process of insemination does not guarantee fertilization. A candidate term deemed inappropriate by subject specialists was gorchudd iâ for “ice sheet” in geology, as the word gorchudd refers to something which “covers” something else, and an ice sheet (properly llen iâ) does not necessarily cover all land features as mountains may protrude above the ice. A lesser problem is that candidate terms may also be found not to comply to linguistic norms, although usually this is simply caused by orthographic matters such as the incorrect use of accented characters or hyphens and is therefore easily rectified.
The sources used for term collection purposes in Welsh terminology work in the education sector differ according to the project in question. In the same way, the methodology of term collection also differs, with manual or semi-automated extraction procedures being used.
2.2. Term Extraction
Welsh term extraction, as previously mentioned, should in this context be considered to be term candidate extraction, as no judgement is made during the extraction stage as to the termhood of the recorded word form. The extraction method employed within the Y Termiadur Addysg project depends on a number of considerations including the availability of digital versions of resources as well as their copyright status. Where digital copies are available and the copyright situation is favourable, semi-automated term extraction is possible using natural language processing (NLP) techniques. With Welsh texts, the first step is identifying unrecognised word forms that are not present in the LTU’s lexicon of Welsh word forms. The texts are converted into a categorised NLTK corpus where information such as the subject domain and academic level of the text can be retained. Named entities such as personal names, place names and product names are filtered out at this stage, along with common misspellings, English words that are not also Welsh words, boiler plate text (for example “Student Name:”), and so forth. A simple software interface is then used to assist the terminologist in manually classifying the remaining unrecognised word forms into categories such as unrecognised Welsh word, unrecognised English word, word fragment, unrecognised place name etc. Word forms are classified in order of their frequency within the corpus, and can then be added to the general lexicon and assigned a part of speech, plural form or conjugation pattern, and so on. In this manner, unrecognised forms in the corpus can be processed relatively quickly into term candidates. Once these unrecognised forms have been added to the lexicon, the corpus can be lemmatized using the LTU’s lemmatizer . Lemmatization is the process of converting inflected forms such as mice and swam in English to their canonical forms of mouse and swim. This process is made more complicated in Welsh due to the fact that a word’s beginning can inflect as well as its ending, a phenomenon known as initial consonant mutation. Following lemmatization, statistical techniques using bigrams and trigrams and tf:idf can be used to identify likely term candidates, and, within a categorised corpus, help determine to which subject domain a term belongs. English term extraction follows a similar approach, with the benefit that English NLP tools are more widely available. Parallel bilingual subject domain corpora have also been created and are used as research tools by the terminologists. Although the Coleg Cymraeg dictionary has not made much use of NLP techniques as of yet, permission has been obtained to convert the archived issues of Gwerddon, the Coleg Cymraeg Cenedlaethol’s Welsh-medium journal, into a corpus for use in concordance searches and term candidate recognition. Another useful resource which has implications for the Coleg Cymraeg dictionary project is the creation and expansion of a corpus of searchable academic texts through the DECHE Digitising, E-publishing and Language Corpus project . The LTU provides a number of additional, publically available online monolingual and bilingual searchable corpora as research tools, although these are limited to texts whose copyright terms allow for this usage, or whose copyright has expired.
Unfortunately, the use of natural language processing techniques is not always feasible. With printed publications, terms must often be extracted manually as copyright or general availability issues mean that digital copies are often not available. Where copyright issues exist, the creation of digital copies through scanning and using optical character recognition (OCR) techniques would not be legal. Where this is not an issue, modern OCR can be used quite effectively with Welsh as with English, allowing for successful digitization. However the complicated layouts of many course books, with multiple columns, images and boxouts can make post-editing OCR too time-consuming to be worthwhile. One of the problems encountered when extracting new terms manually from a text is that the same term may arise many times. Without using technology it is difficult to track whether a term has already been recorded or not, leading to repeated manual recording of the same term. To accelerate the manual term extraction process in the Y Termiadur Addysg project an autocomplete feature was added to the term extraction interface. This was linked to the list of term candidates previously recorded and allows the operator to quickly establish with only a couple of keystrokes whether a term has already been collected, and record the specific instance of the term, should term frequencies need to be recorded.
2.3. Ensuring Consistency
Consistency in terminology between levels of education, subjects and educational institutions is a high priority, as it is vital to ensure that children moving from one stage of education into the next are not confronted with new terms which differ greatly from terms used for related concepts already discussed in earlier years. Although in Wales there are only two terminological dictionary projects which are specific to the education sector, many other terminological dictionaries have been commissioned over the years by other bodies in other sectors, and these contain terminology which is relevant to students and others involved in education. In 2010, the LTU launched its Welsh National Terminology Portal, which features 18 terminological dictionaries developed by the unit itself and its approved partners . Many of these had previously been available only in hard copy. The portal allowed users, for the first time, to search all of these dictionaries simultaneously using a single search box. Although a boon to terminologists, translators and many others, an unexpected side-effect of this new, powerful search option was that it served in some instances to highlight inconsistencies between terminological dictionaries that were meant to be consistent. Such inconsistencies included two preferred terms used for a single concept across two terminological dictionaries, or a different part of speech. As a result, a program was developed within the LTU to identify potential examples of these inconsistencies and bring them to the attention of the terminologists.
Term collection and standardization is common to both the terminology projects currently underway in the Welsh education sector, however a further element of terminology work is unique to the HE terminology work, namely definition writing. The decision to include definitions depends on project priorities and resources. The priority for the Termiadur series has, since the beginning, been to standardize a great many terms within the timeframe of the project, in order to fill a considerable gap in the terminology required for the education sector. However, the Coleg Cymraeg dictionary was developed later when the groundwork for terminology in education had already been carried out. It was therefore possible to concentrate on a smaller number of concepts often within a more specialized field, and provide in-depth information about them through the inclusion of definitions.
In the Coleg Cymraeg dictionary, initial drafts of definitions are prepared either by the terminologist, using reference books and articles, websites and other terminological dictionaries as a guide, or by the subject specialist, using his or her own knowledge of the concept. The draft is then discussed by both the terminologist and subject specialists and fine-tuned, to ensure that it is accurate and clear, and that it complies with ISO 704. The definition is, in most cases, translated so that it is available in both Welsh and English. This process tends to highlight any unclear or ambiguous phrasing which is then corrected, therefore, providing the definition bilingually often increases the clarity of the concept. Compliance with ISO 704 means that the following problematic structures are avoided:
incomplete definitions which are too broad in scope;
negative definitions, which explain what the concept is not, without explaining what it is;
circular definitions which repeat the term within the definition and do not add to the reader’s understanding of the concept.
Definitions include the essential characteristics of the concept and provide sufficient information so that the student may be able to identify a concept and differentiate between it and other similar concepts, as well as understand the relationship between related concepts. Definitions often include rich text features made possible by recent technical developments to the in-house system, Maes T. These include clickable cross-references to related concepts, the addition of graphical elements such as diagrams and photographs and the inclusion of mathematical formulae. More than one method of enabling these was explored, however the one which best suited the needs of the Coleg Cymraeg dictionary was achieved by extensive use of Markdown as well as the use of LaTeX and a small subset of HTML elements, such as the “img” tag.
3. Results and Discussion
3.1. Dissemination of Terms
Recent years have seen great changes in the technologies used to construct, standardize and distribute terminological resources, especially in the field of web-based services and mobile connectivity, increasing the technological demands placed on those developing the resources. In order to deal with these demands, the Welsh terminology projects for the education sector share resources and technical personnel. A software developer is part-funded by both projects to develop and improve the in-house terminology standardization and dissemination platform, Maes T, the Welsh terminology app, and the web-based dictionaries.
The Maes T system is an online terminology development interface for creating, editing and publishing dictionary entries . One of the main drivers for its creation was the need to enable teams of geographically dispersed subject specialists to contribute to standardization, in order to adhere to the consensus-based, concept-led principles that underpin modern terminology standardization work. Using this shared platform ensures consistency in standardization methodology across both the Y Termiadur Addysg and the Coleg Cymraeg projects. It is used for storing all the required term data, including collected source terms and candidate target terms, definitions and linguistic information. An invaluable feature of its design is that it allows discussions about term candidates and definitions to take place between subject specialists and terminologists, archiving all such information for future reference. Members of the public do not have access rights to this system; only published terms and certain data fields (excluding discussions about the suitability of candidate terms) are visible to end users of the dictionaries. Maes T serves as a platform from which to disseminate dictionaries to websites belonging to commissioners of terminology work, to the Welsh National Terminology portal, and to apps.
Y Termiadur Ysgol and the Coleg Cymraeg dictionary are disseminated exclusively in digital format, for a number of reasons. Digital editions eliminate the cost of publishing print dictionaries and allow updates to the dictionaries’ content. New entries may be added and amendments made to any terms which may have changed over time due to the adoption and use of a different term by the public. Such amendments are tagged in the database and added to a list of amendments on the website. This situation arises in the case of terms from fields such as IT and sports, where terms trickle down to the general public, unlike terms in domains which remain primarily the preserve of subject specialists. Thirdly, publishing online and in apps allows users instant access to terms on devices they carry everywhere on their person, making them much more convenient and portable. The move towards publishing on apps is a more recent development and was driven by two factors. Firstly, students and lecturers were requesting that content be made available in this medium and secondly it allowed for content to be stored on a device and accessed without an internet connection, unlike websites. The Maes T platform, developed in house by the LTU, is used for publication of dictionary entries.
Y Termiadur Addysg is disseminated to its own dedicated dictionary website , while the Coleg Cymraeg dictionary is disseminated to the Coleg Cymraeg Cenedlaethol’s institutional website . Each website hosts its own separate terminological dictionary in a fully searchable form powered by the LTU’s terminology distribution platform (for more detail see ). The search facility is enhanced with lemmatization so that users searching with an inflected search word can find the appropriate dictionary headword form, an especially useful feature for learners and the non-Welsh speaking parents of Welsh-medium students who may not recognise the relationship between an inflected form and the canonical form, especially when (due to initial consonant mutation) those forms may not begin with the same letter. In addition to searching for the required terms, users can also browse the terms in alphabetical order. Whilst both websites share much of the underlying architecture and layout, due to the differing needs of their users and the nature of the content found in each dictionary, entries differ in some regards. Y Termiadur Addysg’s entries for example feature audio versions of the terms that can be listened to by clicking on a button within the dictionary entry, a feature not shared by the Coleg Cymraeg dictionary. This facility was requested by the Welsh Government specifically to help learners and non-Welsh speakers as it can be difficult for those not accustomed to Welsh orthography to associate the written form of certain letters with their sound when produced vocally, for example the Welsh digraph “dd” corresponds to the voiced dental fricative “th” found in English words such as “the”. The audio entries were produced using text-to-speech software developed by Ivona for the Royal National Institute of Blind People (RNIB) as part of a Welsh Government grant. Whilst the synthesized speech is not always perfect, it represents the most naturally sounding Welsh synthetic speech engine currently available, and the software can be obtained for free for non-commercial purposes from the RNIB. Whilst Y Termiadur Addysg’s entries feature audio versions of the headwords, they do not feature definitions, a feature seen in the majority of the Coleg Cymraeg dictionary’s entries. This reflects the difference in priorities between both projects previously mentioned. Without definitions, Y Termiadur Addysg uses disambiguation text to differentiate between multiple concepts that share the same word form. For example:
seal (=piece of wax etc.) sêl
seal (=sea mammal) morlo
Whilst the Coleg Cymraeg dictionary also uses disambiguation texts to some degree, definitions and the tagging of distinct subject domains make their use less necessary.
In addition to being available within both these websites, the terminological dictionaries are also aggregated within the Welsh National Terminology Portal. For many users, this makes the portal rather than the project website the first port of call for terminological enquiries as it obviates the need for identical searches on multiple websites and displays the resulting dictionaries’ entries together on the same screen for ease of comparison.
In 2012, Y Termiadur Addysg was made available along with a general-language dictionary as part of an app entitled Ap Geiriaduron (“Dictionaries App”) for Google’s Android operating system, Amazon OS and Apple iOS. This app was created by a graduate developer who was funded to work at the LTU by a Graduate Opportunities Wales grant. It proved so popular that the dedicated Y Termiadur Addysg app was later discontinued as users preferred to install a single dictionaries app, especially on devices with limited available memory. In 2015 the Coleg Cymraeg’s dictionary was added to the Ap Geiriaduron with the update also introducing the features unique to that project, namely definitions with images, cross references and support for complex mathematical formulae using MathJax. In addition to offline searching of the installed dictionaries, when connected to WiFi users can choose to search the Welsh National Terminology Portal from within the app.
An issue which had to be addressed when moving from online dissemination to apps was the low storage space on devices such as mobile phones. This was not a problem initially as no definitions were included in the first version of the Ap Geiriaduron. With the move to include the Coleg Cymraeg dictionary, however, the amount of space required for downloading definitions had to be considered. This was resolved by optimizing the database schema and keeping the size of definition data to a minimum by post-processing Markdown and LaTeX on the device rather than on the server (since Markdown and LaTeX are compact and efficient formats). Despite such challenges, the development of an app interface for the terminological dictionaries has proved opportune as the use of mobile devices to access digital media has increased significantly in recent years, a tendency which is visible in the usage statistics.
3.2. Evaluating the Work
The main concern in terminology standardization work carried out for Welsh-medium education is that the required terms be rigorously standardized and disseminated to stakeholders as quickly as possible. Evaluating the impact of this work on the stakeholders and chronicling developments in Welsh terminology for other terminologists outside Wales is a secondary consideration.
Having said that, some qualitative and quantitative evaluation has been carried out. A case study entitled “Welsh Lexicography and Terminology” was submitted to the REF 2014 Impact Assessment exercise as part of the Bangor University School of Linguistics’ submission. It included Y Termiadur Addysg as one of its two principal outputs. Together with the LTU’s work on digitizing a general-language dictionary entitled The Welsh Academy English-Welsh Dictionary: Geiriadur yr Academi (1995), it contributed directly to Bangor University gaining second place overall in the UK in terms of social, economic and cultural impact in the Modern Languages and Linguistics Unit of Assessment .
With regard to qualitative evaluation, some statistics on the usage of terminological dictionaries for the education sector have been collected, but it is not always possible to draw direct comparisons between them all. The following table presents an overview of the number of searches recorded on the Y Termiadur Addysg website, the Coleg Cymraeg dictionary website and on the Welsh National Terminology Portal, which includes both these dictionaries (see Table 1). The figures date from September 18th, 2015 and are taken from Prys, Prys and Jones  (p. 357).
|Website||Launch Date||Total Searches||Average Searches/Month|
|Y Termiadur Addysg||August 2011||568,136||11,595|
|Coleg Cymraeg Dictionary||March 2010||18,824||285|
|Welsh National Terminology Portal||March 2010||836,414||12,673|
These figures are very encouraging for both the Y Termiadur Addysg and the National Portal websites. There are several possibilities which would account for the disparity between the search figures for these two sites and for the Coleg Cymraeg dictionary website. The number of school-aged pupils who study through the medium of Welsh far exceed that of Welsh-medium university students. The Termiadur series is, in addition, a longer-running and therefore more well-known resource. University students who are already familiar with Y Termiadur Addysg may continue using it, or use the National Terminology Portal to access it and the Coleg Cymraeg dictionary simultaneously. Anecdotal evidence also suggests that users such as translators prefer to use the National Portal for speed and convenience, as it eliminates the need to search multiple terminological dictionary sites. A final consideration is that the Y Termiadur Addysg website and the National Portal are dedicated terminological dictionary sites. The Coleg Cymraeg website features a host of valuable resources for university students, so much so that students may not always be aware of the exact range of resources available. Perhaps increased marketing of the Coleg Cymraeg dictionary website would increase traffic to it; in fact, in a survey of the Coleg Cymraeg terminology service and resources carried out in 2015, the majority of the 111 respondents (primarily lecturers and students) believed the terminological dictionary website to be insufficiently marketed . With the launch of the dictionary in the Ap Geiriaduron, recent marketing efforts have concentrated on this platform. Such efforts include advertising the app on Twitter and Facebook and, in Welsh universities, advertising on large information screens such as in libraries and student service departments, as well as on screensavers used in student computer rooms.
It is not possible to make direct comparisons between the statistics of the websites above and those available for the Ap Geiriaduron as information about the searches undertaken by users on their devices is not sent back to the LTU’s servers (due to security and privacy considerations). Since the launch of the app in late 2012, however, it has been downloaded over 50,000 times, the vast majority of these downloads occurring in the UK. This is a significant figure, given the total number of Welsh speakers. Approximately 70% of the downloads were for iOS (the operating system found on iPads and iPhones), 28% for Android, and 2% for Amazon’s version of Android, FireOS.
These usage statistics, together with the REF results with regard to social impact, are clear indications that terminology for all levels of education is not only deemed vital by those who aim to develop Welsh-medium provision (i.e., commissioners of terminological dictionaries), but are considered a key resource by all those involved in education, be they teachers, lecturers, pupils, students, parents or translators.
Important steps have been taken in terminology standardization for Welsh-medium education since the early 1990s. These include the adoption of international standards and their implementation within the collaborative terminology development environment for Welsh, and the creation of a dissemination platform to deliver standardized Welsh-medium terminological dictionaries in a number of different websites and apps to cater for the varied needs of different clients and users. The consolidation of terminology standardization work at the LTU and the continued funding of both the Y Termiadur Addysg project and the Coleg Cymraeg dictionary project have played a key role in these successes. However, the creation of standardized terminological dictionaries and the establishment of the underlying support infrastructure is but a first step. Fields of study change, and resources such as examination papers and updated course specifications are produced in regular cycles. New subjects such as music technology may be offered in the language for the first time. This creates a steady demand for new and updated terminology, reflecting the fact that the standardization of terminology is a continuous process rather than a task that can eventually be brought to a permanent conclusion.
Ensuring consistency of terminology and efficiency in term candidate identification is a challenge when managing multiple terminological dictionaries that include thousands of entries. This is increasingly being addressed with the use of natural language processing techniques. However, the development of NLP is central to the remit of neither the Y Termiadur Addysg project nor the Coleg Cymraeg dictionary project, and the focus of both projects must therefore remain on the creation and standardization of terminological dictionary entries. Another issue is that of successfully engaging with all of the potential users, as the projects receive no dedicated marketing budget. Despite this, the usage statistics are encouraging, especially that of the app, and it is hoped that the projects will be able to build on the solid foundation that has been laid by continuously expanding the number of terms available, keeping abreast of the relevant technologies and improving the marketing of resources so that all potential users are aware of the terminological dictionaries available to them.
The authors contributed equally to this work. Information relating to the Coleg Cymraeg project was provided by Tegau Andrews and that relating to the Y Termiadur Addysg project was provided by Gruffudd Prys.
Conflicts of Interest
The authors declare no conflict of interest.
References and Notes
- StatsWales. Welsh Speakers by Local Authority, Gender and Detailed Age Groups. 2011 Census. Available online: https://statswales.wales.gov.uk/Catalogue/Welsh-Language/WelshSpeakers-by-LocalAuthority-Gender-DetailedAgeGroups-2011Census (accessed on 18 November 2015).
- Statistics for Wales. Key Education Statistics Wales 2015. Available online: http://gov.wales/docs/statistics/2015/150429-key-education-statistics-2015-en.pdf (accessed on 18 November 2015).
- Great Britain. Welsh in Education and Life; Her Majesty’s Stationery Office: London, UK, 1927. [Google Scholar]
- Welsh Language Board. The Panel for Official Welsh: Report. Available online: http://www.byig-wlb.org.uk/English/publications/Publications/4848.pdf (accessed on 20 February 2009).
- Prys, D. Setting the Standards: Ten Years of Welsh Terminology Work. In Terminology, Computing, and Translation; ten Hacken, P., Ed.; Narr: Tübingen, Germany, 2006; pp. 41–55. [Google Scholar]
- Williams, J.L., Ed.; Geiriadur Termau/Dictionary of Terms, 1st ed. reprinted; University of Wales Press: Cardiff, UK, 1991.
- Prys, D.; Jones, J.P.M.; ap Emlyn, H. Llyfryddiaeth Geiriaduron Termau; University of Wales: Bangor, UK, 1995. [Google Scholar]
- Hughes, E., Ed.; Termau Amaethyddiaeth a Milfeddygaeth; University of Wales Press: Cardiff, UK, 1994.
- Prys, D., Jones, J.P.M., Eds.; Y Termiadur Ysgol; ACCAC: Cardiff, UK, 1998.
- UK Parliament. The Evolution of the National Curriculum: From Butler to Balls. Available online: http://www.publications.parliament.uk/pa/cm200809/cmselect/cmchilsch/344/34405.htm (accessed on 18 November 2015).
- International Organization for Standardization. ISO 704: Terminology Work—Principles and Methods; International Organization for Standardization: Geneva, Switzerland, 2000. [Google Scholar]
- International Organization for Standardization. ISO 860: Terminology Work—Harmonization of Concepts and Terms; International Organization for Standardization: Geneva, Switzerland, 1996. [Google Scholar]
- International Organization for Standardization. ISO/TR 12618: Computational Aids in Terminology—Creation and Use of Terminological Databases and Text Corpora; International Organization for Standardization: Geneva, Switzerland, 1994. [Google Scholar]
- Prys, D. Providing the Terms: Standardizing Terms for Education in Wales. In Speaking in Tongues: Languages of Lifelong Learning; Davidson, I., Murphy, D., Piette, B., Eds.; University of Wales, Bangor: Bangor, UK, 2003; pp. 193–226. [Google Scholar]
- Prys, D.; Jones, J.P.M.; Davies, O.; Prys, G. Y Termiadur; ACCAC: Cardiff, UK, 2006. [Google Scholar]
- Prys, G., Prys, D., Eds.; Y Termiadur Addysg. Available online: www.termiaduraddysg.org (accessed on 18 November 2015).
- Welsh Assembly Government. Welsh-medium Education Strategy. Available online: http://gov.wales/docs/dcells/publications/100420welshmediumstrategyen.pdf (accessed on 18 November 2015).
- Prys, D. Developing National Terminology Policies: A Case Study from Wales. In Magyar Terminológia Journal of Hungarian Terminology; Akadémiai Kiadó: Budapest, Hungary, 2011; Volume 4, pp. 160–168. [Google Scholar]
- Spencer, Ll., Edwards, M., Prys, D., Thomas, E., Eds.; Geiriadur Termau Seicoleg/Dictionary of Terms for Psychology; University of Wales: Bangor, UK, 2004.
- Pommerening, A., Prys, D., Eds.; Geiriadur Termau Rheoli Coetiroedd/Dictionary of Terms for Woodland Management; University of Wales: Bangor, UK, 2005.
- Jones, D.Ll., Prys, D., Davies, O.L., Eds.; Geiriadur Termau’r Gyfraith/Dictionary of Legal Terms; Bangor University: Bangor, UK, 2008.
- Prys, D., Davies, O. Ll., Eds.; Geiriadur y Diwydiannau Creadigol/Dictionary of Terms for the Creative Industries; Centre for Welsh Medium Higher Education: Carmarthen, UK, 2008.
- Andrews, T., Prys, D., Eds.; Geiriadur Termau’r Coleg Cymraeg Cenedlaethol. Available online: http://www.colegcymraeg.ac.uk/termau (accessed on 18 November 2015).
- Redknap, C.; Lewis, W.G.; Williams, S.R.; Laugharne, J. Welsh-medium and Bilingual Education. Available online: http://www.bangor.ac.uk/addysg/publications/Welsh_mediumBE.pdf (accessed on 18 November 2015).
- Prys, D.; Jones, D.B. Guidelines for the Standardization of Terminology for the Welsh Assembly Government and the Welsh Language Board; Welsh Language Board: Cardiff, UK, 2007. [Google Scholar]
- International Organization for Standardization. ISO 15188: Project Management Guidelines for Terminology Standardization; International Organization for Standardization: Geneva, Switzerland, 2001. [Google Scholar]
- Cabré, M.T. Terminology: Theory, Methods, and Applications; e-book; John Benjamins Publishing Company: Amsterdam, The Netherlands; Philadelphia, PA, USA, 1999. [Google Scholar]
- Elis-Gruffydd, D. Geiriadur Daearegol/Geomorffolegol Sylfaenol. Unpublished as a standalone terminological dictionary; incorporated in the Coleg Cymraeg Cenedlaethol Terminological Dictionary, 2015.
- Gwerddon. Available online: http://www.gwerddon.cymru/cy/gwerddon/ (accessed on 18 November 2015).
- Welsh Government. BydTermCymru. Available online: http://cymraeg.gov.wales/btc/?lang=en (accessed on 18 November 2015).
- Brown, A.O. Glossaire de Termes Biologiques/French-Language Glossary of Biological Terminology. Available online: http://adamoliverbrown.com/glossary/ (accessed on 18 November 2015).
- Fiontar. Téarma.ie: National Terminology Database for Irish. Available online: http://www.tearma.ie/Home.aspx (accessed on 18 November 2015).
- Welsh National Language Technologies Portal. Available online: http://techiaith.cymru/?lang=en (accessed on 18 November 2015).
- Roberts, M., Ed.; DECHE Corpus of Welsh Scholarly Writing. Available online: http://corpws.cymru/deche/ (accessed on 18 November 2015).
- Andrews, T.; Prys, G.; Jones, D.B.; Prys, D. Distributing Terminology Resources Online: Multiple Outlet and Centralized Outlet Distribution Models in Wales. In Proceedings of CHAT 2012: Creation, Harmonization and Application of Terminology Resources, Madrid, Spain, 22 June 2012; Gornostay, T., Ed.; Linköping University Electronic Press: Linköping, Sweden, 2012; pp. 37–40. [Google Scholar]
- Andrews, T.; Prys, G. The Maes T System and its use in the Welsh-Medium Higher Education Terminology Project. In Proceedings of CHAT 2011: Creation, Harmonization and Application of Terminology Resources, Riga, Latvia, 11 May 2011; Gornostay, T., Vasiļjevs, A., Eds.; NEALT: Tartu, Estonia, 2011; pp. 49–50. [Google Scholar]
- Bangor University. REF 2014. Available online: http://www.bangor.ac.uk/cbless/ref_2014.php.en (accessed on 18 November 2015).
- Prys, D.; Prys, G.; Jones, D.B. Quantifying the Use of Digital Welsh-language Language Resources. In Proceedings of the Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, 27–29 November 2015; Vetulani, Z., Mariani, J., Eds.; Fundacja Uniwersytetu im. Adama Mickiewicza w Poznaniu: Poznań, Poland, 2016; pp. 355–359. [Google Scholar]
- Coleg Cymraeg Cenedlaethol. Results of the Terminological Dictionary Questionnaire. Unpublished survey carried out online, 2015.
© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).