Web Mining of Online Resources for German Labor Market Research and Education: Finding the Ground Truth?

: The labor market is highly dependent on vocational and academic education, training, retraining, and further education in order to master challenges such as advancing digitalization and sustainability. Further training is a key factor in ensuring a qualified workforce, the employability of all employees, and, thus, national competitiveness and innovation. In the contribution at hand, we explore an innovative way to derive knowledge about learning pathways by connecting the dots from different data sources of the German labor market. In particular, we focus on the web mining of online resources for German labor market research and education, such as online advertisements, information portals


Introduction
The web mining of data about the German labor market offers new and innovative ways to derive knowledge about learning pathways by connecting the dots from different data sources.The labor market is a domain with a variety of data structures connected to a variety of related applications (e.g., recommending suitable jobs to job seekers, listing skills for occupational titles or selecting suitable candidates for a job).Labor market research is mostly based on traditional methods such as surveys or the analysis of official statistical data (e.g., [1]).In the paper at hand, we explore a different approach-sourcing, analyzing and linking open data on various aspects of the labor market through the web mining of online resources.As a proof of concept, we build and extend a network of education pathways.
This is an important issue, as there is no ground truth for this type of network.Not all programs are officially regulated by the state or federal government, and chambers do not always publish relevant data.In addition, not all education pathways are formalized but some education pathways emerge from the offers on the market (i.e., rather informal learning pathways, cf.[2]).
There are a number of web resources offered by the government and official institutes.For example, the Federal Employment Agency (BA) offers comprehensive information about professions as well as related forms of (further) vocational education and training on its BERUFENET information portal (see Figure 1).In addition, the BA lists current offers for vocational training (AUSBILDUNGSSUCHE), further education (WEITERBIL-DUNGSSUCHE), study programs (STUDIENSUCHE), and job-related language support (SPRACHFÖRDERUNG), as well as coaching and activation measures (COACHING-UNDAKTIVIERUNG) on an information portal called KURSNET.It also provides information on typical salaries (ENTGELTATLAS) and professional reorientation (NEW-PLAN), lists online job advertisements (JOBSUCHE), and maintains an applicant directory (BEWERBERBÖRSE).
Last but not least, there are a number of classification frameworks related to German qualifications and occupations-e.g., the classification of occupations (Klassifikation der Berufe, KldB), its European counterpart, the "European Skills/Competences, Qualifications and Occupations" (ESCO, [3]), and the International Standard Classification of Jobs (ISCO-08).Notably, official statistics on the labor market are often available only in aggregated form, e.g., based on the Classification of Economic Activities (Klassifikation der Wirtschaftszweige 2008, WZ-08).Connectable taxonomies such as ESCO, see [3], are a good example of the central role of ontologies and, in particular, knowledge graphs in this field.However, single taxonomies such as ESCO cannot provide all details of local labor market needs and do not provide direct links to other hierarchies of skills, vocational education and training (VET), and continuing vocational education and training (CVET) data.Fortunately there are official tables relating ESCO classifications to KldB codes.
In general, the labor market relies heavily on vocational education and training, retraining, and advanced vocational qualification to meet challenges such as the ongoing digitalization [4] or sustainability [5].In particular, the German education system offers special pathways for skilled workers in tertiary education [6], and distinguishes between initial vocational education (training, "Ausbildung" or retraining, "Umschulung") and continuing vocational education, which includes continuing vocational training ("Weiterbildung") and upgrading training ("Fortbildung").In this regard, upgrading training is usually formally regulated (e.g., at the federal level [BBiG/HwO] or by the federal states [7,8]; this type of accreditation can also be found in other countries and enables quality assurance, leading to official recognition and approval by the relevant legislative or professional authorities).There are 1004 (re-)trainings that are regulated by companies or by the Crafts and Trade Code, 542 of which are regulated by the German state.The number of informal courses is much higher.
Understanding the differences between formal education and informal learning pathways (but also between official statistics and the labor market represented by online job or training ads) is, therefore, crucial.In this respect, the mass of data available online could be used to bridge the gap between the relatively slow traditional research using survey data and official regulations dealing with rapid changes in the labor market.
In this paper, we explore an innovative approach to generate knowledge on both formal and informal education pathways-sourcing, analyzing and linking open data on different aspects of the labor market through the web mining of online resources.As a proof of concept, we build and extend a network of educational pathways based on the data available online.
Specifically, we examine how the different data sources can be related to each other and how knowledge about the German labor and vocational training market can be generated from them.(1) Based on official information available through web mining, we will look at the relationships between occupations and how to find (a) entry requirements and (b) training opportunities for each occupation in order to derive knowledge about educational pathways.(2) In addition, we will explore how different types of data and classifications can be related to each other based on existing identifiers and taxonomies (e.g., based on BERUFENET IDs, KldB codes, ESCO classes and ISCO classes) in order to obtain further information about all occupations.In terms of linking data sources, we focus on two research questions: Our first research question (RQ1) is, what are common data structures that can be used to make crawled data interoperable?Our second research question (RQ2) is, what kind of methods can be used for data, entity, event, and relationship extraction from German online labor market resources?
The remainder of this paper is organized as follows: The next section provides an overview of related works.Section 3 presents our methodological approach to querying and linking the data, including an overview of data schemas, web resources and methods.Section 4 is devoted to the results, where we discuss BIBB and BA education pathways and their interoperability and provide some illustrative examples of the kind of knowledge we were able to derive from the data.The final section contains our conclusions and outlook.
All URLs in this paper were accessed in December 2023.(Right) Continuing education programs that provide links to KURSNET and can be crawled using the API.

Related Work
Over the last decade, there has been an increasing interest in mining data from the web, e.g., educational databases, advertisements, and information systems [9][10][11].Web mining refers to the application of data-mining techniques to discover and extract patterns and knowledge from web data (e.g., [5]).Supporting decision making and process management in education is key.The generic challenges are usually the automated extraction of knowledge from data (typically interpreted passages from texts) and the mapping to existing datasets.However, there are still several challenges related to the data and data integration [12].Research questions that have been addressed with web mining techniques cover a wide range, for example, occupational inequality [13], questions of migration and language skills [14], discrimination [15], and students and later occupation [16].
With regard to the linkage of datasets on the labor market, Ortmann et al. used data from BERUFENET to quantify the similarity of jobs based on job competences and to relate this information via KldB code to a separate dataset on job changes from the national education panel [17].For instance, they analyzed the proportion of job changes between different categories of the KldB by the amount of similarity of the jobs (distinguishing similar, related and complete career changes) and found job changes between 5-digit KldB codes to be more likely for completely dissimilar job changes (49 percent) than for related (30 percent) or similar job changes (21 percent).
Another interesting area of research is the classification of online advertisements with regard to skills and taxonomies: Skill concepts have been widely used for the analysis of online job advertisements (OJAs) and provide a good starting point for matching open positions with corresponding employees [18,19].OJAs are usually published in an online database such as Monster or Stepstone but also in databases of official organizations like the federal agency of employment (BA) in Germany.They contain various data about the hiring company, the position, and the requirements for the employee.OJAs are a well-studied topic, especially in the English language [15,[20][21][22], and even historical advertisements have been studied [23].So far, there is little research on German OJAs [24,25], although some work has been conducted, in particular, focusing on qualification development [26,27] and in the context of the greening of jobs [28,29].The proposed technologies for skills extraction range from the automated mapping of search terms to the classification of skills [30] to complex applications of large language models (e.g., SkillGPT by [31]).Special attention has been paid to multi-label classification frameworks [32], building skill taxonomies [33], and, in general, the reflection on big data technologies [34], also for German OJAs [35].While some authors treat competences, skills and knowledge as synonyms, we follow the KSAO model of competency proposed by Fischer and Neubert [36]: Knowledge, Skills, Abilities, and Other components (KSAO) are distinct components and prerequisites of competency-a context-specific disposition to perform well (cf.[37]).Therefore, competences and skills refer to related but distinct concepts.
Thus, while there are still some open questions with regard to common data structures and methods for data extraction (see Table 1), we can build on the experience with BERUFENET, OJAs and the existing taxonomies and structures for skills in German texts, for example, the KldB [38,39].See Section 3.1 for more details.Specifically, we note several research gaps: First, to the best of our knowledge, no labor market research has been conducted on linking a wide range of online data sources.Second, the German labor market (in Germany, Austria and Switzerland) has several specific requirements, and only very limited work has focused on German texts regarding these requirements.Third, no work has been performed to link official training regulations to CVET advertisements, which would cover the majority of non-regulated training programs, see [40].
Since we can only rely on very limited previous work, we will first provide information on the data and continue with a detailed discussion of the methods used for our approach.

Data Schemata
Labor markets are complex fields with diverse data structures and multiple applications (for example, connecting job seekers to the right training or job [44]).As described above, the European ontology ESCO cannot provide all details of local labor market needs and does not provide links to other hierarchies of skills sufficiently.For example, in Germanspeaking countries, other taxonomies of occupations and skills are widely used.Thus, when discussing data for occupational qualifications and certificates, we need to consider multiple data schemas and their relation to several relevant taxonomies.
In the context of occupations, the International Standard Classification of Occupations (ISCO) was developed by the International Labour Organization (ILO) (See https://ww w.ilo.org/public/english/bureau/stat/isco/isco08/) and was published in 1958,1968,1988, and, as its recent version, 2008.It was also used within the European Union (EU), and some German-speaking countries (Germany, Austria, and Switzerland) have linked their specific version to the ISCO 2008.ISCO maps to the ontology "European Skills, Competences, Qualifications and Occupations" (ESCO), which links skills and competences to occupations described in ISCO.Gonzalez et al. state that few works have described the analysis and use of ESCO (see [45]).Some work has been conducted on the semantic interoperability between skills and labor market documents, which was initially promised by ESCO [44].Other researchers have tried to use data from ESCO and Wikidata for the text mining of the scientific literature (see [45]), or for curriculum analysis (see [46]).Recent research has provided a generic mining and mapping approach [47] and automated ontology alignment for ESCO and the English-language O*NET [48].
In Germany, the classification of occupations ("Klassifikation der Berufe", KldB) (See https://statistik.arbeitsagentur.de/DE/Navigation/Grundlagen/Klassifikationen/Klassifikation-der-Berufe/KldB2010-Fassung2020/KldB2010-Fassung2020-Nav.html) and related document codes (DKZ) are the reference for IAB (Institut für Arbeitsmarktund Berufsforschung) and the German Federal Employment Agency (Bundesagentur für Arbeit-BA).The most recent version is the 2020 revision of KldB 2010, which was completely redeveloped and renders the previous versions from 1988 and 1992 deprecated.It was developed to be compatible with ISCO-08.These data are used by the BA when matching candidates to jobs and are integrated into other IT applications.However, while part "B" of DKZ is dedicated to occupations, part "C" covers continuing professional development, "K" skills, and "A" higher education.All these parts are important to describe the access to education and training.
Formally, KldB codes (five-digit codes) are systematically related to DKZ identifiers: for instance, with regard to the classification of occupations there are eight-digit DKZ identifiers for each occupation (as well as for related education and training) which extend the corresponding KldB code by three additional digits.According to the BA, these DKZ 8-digits "form a much more dynamic 'sub-hierarchical level', which has a clear relationship to the KldB (each DKZ 8-digit code can be clearly assigned to a KldB 5-digit code), but is not part of the actual classification and can be adapted to changes in the real occupational landscape at high frequency" [49].In the online edition of the KldB, eight-digit DKZ codes are currently returned when querying individual occupations (e.g., 43104-132 for "Data Scientist"), which also indicates the close connection between KldB and DKZ.
To summarize, based on detailed classification identifiers such as DKZ digits, occupations can be categorized using different national and international taxonomies.

Web Resources
One important source for up-to-date information on occupations and different forms of (further) vocational education and training are the information portals of the BA, especially BERUFENET.The Application Programming Interfaces (APIs) of BERUFENET and other services of the BA, among many others, have recently been documented by a civil society initiative called bund.dev (see https://bund.dev).For instance, information on the API of BERUFENET is available at https://github.com/bundesAPI/berufenet-api,see also Figure 2.
A complete list of the occupations available on BERUFENET can be obtained by a simple GET-request per page (n = 179), starting with page 0 (Listing 1).
Listing 1. GET-request for a page with occupations from BERUFENET.
berufe=$(curl -m 60 \ -H "X-API-Key: d672172b-f3ef-4746-b659-227c39d95acf" \ "https://rest.arbeitsagentur.de/infosysbub/"\"bnet/pc/v1/berufe?suchwoerter=*&page=0") Given the ID of an occupation, detailed information can be obtained by another GETrequest per occupation, for instance, for BERUFENET-ID 15322 (Listing 2).In this way, it is possible to call up detailed information on all occupations (and related forms of education and training) online.Other services of the BA function in a similar way-the interested reader may refer (or even contribute) to the documentation provided online by bund.dev and the first author of this study (see Figure 2):  2) and how they can be linked to the classification of occupations (KldB): they provide a direct mapping (black arrows), can be mapped directly by string matching (red arrows), or the data are only partially available and require a more complex matching because the naming does not follow the standardized form (dark red arrows).In the Federal Republic of Germany, the Vocational Training Act (BBiG) of 1969, which was reformed in 2005 and in 2020, was passed to create a political framework for shaping the work of vocational regulation [55]  The official regulations contain many different types of documents.The main components of regulations are (a) an occupation title ("Bezeichnung des Ausbildungsberufes"), (b) the length of the program ("Ausbildungsdauer"), (c) the occupational skills, knowledge and abilities ("beruflichen Fertigkeiten, Kenntnisse und Fähigkeiten"), (d) the structure ("sachliche und zeitliche Gliederung"), and (e) the requirements ("Prüfungsanforderungen").
It seems noteworthy that these official regulations are not static as the genealogy of vocational education demonstrates (see Figure 3), and as a result, people trained in a deprecated education are available on the labor market.However, while the legal basis and advanced (vocational) training regulations are available for each point in time, they are not always available in a machine-readable form (see Figure 3, middle), and, what is more, the labor market and its demands are usually evolving much faster than regulations.Thus, we also want to make labor market data on occupations and CVET advertisements interoperable in order to find a ground truth concerning educational pathways.
The BIBB data are, therefore, mostly complementary to the BA data.However, at the intersection of the two sets of data is vocational education and training, particularly vocational and continuing training programs.The BA data also provide information on academic programs and could also provide further data on informal CVET programs.

Methods
We have compiled a complete list of official (continuing) professional development and training regulations from the BIBB database in order to derive a knowledge graph of education pathways according to the BIBB, see Section 4.1.Similarly, we compiled a complete list of occupations as well as detailed information for each occupation on BERUFENET (n = 3569) via the documented API.From these data, we extracted occupation titles, BERUFENET IDs (numbers with three or more digits), KldB codes (five digits), DKZ codes (eight digits, preceded by the letter "B") as well as entry requirements and training opportunities for each occupation.From the information on entry requirements (entries under "Zugangsberufe/Zugangstätigkeiten") and training opportunities (entries under "Weiterbildung (beruflicher Aufstieg)"), we derived a knowledge graph of the education pathways according to the BA (consisting of the relation "is qualification for" between the nodes linked), in order to identify additional information (see Section 4.2).
And we extracted (b) information for exemplary entries in SPRACHFOERDERUNG and COACHINGUNDAKTIVIERUNG.On this basis, we inspected the data available and their interoperability (see Section 4.3).
With regard to web resources of the BIBB, we scraped the BIBB "Berufesuche" to obtain information on vocational education and training or continuing vocational education and training (in particular, Ausbildung, Fachpraktiker, Fortbildung/Umschulung, and Pflegeberufe).A list of all data entries could be obtained by appending all possible initial letters (and the letter sequence "xyz") to the URL one after the other.For instance, data entries for occupations starting with x, y, or z could be retrieved by the query https://www.bibb.de/dienst/berufesuche/de/index_berufesuche.php/alphabetical/apprenticeship/xyz.Each individual data entry has a particular ID (e.g., apprenticeship/8234101) and there are data for each ID on a second page (e.g., apprenticeship/8234101?page=2).From these data, we extracted basic information, e.g., KldB codes (five digits) and occupation titles (e.g., "Fachangestellter für Medien-und Informationsdienste/Fachangestellte für Medienund Informationsdienste-Fachrichtung Bibliothek (Ausbildung)").

Results
Figure 2 gives an overview of the data sources we considered and how they can be linked to each other via the classification of occupations (KldB).

BIBB Education Pathways
In this section, we describe the results of the web scraping official (continuing) professional development and training regulations of the BIBB.As described above, the BIBB offers information about regulations for 1004 (re-)trainings (CVET) that are regulated by enterprises or by the Crafts and Trade Code, 542 of which are regulated by the German state, and all regulations for vocational education in Germany.The latter include a KldB code (five digits) stating the occupation resulting from the education.CVET data additionally contain required qualifications for the education.Thus, the mapping from regulations and KldB codes can be obtained using string-matching.But this is, however, not the case for all CVET programs.For example, "AOK-Betriebswirt" (AOK business economist) is special training only offered by AOK.It remains unclear if a generic mapping to a business economist meets the program.
In Figure 4, we present some results for IT professions.For instance, according to the data from web resources of the BIBB, a "Telecommunications Electronics Technician" can use the CVET program "Computer Scientist (Certified)", which leads to "Professions in Software Development-professionally oriented activities", which in turn gives opportunities for the four additional CVET programs: (1) "IT Project Manager (Certified)", leading to "Managers, IT Network Engineering, Coordination, Administration, Organization"; (2) "IT consultant (certified)", leading to "professions in IT application consulting"; (3) "IT economist (certified)" (see also Figure 3), leading to "professions in IT sales"; and (4) "IT developer (certified)", which finally leads to "occupations in IT coordination".In part, the data analyzed in this section suggest some very complex education pathways (see Figure 5).For instance, on the top left of Figure 5, we find the occupations of beekeeping ("Berufe in der Imkerei"), which qualify for further education in nature and landscape conservation ("Berufe in der Natur-und Landschaftspflege") with several specializations like cemetery gardener and professions in nursery gardens.However, the data reported so far only reflect the official regulations and, therefore, may not reflect the realities of the labor or (further) education and training markets.

BA Education Pathways
In this section, we show how data from the knowledge graph described in Section 4.1 can be extended by data from the BA.In this case, we derived education pathways from information on (a) entry requirements and (b) advancement courses listed for each occupation in BERUFENET.Figure 6 shows the results for an exemplary occupation, namely "IT-Economist (certified)" (BERUFENET-ID 15322, KldB/DKZ-code B 43233-105).As a rule, you need to have passed the exam as an IT economist in order to work as an IT economist, so BERUFENET lists "IT-Economist (certified)" (BERUFENET-ID 15323, KldB/DKZ-code B 43233-903) as an entry requirement for working in this occupation.IT-Ökonom/in (Geprüfte/r) informatik (grundständig) Wirtschaftsinformatiker/in -IT-Systeme Figure 6.An extended version of the BIBB pathways shown in Figure 4, focusing on the leaf "IT Economist (certified)".All outgoing links are scraped from BA BERUFENET.All nodes with a dark red line are also included in the BIBB data, in particular, also the CVET program "Computer scientist (certified)".However, we find new links and, in particular, study programs (blue nodes).Green nodes refer to occupations, and red nodes to CVET programs.
Additionally, BERUFENET lists four opportunities for advancement, see Figure 6, each with a short occupation title and BERUFENET ID.
In general, on BERUFENET, each occupation is related to (a) measures of further educational training (including study programs), and to (b) measures of adaptation training.For instance, besides the four measures of further educational training in Figure 6, the occupation "IT-Economist (certified)" is additionally related to seven measures of adaptation training on WEITERBILDUNGSSUCHE: Each measure stored an ID that represents an education goal (which can be used as a value for parameter "bildungsziel" or "oberknoten" in the WEITERBILDUNGSSUCHE to query current vocational training offers).For instance, the first entry in this list has "ID" 122937, which represents offers of adaptation training with regard to IT project management in the WEITERBILDUNGSSUCHE: • https://web.arbeitsagentur.de/weiterbildungssuche/suche?bildungsziel=122937. Figure 6 shows an example of the benefit of extending the BIBB education pathways based on education pathways derived from BA data.Four observations are noteworthy: First, more dependent occupations and trainings are added to the existing data (e.g., for "Computer scientist (certified)").Second, it adds a few more continuing training programs that are not regulated at the federal level.Third, it adds study programs.Fourth, it shows a complex network of education rather than a network of mainly pathways.
In addition, there is a great deal of further information that could be used to further enrich the above-mentioned knowledge graph.For instance, each occupation on BERUFENET is related to a set of competences and related chunks of knowledge and skills-e.g., for the "IT-Economist (certified)", it lists the following set of core competences: An allocation to various taxonomies can also be established via KldB/DKZ codes (8-digits).For instance, with regard to European and international frameworks of classification, the "IT-Economis (certified)" can be considered a narrow match to ESCO-occupation "ICT business development manager" (ESCO-code 2434.2) and automatically classified as ISCO Unit group 2434 "Information And Communications Technology Sales Professionals" based on its KldB/DKZ-code B 43233-105.

Interoperability
In building and extending the knowledge graph based on multiple data sources, we gained several insights:

•
The most efficient way of relating information from BERUFENET to other data sources or to classification systems seems to be the KldB/DKZ codes (eight digits), which are stored in BERUFENET and many other data sources (e.g., AUSBILDUNGSSUCHE; see Figure 1); data sources of the BA that do not contain a KldB/DKZ code can often be related to a KldB/DKZ code by matching short occupation titles (although short occupation titles, unlike BERUFENET IDs or the eight-digit variants of KldB/DKZ codes do not differ for training and for the occupational activity).

•
Results of the JOBSUCHE have an attribute "beruf" that contains occupational titles that could be matched with BERUFENET' short occupation titles; the JOBSUCHE API does not seem to provide KldB-/DKZ-codes.

•
Results of the AUSBILDUNGSSUCHE have an attribute "abschlussbezeichnung" that contains training job designations that could (after removing HTML tags) be matched with BERUFENET's short occupation titles.

•
Results of the STUDIENSUCHE have an attribute "Studienfaecher", which contains one or more course designations that could be matched with BERUFENET's short occupation titles.
In addition, entries in the BEWERBERBÖRSE do provide an attribute "berufe" that can be matched with the short occupation titles of the BERUFENET API.It is also possible to query corresponding results using the "was" parameter and setting it to short occupation titles.The APIs for ENTGELTATLAS and NEWPLAN show the KldB/DKZ code in the results and allow for requesting results based on KldB/DKZ codes via parameters.The APIs of SPRACHFÖRDERUNG and COACHINGUNDAKTIVIERUNG have an attribute "systematiken" but do not contain theBERUFENET's short occupation titles or KldB entries.

Conclusions and Outlook
Labor markets heavily rely on vocational education and training, re-training and advanced vocational qualification.In this paper, we inspected different sources of data and data schemata to explore the interconnectivity of data on the job market in Germany and to derive knowledge on learning pathways from information on the relation between different jobs and occupations.In order to structure the discussion of our main findings, we would like to take up the two research questions that we posed in the introduction: Our first research question (RQ1) was how to derive knowledge about educational pathways from data on entry requirements and training opportunities.We have found that knowledge about a complex variety of possible educational pathways can be derived from BA and BIBB data, and that linking different data sources can reveal pathways that complement an examination of individual data sources well-e.g., we were able to extend our knowledge tree on education pathways, which was derived from the BIBB information about professional development and training regulations, based on the knowledge graphs we derived from BERUFENET of the BA.As each occupation on BERUFENET can be related to a KldB/DKZ code, knowledge trees can easily be extended by adding further kinds of data points available on BERUFENET (e.g., competences, skills and knowledge) or from different data sources (as we demonstrated in Section 4.2).
Our second research question (RQ2) was how different kinds of data and classifications could be related to each other for data, entity, event, and relationship extraction from German online labor market resources.In general, eight-digit KldB/DKZ codes seemed to be the most reliable way of relating data on occupations between different data sources.Short occupation titles also worked well, at least for data from a single data provider (i.e., within data from the BA), but it seems noteworthy that (a) sources from different providers differed in spelling details such as gender, and (b) the difference between training and occupational activity was found in the eight-digit variants of KldB/DKZ codes but not in the occupational titles of BERUFENET (e.g., "IT-Economist").It should also be noted that the eight-digit KldB/DKZ code is probably less stable than the five-digit KldB and may be subject to change (cf.[49]).In this regard, a similarity-based classification based on sentence embeddings of occupational titles by large language models (e.g., [2,31]) may be a promising alternative to simple string matching for many use cases.
In addition, it was possible to relate German occupations to European and international classification frameworks, but not at the level of individual occupations, which resulted in an inherent fuzziness: the ISCO-08 classification is not designed for the occupational level, and even the ESCO classification provided only approximate results for many occupations.This implies that international education research is still very much tied to an aggregate level, although sentence embeddings by large language models (cf., [2,31]) could allow for a classification at the level of individual occupations in future studies.In summary, it seems possible to find the ground truth by linking different data sources on the labor market and on (further) vocational education and training, but the data include many domain-specific aspects, and the relationships to existing occupations are not always clear.For example, many offers of further education in the WEITERBILDUNGSSUCHE of the BA are not linked to occupations in BERUFENET.
Looking to the future, many more data sources could be included to create knowledge graphs about careers and vocational (further) education.For the contribution at hand, we focused primarily on two official data sources.To obtain a more comprehensive overview, it would be interesting to include additional data sources, such as those from chambers of crafts or trade, as well as additional sources of CVET advertisements.In this respect, it should be noted that the web resources studied in our work contain a lot of structured data.This cannot be taken for granted when analyzing a wider range of job portals such as Monster.com,StepStone, or Academics, or other data portals such as kununu or Glassdoor.
Future work will, therefore, need to spend a considerable amount of effort on linking unstructured textual information in order to obtain a complete picture of the labor market.Thus, optimizing and extending ontologies such as GLMO could be a promising direction of research in this area.In addition to ontological aspects and some of the above-mentioned innovative ways of linking data sources, there are other research directions to consider: For example, how do these data relate to other sources such as data from chambers of trade or craft?Can statistical data be used to extend the network of educational pathways?Is it possible to model the movement of labor to other occupations?
Many data sources on the labor market can be accessed online, and APIs are well documented thanks in part to civil society initiatives such as bund.dev.Linking between different sources is possible via common classification systems such as KldB/DKZ or via occupation titles.From an educational science perspective, it would be desirable for the data to be linked by researchers or to be directly provided as Linked Open Data (LOD) by the BA or the BIBB.This would lower the barriers for many scientists and make research easier as well as less error-prone.It is worth noting that first steps in this direction have already been taken by the BA, which provides explicit links from BERUFENET occupations to a subset of related education advertisements in WEITERBILDUNGSSUCHE.Such steps should be extended and applied to other APIs as well.

Listing 2 .Figure 2 .
Figure 2. A visualization of the resources considered (see also Table2) and how they can be linked to the classification of occupations (KldB): they provide a direct mapping (black arrows), can be mapped directly by string matching (red arrows), or the data are only partially available and require a more complex matching because the naming does not follow the standardized form (dark red arrows).
. If occupations are newly created or updated, several parties are involved: (a) the companies and chambers (employers), (b) the trade unions (employees), (c) the federal states, and (d) the federal government.Finally, the federal government provides the legal framework for vocational education and training through laws and regulations.The Federal Institute for Vocational Education and Training (BIBB), founded in 1970 on the basis of the Vocational Training Act (BBiG), provides the content of the training regulations online: • Ausbildungsordnungen (https://www.bibb.de/dienst/berufesuche/de/index_berufesuche.php/) with classification (KldB 2010), statistical data, and a genealogy of occupations; • Fortbildungsordnungen (https://www.bibb.de/dienst/berufesuche/de/index_berufesuche.php/;see Figure 3 for an example) with classification (KldB 2010) and statistical data; • Berufesuche (https://www.bibb.de/de/40.php).

Figure 4 .
Figure 4.The pathway of a "Telecommunications Electronics Technician" via the CVET program "Computer Scientist (Certified)", which leads to "Professions in Software Developmentprofessionally oriented activities", which in turn gives opportunities for the four additional CVET programs.Yellow and green nodes refer to occupations, and red nodes to CVET programs.

Figure 5 .
Figure 5.This graphic shows the complexity of career pathways by CVET programs.Yellow and green nodes refer to occupations, and red nodes to CVET programs.

Table 1 .
Related work in the context of the proposed research questions for the German labor market.

Table 2 .
Overview of available online data.Some data are available via an API, others only as a file download.Here, the source [DBA] refers to the download portal of the BA at https://download-por tal.arbeitsagentur.de/files/and[BS] to BIBB-Berufesuche at https://www.bibb.de/dienst/berufesuche/de/index_berufesuche.php.Obviously, very few historical data are available.See also Figure2.