Hierarchical Classification of Transversal Skills in Job Ads Based on Sentence Embeddings

This paper proposes a classification framework aimed at identifying correlations between job ad requirements and transversal skill sets, with a focus on predicting the necessary skills for individual job descriptions using a deep learning model. The approach involves data collection, preprocessing, and labeling using ESCO (European Skills, Competences, and Occupations) taxonomy. Hierarchical classification and multi-label strategies are used for skill identification, while augmentation techniques address data imbalance, enhancing model robustness. A comparison between results obtained with English-specific and multi-language sentence embedding models reveals close accuracy. The experimental case studies detail neural network configurations, hyperparameters, and cross-validation results, highlighting the efficacy of the hierarchical approach and the suitability of the multi-language model for the diverse European job market. Thus, a new approach is proposed for the hierarchical classification of transversal skills from job ads.


Introduction
The field of text classification, a fundamental subdomain within the natural language processing (NLP) field of machine learning (ML), has witnessed a remarkable evolution in recent years.With the exponential increase in textual data generated across various domains, the need for effective text classification methods has become increasingly pressing.Text classification is the task of assigning predefined labels or categories to textual documents based on their content.This task holds immense importance across various industries and applications, including but not limited to sentiment analysis, spam detection, content recommendation, and news classification.The ability to automatically organize and categorize large volumes of text can streamline information retrieval, enhance decision-making processes, and enable efficient data management.
Traditional text classification methods rely on well-established techniques such as term frequency -inverse document frequency (TF-IDF) representations and traditional ML algorithms.TF-IDF measures the importance of each term within a document relative to a corpus of documents, providing a numerical representation of textual data.Classic ML algorithms, such as k-nearest neighbors, decision trees, naïve Bayes or random forest, process the TF-IDF vectors to identify patterns and relationships among terms, and have been successfully applied to text classification tasks.
While classic methods yielded commendable results, the emergence of deep learning (DL) has brought a new era of text classification.DL models, i.e., neural networks, have demonstrated unprecedented capabilities in handling the complexity and nuances of natural language.One of the key breakthroughs in DL for text classification lies in the use of word and sentence embeddings that represent words as vectors in high-dimensional spaces, capturing semantic relationships between them.Sentence embeddings extend this idea to encode entire sentences or documents into vectorbased representations.
DL models use these embeddings to learn complex patterns and contextual information within text data.Recurrent neural networks, e.g., long-short term memory (LSTM) models, and more advanced architectures such as those based on Transformers have achieved state-of-the-art performance across a wide range of text classification tasks.
The motivation behind employing text classification for the identification of skills from job advertisements is rooted in the ever-evolving job market dynamics and the imperative need for efficient workforce matching.In today's fast-paced and highly competitive job market, where the demand for specific skills is continually changing, exploiting the power of text classification offers several significant advantages.
First, it allows for precise talent matching.Job seekers possess a diverse range of skills, and employers have specific skill requirements for their open positions.Text classification ensures that the right individuals with the exact skill sets are efficiently matched with job opportunities that necessitate those particular competences.This results in reduced skill mismatches, higher job satisfaction, and increased productivity.
Moreover, the job market is a dynamic entity characterized by rapid skill turnover due to technological advancements and evolving industry needs.Text classification offers the ability to perform real-time analysis of job descriptions and skill requirements.By staying up-to-date with the latest trends and demands, organizations can rapidly adapt, ensuring that their workforce remains competitive and aligned with market needs.
Furthermore, text classification optimizes resource allocation for human resources departments and job search platforms.It automates the labor-intensive process of scanning and categorizing skills from large numbers of job listings or CVs, thus saving valuable time and effort.
Lastly, another compelling motivation is the identification of skill gaps.For job seekers, text classification helps individuals pinpoint areas where they may lack necessary skills for a specific role.This knowledge empowers them to proactively seek out relevant training or education, fostering lifelong learning and career development.
Our contribution involves the development of neural network models designed for the classification of job ads based on the ESCO taxonomy.To accomplish this, we create our own dataset, comprising manually labeled job ads that reference the skills mentioned in the respective text.Additionally, our classifiers consider the hierarchical structure outlined in the ESCO platform, wherein skills are categorized into two levels: the higher level comprises more broadly-phrased skills with a wider context, while the lower level encompasses sets of subskills for each top-level skill, providing a more detailed specification.
The rest of the paper is structured as follows.In Section 2, we include a short survey of the related literature.Section 3 details the methodology, including data preprocessing and the proposed hierarchical skill classification.In Section 4, we show the experimental results, and in Section 5 we present the conclusions of this work.

Related Work
In general, the biggest obstacles in classifying job ads and identifying relevant skills can be summarized as: the availability of data, particularly pre-labeled sets of job ads and corresponding skills that can be used for training supervised classifiers; the unstructured nature of many job ads, which feature a broad range of formats, phrasings, different manners of expressing similar requirements, and vagueness when formulating job requirements.These aspects cause the problem space to increase disproportionately compared to the available data, making it tedious and difficult to properly preprocess and standardize the training data, as well as to develop reliable classification models.Consequently, skill identification is a multifaceted topic addressed in a variety of ways in the related literature.
One of the more common and reliable methods for skill identification is skill counting.Manual skill counting relies on expert readers to identify relevant skills in job ads.This can be achieved without a knowledge base or using an already available skill set.The drawback of such an approach is that it is time-consuming and tedious, however the availability of a prior list of skills to choose from makes the task easier for expert staff.The process can also be automated if a skill base is available.The competences can be identified either by Boolean indexing, or using a simple feature as a word weight, which indicates the importance of the skill's phrasing.A common weighing method makes use of metrics such as TF/IDF [1].
Early work in the direction of skill identification involved finding exact matches of skill labels from existing skill bases in job ad phrasing.Such methods are simple to implement and rely on searching for keywords or keyword combinations [2][3][4].In absence of a skill base, skill counting mainly relies on the assessment of expert annotators.The topic of manual or semi-automated content analysis has been a subject of thorough research, since it has the distinctive advantage of also performing a qualitative search alongside a quantitative one.Multiple studies employ content analysis for identifying job market needs, though the general consensus is that manual annotation, while often more reliable, is time consuming and ineffective for the systematic analysis of large bodies of text [5][6][7][8].
Multiple authors handle the skill search task by treating it as a topic modeling problem.In topic modeling, the main themes of a text are learned in an unsupervised manner, by determining and analyzing word distributions.The works that employ topic modeling algorithms identify the most required skills among the topics of the job ad.A common approach in this sense is to find the formulation of relevant skills in the most frequent keywords of the identified topics.In this context, the authors of [9] use latent semantic analysis (LSA) to carry out skill identification from job ads.LSA involves transforming the job ad set into a term matrix.This matrix is then subjected to dimensionality reduction via singular value decomposition, which results in sets of highlycorrelated keywords and documents.The phrases formed by these keywords are the topics identified in the texts, which are further subjected to data analysis techniques and expert evaluation.Similar works are by [10][11][12], where latent Dirichlet allocation (LDA) is used to identify popular keywords in job ads.Each document is transformed into a probability distribution of topics and each topic is treated as a probability distribution of words, all sharing a common Dirichlet prior [13].
Word embeddings have become increasingly popular in generating and classifying text.Similar to word embeddings, skill embedding methods generate vector representations of skill keywords, such that similar skills have high similarity in the corresponding vector space.The aim of most works is to develop embedding spaces that work with simple similarity metrics, such as cosine or Jaccard similarities.In [14][15], the authors employ the Word2vec model [16] to derive vector representations for skill contexts.Subsequently, these vectors serve as inputs for a clustering algorithm, facilitating the grouping of aggregated contexts in clusters.
In [17] the authors use Word2vec embeddings to assess the similarity between skills mentioned in job ads and in professional standards.These standards encompass a set of principles, ethics, and behaviors obligatory for members of a specific profession.The training of Word2vec on a job ad corpus enables the model to learn contextual information from job ads, facilitating the clustering of skills based on their presence in the job ad texts.
The authors of [18] first identify skills explicitly, then infer implicit skills from job ads using Doc2vec [19].The inference process involves identifying similar job descriptions that share common features such as location or company, assuming that they would also share similar skills.Implicit skills, in this context, are those not explicitly stated but considered important for a given position.Using document-level embeddings, the authors incorporate inferred skills into those extracted directly from the job ads based on their similarities.
The authors of [20] introduce Skill2vec, a technique aimed at optimizing candidate skill searches.Also inspired by Word2vec, Skill2vec maps skills into a vector space, revealing skill relationships.Training involves a neural network where skills are treated as words.This method creates a relationship graph among recruitment domain skills, and can help candidates in identifying skill gaps relative to job requirements, guiding them towards suitable training opportunities.
In [21], skill embeddings are determined using FastText [22] trained on job ads, to ensure the coherence of the extracted skills.This approach handles out-of-vocabulary instances, generating representations close to the original word, even when misspellings occur.
In [23], the author addresses the evolving Norwegian job market needs by introducing a method to identify groups of words that represent skills in the text of job listings.The use cases, requirements, data sets, implementation and design of such a skill extracting algorithm are described.The work also mentions some issues related to language ambiguity and semantic differences between datasets, which hinder precise skill extraction, and underscores the need for the computation of semantic similarities to resolve ambiguity effectively.It also suggests leveraging other NLP techniques, such as named-entity recognition (NER) and part-of-speech (POS) tagging.
A popular class of methods involve the use of supervised ML algorithms.In particular, deep neural networks have proven particularly-useful for NLP tasks.Various implementations and architectures have been developed in this direction, with promising results, often proving superior to unsupervised approaches.While methods based on deep neural networks systematically have demonstrated their reliability in capturing hidden word relationships and exhibit promising outcomes for various NLP tasks, they also have the downside of being data-intensive.Achieving favorable results in the context of skill identification demands large, labeled data sets and meticulous fine-tuning.
In [24], the authors use a LSTM architecture pre-trained to perform NER, i.e., the identification and classification of entities from unstructured text into predefined classes such as names, locations, codes, percentages, and organizations [25].A common practice in this direction is to rely on manually labeled data sets comprising a large body of job ads, though the accuracy of manual annotation can greatly affect the reliability of the resulting models.
In [26] the authors compare models based on convolutional neural networks (CNN) and LSTM for a sentence classification problem.The CNN model incorporates word order criteria by applying a fixed-size window to the input array, consisting of words and their corresponding word embeddings [27].The LSTM architecture takes advantage of the sequential nature of the text, addressing long-term dependencies and enabling predictions on variable-length inputs.
The authors of [28] rely on advanced language models such as Bidirectional Encoder Representations from Transformers (BERT) [29] for sentence classification in job ads.BERT is specifically designed to pre-train deep bidirectional representations from unlabeled text, considering both left and right contexts in all layers.Consequently, the pre-trained BERT model is fine-tuned with a single additional output layer, without significant modifications to the task-specific architecture.Other authors report the successful incorporation of BERT-based models into the classification pipeline.In [30], a BERT-based sentence transformer is used to perform initial feature extraction from job ad texts.Following a dimensionality reduction phase, a combination of NLP techniques and clustering methods is used to classify the job ads in the corresponding vector space.Another application of sentence transformers is by [31], who use the multilingual SBERT model [32] to determine vector representation of job ad phrases.They demonstrate that the embedding deduced by the transformer model is significantly more reliable at labeling skills in job ads than alternative unsupervised approaches, while achieving accuracies close to manual annotation.
In [33] the authors opt for a multi-label text classification approach to assign skills to each job description.Rather than classifying individual words in job descriptions, the authors treat the job descriptions as indicators for the binary relevance of multiple skills.To achieve this, they employ a BERT encoder and add an extra layer for multi-label classification.Additionally, a correlation aware bootstrapping process is introduced, encompassing structured semantic representations of skills and their co-occurrences to account for missing skills mentioned in job ads by augmenting the number of training examples.
Paper [34] addresses the scarcity of timely, comprehensive information on EU employers' skill needs by proposing a system that analyzes online job vacancies.It aims to create a pan-European platform of these vacancies for insights into skill requirements, aiming toward real-time skill demand analysis.This system employs ontologies and ML models to process multilingual job postings across various European Union languages.ML algorithms match job content to predefined terms, refining classification accuracy through expert validation and continuous ontology updates.In the proposed approach, each variable (e.g., occupation, region) and language require separate ML model training, which is currently focused on occupation classification with plans to extend to other variables.
A few survey papers provide overviews and detailed analyses of the various methods employed in the related literature for finding relevant sources of job ads in the academia [35], of knowledge extraction from job ads in the IT job market [36], or of skill identification techniques in terms of methods used, classification granularity, and existing implementations [37].

Methodology
This study aims to propose a framework for identifying meaningful relationships between employers' requirements in job ads and sets of transversal skills.To this extent, we generate an original data set consisting in job ads annotated using labels drawn from the ESCO skill base.Additionally, our goal is to create a model that can effectively predict the necessary skills for individual job descriptions.To accomplish this task, our strategy involves the implementation of a deep learning model, considering that DL neural networks, with their ability to capture complex patterns and associations within data, are well-suited for the nature of this problem.Our approach involves a comprehensive experimental study focusing on the application of neural network-based classification models for identifying transversal skills from job ads.The experimental pipeline includes several stages: initially, a preprocessing stage filters the job ads and divides them into sentences.Subsequently, a dataset generation stage creates training and test data using a sentence embedding model.Finally, the process involves generating and fine-tuning hierarchical classification models for the two ESCO skill levels.
Figure 1 summarizes the process described above.

Data Preprocessing
The first step is the data collection and preprocessing phase, using 219 job ads downloaded from the EURES online platform [38].We devised an automated script that parses the text, eliminates personal information, such as web addresses, and non-standard characters, and ultimately identifies the individual sentences in the ads.
Since the job ads are posted by companies from different European countries, they can be written in their respective national languages.As we will explain in the following sections, two approaches are used.The first one is to translate them into English using an automatic translation tool in order to use an English language sentence embedding model.The second one is to use a multi-language sentence embedding model applied to the original text.
These sentences then underwent manual labeling, for classifying them into skill classes and subclasses, using the skills and competences framework provided by the ESCO website [39].ESCO, an initiative affiliated with the European Commission, serves as a classification system for the European Skills, Competences, and Occupations.The framework includes six main classes of transversal skills, each with additional subclasses:  T 1 -Core skills and competences;  T 2 -Thinking skills and competences;  T 3 -Self-management skills and competences;  T 4 -Social and communication skills and competences;  T 5 -Physical and manual skills and competences;  T 6 -Life skills and competences.
The utilization of ESCO's well-defined skill and competence framework ensures a structured and standardized approach to skill labeling.By classifying sentences from job ads into these skill classes, the ML models can enhance the job matching process and facilitate better alignment between job seekers and employers in the European job market.
A sample of the results of the automatic parsing and manual classification of the dataset is presented in Figure 2.Each line corresponds to a sentence.The complete set of information has several components, delimited with semicolons.The first component is an identifier of the job ad, and the second one is the identifier of the sentence in the job ad.The third component is the actual sentence in the form of a list of plain words, without punctuation marks or other identifiers.The last component includes the skills that the sentence refers to.One can notice that each sentence can belong to several classes (T x ) or subclasses (T x.y ).Also, there is a large number of sentences that do not represent transversal skills (marked with 0).

Hierarchical Skill Classification
We propose a hierarchical classification approach for the sentences extracted from the job ads.Our methodology involves several key steps aimed at enhancing the precision of skill identification and classification.In order to represent each sentence in a standardized manner, we employ a pretrained sentence embedding model implemented in the "Sentence Transformers" library [40][41], based on the BERT architecture.This model transforms each sentence into a 768 element vector that captures its meaning.In this way, the text input is transformed into a numeric representation that can be used for further classification.The number of total sentences (i.e., instances in the data set) is 5208.
For the first level of classification, we separately create individual classification models using neural networks for each type of competence (T x ).This results in the development of six distinct models, each dedicated to determining whether or not a sentence contains a specific type of competence.This addresses the fact that a single sentence can potentially belong to multiple skill classes (T x ).
One of the significant challenges encountered while working with the data set was the great imbalance between sentences without skills and those containing skills.A majority of sentences in job ads typically do not contain explicit references to transversal skills.To mitigate this imbalance, we implement a data augmentation strategy; specifically, we employ sentence cloning to augment the data set by replicating the sentences that contain skills to an extent where the positive and negative samples have approximately the same number.This augmentation technique aims to improve the robustness and performance of the classification models.
An alternative to simple cloning is paraphrasing, i.e., automatically creating other sentences with different words but the same meaning for a given input sentence.This can be done, e.g., with the "Pegasus" library [42][43].However, this procedure proved to be time consuming and the final results did not show improvements compared to the simple cloning technique.Moreover, the paraphrasing model was created only for the English language, and this posed additional limitations for working directly with multi-language models, as opposed to using English translations.
For the second level of the hierarchical classification, we employ a multi-label approach to further refine the classification of sentences into subclasses (T x.y ).While generating multiple singlelabel models is the more straightforward and easily-interpretable approach, it has the downside of not accounting for instances which belong to multiple classes, therefore oversimplifying the problem.Multi-label models have been successfully employed for text classification tasks and, while more complex and difficult to train, provide a more comprehensive representation of the relationships between classes and instances, acknowledging that instances can belong to multiple categories.This level of classification aims to provide more granular information about the specific competences referred to in the sentences.However, this stage introduces several significant challenges.The very limited number of instances available for each subclass affects both the ability to create meaningful distinct models and the generalization capabilities of the models.Also, there is a lack of clear boundaries or distinctions between some of the subclasses, i.e., the same sentence can be classified into multiple subclasses simultaneously.More specifically, we had 133 instances (i.e., sentences) in T 1 , 238 instances in T 2 , 378 instances in T 3 , 395 instances in T 4 , 20 instances in T 5 , and 106 instances in T 6 .Therefore, we could not train separate models for each subclass, as we did in the "level 1" classification.
In order to address these issues, we adopt a multi-label classification strategy that allows a single sentence to be assigned to multiple subclasses simultaneously.One can see some examples in Figure 3.In this case, for each sentence there are a number of no i non-mutually exclusive outputs, where no i is the number of subclasses in class i.For the six main classes, (no 1 ,…, no 6 ) = (3,4,4,5,2,6).For example, the second line in Figure 3 contains a sentence that belongs to class T 1 in the first level, and to T 1.1 and T 1.3 in the second level.Since no 1 = 3, a binary vector of three elements defines the desired output.Since only T 1.1 and T 1.3 are relevant, the corresponding output vector is (1, 0, 1).
In the previous stages of our research, we relied on a sentence embedding model tailored for the English language to process and analyze job ads.However, the next step was to employ a multilanguage sentence embedding model directly, to be able to effectively handle job ads written in their original languages.This broader linguistic coverage better reflects the multilingual nature of today's European job market, where job seekers and employers often interact in languages other than English.

The English Language Model
We tested multiple configurations of neural networks for the classification of the vectors representing sentences.In order to assess the performance of a model, cross-validation was used.This method can assess the generalization capabilities of a ML model, ensuring that it can make accurate predictions on unseen data and avoiding overfitting, which occurs when a model is too specific to the training data and performs poorly on new, real-world examples.k-fold crossvalidation is a widely used technique for model evaluation and selection.In this method, the dataset is divided into k equally sized subsets or "folds".The model is trained k times, each time using a different fold as the testing set and the remaining folds as the training set.This process helps to evaluate the model performance for different partitions of the data.By averaging the results from the k iterations, one can obtain a more reliable estimate of the model performance.k-fold crossvalidation also provides a way to tune the hyperparameters and to assess generalization ability.In our experiments, we used k = 5.
The process of training neural networks involves addressing a considerable search space encompassing network architecture, e.g., the number of hidden layers, neurons, and activation functions, as well as hyperparameters such as the choice of the optimizer (the optimization algorithm), learning rate, and regularization details.The multitude of these variables results in an overwhelming number of potential configurations, rendering exhaustive exploration unfeasible.Consequently, employing a heuristic, empirical approach becomes imperative to efficiently identify the best available options within this vast parameter space.
We explored it by iterating through various configurations, initially favoring simpler ones and progressively trying more complex architectures, and conducting multiple trials with parameter adjustments to estimate their impact.
For each configuration we repeated the training process five times, and we computed the average accuracy values for the testing sets.Although differences exist among different runs because the data are randomly selected in the cross-validation folds, they were typically less than 1% between the minimum and maximum obtained values.
In Table 1, we provide a description of the network architecture and hyperparameters that were tested, the obtained accuracy values and the primary motivation underlying the selection of each parameter combination.
In the column "Architecture and hyperparameters", the network architecture is presented on the first line in the following format: (number of inputs) : (number of neurons in hidden layer 1) (activation function of the neurons in hidden layer 1) : ... the same for the other hidden layers ... : (number of outputs) (activation function of the neurons in the output layer) (1) The activation functions that we used are: sigmoid (the unipolar sigmoid), tanh (the hyperbolic tangent), elu (exponential linear unit), and lrelu (leaky rectified linear unit).

Architecture and hyperparameters Motivation
Then, the main hyperparameters are mentioned on the second line: n is the number of training epochs, η is the learning rate, and opt represents the optimization algorithm.
For all configurations, the binary cross-entropy loss function was used, as it is better-suited to classification problems.
The best results were obtained with balanced configurations, i.e., which are sufficiently complex to capture the underlying patterns of the data but without excessive complexity that may lead to overfitting.This equilibrium extends to both the architectural complexity of the network and the number of training epochs.
Furthermore, it is important to note that the best model varies between the different classes of skills (T x ), and therefore different neural models are used in each specific case.
Figure 4 shows the best cross-validation results for the six "level 1" models corresponding to the six main skill classes (T x ).One can see that the accuracy values for all models are quite high, above 94%, with the best one reaching 99%.We should emphasize that these values represent the averages obtained for the testing sets, not for the training sets.Therefore, we can conclude that the "level 1" classification models are well suited for the given task.

The Multi-language Model
Our evaluation revealed that the performance results achieved using the multi-language model were remarkably close to those obtained when using the English-specific model.In this subsection we show the results for the "level 1" problem, but the hierarchical classification methodology designed here is independent of the specific sentence embedding model.The network architectures and hyperparameters that were tested for the multi-language sentence embeddings are presented in Table 2.The results of the experiments, in terms of accuracy, are presented in Table 3.The "Trial" column corresponds to the configurations in Tables 1 and 2. The rest of the columns show the accuracy values obtained for the testing sets in the cross-validation procedure.The best results for each skill class are highlighted in bold red.Still, one can see that several models can give comparably good results.Therefore, the results that are within a 0.2% range from the best one are also marked in bold italics.As shown in Figures 4 and 5, the multi-language model has very close accuracy values compared to the English language model.In Figure 5, we also include the relative difference between the two models computed with the following equation:

Architecture and hyperparameters Motivation
(2) where a m is the best accuracy obtained with the multi-language embeddings and a e is the best accuracy obtained with the English language embeddings.and multi-language sentence embeddings One can see that the multi-language model is slightly better for the classes T 1 , T 3 and T 4 , and slightly worse for the rest, but the accuracy values are all within a 0.55% range.Given the approximate nature of neural network-based classification, we can conclude that the quality of the results obtained in the two situations is basically the same.However, the multi-language model has the advantage of flexibility and using it avoids the additional step of detecting the original language and performing an automatic translation that may even distort the original message to some extent.Therefore, it was selected as the default model to be used for the classification of skills, and also for the results of the second level classification presented in the next subsection.The average training and testing results in terms of accuracy, for the k = 5 cross-validation folds are presented in Figure 6.Since this is a multi-label classification, the accuracy value for an instance represents the average of the accuracy values obtained for all outputs.In this case, one can notice that the testing performance is not as good as for the "level 1" task.It is likely that the root causes are the low number of instances, presented in Table 6, and the ambiguity of the subclass membership.The experimental results show that our models generally perform well in identifying transversal skills within job ads, considering the limited available data.In particular, the high accuracies of the "level 1" models allow the reliable screening of job ads that are lacking in phrases containing transversal skills, which we found to be in the vast majority.Identifying individual transversal skills requires data in an amount that can sufficiently cover the space of possible skillrelated phrasings.Currently, our "level 2" models achieve high accuracies for the T1 category, which focuses on language skills.In this case, the phrasings are generally consistent (eg."Has good knowledge of [language]" or "A good grasp of [language] is beneficial").High accuracy is also achieved for the T6 category, where, similarly, the phrasings found in job ads are more consistent than for other categories.In most cases, some of the best results were obtained using neural networks consisting in two hidden layers with descending sizes.We found that extending the architecture with additional layers of greater sizes did not improve the accuracy, instead resulting in overcomplicated models.However, especially in the case of "level 2" models, we surmise that a far greater improvement would be achieved with a broader data set with better coverage of possible phrasings, than by tweaking our current models.

Conclusions
The present study attempts to create a robust framework for defining the correlations between the requirements of job advertisements and transversal skills, in order to predict the required skills for individual job descriptions.This is achieved by means of a hierarchical classification methodology, utilizing ESCO taxonomy.The comparison between results obtained with English-specific and multi-language sentence embeddings reveals comparable performance, validating the adaptability and efficiency of multi-language embeddings, subsequently adopted as the default model due to its flexibility.At the forefront of our methodology lie the "level 1" and "level 2" hierarchical classifications, each showcasing high accuracy, yet "level 2" exhibits lower cross-validation accuracy attributed to subclass ambiguity and limited number of instances.The main contributions of this approach consist in developing neural-network models to classify job ads within the hierarchical skill framework of the European ESCO taxonomy.The key innovations include a self-generated dataset comprising manually labeled job ads referencing specific skills.Their classification aligns with ESCO hierarchical structure, distinguishing between broader toplevel skills and more specialized subskills.
Future research directions may explore refining subclass identification methods to tackle ambiguity issues and improve classification accuracy.Augmenting data instances within subclasses and employing advanced data augmentation techniques could enhance model generalization.Investigating multi-task learning approaches, enabling simultaneous classification of multiple subclasses, could deepen understanding and granularity in skill identification.Exploring domain adaptation techniques to address linguistic variations across job descriptions in different languages remains very important.Additionally, investigating interpretable models or explainable AI methodologies can aid in understanding model decisions, fostering trust and applicability in realworld scenarios.Collaborations with industry partners for larger datasets and validation in diverse job markets would further validate the efficacy of the proposed framework.

Figure 1 .
Figure 1.Overview of the proposed methodology

Figure 3 .
Figure 3.A sample result of data preprocessing for the "level 2" classification

Figure 4 .
Figure 4. Comparison between the performance of classification using English language and multi-language sentence embeddings

Figure 5 .
Figure 5. Relative difference obtained by the multi-language model compared to the English language model

Figure 6 .
Figure 6.The results obtained for the "level 2" classification

Table 3 .
Accuracy values obtained for the evaluated neural network configurations for the first level classification

Table 5 .
Accuracy values obtained for the evaluated neural network configurations for the second level classification

Table 6 .
The total number of instances in each subclass