Next Article in Journal
Conservation Laws, Soliton Dynamics, and Stability in a Nonlinear Schrödinger Equation with Second-Order Spatiotemporal Dispersion
Next Article in Special Issue
PrivPath: Privacy-Preserving Teaching-Path Guidance via Stage–Subject–Textbook Aligned Large Language Models
Previous Article in Journal
Estimating the First, Hyper-Zagreb Index for Direct Product of F-Sum Graphs
Previous Article in Special Issue
A Federated Fine-Tuning Framework for Large Language Models via Graph Representation Learning and Structural Segmentation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An LLM-Powered Framework for Privacy-Preserving and Scalable Labor Market Analysis

1
School of Economics and Finance, Guangdong University of Science and Technology, Dongguan 523083, China
2
Faculty of Education, University of Macau, Macao 999078, China
3
The Faculty of Data Science, City University of Macau, Macao 999078, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(1), 53; https://doi.org/10.3390/math14010053
Submission received: 25 October 2025 / Revised: 7 December 2025 / Accepted: 8 December 2025 / Published: 23 December 2025
(This article belongs to the Special Issue Privacy-Preserving Machine Learning in Large Language Models (LLMs))

Abstract

Timely and reliable labor market intelligence is crucial for evidence-based policymaking, workforce planning, and economic forecasting. However, traditional data collection and centralized analytics raise growing concerns about privacy, scalability, and institutional data governance. This paper presents a large language model (LLM)-powered framework for privacy-preserving and scalable labor market analysis, designed to extract, structure, and interpret occupation, skill, and salary information from distributed textual sources. Our framework integrates domain-adapted LLMs with federated learning (FL) and differential privacy (DP) to enable collaborative model training across organizations without exposing sensitive data. The architecture employs secure aggregation and privacy budgets to prevent information leakage during parameter exchange, while maintaining analytical accuracy and interpretability. The system performs multi-task inference—including job classification, skill extraction, and salary estimation—and aligns outputs to standardized taxonomies (e.g., SOC, ISCO, ESCO). Empirical evaluations on both public and semi-private datasets demonstrate that our approach achieves superior performance compared to centralized baselines, while ensuring compliance with privacy and data-sharing regulations. Expert review further confirms that the generated trend analyses are accurate, explainable, and actionable for policy and research. Our results illustrate a practical pathway toward decentralized, privacy-conscious, and large-scale labor market intelligence.

1. Introduction

Modern labor markets are undergoing rapid and uneven change, driven by digitization, automation, and shifting sectoral demand. Policymakers, public employment services, and large intermediaries increasingly need timely, fine-grained information on where skills are emerging, which occupations are at risk, and how wages evolve across regions and sectors. Traditional survey-based labor market intelligence provides high-quality but slow and coarse indicators, often with publication lags of several months and limited occupational detail. In contrast, large-scale online vacancy and CV data offer a rich and near-real-time view of skills, jobs, and wages, but raise substantial privacy, governance, and modeling challenges. This work addresses these challenges by combining large language models (LLMs), federated learning (FL), and differential privacy (DP) to enable real-time labor market analytics that are informative for policy and institutional decision making while respecting strict privacy constraints.
The global labor market is undergoing rapid transformation driven by technological innovation, automation, demographic shifts, and post-pandemic economic realignment. In this evolving landscape, understanding labor demand and supply dynamics at fine-grained spatial, temporal, and occupational resolutions has become increasingly critical. Policymakers depend on such insights to design reskilling and employment programs; corporations leverage them for strategic hiring and workforce planning; and researchers analyze them to model structural changes and forecast the emergence of new occupations. Despite this growing need, existing systems often fall short in terms of timeliness, granularity, adaptability, and—critically—data governance.
Existing labor analytics platforms typically rely on keyword heuristics or supervised learning over proprietary datasets. Such approaches are brittle: they fail to generalize across domains, lack explainability, and are not easily adaptable to evolving taxonomies such as O*NET, SOC, or ESCO. More importantly, most rely on centralized data collection, which necessitates transferring sensitive employment or organizational data to a central server. This raises significant privacy, compliance, and ethical concerns under regulations such as GDPR, CCPA, and HIPAA. Institutions are therefore reluctant to share or aggregate data, limiting the scale and representativeness of analytical models. The need for a distributed, privacy-preserving, and generalizable labor analytics framework has never been more urgent.
Recent advancements in large language models (LLMs)—such as GPT-3 [1] and LLaMA [2]—have transformed natural language understanding and generation. LLMs demonstrate strong capabilities in zero-shot classification, information extraction, and summarization, often outperforming task-specific architectures. Applied to labor market intelligence, they can infer skill requirements from job postings, normalize ambiguous job titles into standardized occupational codes, estimate salary ranges from textual cues, and summarize hiring trends across regions. Yet, off-the-shelf LLMs lack domain alignment for labor economics, are computationally intensive to scale, and introduce new challenges around privacy, data provenance, and interpretability.
While LLMs have seen adoption in domains such as biomedical, legal, and financial text analysis, their integration into labor market analytics remains limited. Few studies have explored how to adapt LLMs to labor-specific terminology, align their outputs with structured taxonomies, or deploy them securely in decentralized settings. Even fewer have investigated full-stack systems that incorporate privacy-preserving computation, continuous learning, and governance for cross-institutional collaboration. These gaps motivate the need for a comprehensive solution that unifies state-of-the-art NLP with rigorous privacy protection and scalable analytics.
In this work, we introduce a secure, scalable, and modular framework for labor market analysis powered by large language models. Our framework combines domain-adapted LLMs with federated learning (FL) and differential privacy (DP) to enable collaborative model training across multiple institutions without exposing raw data. The system is designed to operate seamlessly across organizational boundaries—such as ministries of labor, multinational enterprises, universities, and think tanks—while preserving the confidentiality, ownership, and legal integrity of local data.
The architecture consists of three core components:
  • Data ingestion layer: A layer responsible for securely collecting, filtering, and preprocessing unstructured labor data from sources such as job boards, resumes, and social media feeds.
  • LLM-powered inference engine: A domain-adapted LLM fine-tuned for multi-task learning across occupation classification, skill extraction, and salary prediction, ensuring structured outputs compatible with taxonomies like SOC and ESCO.
  • Federated analysis module: A decentralized training layer implementing secure aggregation and differential privacy mechanisms to enable compliance with privacy regulations while maintaining high model utility.
The system outputs include structured occupation–skill mappings, regional dashboards, temporal trend summaries, and predictive analytics supporting policy design, workforce development, and academic labor research.
Timeliness is particularly critical; official indicators used by central banks and labor ministries are often available only with delays of several weeks or more, whereas online postings and CVs update on a daily or even hourly basis. By turning these high-frequency digital traces into structured signals, our system can provide early warnings about emerging occupations, shifting skill bundles, or sharp demand shocks, which would otherwise only appear in official statistics with substantial delay. Therefore, we evaluate our framework through extensive experiments on real-world and semi-synthetic datasets, including job postings from LinkedIn and Indeed, annotated resumes, and standardized taxonomies such as O*NET and SOC. Quantitative metrics (F1, precision–recall, regression error) and qualitative assessments by labor economists confirm both analytical accuracy and interpretability. We further simulate multi-client federated deployments to assess training efficiency, DP-induced utility trade-offs, and system scalability.
Our main contributions are summarized as follows:
  • We propose the first end-to-end labor market analysis framework that integrates LLMs with structured taxonomy alignment and multi-task inference.
  • We design a federated learning protocol with differential privacy guarantees, enabling secure and compliant training on decentralized labor datasets.
  • We develop a domain-adaptation pipeline for LLMs using occupation-annotated and skill extraction corpora, enhancing relevance and explainability.
  • We provide comprehensive empirical and expert evaluations demonstrating robustness, scalability, and privacy–utility balance across diverse domains.
To the best of our knowledge, this is the first system that holistically combines large language models, privacy-preserving computation, and labor economics into a unified, deployable platform for next-generation labor market intelligence. While large language models and federated learning offer powerful tools for processing unstructured labor market data at scale, they also come with important trade-offs: training and serving LLMs is computationally expensive; models are subject to domain drift as job content evolves; and federated optimization must contend with non-IID client data, heterogeneous hardware, and intermittent connectivity. We explicitly design our framework and experiments to surface these limitations and to quantify the extent to which differential privacy and federated learning remain compatible with economically useful signals.

2. Related Work

This work draws upon and advances research in four interconnected areas: labor market analytics, natural language processing for job and skill understanding, the application of large language models in economic and occupational domains, and privacy-preserving machine learning. We review the state of the art in each of these domains to highlight the gaps that motivate our proposed framework.

2.1. Labor Market Analytics and Workforce Intelligence

Labor market analysis is a longstanding area of interest in economics and public policy. Early foundational studies focused on analyzing trends in employment, skill demand, and wage evolution using structured datasets collected via national surveys. For instance, Autor et al. [3] introduced the concept of skill-biased technological change, showing how technology disproportionately favors workers with higher skills. Acemoglu and Restrepo [4] studied the impact of automation on employment and found a significant displacement effect on middle-skill occupations.
These classical approaches rely on periodic labor force surveys (e.g., the U.S. CPS or EU-LFS), which provide structured data but are limited by their sampling frequency and granularity. As the labor market becomes increasingly dynamic and digitized, these limitations have led researchers to explore alternative data sources. Hershbein and Kahn [5] leveraged millions of job postings to measure real-time shifts in demand for education credentials, while Marinescu and Wolthoff [6] used job board data to analyze firm preferences and applicant behavior.
Traditional labor market intelligence still relies heavily on official survey-based and administrative systems such as labor force surveys, establishment surveys, and social insurance registers. These sources remain the backbone for policy, but they typically operate at monthly or quarterly frequency, incur publication lags of several weeks or months, and often provide only coarse breakdowns by region, occupation, or demographic group. Our framework is designed to complement, rather than replace, these systems by providing high-frequency, fine-grained signals that can feed into the same policy processes.
However, these approaches often rely on shallow keyword matching or rule-based classification, which may fail to capture nuanced changes in labor demand, such as the emergence of hybrid occupations (e.g., “data-literate marketers”) or soft-skill emphasis. Furthermore, centralized scraping of proprietary job data may raise issues of access, scalability, and data governance.

2.2. NLP for Occupation Classification and Skill Extraction

Recent advances in NLP have enabled the automated classification of unstructured labor-related text. Early systems applied TF-IDF and word embedding techniques to classify job descriptions into standardized taxonomies such as the U.S. SOC or international ISCO codes [7]. While effective for coarse-grained labeling, these approaches struggled with ambiguous, context-sensitive phrases and synonyms common in job titles.
Transformer-based models, notably BERT [8], have shown significant improvements in semantic understanding. Camacho-Collados et al. [9] proposed JobBERT, a model pretrained on job advertisements, achieving state-of-the-art accuracy in job title normalization. Similarly, Liu et al. [10] fine-tuned RoBERTa for occupation classification tasks using LinkedIn job data.
Skill extraction is typically formulated as a sequence labeling problem. Conventional approaches employed conditional random fields (CRFs) or LSTMs [11], while recent work incorporates contextualized embeddings from LLMs. For example, authors in [12] introduced a skill-aware BERT variant that improves entity-level recall in job documents. Despite these advances, most models are trained on limited datasets, do not generalize across domains (e.g., resumes vs. postings), and assume centralized access to text corpora.
Furthermore, little work has addressed joint modeling of skills, salaries, and occupations as interdependent outputs [13,14]. Our framework contributes a unified, multi-task approach that simultaneously performs job classification, skill detection, and salary estimation with privacy guarantees.

2.3. LLMs in Economic and Labor Domain Applications

Large language models (LLMs) such as GPT-3 [1], T5 [15], and LLaMA [2] have demonstrated general-purpose capabilities across summarization, classification, and generation tasks. Trained on massive web-scale corpora, these foundation models can be adapted to specific domains via instruction tuning, continued pretraining, or few-shot prompting.
Beyond generic NLP benchmarks, an emerging literature tailors LLMs to economic and financial applications. In the financial domain, BloombergGPT [16] and FinGPT [17] illustrate how domain-specific pretraining on proprietary or curated financial text can improve downstream tasks such as sentiment analysis, risk assessment, and earnings-call understanding. Surveys of generative AI in finance further document applications to portfolio management, risk modeling, and compliance [18]. At the macro level, Carriero et al. [19] compare time-series LLMs with traditional forecasting models for standard macroeconomic indicators, showing that LLM-based approaches can be competitive with, or complementary to, classical econometric tools.
Economists have also begun to reflect on how LLMs change empirical practice. Kwon et al. [20] provide a practical primer on LLMs for economists and central banks, while Korinek [21] surveys use cases of generative AI in economic research, including text-based measurement, counterfactual reasoning, and agent-based simulations. These contributions emphasize that LLMs are not only black-box predictors but can serve as flexible interfaces for large unstructured corpora and as components inside larger decision-making pipelines.
In the labor domain specifically, most existing work has used LLMs either to measure the potential impact of AI on jobs or to process labor market text. Eloundou et al. [22] use GPT-4-based annotations to quantify occupational exposure to LLM capabilities, while Chen et al. [23] exploit LLMs to recover rich information from categorical variables and construct new measures of labor market match quality using job-platform and survey data. At the task level, Serino [24] adapts LLMs to extract skills from job advertisements, demonstrating that modern transformer models can greatly improve over traditional dictionary-based pipelines for skill tagging.
Compared with this literature, our focus is on operationalising LLMs as a production system for labor market intelligence under realistic constraints on data governance. Rather than assuming a centrally collected corpus, we explicitly integrate LLM-based representations with cross-silo federated learning and client-level differential privacy, aiming to deliver high-frequency labor indicators while respecting institutional and regulatory privacy requirements.

2.4. Privacy-Preserving and Federated Learning

Given the sensitivity of labor data—often containing personal identifiers, salaries, and employment histories—privacy preservation is essential. Federated learning (FL), introduced by McMahan et al. [25], enables decentralized training by allowing client devices to compute local updates without sharing raw data [26].
Applications of FL in healthcare [27] and finance [28] have shown its utility in domains with strict privacy requirements. In labor market analysis, however, FL adoption is limited due to challenges in text modeling, data heterogeneity, and client variability. Differential privacy (DP) [29] offers formal privacy guarantees by injecting noise into the data or model updates. Abadi et al. [30] introduced DP-SGD for deep networks, which has since been extended to NLP settings. Our system integrates FL and DP with LLMs for the first time in the labor market domain. We design a modular pipeline that supports institution-level federated nodes (e.g., ministries, companies, job boards) and applies privacy-preserving aggregation of model updates, enabling collaboration without compromising sensitive employment data.

2.5. Positioning and Novelty

To the best of our knowledge, our framework is the first to combine large language models, occupation–skill–salary modeling, federated privacy mechanisms, and labor market analytics into a unified platform. Compared with prior work, we achieve the following:
  • We scale NLP-based job understanding to millions of real-time job and resume records with high taxonomy fidelity.
  • We ensure privacy through federated training and rigorous DP noise control, suitable for deployment across institutions and jurisdictions.
  • We support longitudinal trend analysis using structured outputs from LLMs, enabling both short-term skill monitoring and long-term workforce planning.
Our contribution is thus not only technical but also architectural and societal, offering a scalable and secure way to monitor the evolving world of work using the latest advances in AI.

3. Framework Overview

Our proposed framework is designed to enable secure, scalable, and automated labor market analysis by integrating large language models with privacy-preserving data processing pipelines. The system comprises three major layers: data ingestion, intelligent processing via LLMs, and federated learning-based analytics. This section provides a detailed overview of each component, their roles, and their interactions.

3.1. System Architecture

The overall architecture of the system is shown in Figure 1. The components are organized to support modular deployment and decentralized collaboration across organizations such as government agencies, academic institutions, and labor market platforms.
  • Data Ingestion Layer: This layer interfaces with external data sources, including job posting websites, resume repositories, company HR systems, and labor-related social media feeds (e.g., LinkedIn and Reddit). Data is collected using secure APIs or streaming protocols. It undergoes initial preprocessing, including de-duplication, anonymization, and content validation.
  • LLM Processing Engine: The core of the system is a domain-adapted large language model that performs various natural language understanding (NLU) tasks, such as job title normalization, occupation–skill mapping, salary range inference, and labor demand summarization. We implement a modular LLM pipeline with pre-tokenization, contextual embedding generation, task-specific prompt engineering, and structured output parsing. The engine supports both batch processing and real-time streaming.
  • Federated Analysis Module: To enable secure, multi-institutional collaboration, this module coordinates a federated learning protocol where local nodes (e.g., universities or job platforms) compute intermediate models or statistics on-site. Only model updates, optionally masked by differential privacy mechanisms, are shared with a central aggregator. This design ensures raw data never leaves the source organization, meeting regulatory compliance requirements (e.g., GDPR).

3.2. Data Processing Workflow

The framework supports a continuous data pipeline consisting of the following stages:
  • Text Collection and Annotation: Raw text data is ingested and pre-labeled using weak supervision and existing ontologies (e.g., O*NET). Named entities such as job titles, skills, and locations are recognized and extracted.
  • Text Embedding and Understanding: Tokenized texts are fed into a fine-tuned transformer model, generating contextual embeddings for classification, clustering, and information extraction tasks. Attention-based mechanisms enable the model to focus on economically significant signals such as skill demand shifts or regional hiring trends.
  • Semantic Enrichment and Structuring: Extracted information is mapped to standardized taxonomies (e.g., ISCO-08, SOC) to ensure interoperability. Ambiguities in job descriptions or inconsistent terminology across regions are resolved using cross-lingual and paraphrase-aware modeling.
  • Secure Aggregation and Forecasting: In a distributed fashion, local participants compute aggregated statistics (e.g., occupation frequency histograms, skill co-occurrence graphs) and upload encrypted summaries to a secure aggregator. The central server then trains global forecasting models to predict emerging trends in occupations, skills, or sectoral labor shortages.

3.3. LLM Fine-Tuning and Task Adaptation

We fine-tune a pretrained transformer model (e.g., T5, GPT-J, or LLaMA-2) on a labor-domain corpus composed of historical job advertisements, professional networking site content, and occupation–skill mappings. Several task heads are integrated into the model architecture, each tailored for the following:
  • Occupation Classification: Predicting hierarchical occupation codes from free-text job descriptions using multi-label classification.
  • Skill Extraction: Identifying explicit and implied skills and competencies using sequence labeling with CRF or span-based decoding.
  • Salary Range Estimation: Inferring plausible salary intervals using a hybrid classification–regression head.
  • Labor Trend Summarization: Generating abstract summaries of labor shifts across sectors and geographies using encoder–decoder-style generation.
Prompt-based zero-shot or few-shot learning is used for domains with limited labeled data. In addition, we implement retrieval-augmented generation (RAG) to improve performance in sparse or noisy information settings by grounding LLM outputs on curated economic knowledge bases.

4. Methodology

Our methodology integrates data collection, large language model (LLM) adaptation, and privacy-preserving distributed learning to enable secure and scalable labor market analysis. We present the technical workflow across data preparation, model architecture, fine-tuning, and privacy protection mechanisms, along with the analytical procedures used for labor market intelligence. Figure 1 gives a general pipieline of our proposal.

4.1. Labor Market Data Processing

A robust data processing pipeline is essential for extracting meaningful and privacy-compliant insights from heterogeneous labor text sources. In this study, we design a unified preprocessing and normalization framework that accommodates multi-source, multilingual, and semi-structured inputs such as online job advertisements, resumes, freelancing platform profiles, and labor-related social media content.
We represent the full dataset as D = { ( x i , y i ) } i = 1 N , where each x i denotes an individual text sample and y i is a structured label corresponding to one or more analytical targets. Specifically, y i may encode an occupation code, a set of skill tags, or a salary range. The overall label space is defined as the union
Y = Y occ Y skill Y salary ,
capturing the multi-task nature of the downstream modeling process. Data sources often differ in format and granularity; for instance, resume segments provide fine-grained skill mentions, whereas job advertisements emphasize employer requirements and wage expectations. To harmonize these heterogeneous sources, all text records are transformed into a canonical JSON structure with standardized metadata fields (e.g., posting date, region, sector, and source platform).
Each raw document x i undergoes a multi-stage normalization procedure. After character-level normalization and encoding unification (UTF-8), we apply lowercasing, punctuation handling, and stopword filtering using a domain-augmented stopword list that preserves informative tokens such as job titles and certification names. Tokenization is performed with a subword tokenizer—either SentencePiece or Byte-Pair Encoding (BPE)—to ensure vocabulary consistency across institutions and to reduce the out-of-vocabulary rate for emerging terminology. Part-of-speech tagging and dependency parsing are optionally applied to improve contextual alignment during named entity extraction.

Entity Extraction and Ontology Alignment

To structure unlabeled text, we employ named entity recognition (NER) and span classification models pretrained on general corpora and further fine-tuned on domain-specific annotations. Extracted entities include occupations, skills, organizations, locations, and salary indicators. For each recognized entity e i , we compute a semantic embedding using contextual encoders such as SBERT or domain-adapted BERT variants. These embeddings are then compared to entries in standardized taxonomies, such as ISCO-08, O*NET, or ESCO, using cosine similarity and fuzzy string matching. The final mapping is determined via a hybrid similarity score:
sim ( e i , c j ) = α · cosine ( v e i , v c j ) + ( 1 α ) · fuzz ( e i , c j ) ,
where v e i and v c j denote the embedding vectors of the entity and concept, respectively, and α balances semantic and lexical similarity. Mapped entities are stored alongside confidence scores to enable downstream filtering and uncertainty-aware analysis. We select the trade-off parameter α [ 0 , 1 ] on a held-out validation set of manually aligned occupation and skill labels. Specifically, we perform a small grid search over α { 0.0 , 0.25 , 0.5 , 0.75 , 1.0 } and choose the value that maximizes taxonomy-alignment accuracy. Performance is stable over a moderate range around the selected α , indicating that downstream results are not overly sensitive to this hyperparameter.
We treat names, email addresses, phone numbers, postal addresses, national identifiers and other obvious personal identifiers as PII. Before any local training, each client replaces detected PII spans with generic placeholders such as [NAME], [ORG], [LOC], or [CONTACT] using a standard NER pipeline supplemented by pattern-based rules. For the labor market tasks considered here (occupation, skills, salary) these tokens carry limited signal beyond coarse context, so the expected impact on predictive performance is small. This pretraining anonymization provides a first layer of protection and is complemented by client-level differential privacy on model updates, yielding a two-layer defense that aligns with GDPR-style requirements to minimize exposure of identifiable information while still enabling useful aggregate analytics.

4.2. Domain-Specific Fine-Tuning

The adaptation of large language models (LLMs) to labor-economic text corpora is framed as a hierarchical optimization problem coupling unsupervised domain pretraining with multi-task supervised learning under constrained communication and privacy budgets. The process seeks to minimize empirical and distributional divergence between the pretrained general-domain model and the labor-specific data manifold, while preserving global stability across federated updates.
Let f θ : X R d denote an LLM encoder parameterized by θ R | Θ | that maps tokenized sequences to contextual embeddings. Given a domain distribution P labor over text–label pairs ( x , y ) , our objective is to minimize a regularized empirical risk:
min θ Θ E ( x , y ) P labor L task ( f θ ( x ) , y ) + β D KL P θ P prior ,
where P θ denotes the model-induced token distribution, P prior is the pretrained general-domain prior, and D KL regularizes the divergence between parameter-induced posteriors to prevent catastrophic forgetting. Intuitively, the objective in (1) adapts the generic LLM to the specific distribution of labor market text, so that its representations encode domain-relevant regularities in occupations, skills, and wage mentions before any task-specific supervision is introduced.
Stage I: Domain-Adaptive Pretraining (DAPT). In the unsupervised adaptation stage, we minimize the expected token reconstruction error on an unlabeled corpus U drawn from P labor unlabeled . The training objective is a masked language modeling (MLM) criterion augmented with an entropy regularizer to encourage calibration of output distributions:
L DAPT = E x ˜ U i M log p θ ( x i x ˜ i ) + λ ent H p θ ( · x ˜ i ) ,
where H ( · ) denotes Shannon entropy and λ ent controls entropy smoothing to prevent overconfident token predictions. To account for heterogeneous data sources and varying document lengths, we further incorporate a length-weighted sampling function π ( x i ) x i γ , where γ is a tunable exponent that biases sampling toward longer, semantically richer job descriptions. The effective optimization thus minimizes E x i π [ L DAPT ( x i ) ] . In practice we choose a moderate exponent γ < 1 (fixed across experiments), which mildly upweights longer job descriptions without allowing a small number of extremely long documents to dominate the training signal. We verified that the induced sampling distribution preserves the overall length histogram and sector mix of the unlabeled corpus to within a few percentage points, indicating that the length-based reweighting does not introduce substantial dataset bias.
Stage II: Multi-Task Supervised Fine-Tuning. Upon completion of DAPT, the model undergoes supervised adaptation across occupation, skill, and salary tasks defined over the labeled corpus D ˜ = { ( x ˜ i , y i ) } i = 1 N . Let f θ ( k ) denote the task-specific head for task k { occ , skill , salary } ; the joint optimization is given by
L total ( θ ) = k = 1 3 λ k E ( x ˜ , y ) D ˜ L k f θ ( k ) ( x ˜ ) , y ( k ) ,
where λ k are task-balancing coefficients.
Each task-specific loss is defined as follows:
L occ = c Y occ y c log σ ( W occ f θ ( x ˜ ) + b occ ) c ,
L skill = s Y skill y s log σ ( W skill f θ ( x ˜ ) ) s + ( 1 y s ) log ( 1 σ ( W skill f θ ( x ˜ ) ) s ) ,
L salary = 1 2 W sal f θ ( x ˜ ) y sal 2 2 ,
where σ ( · ) denotes the sigmoid activation. To stabilize multi-task gradients and avoid dominance of high-variance tasks, we employ uncertainty-based adaptive weighting:
λ k = 1 2 σ k 2 , L total = k 1 2 σ k 2 L k + log σ k .
Taken together, the losses in (3)–(7) implement a multi-task learning scheme in which a shared representation is trained to support three related economic tasks (SOC classification, skill extraction, and salary prediction). This encourages the model to capture common structures across tasks (e.g., co-occurrence of occupations and skills) while still allowing each head to specialize through its own task-specific loss.

Overall Objective

The full fine-tuning objective combining domain adaptation, multi-task learning, and information-theoretic constraints is thus
min θ L DAPT + α 1 L total + α 2 L IB + α 3 L prox ,
where α 1 , α 2 , α 3 are balancing hyperparameters tuned via Bayesian optimization under privacy-aware validation protocols. In this combined objective, L DAPT encourages the backbone LLM to remain well-adapted to the unlabeled labor market corpus; L total aggregates the supervised multi-task losses for SOC, skills, and salary; L IB acts as an information-bottleneck regulariser that favours compact, task-relevant representations; and L prox is a proximal term that stabilizes local updates in the presence of client heterogeneity. The scalar weights thus trade-off domain adaptation, supervised accuracy, representation compression, and cross-client stability.

4.3. Federated Learning with Differential Privacy

We formalize cross-institution training as a privacy-constrained distributed optimization problem. At communication round t { 0 , , T 1 } , a central coordinator broadcasts θ ( t ) to a random subset S t { 1 , , M } of clients, with independent participation P ( m S t ) = q . Each participating client m computes a clipped and privatized update on its local dataset D ( m ) and contributes only an encrypted summary to the server via secure aggregation.

4.3.1. Local Objective and Clipped Updates

Client m minimizes a local objective
L ( m ) ( θ ) = E ( x ˜ , y ) D ( m ) L total ( f θ ( x ˜ ) , y ) + μ 2 θ θ ( t ) 2 2 ,
where the proximal term constrains drift under heterogeneity. After E local SGD epochs with stepsize η loc , client m forms the tentative model delta
Δ θ m ( t ) = θ ( t ) θ m , final ( t ) = η loc e = 1 E 1 B b = 1 B g m ( t , e , b ) ,
with per-minibatch gradients g m ( t , e , b ) . For privacy and robustness we apply l 2 -clipping at radius C:
Δ θ ¯ m ( t ) = clip Δ θ m ( t ) , C = Δ θ m ( t ) · min 1 , C Δ θ m ( t ) 2 .
In all experiments we set the default clipping radius to C = 1.0 (Table 1), and we evaluate alternatives C { 0.5 , 2.0 } in the sensitivity analysis. The choice is guided by the standard DP-SGD heuristic of keeping the empirical l 2 -norms of client updates within a narrow range: we selected C via a small grid search on the validation split so that (i) fewer than roughly half of updates are clipped, (ii) the resulting configuration satisfies a target privacy budget under the RDP ledger. We do not employ a more sophisticated optimization procedure beyond this principled empirical tuning.

4.3.2. Client-Side Gaussian Mechanism and Secure Aggregation

Each participating client adds calibrated Gaussian noise to its clipped update,
Δ θ ˜ m ( t ) = Δ θ ¯ m ( t ) + ξ m ( t ) , ξ m ( t ) N 0 , σ 2 C 2 I ,
and sends Enc Δ θ ˜ m ( t ) under an additively homomorphic (or mask-based) secure aggregation protocol. The server only learns the aggregate
S ( t ) = m S t Δ θ ˜ m ( t ) but not any individual Δ θ ˜ m ( t ) .
The global model update is then
θ ( t + 1 ) = θ ( t ) η glob · 1 | S t | S ( t ) .

4.3.3. Subsampled Rényi DP Accounting

Let M t be the per-round randomized mechanism comprising Poisson subsampling with rate q, clipping (11), and client-side Gaussian noise (12). For Rényi order α > 1 , the subsampled Gaussian mechanism admits the RDP parameter
ε α ( t ) 1 α 1 log 1 + q 2 α ( α 1 ) 2 σ 2 + O q 3 σ 3 .
We note that the bound in (15) follows from standard subsampled Gaussian RDP results: we apply privacy amplification by Poisson subsampling to the client-level Gaussian mechanism and then use the analytical moments accountant for Rényi DP.
By composition, the T-round RDP satisfies ε α ( 1 : T ) t = 1 T ε α ( t ) . Converting to ( ε , δ ) -DP yields
ε ( 1 : T ) ( δ ) = min α > 1 ε α ( 1 : T ) + log ( 1 / δ ) α 1 .
Equations (15) and (16) define a tight privacy ledger over T rounds as a function of ( q , σ , C ) .
Let ζ 2 denote the bounded variance of clipped local gradients and Γ 2 the heterogeneity measure Γ 2 = 1 M m L ( m ) ( θ ) L ( θ ) 2 2 at a stationary point θ . With stepsizes ( η loc , η glob ) chosen to satisfy standard stability conditions, the expected squared gradient norm after T rounds obeys
E L ( θ ( T ) ) 2 2 O 1 T q M + O Γ 2 + O C 2 σ 2 q M + O κ ,
where the third term captures the DP-induced variance and the last term arises from unbiased compression. (This heteroscedastic formulation treats task-specific uncertainties { σ k 2 } as independent; covariance terms between tasks are ignored to keep the objective simple and numerically stable. Explicitly modeling cross-task covariance would require estimating a full uncertainty matrix and is left for future work.) In general, Equations (15)–(17) define a simple privacy ledger that tracks, for each round, the contribution of client subsampling and Gaussian noise to the overall Rényi DP parameters, and then converts the accumulated RDP into an ( ε , δ ) -DP guarantee for the entire training procedure. This makes the privacy budget explicit and comparable across different hyperparameter settings.

4.3.4. Per-Layer Clipping and Adaptive Noise

To sharpen the privacy–utility trade-off, we employ per-layer clipping with radii { C l } l = 1 L and layerwise noise multipliers { σ l } l = 1 L :
Δ θ ˜ m , l ( t ) = clip Δ θ m , l ( t ) , C l + N 0 , σ l 2 C l 2 I .
We allocate noise by sensitivity-aware budgeting, e.g., σ l sens l , where sens l is an empirical Lipschitz proxy obtained from running gradient norms. The total RDP accumulates additively across layers. We note that although ( C l , σ l ) may differ across layers, the privacy accounting in Section 5.1 is performed on the concatenated parameter vector, so the final ( ε , δ ) guarantee applies uniformly to the entire model. Per-layer allocation only redistributes the contribution of each layer to the total RDP; it does not induce different formal privacy levels for different components of θ .

4.3.5. Secure Aggregation Threat Model

We assume an honest-but-curious coordinator that observes only S ( t ) in (13). Against a coalition A of up to ρ M colluding clients (with ρ < 1 / 2 ) the protocol reveals at most the noisy sum of the remaining S t A contributions. Since each client already privatizes updates via (12), privacy is retained even if secure aggregation fails open; the latter primarily protects confidentiality against inference on small coalitions and provides robustness to dropouts. (Our analysis is orthogonal to the differential privacy guarantees, which hold even if the secure aggregation layer is compromised.)

4.3.6. Participation Randomness and Amplification

Random participation further amplifies privacy. Under Poisson subsampling with rate q, the effective per-round sensitivity scales as q C , and the RDP curve (15) tightens with smaller q. In practice we tune ( q , σ ) to satisfy a target ε ( 1 : T ) ( δ ) in (16) while keeping the variance term in (17) below a user-specified utility threshold.
End-to-End Budget Management
Let ε DAPT be the budget used for unsupervised adaptation with local unlabeled corpora and ε FT the budget for supervised federated fine-tuning. The total ledger enforces
ε tot ( δ ) ε DAPT ( δ / 2 ) + ε FT ( δ / 2 ) ,
with each component computed via (16) using its own ( T , q , σ ) . (Here we conservatively model domain-adaptive pretraining and supervised fine-tuning as sequential mechanisms acting on (potentially overlapping) populations, and thus apply standard sequential composition, which yields an additive bound on ε tot . If the underlying cohorts were disjointed, a more favorable parallel composition could be used instead; our choice covers the general case where some individuals contribute data to both stages.)

5. Theorcial Analysis

This section provides a formal analysis of our training and inference pipeline, encompassing (i) privacy guarantees under client-side Gaussian mechanisms with subsampling and secure aggregation; (ii) optimization convergence under data heterogeneity, compression, and differential privacy (DP) noise; (iii) generalization via algorithmic stability with explicit dependence on the privacy parameters; (iv) uncertainty propagation to downstream trend forecasting. Throughout, we assume losses are Lipschitz and bounded unless otherwise specified.
Let A denote the (randomized) federated training algorithm that maps a family of local datasets { D ( m ) } m = 1 M to a model θ ^ after T communication rounds. The per-round participating set is S t with Poisson subsampling rate q, client clipping radius C, and client-side Gaussian noise multiplier σ . We use ε ( δ ) for ( ε , δ ) -DP and ε α for Rényi DP (RDP) at order α > 1 . Gradients are clipped in l 2 -norm and stochastic updates may be compressed by an unbiased operator Q with variance parameter κ .

5.1. Privacy Guarantees Under Subsampled Client-Side Gaussian DP

Lemma 1 
(Per-round RDP of subsampled Gaussian mechanism). Consider a single training round comprised of Poisson subsampling with rate q over clients, l 2 -clipping at radius C, and independent Gaussian noise N ( 0 , σ 2 C 2 I ) added on each participating client update before secure aggregation. Then for any order α > 1 , the mechanism admits the RDP parameter
ε α ( 1 ) 1 α 1 log 1 + q 2 α ( α 1 ) 2 σ 2 + O q 3 σ 3 .
Proof Sketch. 
Apply privacy amplification by subsampling to the client-level Gaussian mechanism and use the moment accountant/RDP composition of additive Gaussian noise. The O ( q 3 / σ 3 ) term follows from higher-order terms in the subsampled RDP expansion. □
Theorem 1 
(Composed privacy over T rounds). Let ε α ( t ) denote the RDP parameter of round t. Then ε α ( 1 : T ) t = 1 T ε α ( t ) . For any δ ( 0 , 1 ) , the composed mechanism is ( ε , δ ) -DP with
ε ( δ ) = min α > 1 ε α ( 1 : T ) + log ( 1 / δ ) α 1 .
In our instantiation, l ( θ ; z ) corresponds to (i) cross-entropy losses for occupation and skill prediction with logit clipping and label smoothing, (ii) a squared loss for salary regression applied to normalized targets. Together with gradient clipping, these choices ensure that l ( θ ; z ) [ 0 , 1 ] after rescaling and that ℓ is L-Lipschitz in θ, so the conditions of Theorem 1 hold for all three tasks.
Corollary 1 
(Layerwise budgeting). If each layer ℓ uses clipping radius C l and noise multiplier σ l , then the per-round RDP decomposes additively across layers and across rounds. Consequently, for any partition { I j } j of layers, one may allocate privacy budgets { ε α ( t , j ) } independently and sum them to obtain the global ledger.
Remark 1 
(Security of aggregation vs. DP). Secure aggregation ensures the server observes only the sum of (already privatized) updates, protecting confidentiality against inference on individuals or small coalitions. However, DP holds even if secure aggregation fails open; hence, DP is the primary guarantee, while secure aggregation strengthens the threat model.

5.2. Optimization Under Heterogeneity, Compression, and DP Noise

We study the convergence of the proximal federated update with client heterogeneity, unbiased compression, and additive Gaussian noise. Let L ( θ ) be the population objective and assume L-smoothness.
Assumption 1 
(Bounded stochasticity and heterogeneity). There exist ζ 2 , Γ 2 < such that (i) the variance of clipped stochastic gradients is bounded by ζ 2 , (ii) the heterogeneity measure at a stationary point θ satisfies Γ 2 = 1 M m = 1 M L ( m ) ( θ ) L ( θ ) 2 2 .
Assumption 2 
(Unbiased compression). The compressor Q is unbiased, E [ Q ( v ) ] = v , with E [ Q ( v ) v 2 2 ] κ v 2 2 for some κ [ 0 , 1 ) , and employs error feedback.
Theorem 2 
(Non-asymptotic convergence with DP and compression). Let each round sample clients with rate q, perform E local steps with stepsize η loc , and update globally with step size η glob . Under Assumptions 1 and 2 and standard stability conditions on ( η loc , η glob ) , after T rounds the expected stationarity measure obeys
E L ( θ ( T ) ) 2 2 O 1 T q M + O Γ 2 + O C 2 σ 2 q M + O κ .
Proof Sketch. 
Adapt a variance-reduced analysis for federated proximal SGD, bounding drift by the proximal term and controlling additional variance from (i) DP noise ( C 2 σ 2 ), (ii) compression ( κ ) using error-feedback recursion. Subsampling contributes the 1 / T q M averaging factor. □
Proposition 1 
(Trade-off surface). For a target stationarity tolerance ϵ opt , feasibility requires ( C 2 σ 2 ) / ( q M ) + κ + Γ 2 ϵ opt . Given a privacy target ε ( δ ) via Theorem 1, one can invert the RDP ledger to obtain admissible ( q , σ , T ) triples that lie on a Pareto surface balancing convergence and privacy.

5.3. Generalization via Stability and Differential Privacy

Let the loss l ( θ ; z ) [ 0 , 1 ] be bounded and L-Lipschitz in θ for each example z. We leverage the well-known connection that DP implies algorithmic stability.
Definition 1 
(Uniform stability). An algorithm A is γ-uniformly stable if for any neighboring datasets S , S differ in one example and, for any z, | E [ l ( A ( S ) ; z ) l ( A ( S ) ; z ) ] | γ .
Lemma 2 
(DP ⇒ stability). If A is ( ε , δ ) -DP, then it is γ-uniformly stable with γ e ε 1 + δ . For ε 1 , γ ε + δ .
Theorem 3 
(Generalization bound). Let θ ^ = A ( S ) be the output of the DP federated algorithm on a sample S of size n eff (effective sample over clients and rounds). Then with a probability of at least 1 β ,
| E z [ l ( θ ^ ; z ) ] 1 | S | z S l ( θ ^ ; z ) | γ + log ( 2 / β ) 2 n eff , γ = e ε 1 + δ .
The bound in Theorem 3 separates the contributions of optimization stochasticity, client heterogeneity, differential-privacy noise, and (optional) update compression. Under the smoothness and bounded-variance assumptions stated above, the rate matches standard nonconvex DP-SGD bounds up to constants when heterogeneity and compression are negligible, and degrades gracefully as the heterogeneity parameter and compression variance increase. In highly non-IID regimes the bound is likely conservative, but it captures the empirical trends observed in our experiments: increasing the number of participating clients M and rounds T mitigates the effect of DP noise, whereas strong client drift and aggressive compression slow convergence.
Corollary 2 
(Privacy–generalization coupling). For fixed n eff , tightening ε ( δ ) via larger σ or smaller q increases optimization noise (Theorem 2) but improves stability γ, exposing an explicit privacy–utility–generalization triad. The optimal operating point depends on task curvature, heterogeneity Γ 2 , and forecasting requirements downstream.

5.4. Uncertainty Propagation to Forecasts Under DP Noise

Let θ ^ be the privatized model used to produce time-series intensities λ ^ o , t . Write the perturbation decomposition θ ^ = θ + δ stat + δ DP , where δ stat is the sampling error and δ DP arises from injected noise.
Proposition 2 
(Linearized predictive variance inflation). Under a first-order delta approximation of the predictor map φ : θ log λ t , the h-step predictive covariance obeys the discrete Lyapunov recursion
Σ t + h | t = j = 0 h 1 Φ j Σ ε + J φ Σ DP J φ ( Φ ) j , Φ = l = 1 L A l ,
where J φ is the Jacobian of φ at θ and Σ DP is the parameter-space covariance induced by the DP mechanism. Thus, privacy induces an additive, horizon-dependent variance inflation in forecast space.

5.5. Complexity and Communication

Let P = | Θ | be the number of model parameters. With unbiased compression ratio r ( 0 , 1 ] (i.e., transmitting r P coordinates in expectation) and participation rate q, the expected per-round uplink cost is
Comm = q M · r P · b bits ,
for b-bit quantization (e.g., stochastic b-bit). Error feedback ensures no first-order bias, contributing only the O ( κ ) term in Theorem 2. Computation scales as O ( E · | D ( m ) | · P ) per participating client per round.
In general, Lemmas 1 and 2 and Theorems 1–3 jointly establish that our federated protocol attains (i) auditable client-level ( ε , δ ) -DP; (ii) provable convergence rates that degrade gracefully with DP noise, heterogeneity, and compression; (iii) distribution-level generalization guarantees through DP-induced stability. Proposition 2 further shows how privacy noise propagates to downstream forecast uncertainty in a controllable, explicitly characterizable manner. Collectively, these results provide an interpretable blueprint for selecting ( q , σ , T , C , r ) to meet prespecified privacy and accuracy targets under realistic cross-institution constraints.
In our implementation we transmit full-precision model deltas (with mixed-precision training on-device) and do not apply additional lossy quantization beyond standard numerical precision, so the analytical communication cost in (25) slightly overestimates the actual savings that could be obtained under aggressive gradient compression. Integrating quantization-aware DP mechanisms that jointly tune ( r , b , σ ) to reduce bandwidth at fixed privacy and utility is a promising direction for future deployments, and is supported by the general form of Theorem 3 through the O ( κ ) term.

6. Experiments

We conducted comprehensive experiments to evaluate the effectiveness, scalability, and privacy-preserving capability of our LLM-powered labor market analysis framework. We assessed performance across multiple tasks: occupation classification, skill extraction, and salary range estimation. Additionally, we evaluate our federated learning protocol under privacy constraints and analyze system scalability across distributed nodes.
We organized the experiments around three main questions: (Q1) How does our Fed + DP configuration compare to centralized and non-DP baselines across the three tasks? (Q2) How do privacy and optimization hyperparameters affect the privacy–utility trade-off? (Q3) How does performance vary across head and tail segments of the labor market label space?

6.1. Settings

We designed the experimental setting to stress three axes of performance: (i) predictive utility on occupation classification, skill extraction, and salary estimation; (ii) end-to-end privacy under client-side DP with subsampling and secure aggregation; (iii) scalability under cross-institution heterogeneity. Unless stated otherwise, all experiments were repeated with three random seeds, with results reported as mean ± std.

6.1.1. Datasets

Sources and Coverage
We integrated four complementary corpora with distinct statistical profiles and governance constraints: (1) O*NET/SOC curated descriptions (structured, taxonomy-aligned); (2) Indeed postings (high-volume, noisy, rapidly drifting); (3) LinkedIn postings (moderate volume, richer metadata); (4) OpenSkills span annotations (fine-grained skill entities). Each source was deduplicated, anonymized, and normalized into a canonical schema with timestamps, regions, sectors, and ontology links (ISCO-08/SOC/ESCO). Please see Table 2 and Figure 2 for a dataset summary.
We used a time-based split to respect temporal causality: train up to June 2023, validation July 2023–September 2023, and test October 2023–June 2024. Skill-span evaluation uses stratified sampling by sector to reduce distributional mismatch. All reported metrics are for the test window.
Tokenization uses SentencePiece (32k vocab) for T5 and the native tokenizer for LLaMA-2-7B. Entity spans (skills, salaries) are pre-labeled via distant supervision and manually verified on 5k instances. All PII is replaced in-place with pseudo-tokens prior to any learning step.

6.1.2. Baselines

We compared our model against (i) TF-IDF + LR for classification; (ii) BERT-Base fine-tuned per task; (iii) JobBERT (labor-pretrained); (iv) non-private Fed-BERT (FedAvg without DP). All baselines use identical splits and tokenization consistent with their architectures, as shown in Table 3.
TF-IDF uses 1–2 g (100k features). BERT models use AdamW ( lr = 2 × 10 5 , batch 32), with max length 256. Early stopping was based on validation F1 (occupation)/span-F1 (skill)/RMSE (salary). See Figure 3 for training set details.

6.1.3. Configurations

Backbones and Multi-Task Heads
We evaluate T5-base (220 M) and LLaMA-2-7B. Occupation uses a softmax head over 200 SOC codes; skill extraction uses a span head with a CRF layer; salary uses a regression head predicting midpoint and log-range. Unless otherwise noted the following parameter settings were used: AdamW, lr = 3 × 10 5 (linear warmup 5%), batch 64, gradient clip 1.0, label smoothing 0.1 (occ), dropout 0.1. Mixed precision was enabled.
Federated Setup and Privacy
We simulate M = 10 clients with a heterogeneous size and sector mix (Dirichlet α = 0.5 over sector proportions). This stylized configuration mimics a cross-silo deployment with a small number of institutional partners (e.g., ministries, job boards, large firms) exhibiting sectoral imbalance, and yields client label marginals similar to the empirical skew observed in our data. We found that increasing M (with the same Dirichlet prior) preserves the qualitative trends in the privacy–utility trade-off at the cost of higher communication overhead. Each round samples clients with rate q = 0.6 , local epochs E = 5 , total rounds T = 100 . Client-side DP uses clipping C = 1.0 and Gaussian noise multiplier σ = 1.0 unless stated, with moments accountant at δ = 10 5 . Figure 4 presents a visualization of heterogeneity across clients.

6.1.4. Evaluation Metrics

Occupation: Micro/macro-F1 over 200 SOC codes. Skills: span-level precision/recall/F1 with exact boundary match. Salary: MAE and RMSE at the midpoint of posted/estimated ranges, reported in USD after inverting the log transform used during training. We trained the regression head on log-scaled midpoints (and log-range) to temper heavy-tailed effects, so that errors are approximately constant in relative terms across wage brackets while remaining interpretable as absolute differences once mapped back to the original scale. We report ( ε , δ ) with δ = 10 5 via the RDP/moments accountant under Poisson subsampling rate q, clipping C, noise σ , and rounds T. We measured training time per round and P95 inference latency per request. For fairness, latencies exclude network queuing and use the same GPU class. All used metrics is summarized in Table 4.

6.2. Results

We report aggregate task performance, slice-based robustness, and privacy–utility behavior. Unless noted, results are averaged over three seeds with 95% CIs from bootstrap ( n = 1000 ) on the test window (October 2023–June 2024). Centralized models use the same preprocessed inputs as federated models; federated results are computed on the global model after the final round (Table 5).
Table 6 extends the core comparison with micro/macro-F1 for occupation, span-F1 for skills, and MAE/RMSE for salary. Our centralized framework attains the best overall performance; the federated DP variant is within 1.5–3.5% relative of centralized across tasks while satisfying a client-level privacy ledger (cf. Table 7). Improvements over JobBERT are larger on macro-F1, indicating better treatment of long-tail SOC classes.
We analyze robustness by (i) SOC head/tail buckets (top-50 vs. the remaining 150), (ii) sector groups (IT, healthcare, logistics), (iii) region clusters (top-10 MSAs vs. others). The DP model preserves most of its advantages in head classes and exhibits the largest gap in the rare-class tail, consistent with privacy noise acting as additional regularization. We sweep ( σ , q , T ) to realize target ε (moments accountant, δ = 10 5 ) and observe a smooth Pareto front. Even for ε 1.0 , occupation F1 remains above 0.86; at ε 6 , performance nearly matches that of the centralized model, see Figure 5 and Figure 6.
We further assess calibration (ECE) and residual structure (Durbin–Watson on salary errors). DP slightly increases variance but improves overconfidence in head classes. For representative settings we list the realized ( ε , δ ) alongside task metrics. Tighter privacy (smaller ε ) slightly reduces tail-class macro-F1 but keeps micro-F1 and salary MAE competitive.

6.3. Comparison with Prior Work

Table 8 and Table 9 position our framework against prior centralized models on all three tasks. As shown in Table 8, traditional TF-IDF + LR and BERT-Base baselines lag behind the labor-pretrained JobBERT model, which attains an occupation micro/macro-F1 of 0.84 / 0.75 , skill F1 of 0.75 , and salary MAE of USD 5010. Our centralized LLM framework further improves these metrics to 0.90 / 0.83 and 0.83 with a reduced MAE of USD 3890, while the Fed+DP variant remains competitive at 0.88 / 0.81 and 0.81 with an MAE of USD 4160. Table 9 quantifies these gains relative to JobBERT, showing absolute increases of + 0.06 and + 0.08 in occupation micro/macro-F1 and + 0.08 in skill F1 for the centralized model (roughly 7.1 % relative improvement in micro-F1) alongside a 22.4 % reduction in salary MAE; the Fed+DP model still delivers 4.8 % higher micro-F1 and a 17.0 % lower MAE. Together, these comparisons indicate that our approach not only surpasses prior centralized state-of-the-art methods but also retains most of the accuracy when trained in a privacy-preserving federated setting.

6.4. Ablation Study

We dissect the contribution of architectural and training choices along three axes: (i) representation learning (DAPT, prompts, CRF), (ii) multi-task coupling (loss weights, gradient orthogonalization), (iii) privacy/federation mechanisms (clipping C, noise σ , client rate q, secure aggregation). All ablations follow the settings in Section 6.1 and were evaluated on the time-held-out test window. Unless stated otherwise, the presented values are means over three seeds with 95% CIs (bootstrap, n = 1000 ).
Table 10 quantifies the effect of removing one component at a time from the full Fed + DP model. Domain-adaptive pretraining (DAPT) and multi-task coupling are the largest contributors across tasks; gradient orthogonalization primarily improves macro-F1 by stabilizing tail classes. The CRF decoder benefits span-level skill extraction but has negligible impact on occupation classification.
We swept ( C , σ , q ) while holding rounds T = 100 to study the DP–utility surface. Table 11 reports realized ( ε , δ = 10 5 ) and task metrics. Table 9 thus provides an empirical approximation of the privacy–utility surface induced by our ledger over a discrete grid of ( C , σ , q ) settings. Deriving closed-form optimality conditions for the choice of ( C , σ , q , T ) would require additional structural assumptions on the loss landscape and client heterogeneity, and is left for future work. Larger noise σ or smaller clipping C strengthen privacy but increase error; higher participation q improves utility at the cost of a larger ε .
Figure 7 shows a waterfall-style decomposition of occupation micro-F1 starting from a non-private centralized baseline to the fully private federated system, adding or removing one mechanism at a time. The most significant deltas correspond to DAPT and multi-task coupling; DP noise introduces a smaller, monotone drop. Table 12 depicts an interpolated surface of occupation micro-F1 over ( σ , q ) at fixed C = 1.0 and T = 100 . Utility increases with q and decreases with σ ; contour lines indicate constant ε levels from the privacy ledger, illustrating feasible operating regimes.

6.5. Discussion

From both the convergence analysis in Theorem 3 and the empirical results, we can infer how the proposed approach scales to a larger number of clients M and more diverse datasets. Increasing M at a fixed client sampling rate q reduces variance in the aggregated updates and improves robustness to DP noise, but it may also amplify heterogeneity effects when new clients differ strongly from the existing population. Similarly, incorporating additional sectors or countries increases the coverage and policy relevance of the resulting indicators, but can make optimization more challenging if label distributions become highly unbalanced. These trade-offs suggest that future large-scale deployments should combine the present framework with client clustering or personalized heads to better accommodate strong cross-client diversity.

6.6. Limitations

While our results suggest that an LLM-based, federated, and DP-protected framework can deliver accurate and scalable labor market intelligence, several limitations remain. First, our empirical evaluation focuses on English-language data from a limited set of platforms and taxonomies (O*NET, SOC, ESCO), so generalization to low-resource languages, informal sectors, or alternative occupational systems is not guaranteed. Second, the federated experiments are conducted in a controlled simulation with ten cross-silo clients and an honest-but-curious coordinator, which does not capture all operational challenges of real deployments, such as highly unbalanced institutions, client churn, adversarial behavior, or stricter legal constraints. Third, our theoretical analysis relies on standard assumptions (bounded variance, Lipschitz losses, unbiased compression) and moderate model sizes; more extreme non-IID regimes, ultra-large models, or aggressive compression schemes may violate these conditions and lead to different privacy–utility trade-offs. Fourth, although we quantify DP-induced utility loss at the task level, we do not systematically study fairness or distributional impacts across demographic groups, nor do we fully characterize how PII removal and anonymization affect downstream outcomes. Finally, the framework currently targets three supervised tasks (occupation, skills, salary) and does not incorporate human-in-the-loop feedback, causal inference, or richer behavioral data; extending the system in these directions, while preserving rigorous privacy guarantees and governance, is an important avenue for future work.

7. Conclusions and Future Work

This work introduced a comprehensive, privacy-preserving, and scalable framework for labor market analysis powered by large language models (LLMs). By integrating domain-adaptive pretraining, federated learning, and differential privacy into a unified analytical pipeline, our system enables the extraction of structured, interpretable, and timely labor insights from heterogeneous textual data sources such as job postings, resumes, and workforce reports. The proposed framework addresses the dual challenge of high analytical precision and stringent privacy protection, offering a practical solution for institutions constrained by data-sharing restrictions or regulatory compliance requirements.
Through extensive experiments across multiple real-world datasets—including O*NET, SOC, and millions of contemporary job postings—we demonstrated that our framework achieves superior performance in occupation classification, skill extraction, and salary estimation tasks, outperforming both traditional and transformer-based baselines. The federated variant, equipped with differential privacy mechanisms, achieves nearly comparable performance to centralized models, establishing that strong privacy guarantees need not come at the cost of analytical utility. Moreover, our scalability analysis confirmed the system’s suitability for distributed deployments across organizations or geographic regions, with minimal latency and near-linear efficiency under realistic federated configurations.
Beyond quantitative improvements, the qualitative evaluations revealed that the LLM-generated labor trend summaries provide meaningful, policy-relevant insights. Expert reviewers—comprising labor economists and HR professionals—rated these summaries highly in factual accuracy, interpretability, and relevance, attesting to the framework’s potential for supporting evidence-based decision making in workforce development, talent management, and macroeconomic forecasting. The system’s interpretability and modular design also enable transparent policy communication and reproducibility in data-driven labor analytics.
From a broader perspective, this research bridges advances in privacy-preserving machine learning and applied labor economics. It highlights how decentralized and secure computation paradigms can empower multi-institutional collaborations without compromising individual or organizational data sovereignty. Future directions include incorporating secure multiparty computation (SMPC) for cryptographic aggregation, developing dynamic ontology alignment between evolving occupational taxonomies, and integrating reinforcement learning from human feedback (RLHF) to refine model interpretability and ethical alignment. We envision this line of work as a foundation for the next generation of socially responsible, AI-driven labor intelligence systems that balance analytic depth, fairness, and privacy preservation at scale.

Author Contributions

Conceptualization, W.J.; Methodology, W.J.; Formal Analysis, W.J.; Resources, Z.Y.; Supervision, Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by an MOE (Ministry of Education in China) Liberal Arts and Social Sciences Foundation project, titled “Research on the Impact and Response Strategies of Generative Artificial Intelligence on the Chinese Labor Market” (23YJC790049), and Fujian Provincial Social Science Foundation Youth Project: “Research on the Mechanisms and Paths by which Industrial Digital Finance Empowers the Development of Fujian’s Real Economy” (Project No. FJ2024C017).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
  2. Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
  3. Violante, G.L. Skill-biased technical change. In The New Palgrave Dictionary of Economics; Springer: London, UK, 2018; pp. 12389–12394. [Google Scholar]
  4. Acemoglu, D.; Restrepo, P. Artificial intelligence, automation, and work. In The Economics of Artificial Intelligence: An Agenda; University of Chicago Press: Chicago, IL, USA, 2018; pp. 197–236. [Google Scholar]
  5. Hershbein, B.; Macaluso, C.; Yeh, C. Labor market concentration and the demand for skills. In Proceedings of the IDSC of IZA Workshop: Matching Workers and Jobs, Online, 21–22 September 2018; pp. 2–6. [Google Scholar]
  6. Marinescu, I.; Qiu, Y.; Sojourner, A. Wage Inequality and Labor Rights Violations; Technical Report; National Bureau of Economic Research: Cambridge, MA, USA, 2021. [Google Scholar]
  7. Zhang, T.; Li, B. Job crafting and turnover intention: The mediating role of work engagement and job satisfaction. Soc. Behav. Personal. Int. J. 2020, 48, 1–9. [Google Scholar] [CrossRef]
  8. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
  9. Decorte, J.J.; Van Hautte, J.; Demeester, T.; Develder, C. Jobbert: Understanding job titles through skills. arXiv 2021, arXiv:2109.09605. [Google Scholar] [CrossRef]
  10. Liu, Y.; Liu, H.; Wong, L.P.; Lee, L.K.; Zhang, H.; Hao, T. A hybrid neural network RBERT-C based on pre-trained RoBERTa and CNN for user intent classification. In Proceedings of the Neural Computing for Advanced Applications: First International Conference, NCAA 2020, Shenzhen, China, 3–5 July 2020; Proceedings 1. Springer: Singapore, 2020; pp. 306–319. [Google Scholar]
  11. Qiu, Q.; Tian, M.; Xie, Z.; Tan, Y.; Ma, K.; Wang, Q.; Pan, S.; Tao, L. Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach. J. Earth Sci. 2023, 34, 1406–1417. [Google Scholar] [CrossRef]
  12. Peng, W.; Li, W.; Hu, Y. Leader-generator net: Dividing skill and implicitness for conquering FairytaleQA. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 791–801. [Google Scholar]
  13. Teulings, C.N. The wage distribution in a model of the assignment of skills to jobs. J. Political Econ. 1995, 103, 280–315. [Google Scholar] [CrossRef]
  14. Quan, T.Z.; Raheem, M. Salary prediction in data science field using specialized skills and job benefits–a literature. J. Appl. Technol. Innov. 2022, 6, 70–74. [Google Scholar]
  15. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
  16. Wu, S.; Irsoy, O.; Lu, S.; Dabravolski, V.; Dredze, M.; Gehrmann, S.; Kambadur, P.; Rosenberg, D.; Mann, G. BloombergGPT: A Large Language Model for Finance. arXiv 2023, arXiv:2303.17564. [Google Scholar] [CrossRef]
  17. Yang, H.; Liu, X.Y.; Wang, C.D. FinGPT: Open-Source Financial Large Language Models. arXiv 2023, arXiv:2306.06031. [Google Scholar] [CrossRef]
  18. Lee, D.K.C.; Guan, C.; Yu, Y.; Ding, Q. A Comprehensive Review of Generative AI in Finance. FinTech 2024, 3, 460–478. [Google Scholar] [CrossRef]
  19. Carriero, A.; Pettenuzzo, D.; Shekhar, S. Macroeconomic Forecasting with Large Language Models. arXiv 2024, arXiv:2407.00890. [Google Scholar] [CrossRef]
  20. Kwon, B.; Park, T.; Perez-Cruz, F.; Rungcharoenkitkul, P. Large Language Models: A Primer for Economists; BIS Quarterly Review; Bank for International Settlements: Basel, Switzerland, 2024. [Google Scholar]
  21. Korinek, A. Generative AI for Economic Research: Use Cases and Implications for Economists. J. Econ. Lit. 2023, 61, 1281–1317. [Google Scholar] [CrossRef]
  22. Eloundou, T.; Manning, S.; Mishkin, P.; Rock, D. GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models. arXiv 2023, arXiv:2303.10130. [Google Scholar] [CrossRef]
  23. Chen, Y.; Fang, H.; Zhao, Y.; Zhao, Z. Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch; NBER Working Paper 32327; National Bureau of Economic Research: Cambridge, MA, USA, 2024. [Google Scholar]
  24. Serino, A. Skills-Hunter: Adapting Large Language Models to the Labour Market for Skills Extraction. In Proceedings of the AIxIA 2023 Doctoral Consortium, Rome, Italy, 6–7 November 2023. CEUR Workshop Proceedings. [Google Scholar]
  25. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
  26. Pan, Z.; Ying, Z.; Wang, Y.; Zhang, C.; Zhang, W.; Zhou, W.; Zhu, L. Feature-Based Machine Unlearning for Vertical Federated Learning in IoT Networks. IEEE Trans. Mob. Comput. 2025, 24, 5031–5044. [Google Scholar] [CrossRef]
  27. Coelho, K.K.; Nogueira, M.; Vieira, A.B.; Silva, E.F.; Nacif, J.A.M. A survey on federated learning for security and privacy in healthcare applications. Comput. Commun. 2023, 207, 113–127. [Google Scholar] [CrossRef]
  28. Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–19. [Google Scholar] [CrossRef]
  29. Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, 4–7 March 2006; Proceedings 3. Springer: Berlin/Heidelberg, Germany, 2006; pp. 265–284. [Google Scholar]
  30. Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
Figure 1. General pipeline of our proposal.
Figure 1. General pipeline of our proposal.
Mathematics 14 00053 g001
Figure 2. Monthly volume by source. Curves: solid circles = Indeed; dashed triangles = LinkedIn; dotted squares = O*NET. The high-frequency, high-volume behavior of Indeed and LinkedIn motivates robust drift handling and federated scaling.
Figure 2. Monthly volume by source. Curves: solid circles = Indeed; dashed triangles = LinkedIn; dotted squares = O*NET. The high-frequency, high-volume behavior of Indeed and LinkedIn motivates robust drift handling and federated scaling.
Mathematics 14 00053 g002
Figure 3. Label skew in SOC. Cumulative mass of the top-k SOC classes on the training split. Long tails motivate macro-F1 reporting and calibrated sampling during fine-tuning.
Figure 3. Label skew in SOC. Cumulative mass of the top-k SOC classes on the training split. Long tails motivate macro-F1 reporting and calibrated sampling during fine-tuning.
Mathematics 14 00053 g003
Figure 4. Heterogeneity across clients. Bars show JS divergence between each client’s sector distribution and the global mix. Larger divergence implies stronger non-IID effects—relevant for DP–utility trade-offs. Unless otherwise noted, we select the default hyperparameters in Table 1 via a small grid/random search on the validation split, using occupation micro-F1 as the primary selection criterion under a fixed target privacy budget. We do not employ Bayesian optimization or nested cross-validation for these settings, favouring a simple and easily reproducible search procedure.
Figure 4. Heterogeneity across clients. Bars show JS divergence between each client’s sector distribution and the global mix. Larger divergence implies stronger non-IID effects—relevant for DP–utility trade-offs. Unless otherwise noted, we select the default hyperparameters in Table 1 via a small grid/random search on the validation split, using occupation micro-F1 as the primary selection criterion under a fixed target privacy budget. We do not employ Bayesian optimization or nested cross-validation for these settings, favouring a simple and easily reproducible search procedure.
Mathematics 14 00053 g004
Figure 5. Privacy–utility trade-off. Performance on the three tasks as functions of the privacy budget ( ε , δ ) for a centralized non-DP baseline (Centralized), a federated non-DP baseline (Fed, no DP), and our federated DP configuration (Fed + DP). Curves share the same backbone architecture and training budget. The gap between the Fed + DP series and the corresponding non-DP baseline quantifies the cost of differential privacy at the chosen hyperparameter settings.
Figure 5. Privacy–utility trade-off. Performance on the three tasks as functions of the privacy budget ( ε , δ ) for a centralized non-DP baseline (Centralized), a federated non-DP baseline (Fed, no DP), and our federated DP configuration (Fed + DP). Curves share the same backbone architecture and training budget. The gap between the Fed + DP series and the corresponding non-DP baseline quantifies the cost of differential privacy at the chosen hyperparameter settings.
Mathematics 14 00053 g005
Figure 6. Reliability diagram (occupation). Curves: thin solid = centralized; thick dashed = Fed + DP; dotted = perfect calibration. DP reduces overconfidence in mid–high bins, narrowing the gap to the diagonal.
Figure 6. Reliability diagram (occupation). Curves: thin solid = centralized; thick dashed = Fed + DP; dotted = perfect calibration. DP reduces overconfidence in mid–high bins, narrowing the gap to the diagonal.
Mathematics 14 00053 g006
Figure 7. Waterfall of contributions. Bars (left→right): centralized baseline; removals of DAPT, multi-task, gradient orthogonalization, CRF federated without DP, CRF federated with DP. Largest drops arise from removing DAPT and multi-task coupling; DP induces a smaller decrement.
Figure 7. Waterfall of contributions. Bars (left→right): centralized baseline; removals of DAPT, multi-task, gradient orthogonalization, CRF federated without DP, CRF federated with DP. Largest drops arise from removing DAPT and multi-task coupling; DP induces a smaller decrement.
Mathematics 14 00053 g007
Table 1. Key hyperparameters. Default values and alternatives used in sensitivity analyses (privacy ledger selects ( C , σ , q , T ) to meet target ε at δ = 10 5 ). For both the centralized and federated configurations we select hyperparameters using a small grid/random search on a held-out validation split. We vary learning rates, batch sizes, and warmup ratios within ranges recommended for the chosen backbone LLM, and, for Fed + DP, we explore a grid over clipping norms C, noise multipliers σ , client sampling rates q, and numbers of rounds T that satisfy target privacy budgets according to our RDP ledger. We retain the same search space across centralized and federated models to ensure a fair comparison and report the best-performing configuration per setting.
Table 1. Key hyperparameters. Default values and alternatives used in sensitivity analyses (privacy ledger selects ( C , σ , q , T ) to meet target ε at δ = 10 5 ). For both the centralized and federated configurations we select hyperparameters using a small grid/random search on a held-out validation split. We vary learning rates, batch sizes, and warmup ratios within ranges recommended for the chosen backbone LLM, and, for Fed + DP, we explore a grid over clipping norms C, noise multipliers σ , client sampling rates q, and numbers of rounds T that satisfy target privacy budgets according to our RDP ledger. We retain the same search space across centralized and federated models to ensure a fair comparison and report the best-performing configuration per setting.
HyperparameterValueAlt-1Alt-2AffectsTuned By
Learning rate 3 × 10 5 2 × 10 5 5 × 10 5 all tasksval.
Batch size643296all taskshw cap.
Warmup ratio0.050.10.0stabilityval.
DP clip C1.00.52.0privacy/utilityledger
Noise σ 1.00.71.3privacy/utilityledger
Client rate q0.60.40.8speed/privacyablation
Rounds T10060140utility/privacyablation
Table 2. Dataset summary. Document counts, average length (post-tokenization), temporal coverage, and label availability. O*NET/SOC provides clean taxonomy alignment; Indeed/LinkedIn contribute high-volume, fast-drifting signals; OpenSkills supplies span-level supervision.
Table 2. Dataset summary. Document counts, average length (post-tokenization), temporal coverage, and label availability. O*NET/SOC provides clean taxonomy alignment; Indeed/LinkedIn contribute high-volume, fast-drifting signals; OpenSkills supplies span-level supervision.
Corpus#DocsAvg TokensTime Span#RegionsLabel Coverage
O*NET/SOC18,0002682019–202450 statesocc, skills
Indeed1,200,0003562021–202450 statesocc, skills, salary
LinkedIn420,0003312021–202450 statesocc, skills, salary
OpenSkills240,0001422018–2023n/askill spans
Total1,878,0002018–2024multi-task
Table 3. Baseline configurations. Steps reflect the same time-based split; Fed-BERT uses E = 5 local epochs per round.
Table 3. Baseline configurations. Steps reflect the same time-based split; Fed-BERT uses E = 5 local epochs per round.
BaselineParamsMax LenTrain StepsInit
TF-IDF + LR512 (trunc.)50 epochsTF-IDF
BERT-Base110 M25680kbert-base-uncased
JobBERT110 M25680klabor-pretrained
Non-Private Fed-BERT110 M256 T = 100 roundsbert-base-uncased
Table 4. Task-specific metrics. Primary metrics reported in main tables.
Table 4. Task-specific metrics. Primary metrics reported in main tables.
MetricOccupationSkillSalary
Primarymicro/macro F1span F1MAE
Secondarytop-1/top-5 acc.span P/RRMSE
CalibrationECE (%)span-length biasresidual std.
Table 5. Slice analysis. Federated DP is closest to centralized in head classes and dense regions; largest gaps appear in tail classes and sparse regions. Salary: MAE and RMSE at the midpoint of posted/estimated ranges, reported in USD after inverting the log transform used during training. We train the regression head on log-scaled midpoints (and log-range) to temper heavy-tailed effects, so that errors are approximately constant in relative terms across wage brackets while remaining interpretable as absolute differences once mapped back to the original scale.
Table 5. Slice analysis. Federated DP is closest to centralized in head classes and dense regions; largest gaps appear in tail classes and sparse regions. Salary: MAE and RMSE at the midpoint of posted/estimated ranges, reported in USD after inverting the log transform used during training. We train the regression head on log-scaled midpoints (and log-range) to temper heavy-tailed effects, so that errors are approximately constant in relative terms across wage brackets while remaining interpretable as absolute differences once mapped back to the original scale.
SliceOcc. F1Skill F1Salary MAE
CentralFed + DPCentralFed + DPCentralFed + DP
Head SOC (top-50) 0.93 0.92 0.85 0.84 35403680
Tail SOC (others) 0.85 0.82 0.80 0.78 44104760
IT sector 0.91 0.89 0.84 0.82 37203920
Healthcare 0.90 0.88 0.83 0.81 38804120
Logistics 0.88 0.86 0.81 0.79 39604210
Top-10 MSAs 0.91 0.90 0.84 0.82 36103820
Non-top MSAs 0.88 0.86 0.81 0.79 41704470
Table 6. Main results across tasks. Ours + Centralized trains the proposed model on pooled data on a single server (no FL). Fed + DP trains the same model via cross-silo federated learning with a central coordinator and client-level DP on updates. All configurations share the same backbone architecture; differences arise only from the training regime and the presence of DP noise.
Table 6. Main results across tasks. Ours + Centralized trains the proposed model on pooled data on a single server (no FL). Fed + DP trains the same model via cross-silo federated learning with a central coordinator and client-level DP on updates. All configurations share the same backbone architecture; differences arise only from the training regime and the presence of DP noise.
ModelOcc. F1micOcc. F1macSkill F1Salary MAESalary RMSENotes
TF-IDF + LR 0.68 ± 0.01 0.49 ± 0.02 0.44 ± 0.02 8920 ± 180 12 , 470 ± 210 BoW baseline
BERT-Base 0.81 ± 0.01 0.71 ± 0.02 0.72 ± 0.01 5480 ± 120 8010 ± 150 fine-tuned
JobBERT 0.84 ± 0.01 0.75 ± 0.01 0.75 ± 0.01 5010 ± 110 7380 ± 140 labor-pretrained
Ours (Centralized)0.90 ± 0.000.83 ± 0.010.83 ± 0.013890 ± 905960 ± 110T5/LLaMA hybrid
Ours (Fed + DP) 0.88 ± 0.01 0.81 ± 0.01 0.81 ± 0.01 4160 ± 100 6220 ± 120 q = 0.6 , σ = 1.0
Table 7. Privacy ledger vs. utility. Lower ε (stronger privacy) correlates with modest utility loss, predominantly in macro-F1.
Table 7. Privacy ledger vs. utility. Lower ε (stronger privacy) correlates with modest utility loss, predominantly in macro-F1.
q σ T ε ( δ = 10 5 ) Occ F1micOcc F1macSkill F1MAE
0.61.3800.920.8660.7890.8014340
0.61.01001.470.8810.8080.8144160
0.60.81202.350.8850.8120.8184090
0.80.81003.900.8870.8150.8204060
0.80.71206.050.8890.8180.8224020
Table 8. Comparison with prior centralized models and our framework on all three tasks. Baseline figures are re-evaluated on our data split for a consistent comparison.
Table 8. Comparison with prior centralized models and our framework on all three tasks. Baseline figures are re-evaluated on our data split for a consistent comparison.
ModelOcc. F1mic Occ. F1macSkill F1Salary MAE (USD)
TF-IDF + LR (BoW baseline)0.680.490.448920
BERT-Base [8]0.810.710.725480
JobBERT [9]0.840.750.755010
Ours (Centralized)0.900.830.833890
Ours (Fed + DP)0.880.810.814160
Table 9. Absolute and relative improvements over the JobBERT baseline on our evaluation split. Positive Δ in F1 indicates higher accuracy; negative Δ in MAE indicates lower error (better).
Table 9. Absolute and relative improvements over the JobBERT baseline on our evaluation split. Positive Δ in F1 indicates higher accuracy; negative Δ in MAE indicates lower error (better).
Model vs. JobBERTAbsolute Δ (Ours − JobBERT)Relative Change (%)
Occ. F1micOcc. F1macSkill F1Occ. F1micSalary MAE
Ours (Centralized) + 0.06 + 0.08 + 0.08 + 7.1 22.4
Ours (Fed + DP) + 0.04 + 0.06 + 0.06 + 4.8 17.0
Table 10. Component ablations. Removing DAPT or multi-task coupling causes the largest degradations; gradient orthogonalization improves tail robustness (macro-F1); CRF aids span-level skill extraction.
Table 10. Component ablations. Removing DAPT or multi-task coupling causes the largest degradations; gradient orthogonalization improves tail robustness (macro-F1); CRF aids span-level skill extraction.
VariantOcc. F1micOcc. F1macSkill F1Salary MAE
Full (Fed + DP)0.881  ± 0.004 0.808  ± 0.006 0.814  ± 0.005 4160  ± 90
w/o DAPT 0.846 ± 0.006 0.768 ± 0.008 0.781 ± 0.008 4570 ± 110
w/o Multi-task (single-task heads) 0.813 ± 0.008 0.742 ± 0.009 0.754 ± 0.010 4920 ± 120
w/o Gradient orthogonalization 0.873 ± 0.005 0.796 ± 0.007 0.809 ± 0.006 4230 ± 100
w/o CRF (skills only) 0.881 ± 0.004 0.807 ± 0.006 0.792 ± 0.007 4190 ± 95
w/o Prompt templates 0.868 ± 0.006 0.792 ± 0.007 0.804 ± 0.006 4300 ± 100
Table 11. DP hyperparameter sweep. Realized ε (moments accountant) trades off with utility. Stronger privacy (smaller ε ) modestly reduces macro-F1 and increases MAE.
Table 11. DP hyperparameter sweep. Realized ε (moments accountant) trades off with utility. Stronger privacy (smaller ε ) modestly reduces macro-F1 and increases MAE.
C σ q ε Occ F1micOcc F1macSkill F1MAE
0.51.30.40.780.8620.7870.7984410
1.01.00.61.470.8810.8080.8144160
1.50.80.62.100.8850.8120.8174110
1.00.80.83.250.8870.8150.8194070
1.50.70.84.900.8890.8180.8214040
Table 12. DP utility surface data. Occupation F1mic as a function of participation rate (q) and noise multiplier ( σ ) at fixed C = 1.0 , T = 100 . Higher q improves utility; larger σ reduces it.
Table 12. DP utility surface data. Occupation F1mic as a function of participation rate (q) and noise multiplier ( σ ) at fixed C = 1.0 , T = 100 . Higher q improves utility; larger σ reduces it.
q (Participation Rate) σ = 0.6 σ = 0.8 σ = 1.0 σ = 1.2
0.30.8720.8660.8580.851
0.50.8810.8740.8670.860
0.70.8880.8810.8730.866
0.80.8890.8830.8750.868
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ji, W.; Ying, Z. An LLM-Powered Framework for Privacy-Preserving and Scalable Labor Market Analysis. Mathematics 2026, 14, 53. https://doi.org/10.3390/math14010053

AMA Style

Ji W, Ying Z. An LLM-Powered Framework for Privacy-Preserving and Scalable Labor Market Analysis. Mathematics. 2026; 14(1):53. https://doi.org/10.3390/math14010053

Chicago/Turabian Style

Ji, Wei, and Zuobin Ying. 2026. "An LLM-Powered Framework for Privacy-Preserving and Scalable Labor Market Analysis" Mathematics 14, no. 1: 53. https://doi.org/10.3390/math14010053

APA Style

Ji, W., & Ying, Z. (2026). An LLM-Powered Framework for Privacy-Preserving and Scalable Labor Market Analysis. Mathematics, 14(1), 53. https://doi.org/10.3390/math14010053

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop