LLM-Supervised Multilingual Skill Extraction and Classification from Job Ads

Jakob Mørup Wang*, Zhiru Sun

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

This paper presents a pipeline for extracting, classifying, and representing skills requested in job advertisements to enable demand-side labor market analysis. Our contributions include: (1) addressing the annotation bottleneck by leveraging scalable, taxonomy-aligned LLM supervision for training a lightweight sentence encoder, (2) expanding skill extraction to include implicit skill requirements as well as the explicit mentions typically targeted in prior work, and (3) representing skills as distributions to robustly support downstream tasks despite the fluid, overlapping nature of skill definitions. Concretely, we compile 3M+ postings from 10k+ sources and sample 500k+ sentences to fine-tune paraphrase-multilingual-mpnet-base-v2 for identifying skill requests and mapping them to the 13,896-skill ESCO taxonomy, supervised by GPT-4o mini. The outcome is normalized per-ad skill distributions, aggregated from sentence-level distributions weighted by request probability.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems : 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part II
EditorsRyutaro Ichise
PublisherSpringer
Publication date2026
Pages94-104
ISBN (Print)9783031971433
ISBN (Electronic)978-3-031-97144-0
DOIs
Publication statusPublished - 2026
Event30th International Conference on Natural Language and Information Systems, NLDB 2025 - Kanazawa, Japan
Duration: 4. Jul 20256. Jul 2025

Conference

Conference30th International Conference on Natural Language and Information Systems, NLDB 2025
Country/TerritoryJapan
CityKanazawa
Period04/07/202506/07/2025
SeriesLecture Notes in Computer Science
Volume15837 LNCS
ISSN0302-9743

Keywords

  • Labor market analysis
  • LLM supervision
  • Skill classification
  • Skill extraction
  • Taxonomy alignment

Fingerprint

Dive into the research topics of 'LLM-Supervised Multilingual Skill Extraction and Classification from Job Ads'. Together they form a unique fingerprint.

Cite this