Skip to content

[FEATURE] Adding the SkillSkape dataset #22

@jjzha

Description

@jjzha

Problem

Is your proposal tackling an existing problem or limitation?

  • No, it's an addition

Proposal

To add the SkillSkape dataset (https://aclanthology.org/2024.nlp4hr-1.4.pdf), a (synthetically generated) dataset for skill matching (sentence -> multiple skills); a multi-label classification task. The dataset consists of 6352 train, 1316 dev., 1272 test samples.

  • Type:
    • New Ontology (data source for multiple tasks)
    • New Task(s)
    • New Model(s)
    • New Metric(s)
    • Other
  • Area(s) of code: paths, modules, or APIs you expect to touch
    workrb/src/workrb/tasks/classification
    workrb/tests

Additional Context

Originally from

"Antoine Magron, Anna Dai, Mike Zhang, Syrielle Montariol, and Antoine Bosselut. 2024. JobSkape: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching. In Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024), pages 43–58, St. Julian’s, Malta. Association for Computational Linguistics."

A sample from the training set:

(idx),sentence,skills
1,"The ideal candidate will possess a proven track record of fostering a collaborative work environment, inspiring and energizing team members to achieve their full potential.","['encourage teambuilding', 'motivate employees', 'keep up with digital transformation of industrial processes', 'teamwork principles']"

Implementation

  • I plan to implement this in a PR
  • I am proposing the idea and would like someone else to pick it up

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions