# ESCO Skill Extractor
This is a a tool that extract **ESCO skills** and **ISCO occupations** from texts such as job descriptions or CVs. It uses a transformer and compares its embedding using cosine similarity.
## Installation
```bash
pip install esco-skill-extractor
```
## Usage
### Via python
```python
from esco_skill_extractor import SkillExtractor
# Don't be scared, the 1st time will take longer to download the model and create the embeddings.
skill_extractor = SkillExtractor()
ads = [
"We are looking for a software engineer with experience in Java and Python.",
"We are looking for a devops engineer. Containerization tools such as Docker is a must. AWS is a plus."
# ...
]
print(skill_extractor.get_skills(ads))
# [
# [
# "http://data.europa.eu/esco/skill/19a8293b-8e95-4de3-983f-77484079c389",
# "http://data.europa.eu/esco/skill/ccd0a1d9-afda-43d9-b901-96344886e14d",
# ],
# [
# "http://data.europa.eu/esco/skill/11430d93-c835-48ed-8e70-285fa69c9ae6",
# "http://data.europa.eu/esco/skill/ae4f0cc6-e0b9-47f5-bdca-2fc2e6316dce",
# "http://data.europa.eu/esco/skill/ce8ae6ca-61d8-4174-b457-641de96cbff4",
# "http://data.europa.eu/esco/skill/f0de4973-0a70-4644-8fd4-3a97080476f4",
# ],
# ]
print(skill_extractor.get_occupations(ads))
# [
# [
# "http://data.europa.eu/esco/occupation/10469d70-78a3-4650-9e29-d04de13c62c1",
# "http://data.europa.eu/esco/occupation/1c5a896a-e010-4217-a29a-c44db26e25da",
# "http://data.europa.eu/esco/occupation/4874fa37-0cd1-4a68-aed8-a838851f242d",
# "http://data.europa.eu/esco/occupation/579254cf-6d69-4889-9000-9c79dc568644",
# "http://data.europa.eu/esco/occupation/57af9090-55b4-4911-b2d0-86db01c00b02",
# "http://data.europa.eu/esco/occupation/f2b15a0e-e65a-438a-affb-29b9d50b77d1",
# "http://data.europa.eu/esco/isco/C2512",
# "http://data.europa.eu/esco/isco/C2514",
# ],
# [
# "http://data.europa.eu/esco/occupation/2fb96c6c-8d0b-4ef0-b1ee-3e493305e4eb",
# "http://data.europa.eu/esco/occupation/349ee6f6-c295-4c38-9b98-48765b55280e",
# "http://data.europa.eu/esco/occupation/781a6350-e686-45b9-b075-e4c8d5a05ff7",
# "http://data.europa.eu/esco/occupation/93b11f0f-69af-4ece-b9da-f29aab7d38d3",
# "http://data.europa.eu/esco/occupation/bb609566-3ab6-44dd-8f48-cf0b15b96827",
# "http://data.europa.eu/esco/occupation/cc867bee-ab5c-427f-9244-f7a204d9574b",
# ],
# ]
```
### Via GUI
```bash
# Visit the URL printed in the console.
# run python -m esco_skill_extractor --help for more options.
python -m esco_skill_extractor
```
<img src="docs/gui.gif">
### Via API
```bash
# Visit the URL printed in the console.
# run python -m esco_skill_extractor --help for more options.
python -m esco_skill_extractor
```
```js
async function getSkills() {
const texts = [
"We are looking for a software engineer with experience in Java and Python.",
"We are looking for a devops engineer. Containerization tools such as Docker is a must. AWS is a plus.",
// ...
];
// Default host is localhost, and default port is 8000. Check CLI options for more.
const response = await fetch("http://localhost:8000/extract-skills", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify(texts),
});
const skills = await response.json();
console.log(skills);
// [
// [
// "http://data.europa.eu/esco/skill/19a8293b-8e95-4de3-983f-77484079c389",
// "http://data.europa.eu/esco/skill/ccd0a1d9-afda-43d9-b901-96344886e14d",
// ],
// [
// "http://data.europa.eu/esco/skill/11430d93-c835-48ed-8e70-285fa69c9ae6",
// "http://data.europa.eu/esco/skill/ae4f0cc6-e0b9-47f5-bdca-2fc2e6316dce",
// "http://data.europa.eu/esco/skill/ce8ae6ca-61d8-4174-b457-641de96cbff4",
// "http://data.europa.eu/esco/skill/f0de4973-0a70-4644-8fd4-3a97080476f4",
// ],
// ]
const occupations = await fetch("http://localhost:8000/extract-occupations", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify(texts),
});
}
```
## Possible keyword arguments for `SkillExtractor`
| Keyword Argument | Description | Default |
| -------------------- | ------------------------------------------------------------------------------- | ------------------------------ |
| skill_threshold | Skills surpassing this cosine similarity threshold are considered a match. | 0.45 |
| occupation_threshold | Occupations surpassing this cosine similarity threshold are considered a match. | 0.55 |
| device | The device where the copulations will take place. AKA torch device. | "cuda" if available else "cpu" |
## How it works
1. It creates embeddings for esco skills and ISCO occupations.
2. It creates embeddings for the sentences of the input texts.
3. It compares the embeddings of of the selected entity and the sentences using cosine similarity and it takes the maximum value.
4. An entity matches a sentence if the cosine similarity is above a certain threshold.
Raw data
{
"_id": null,
"home_page": "https://github.com/KonstantinosPetrakis/esco-skill-extractor",
"name": "esco-skill-extractor",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Konstantinos Petrakis",
"author_email": "konstpetrakis01@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/65/3c/832e4b6a192bf8d99dde48b85d3aab240728f712324971b4fba461ecf7bd/esco-skill-extractor-0.1.15.tar.gz",
"platform": null,
"description": "# ESCO Skill Extractor\n\nThis is a a tool that extract **ESCO skills** and **ISCO occupations** from texts such as job descriptions or CVs. It uses a transformer and compares its embedding using cosine similarity.\n\n## Installation\n\n```bash\npip install esco-skill-extractor\n```\n\n## Usage\n\n### Via python\n\n```python\nfrom esco_skill_extractor import SkillExtractor\n\n# Don't be scared, the 1st time will take longer to download the model and create the embeddings.\nskill_extractor = SkillExtractor()\n\nads = [\n \"We are looking for a software engineer with experience in Java and Python.\",\n \"We are looking for a devops engineer. Containerization tools such as Docker is a must. AWS is a plus.\"\n # ...\n]\n\nprint(skill_extractor.get_skills(ads))\n# [\n# [\n# \"http://data.europa.eu/esco/skill/19a8293b-8e95-4de3-983f-77484079c389\",\n# \"http://data.europa.eu/esco/skill/ccd0a1d9-afda-43d9-b901-96344886e14d\",\n# ],\n# [\n# \"http://data.europa.eu/esco/skill/11430d93-c835-48ed-8e70-285fa69c9ae6\",\n# \"http://data.europa.eu/esco/skill/ae4f0cc6-e0b9-47f5-bdca-2fc2e6316dce\",\n# \"http://data.europa.eu/esco/skill/ce8ae6ca-61d8-4174-b457-641de96cbff4\",\n# \"http://data.europa.eu/esco/skill/f0de4973-0a70-4644-8fd4-3a97080476f4\",\n# ],\n# ]\nprint(skill_extractor.get_occupations(ads))\n# [\n# [\n# \"http://data.europa.eu/esco/occupation/10469d70-78a3-4650-9e29-d04de13c62c1\",\n# \"http://data.europa.eu/esco/occupation/1c5a896a-e010-4217-a29a-c44db26e25da\",\n# \"http://data.europa.eu/esco/occupation/4874fa37-0cd1-4a68-aed8-a838851f242d\",\n# \"http://data.europa.eu/esco/occupation/579254cf-6d69-4889-9000-9c79dc568644\",\n# \"http://data.europa.eu/esco/occupation/57af9090-55b4-4911-b2d0-86db01c00b02\",\n# \"http://data.europa.eu/esco/occupation/f2b15a0e-e65a-438a-affb-29b9d50b77d1\",\n# \"http://data.europa.eu/esco/isco/C2512\",\n# \"http://data.europa.eu/esco/isco/C2514\",\n# ],\n# [\n# \"http://data.europa.eu/esco/occupation/2fb96c6c-8d0b-4ef0-b1ee-3e493305e4eb\",\n# \"http://data.europa.eu/esco/occupation/349ee6f6-c295-4c38-9b98-48765b55280e\",\n# \"http://data.europa.eu/esco/occupation/781a6350-e686-45b9-b075-e4c8d5a05ff7\",\n# \"http://data.europa.eu/esco/occupation/93b11f0f-69af-4ece-b9da-f29aab7d38d3\",\n# \"http://data.europa.eu/esco/occupation/bb609566-3ab6-44dd-8f48-cf0b15b96827\",\n# \"http://data.europa.eu/esco/occupation/cc867bee-ab5c-427f-9244-f7a204d9574b\",\n# ],\n# ]\n```\n\n### Via GUI\n\n```bash\n# Visit the URL printed in the console.\n# run python -m esco_skill_extractor --help for more options.\npython -m esco_skill_extractor\n```\n\n<img src=\"docs/gui.gif\">\n\n### Via API\n\n```bash\n# Visit the URL printed in the console.\n# run python -m esco_skill_extractor --help for more options.\npython -m esco_skill_extractor\n```\n\n```js\nasync function getSkills() {\n const texts = [\n \"We are looking for a software engineer with experience in Java and Python.\",\n \"We are looking for a devops engineer. Containerization tools such as Docker is a must. AWS is a plus.\",\n // ...\n ];\n\n // Default host is localhost, and default port is 8000. Check CLI options for more.\n const response = await fetch(\"http://localhost:8000/extract-skills\", {\n method: \"POST\",\n headers: {\n \"Content-Type\": \"application/json\",\n },\n body: JSON.stringify(texts),\n });\n\n const skills = await response.json();\n console.log(skills);\n // [\n // [\n // \"http://data.europa.eu/esco/skill/19a8293b-8e95-4de3-983f-77484079c389\",\n // \"http://data.europa.eu/esco/skill/ccd0a1d9-afda-43d9-b901-96344886e14d\",\n // ],\n // [\n // \"http://data.europa.eu/esco/skill/11430d93-c835-48ed-8e70-285fa69c9ae6\",\n // \"http://data.europa.eu/esco/skill/ae4f0cc6-e0b9-47f5-bdca-2fc2e6316dce\",\n // \"http://data.europa.eu/esco/skill/ce8ae6ca-61d8-4174-b457-641de96cbff4\",\n // \"http://data.europa.eu/esco/skill/f0de4973-0a70-4644-8fd4-3a97080476f4\",\n // ],\n // ]\n const occupations = await fetch(\"http://localhost:8000/extract-occupations\", {\n method: \"POST\",\n headers: {\n \"Content-Type\": \"application/json\",\n },\n body: JSON.stringify(texts),\n });\n}\n```\n\n## Possible keyword arguments for `SkillExtractor`\n\n| Keyword Argument | Description | Default |\n| -------------------- | ------------------------------------------------------------------------------- | ------------------------------ |\n| skill_threshold | Skills surpassing this cosine similarity threshold are considered a match. | 0.45 |\n| occupation_threshold | Occupations surpassing this cosine similarity threshold are considered a match. | 0.55 |\n| device | The device where the copulations will take place. AKA torch device. | \"cuda\" if available else \"cpu\" |\n\n## How it works\n\n1. It creates embeddings for esco skills and ISCO occupations.\n2. It creates embeddings for the sentences of the input texts.\n3. It compares the embeddings of of the selected entity and the sentences using cosine similarity and it takes the maximum value.\n4. An entity matches a sentence if the cosine similarity is above a certain threshold.\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Extract ESCO skills from texts such as job descriptions or CVs",
"version": "0.1.15",
"project_urls": {
"Homepage": "https://github.com/KonstantinosPetrakis/esco-skill-extractor"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "56462df0a23d14306bd75335f7ed250cbfca2e4e6af47f2bb9043fb0c97df7ef",
"md5": "b5d67a5160112d5b74d1723e0bd705fb",
"sha256": "ef75be754f10e84d14dc001aa829f410cd10dbf250a136089ae85c56ce8f7f0a"
},
"downloads": -1,
"filename": "esco_skill_extractor-0.1.15-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b5d67a5160112d5b74d1723e0bd705fb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 2558527,
"upload_time": "2024-11-30T08:38:08",
"upload_time_iso_8601": "2024-11-30T08:38:08.803267Z",
"url": "https://files.pythonhosted.org/packages/56/46/2df0a23d14306bd75335f7ed250cbfca2e4e6af47f2bb9043fb0c97df7ef/esco_skill_extractor-0.1.15-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "653c832e4b6a192bf8d99dde48b85d3aab240728f712324971b4fba461ecf7bd",
"md5": "d0d0c9919659ff3b957afc193bbb35a4",
"sha256": "eb446871bbbb34de9cf9fd3b397389e03ada87bacd347aa168d7a276d9a5ac5e"
},
"downloads": -1,
"filename": "esco-skill-extractor-0.1.15.tar.gz",
"has_sig": false,
"md5_digest": "d0d0c9919659ff3b957afc193bbb35a4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 2552602,
"upload_time": "2024-11-30T08:38:14",
"upload_time_iso_8601": "2024-11-30T08:38:14.307950Z",
"url": "https://files.pythonhosted.org/packages/65/3c/832e4b6a192bf8d99dde48b85d3aab240728f712324971b4fba461ecf7bd/esco-skill-extractor-0.1.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-30 08:38:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "KonstantinosPetrakis",
"github_project": "esco-skill-extractor",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "sentence-transformers",
"specs": []
},
{
"name": "Flask",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "waitress",
"specs": []
}
],
"lcname": "esco-skill-extractor"
}