<img src="https://github.com/cullenwatson/JobSpy/assets/78247585/ae185b7e-e444-4712-8bb9-fa97f53e896b" width="400">
**JobSpy** is a simple, yet comprehensive, job scraping library.
*Looking to build a data-focused software product?* **[Book a call](https://bunsly.com/)** *to
work with us.*
## Features
- Scrapes job postings from **LinkedIn**, **Indeed**, **Glassdoor**, **Google**, & **ZipRecruiter** simultaneously
- Aggregates the job postings in a dataframe
- Proxies support to bypass blocking
![jobspy](https://github.com/cullenwatson/JobSpy/assets/78247585/ec7ef355-05f6-4fd3-8161-a817e31c5c57)
### Installation
```
pip install -U python-jobspy
```
_Python version >= [3.10](https://www.python.org/downloads/release/python-3100/) required_
### Usage
```python
import csv
from jobspy import scrape_jobs
jobs = scrape_jobs(
site_name=["indeed", "linkedin", "zip_recruiter", "glassdoor", "google"],
search_term="software engineer",
google_search_term="software engineer jobs near San Francisco, CA since yesterday",
location="San Francisco, CA",
results_wanted=20,
hours_old=72,
country_indeed='USA',
# linkedin_fetch_description=True # gets more info such as description, direct job url (slower)
# proxies=["208.195.175.46:65095", "208.195.175.45:65095", "localhost"],
)
print(f"Found {len(jobs)} jobs")
print(jobs.head())
jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False) # to_excel
```
### Output
```
SITE TITLE COMPANY CITY STATE JOB_TYPE INTERVAL MIN_AMOUNT MAX_AMOUNT JOB_URL DESCRIPTION
indeed Software Engineer AMERICAN SYSTEMS Arlington VA None yearly 200000 150000 https://www.indeed.com/viewjob?jk=5e409e577046... THIS POSITION COMES WITH A 10K SIGNING BONUS!...
indeed Senior Software Engineer TherapyNotes.com Philadelphia PA fulltime yearly 135000 110000 https://www.indeed.com/viewjob?jk=da39574a40cb... About Us TherapyNotes is the national leader i...
linkedin Software Engineer - Early Career Lockheed Martin Sunnyvale CA fulltime yearly None None https://www.linkedin.com/jobs/view/3693012711 Description:By bringing together people that u...
linkedin Full-Stack Software Engineer Rain New York NY fulltime yearly None None https://www.linkedin.com/jobs/view/3696158877 Rain’s mission is to create the fastest and ea...
zip_recruiter Software Engineer - New Grad ZipRecruiter Santa Monica CA fulltime yearly 130000 150000 https://www.ziprecruiter.com/jobs/ziprecruiter... We offer a hybrid work environment. Most US-ba...
zip_recruiter Software Developer TEKsystems Phoenix AZ fulltime hourly 65 75 https://www.ziprecruiter.com/jobs/teksystems-0... Top Skills' Details• 6 years of Java developme...
```
### Parameters for `scrape_jobs()`
```plaintext
Optional
├── site_name (list|str):
| linkedin, zip_recruiter, indeed, glassdoor, google
| (default is all)
│
├── search_term (str)
|
├── google_search_term (str)
| search term for google jobs. This is the only param for filtering google jobs.
│
├── location (str)
│
├── distance (int):
| in miles, default 50
│
├── job_type (str):
| fulltime, parttime, internship, contract
│
├── proxies (list):
| in format ['user:pass@host:port', 'localhost']
| each job board scraper will round robin through the proxies
|
├── is_remote (bool)
│
├── results_wanted (int):
| number of job results to retrieve for each site specified in 'site_name'
│
├── easy_apply (bool):
| filters for jobs that are hosted on the job board site (LinkedIn easy apply filter no longer works)
│
├── description_format (str):
| markdown, html (Format type of the job descriptions. Default is markdown.)
│
├── offset (int):
| starts the search from an offset (e.g. 25 will start the search from the 25th result)
│
├── hours_old (int):
| filters jobs by the number of hours since the job was posted
| (ZipRecruiter and Glassdoor round up to next day.)
│
├── verbose (int) {0, 1, 2}:
| Controls the verbosity of the runtime printouts
| (0 prints only errors, 1 is errors+warnings, 2 is all logs. Default is 2.)
├── linkedin_fetch_description (bool):
| fetches full description and direct job url for LinkedIn (Increases requests by O(n))
│
├── linkedin_company_ids (list[int]):
| searches for linkedin jobs with specific company ids
|
├── country_indeed (str):
| filters the country on Indeed & Glassdoor (see below for correct spelling)
|
├── enforce_annual_salary (bool):
| converts wages to annual salary
|
├── ca_cert (str)
| path to CA Certificate file for proxies
```
```
├── Indeed limitations:
| Only one from this list can be used in a search:
| - hours_old
| - job_type & is_remote
| - easy_apply
│
└── LinkedIn limitations:
| Only one from this list can be used in a search:
| - hours_old
| - easy_apply
```
## Supported Countries for Job Searching
### **LinkedIn**
LinkedIn searches globally & uses only the `location` parameter.
### **ZipRecruiter**
ZipRecruiter searches for jobs in **US/Canada** & uses only the `location` parameter.
### **Indeed / Glassdoor**
Indeed & Glassdoor supports most countries, but the `country_indeed` parameter is required. Additionally, use the `location`
parameter to narrow down the location, e.g. city & state if necessary.
You can specify the following countries when searching on Indeed (use the exact name, * indicates support for Glassdoor):
| | | | |
|----------------------|--------------|------------|----------------|
| Argentina | Australia* | Austria* | Bahrain |
| Belgium* | Brazil* | Canada* | Chile |
| China | Colombia | Costa Rica | Czech Republic |
| Denmark | Ecuador | Egypt | Finland |
| France* | Germany* | Greece | Hong Kong* |
| Hungary | India* | Indonesia | Ireland* |
| Israel | Italy* | Japan | Kuwait |
| Luxembourg | Malaysia | Mexico* | Morocco |
| Netherlands* | New Zealand* | Nigeria | Norway |
| Oman | Pakistan | Panama | Peru |
| Philippines | Poland | Portugal | Qatar |
| Romania | Saudi Arabia | Singapore* | South Africa |
| South Korea | Spain* | Sweden | Switzerland* |
| Taiwan | Thailand | Turkey | Ukraine |
| United Arab Emirates | UK* | USA* | Uruguay |
| Venezuela | Vietnam* | | |
## Notes
* Indeed is the best scraper currently with no rate limiting.
* All the job board endpoints are capped at around 1000 jobs on a given search.
* LinkedIn is the most restrictive and usually rate limits around the 10th page with one ip. Proxies are a must basically.
## Frequently Asked Questions
---
**Q: Why is Indeed giving unrelated roles?**
**A:** Indeed searches the description too.
- use - to remove words
- "" for exact match
Example of a good Indeed query
```py
search_term='"engineering intern" software summer (java OR python OR c++) 2025 -tax -marketing'
```
This searches the description/title and must include software, summer, 2025, one of the languages, engineering intern exactly, no tax, no marketing.
---
**Q: Received a response code 429?**
**A:** This indicates that you have been blocked by the job board site for sending too many requests. All of the job board sites are aggressive with blocking. We recommend:
- Wait some time between scrapes (site-dependent).
- Try using the proxies param to change your IP address.
---
### JobPost Schema
```plaintext
JobPost
├── title
├── company
├── company_url
├── job_url
├── location
│ ├── country
│ ├── city
│ ├── state
├── description
├── job_type: fulltime, parttime, internship, contract
├── job_function
│ ├── interval: yearly, monthly, weekly, daily, hourly
│ ├── min_amount
│ ├── max_amount
│ ├── currency
│ └── salary_source: direct_data, description (parsed from posting)
├── date_posted
├── emails
└── is_remote
Linkedin specific
└── job_level
Linkedin & Indeed specific
└── company_industry
Indeed specific
├── company_country
├── company_addresses
├── company_employees_label
├── company_revenue_label
├── company_description
└── company_logo
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Bunsly/JobSpy",
"name": "python-jobspy",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "jobs-scraper, linkedin, indeed, glassdoor, ziprecruiter",
"author": "Zachary Hampton",
"author_email": "zachary@bunsly.com",
"download_url": "https://files.pythonhosted.org/packages/c5/a0/d5c9f3f5819855509a5cabb745122f7b8ef08bd4399253278d21302d9f86/python_jobspy-1.1.76.tar.gz",
"platform": null,
"description": "<img src=\"https://github.com/cullenwatson/JobSpy/assets/78247585/ae185b7e-e444-4712-8bb9-fa97f53e896b\" width=\"400\">\n\n**JobSpy** is a simple, yet comprehensive, job scraping library.\n\n*Looking to build a data-focused software product?* **[Book a call](https://bunsly.com/)** *to\nwork with us.*\n\n## Features\n\n- Scrapes job postings from **LinkedIn**, **Indeed**, **Glassdoor**, **Google**, & **ZipRecruiter** simultaneously\n- Aggregates the job postings in a dataframe\n- Proxies support to bypass blocking\n\n![jobspy](https://github.com/cullenwatson/JobSpy/assets/78247585/ec7ef355-05f6-4fd3-8161-a817e31c5c57)\n\n### Installation\n\n```\npip install -U python-jobspy\n```\n\n_Python version >= [3.10](https://www.python.org/downloads/release/python-3100/) required_\n\n### Usage\n\n```python\nimport csv\nfrom jobspy import scrape_jobs\n\njobs = scrape_jobs(\n site_name=[\"indeed\", \"linkedin\", \"zip_recruiter\", \"glassdoor\", \"google\"],\n search_term=\"software engineer\",\n google_search_term=\"software engineer jobs near San Francisco, CA since yesterday\",\n location=\"San Francisco, CA\",\n results_wanted=20,\n hours_old=72,\n country_indeed='USA',\n \n # linkedin_fetch_description=True # gets more info such as description, direct job url (slower)\n # proxies=[\"208.195.175.46:65095\", \"208.195.175.45:65095\", \"localhost\"],\n)\nprint(f\"Found {len(jobs)} jobs\")\nprint(jobs.head())\njobs.to_csv(\"jobs.csv\", quoting=csv.QUOTE_NONNUMERIC, escapechar=\"\\\\\", index=False) # to_excel\n```\n\n### Output\n\n```\nSITE TITLE COMPANY CITY STATE JOB_TYPE INTERVAL MIN_AMOUNT MAX_AMOUNT JOB_URL DESCRIPTION\nindeed Software Engineer AMERICAN SYSTEMS Arlington VA None yearly 200000 150000 https://www.indeed.com/viewjob?jk=5e409e577046... THIS POSITION COMES WITH A 10K SIGNING BONUS!...\nindeed Senior Software Engineer TherapyNotes.com Philadelphia PA fulltime yearly 135000 110000 https://www.indeed.com/viewjob?jk=da39574a40cb... About Us TherapyNotes is the national leader i...\nlinkedin Software Engineer - Early Career Lockheed Martin Sunnyvale CA fulltime yearly None None https://www.linkedin.com/jobs/view/3693012711 Description:By bringing together people that u...\nlinkedin Full-Stack Software Engineer Rain New York NY fulltime yearly None None https://www.linkedin.com/jobs/view/3696158877 Rain\u2019s mission is to create the fastest and ea...\nzip_recruiter Software Engineer - New Grad ZipRecruiter Santa Monica CA fulltime yearly 130000 150000 https://www.ziprecruiter.com/jobs/ziprecruiter... We offer a hybrid work environment. Most US-ba...\nzip_recruiter Software Developer TEKsystems Phoenix AZ fulltime hourly 65 75 https://www.ziprecruiter.com/jobs/teksystems-0... Top Skills' Details\u2022 6 years of Java developme...\n```\n\n### Parameters for `scrape_jobs()`\n\n```plaintext\nOptional\n\u251c\u2500\u2500 site_name (list|str): \n| linkedin, zip_recruiter, indeed, glassdoor, google\n| (default is all)\n\u2502\n\u251c\u2500\u2500 search_term (str)\n|\n\u251c\u2500\u2500 google_search_term (str)\n| search term for google jobs. This is the only param for filtering google jobs.\n\u2502\n\u251c\u2500\u2500 location (str)\n\u2502\n\u251c\u2500\u2500 distance (int): \n| in miles, default 50\n\u2502\n\u251c\u2500\u2500 job_type (str): \n| fulltime, parttime, internship, contract\n\u2502\n\u251c\u2500\u2500 proxies (list): \n| in format ['user:pass@host:port', 'localhost']\n| each job board scraper will round robin through the proxies\n|\n\u251c\u2500\u2500 is_remote (bool)\n\u2502\n\u251c\u2500\u2500 results_wanted (int): \n| number of job results to retrieve for each site specified in 'site_name'\n\u2502\n\u251c\u2500\u2500 easy_apply (bool): \n| filters for jobs that are hosted on the job board site (LinkedIn easy apply filter no longer works)\n\u2502\n\u251c\u2500\u2500 description_format (str): \n| markdown, html (Format type of the job descriptions. Default is markdown.)\n\u2502\n\u251c\u2500\u2500 offset (int): \n| starts the search from an offset (e.g. 25 will start the search from the 25th result)\n\u2502\n\u251c\u2500\u2500 hours_old (int): \n| filters jobs by the number of hours since the job was posted \n| (ZipRecruiter and Glassdoor round up to next day.)\n\u2502\n\u251c\u2500\u2500 verbose (int) {0, 1, 2}: \n| Controls the verbosity of the runtime printouts \n| (0 prints only errors, 1 is errors+warnings, 2 is all logs. Default is 2.)\n\n\u251c\u2500\u2500 linkedin_fetch_description (bool): \n| fetches full description and direct job url for LinkedIn (Increases requests by O(n))\n\u2502\n\u251c\u2500\u2500 linkedin_company_ids (list[int]): \n| searches for linkedin jobs with specific company ids\n|\n\u251c\u2500\u2500 country_indeed (str): \n| filters the country on Indeed & Glassdoor (see below for correct spelling)\n|\n\u251c\u2500\u2500 enforce_annual_salary (bool): \n| converts wages to annual salary\n|\n\u251c\u2500\u2500 ca_cert (str)\n| path to CA Certificate file for proxies\n```\n\n```\n\u251c\u2500\u2500 Indeed limitations:\n| Only one from this list can be used in a search:\n| - hours_old\n| - job_type & is_remote\n| - easy_apply\n\u2502\n\u2514\u2500\u2500 LinkedIn limitations:\n| Only one from this list can be used in a search:\n| - hours_old\n| - easy_apply\n```\n\n## Supported Countries for Job Searching\n\n### **LinkedIn**\n\nLinkedIn searches globally & uses only the `location` parameter. \n\n### **ZipRecruiter**\n\nZipRecruiter searches for jobs in **US/Canada** & uses only the `location` parameter.\n\n### **Indeed / Glassdoor**\n\nIndeed & Glassdoor supports most countries, but the `country_indeed` parameter is required. Additionally, use the `location`\nparameter to narrow down the location, e.g. city & state if necessary. \n\nYou can specify the following countries when searching on Indeed (use the exact name, * indicates support for Glassdoor):\n\n| | | | |\n|----------------------|--------------|------------|----------------|\n| Argentina | Australia* | Austria* | Bahrain |\n| Belgium* | Brazil* | Canada* | Chile |\n| China | Colombia | Costa Rica | Czech Republic |\n| Denmark | Ecuador | Egypt | Finland |\n| France* | Germany* | Greece | Hong Kong* |\n| Hungary | India* | Indonesia | Ireland* |\n| Israel | Italy* | Japan | Kuwait |\n| Luxembourg | Malaysia | Mexico* | Morocco |\n| Netherlands* | New Zealand* | Nigeria | Norway |\n| Oman | Pakistan | Panama | Peru |\n| Philippines | Poland | Portugal | Qatar |\n| Romania | Saudi Arabia | Singapore* | South Africa |\n| South Korea | Spain* | Sweden | Switzerland* |\n| Taiwan | Thailand | Turkey | Ukraine |\n| United Arab Emirates | UK* | USA* | Uruguay |\n| Venezuela | Vietnam* | | |\n\n\n## Notes\n* Indeed is the best scraper currently with no rate limiting. \n* All the job board endpoints are capped at around 1000 jobs on a given search. \n* LinkedIn is the most restrictive and usually rate limits around the 10th page with one ip. Proxies are a must basically.\n\n## Frequently Asked Questions\n\n---\n**Q: Why is Indeed giving unrelated roles?** \n**A:** Indeed searches the description too.\n\n- use - to remove words\n- \"\" for exact match\n\nExample of a good Indeed query\n\n```py\nsearch_term='\"engineering intern\" software summer (java OR python OR c++) 2025 -tax -marketing'\n```\n\nThis searches the description/title and must include software, summer, 2025, one of the languages, engineering intern exactly, no tax, no marketing.\n\n---\n\n**Q: Received a response code 429?** \n**A:** This indicates that you have been blocked by the job board site for sending too many requests. All of the job board sites are aggressive with blocking. We recommend:\n\n- Wait some time between scrapes (site-dependent).\n- Try using the proxies param to change your IP address.\n\n---\n\n### JobPost Schema\n\n```plaintext\nJobPost\n\u251c\u2500\u2500 title\n\u251c\u2500\u2500 company\n\u251c\u2500\u2500 company_url\n\u251c\u2500\u2500 job_url\n\u251c\u2500\u2500 location\n\u2502 \u251c\u2500\u2500 country\n\u2502 \u251c\u2500\u2500 city\n\u2502 \u251c\u2500\u2500 state\n\u251c\u2500\u2500 description\n\u251c\u2500\u2500 job_type: fulltime, parttime, internship, contract\n\u251c\u2500\u2500 job_function\n\u2502 \u251c\u2500\u2500 interval: yearly, monthly, weekly, daily, hourly\n\u2502 \u251c\u2500\u2500 min_amount\n\u2502 \u251c\u2500\u2500 max_amount\n\u2502 \u251c\u2500\u2500 currency\n\u2502 \u2514\u2500\u2500 salary_source: direct_data, description (parsed from posting)\n\u251c\u2500\u2500 date_posted\n\u251c\u2500\u2500 emails\n\u2514\u2500\u2500 is_remote\n\nLinkedin specific\n\u2514\u2500\u2500 job_level\n\nLinkedin & Indeed specific\n\u2514\u2500\u2500 company_industry\n\nIndeed specific\n\u251c\u2500\u2500 company_country\n\u251c\u2500\u2500 company_addresses\n\u251c\u2500\u2500 company_employees_label\n\u251c\u2500\u2500 company_revenue_label\n\u251c\u2500\u2500 company_description\n\u2514\u2500\u2500 company_logo\n```\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Job scraper for LinkedIn, Indeed, Glassdoor & ZipRecruiter",
"version": "1.1.76",
"project_urls": {
"Homepage": "https://github.com/Bunsly/JobSpy"
},
"split_keywords": [
"jobs-scraper",
" linkedin",
" indeed",
" glassdoor",
" ziprecruiter"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f504fc52f1baa4bd55c8cb77fc218a5e27f459598fe3cc08543616efcd4a323b",
"md5": "f1c31fc02bf31cc7663fe815dffa5e80",
"sha256": "9d27e4cb6ef5cd529ea67dbd74259931b77e83fb37b0ec2e1f562eb9fe1f03f4"
},
"downloads": -1,
"filename": "python_jobspy-1.1.76-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f1c31fc02bf31cc7663fe815dffa5e80",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 38840,
"upload_time": "2024-12-04T22:55:22",
"upload_time_iso_8601": "2024-12-04T22:55:22.942308Z",
"url": "https://files.pythonhosted.org/packages/f5/04/fc52f1baa4bd55c8cb77fc218a5e27f459598fe3cc08543616efcd4a323b/python_jobspy-1.1.76-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c5a0d5c9f3f5819855509a5cabb745122f7b8ef08bd4399253278d21302d9f86",
"md5": "1ed35944cf23b1a6f07366ff020ac486",
"sha256": "0589cbaf41e3931b299b2c475534bbf8d08dad5b9d3e387d5289bc0ab049aa76"
},
"downloads": -1,
"filename": "python_jobspy-1.1.76.tar.gz",
"has_sig": false,
"md5_digest": "1ed35944cf23b1a6f07366ff020ac486",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 34116,
"upload_time": "2024-12-04T22:55:24",
"upload_time_iso_8601": "2024-12-04T22:55:24.135665Z",
"url": "https://files.pythonhosted.org/packages/c5/a0/d5c9f3f5819855509a5cabb745122f7b8ef08bd4399253278d21302d9f86/python_jobspy-1.1.76.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-04 22:55:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Bunsly",
"github_project": "JobSpy",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "python-jobspy"
}