# Job Scraper 🚀
A powerful and easy-to-use Python package for scraping job postings from **Indeed** and **LinkedIn** using Selenium browser automation.
## Features ✨
- 🔍 Scrape job details from Indeed and LinkedIn
- 🤖 Automatic platform detection
- 📊 Extract title, company, location, and full job description
- 🎯 Simple and intuitive API
- 🛡️ Bypasses anti-bot protection using Selenium
- ⚙️ Configurable options (headless mode, verbose output)
## Installation 📦
### Install from source
```bash
# Clone or download the repository
git clone https://github.com/adilc0070/job-scraper.git
cd job-scraper
# Install the package
pip install .
```
### Install in development mode
```bash
pip install -e .
```
## Quick Start 🚀
### Basic Usage
```python
from job_scraper import scrape_job
# Scrape any supported job posting (auto-detects platform)
job = scrape_job('https://in.indeed.com/viewjob?jk=...')
if job:
print(f"Title: {job['title']}")
print(f"Company: {job['company']}")
print(f"Location: {job['location']}")
print(f"Description: {job['description'][:200]}...")
```
### Platform-Specific Scraping
```python
from job_scraper import scrape_indeed_job, scrape_linkedin_job
# Scrape from Indeed
indeed_job = scrape_indeed_job('https://in.indeed.com/viewjob?jk=...')
# Scrape from LinkedIn
linkedin_job = scrape_linkedin_job('https://www.linkedin.com/jobs/view/...')
```
### Advanced Options
```python
from job_scraper import scrape_job
# Run in non-headless mode (show browser window)
job = scrape_job(url, headless=False)
# Disable verbose output (silent mode)
job = scrape_job(url, verbose=False)
# Combine options
job = scrape_job(url, headless=False, verbose=False)
```
## API Reference 📚
### `scrape_job(url, headless=True, verbose=True)`
Automatically detect and scrape jobs from Indeed or LinkedIn.
**Parameters:**
- `url` (str): The job posting URL (Indeed or LinkedIn)
- `headless` (bool): Run browser in headless mode (default: True)
- `verbose` (bool): Print progress messages (default: True)
**Returns:**
- `dict`: Job details or `None` if scraping fails
**Example:**
```python
job = scrape_job('https://in.indeed.com/viewjob?jk=123456')
```
### `scrape_indeed_job(url, headless=True, verbose=True)`
Scrape a single Indeed job posting.
**Parameters:**
- Same as `scrape_job()`
**Returns:**
- `dict`: Job details with keys: `title`, `company`, `location`, `description`, `source`, `url`
### `scrape_linkedin_job(url, headless=True, verbose=True)`
Scrape a single LinkedIn job posting.
**Parameters:**
- Same as `scrape_job()`
**Returns:**
- `dict`: Job details with keys: `title`, `company`, `location`, `description`, `source`, `url`
## Response Format 📋
All scraping functions return a dictionary with the following structure:
```python
{
'title': 'Senior Python Developer',
'company': 'Tech Company Inc.',
'location': 'San Francisco, CA',
'description': 'Full job description text...',
'source': 'Indeed', # or 'LinkedIn'
'url': 'https://...'
}
```
## Examples 💡
### Scrape Multiple Jobs
```python
from job_scraper import scrape_job
urls = [
'https://in.indeed.com/viewjob?jk=123',
'https://www.linkedin.com/jobs/view/456',
'https://in.indeed.com/viewjob?jk=789'
]
jobs = []
for url in urls:
job = scrape_job(url, verbose=False)
if job:
jobs.append(job)
print(f"✓ Scraped: {job['title']}")
print(f"\nTotal jobs scraped: {len(jobs)}")
```
### Save to JSON
```python
import json
from job_scraper import scrape_job
job = scrape_job('https://in.indeed.com/viewjob?jk=...')
if job:
with open('job_data.json', 'w', encoding='utf-8') as f:
json.dump(job, f, indent=2, ensure_ascii=False)
print("Job saved to job_data.json")
```
### Filter by Keywords
```python
from job_scraper import scrape_job
keywords = ['python', 'django', 'flask']
job = scrape_job('https://in.indeed.com/viewjob?jk=...')
if job and any(keyword.lower() in job['description'].lower() for keyword in keywords):
print(f"✓ Job matches keywords: {job['title']}")
else:
print("✗ Job doesn't match keywords")
```
## Requirements 🛠️
- Python 3.7+
- selenium >= 4.0.0
- beautifulsoup4 >= 4.9.0
- webdriver-manager >= 4.0.0
- Google Chrome (installed automatically by webdriver-manager)
## Limitations ⚠️
- Some job sites may block automated scraping
- LinkedIn may require authentication for certain jobs
- Rate limiting may apply - avoid scraping too many jobs rapidly
- Always respect the website's Terms of Service and robots.txt
## Troubleshooting 🔧
### Browser not found
The package automatically downloads ChromeDriver. Ensure you have Google Chrome installed.
### 403 Forbidden Error
Some sites may block requests. Try:
- Using `headless=False` to run in visible browser mode
- Adding delays between requests
- Using a VPN or proxy
### Encoding errors
The package handles most encoding issues automatically. If problems persist, the scraped data is returned as UTF-8.
## Contributing 🤝
Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
### Ways to Contribute
1. **Report bugs** - Open an issue on GitHub
2. **Add features** - Submit a pull request
3. **Improve documentation** - Fix typos or add examples
4. **Support more platforms** - Add scrapers for other job sites
## Publishing to PyPI 📤
To publish this package to PyPI so others can install it with `pip install job-scraper-selenium`:
### 1. Create PyPI account
- Sign up at [https://pypi.org/account/register/](https://pypi.org/account/register/)
### 2. Install publishing tools
```bash
pip install build twine
```
### 3. Build the package
```bash
python -m build
```
### 4. Upload to PyPI
```bash
# Test on TestPyPI first (optional)
twine upload --repository testpypi dist/*
# Upload to real PyPI
twine upload dist/*
```
### 5. Install your package
```bash
pip install job-scraper-selenium
```
## License 📄
MIT License - see [LICENSE](LICENSE) file for details
## Author ✍️
**Your Name**
- GitHub: [@adilc0070](https://github.com/adilc0070)
- Email: adilc0070@gmail.com
## Acknowledgments 🙏
- Selenium for browser automation
- BeautifulSoup for HTML parsing
- webdriver-manager for automatic driver management
## Support 💬
If you found this package helpful, please:
- ⭐ Star the repository
- 🐛 Report issues
- 🔀 Submit pull requests
- 📢 Share with others
---
**Disclaimer:** This package is for educational purposes. Always respect website Terms of Service and use responsibly.
Raw data
{
"_id": null,
"home_page": "https://github.com/adilc0070/job-scraper",
"name": "job-scraper-selenium",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "job scraper indeed linkedin selenium web-scraping",
"author": "Adil C",
"author_email": "adilc0070@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d7/ec/c630dbc6506622a8d9e8162fbb71c456bcc51d8a7180ed72eea6b1be14d5/job_scraper_selenium-1.0.2.tar.gz",
"platform": null,
"description": "# Job Scraper \ud83d\ude80\r\n\r\nA powerful and easy-to-use Python package for scraping job postings from **Indeed** and **LinkedIn** using Selenium browser automation.\r\n\r\n## Features \u2728\r\n\r\n- \ud83d\udd0d Scrape job details from Indeed and LinkedIn\r\n- \ud83e\udd16 Automatic platform detection\r\n- \ud83d\udcca Extract title, company, location, and full job description\r\n- \ud83c\udfaf Simple and intuitive API\r\n- \ud83d\udee1\ufe0f Bypasses anti-bot protection using Selenium\r\n- \u2699\ufe0f Configurable options (headless mode, verbose output)\r\n\r\n## Installation \ud83d\udce6\r\n\r\n### Install from source\r\n\r\n```bash\r\n# Clone or download the repository\r\ngit clone https://github.com/adilc0070/job-scraper.git\r\ncd job-scraper\r\n\r\n# Install the package\r\npip install .\r\n```\r\n\r\n### Install in development mode\r\n\r\n```bash\r\npip install -e .\r\n```\r\n\r\n## Quick Start \ud83d\ude80\r\n\r\n### Basic Usage\r\n\r\n```python\r\nfrom job_scraper import scrape_job\r\n\r\n# Scrape any supported job posting (auto-detects platform)\r\njob = scrape_job('https://in.indeed.com/viewjob?jk=...')\r\n\r\nif job:\r\n print(f\"Title: {job['title']}\")\r\n print(f\"Company: {job['company']}\")\r\n print(f\"Location: {job['location']}\")\r\n print(f\"Description: {job['description'][:200]}...\")\r\n```\r\n\r\n### Platform-Specific Scraping\r\n\r\n```python\r\nfrom job_scraper import scrape_indeed_job, scrape_linkedin_job\r\n\r\n# Scrape from Indeed\r\nindeed_job = scrape_indeed_job('https://in.indeed.com/viewjob?jk=...')\r\n\r\n# Scrape from LinkedIn\r\nlinkedin_job = scrape_linkedin_job('https://www.linkedin.com/jobs/view/...')\r\n```\r\n\r\n### Advanced Options\r\n\r\n```python\r\nfrom job_scraper import scrape_job\r\n\r\n# Run in non-headless mode (show browser window)\r\njob = scrape_job(url, headless=False)\r\n\r\n# Disable verbose output (silent mode)\r\njob = scrape_job(url, verbose=False)\r\n\r\n# Combine options\r\njob = scrape_job(url, headless=False, verbose=False)\r\n```\r\n\r\n## API Reference \ud83d\udcda\r\n\r\n### `scrape_job(url, headless=True, verbose=True)`\r\n\r\nAutomatically detect and scrape jobs from Indeed or LinkedIn.\r\n\r\n**Parameters:**\r\n- `url` (str): The job posting URL (Indeed or LinkedIn)\r\n- `headless` (bool): Run browser in headless mode (default: True)\r\n- `verbose` (bool): Print progress messages (default: True)\r\n\r\n**Returns:**\r\n- `dict`: Job details or `None` if scraping fails\r\n\r\n**Example:**\r\n```python\r\njob = scrape_job('https://in.indeed.com/viewjob?jk=123456')\r\n```\r\n\r\n### `scrape_indeed_job(url, headless=True, verbose=True)`\r\n\r\nScrape a single Indeed job posting.\r\n\r\n**Parameters:**\r\n- Same as `scrape_job()`\r\n\r\n**Returns:**\r\n- `dict`: Job details with keys: `title`, `company`, `location`, `description`, `source`, `url`\r\n\r\n### `scrape_linkedin_job(url, headless=True, verbose=True)`\r\n\r\nScrape a single LinkedIn job posting.\r\n\r\n**Parameters:**\r\n- Same as `scrape_job()`\r\n\r\n**Returns:**\r\n- `dict`: Job details with keys: `title`, `company`, `location`, `description`, `source`, `url`\r\n\r\n## Response Format \ud83d\udccb\r\n\r\nAll scraping functions return a dictionary with the following structure:\r\n\r\n```python\r\n{\r\n 'title': 'Senior Python Developer',\r\n 'company': 'Tech Company Inc.',\r\n 'location': 'San Francisco, CA',\r\n 'description': 'Full job description text...',\r\n 'source': 'Indeed', # or 'LinkedIn'\r\n 'url': 'https://...'\r\n}\r\n```\r\n\r\n## Examples \ud83d\udca1\r\n\r\n### Scrape Multiple Jobs\r\n\r\n```python\r\nfrom job_scraper import scrape_job\r\n\r\nurls = [\r\n 'https://in.indeed.com/viewjob?jk=123',\r\n 'https://www.linkedin.com/jobs/view/456',\r\n 'https://in.indeed.com/viewjob?jk=789'\r\n]\r\n\r\njobs = []\r\nfor url in urls:\r\n job = scrape_job(url, verbose=False)\r\n if job:\r\n jobs.append(job)\r\n print(f\"\u2713 Scraped: {job['title']}\")\r\n\r\nprint(f\"\\nTotal jobs scraped: {len(jobs)}\")\r\n```\r\n\r\n### Save to JSON\r\n\r\n```python\r\nimport json\r\nfrom job_scraper import scrape_job\r\n\r\njob = scrape_job('https://in.indeed.com/viewjob?jk=...')\r\n\r\nif job:\r\n with open('job_data.json', 'w', encoding='utf-8') as f:\r\n json.dump(job, f, indent=2, ensure_ascii=False)\r\n print(\"Job saved to job_data.json\")\r\n```\r\n\r\n### Filter by Keywords\r\n\r\n```python\r\nfrom job_scraper import scrape_job\r\n\r\nkeywords = ['python', 'django', 'flask']\r\n\r\njob = scrape_job('https://in.indeed.com/viewjob?jk=...')\r\n\r\nif job and any(keyword.lower() in job['description'].lower() for keyword in keywords):\r\n print(f\"\u2713 Job matches keywords: {job['title']}\")\r\nelse:\r\n print(\"\u2717 Job doesn't match keywords\")\r\n```\r\n\r\n## Requirements \ud83d\udee0\ufe0f\r\n\r\n- Python 3.7+\r\n- selenium >= 4.0.0\r\n- beautifulsoup4 >= 4.9.0\r\n- webdriver-manager >= 4.0.0\r\n- Google Chrome (installed automatically by webdriver-manager)\r\n\r\n## Limitations \u26a0\ufe0f\r\n\r\n- Some job sites may block automated scraping\r\n- LinkedIn may require authentication for certain jobs\r\n- Rate limiting may apply - avoid scraping too many jobs rapidly\r\n- Always respect the website's Terms of Service and robots.txt\r\n\r\n## Troubleshooting \ud83d\udd27\r\n\r\n### Browser not found\r\nThe package automatically downloads ChromeDriver. Ensure you have Google Chrome installed.\r\n\r\n### 403 Forbidden Error\r\nSome sites may block requests. Try:\r\n- Using `headless=False` to run in visible browser mode\r\n- Adding delays between requests\r\n- Using a VPN or proxy\r\n\r\n### Encoding errors\r\nThe package handles most encoding issues automatically. If problems persist, the scraped data is returned as UTF-8.\r\n\r\n## Contributing \ud83e\udd1d\r\n\r\nContributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\r\n\r\n### Ways to Contribute\r\n\r\n1. **Report bugs** - Open an issue on GitHub\r\n2. **Add features** - Submit a pull request\r\n3. **Improve documentation** - Fix typos or add examples\r\n4. **Support more platforms** - Add scrapers for other job sites\r\n\r\n## Publishing to PyPI \ud83d\udce4\r\n\r\nTo publish this package to PyPI so others can install it with `pip install job-scraper-selenium`:\r\n\r\n### 1. Create PyPI account\r\n- Sign up at [https://pypi.org/account/register/](https://pypi.org/account/register/)\r\n\r\n### 2. Install publishing tools\r\n```bash\r\npip install build twine\r\n```\r\n\r\n### 3. Build the package\r\n```bash\r\npython -m build\r\n```\r\n\r\n### 4. Upload to PyPI\r\n```bash\r\n# Test on TestPyPI first (optional)\r\ntwine upload --repository testpypi dist/*\r\n\r\n# Upload to real PyPI\r\ntwine upload dist/*\r\n```\r\n\r\n### 5. Install your package\r\n```bash\r\npip install job-scraper-selenium\r\n```\r\n\r\n## License \ud83d\udcc4\r\n\r\nMIT License - see [LICENSE](LICENSE) file for details\r\n\r\n## Author \u270d\ufe0f\r\n\r\n**Your Name**\r\n- GitHub: [@adilc0070](https://github.com/adilc0070)\r\n- Email: adilc0070@gmail.com\r\n\r\n## Acknowledgments \ud83d\ude4f\r\n\r\n- Selenium for browser automation\r\n- BeautifulSoup for HTML parsing\r\n- webdriver-manager for automatic driver management\r\n\r\n## Support \ud83d\udcac\r\n\r\nIf you found this package helpful, please:\r\n- \u2b50 Star the repository\r\n- \ud83d\udc1b Report issues\r\n- \ud83d\udd00 Submit pull requests\r\n- \ud83d\udce2 Share with others\r\n\r\n---\r\n\r\n**Disclaimer:** This package is for educational purposes. Always respect website Terms of Service and use responsibly.\r\n\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python package for scraping job postings from Indeed and LinkedIn",
"version": "1.0.2",
"project_urls": {
"Bug Reports": "https://github.com/adilc0070/job-scraper/issues",
"Homepage": "https://github.com/adilc0070/job-scraper",
"Source": "https://github.com/adilc0070/job-scraper"
},
"split_keywords": [
"job",
"scraper",
"indeed",
"linkedin",
"selenium",
"web-scraping"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "13d7f554a3456518362e3ad9476edf61c62388252bce6d9ef69422766739e3b8",
"md5": "30ed5f3127e61909644831d02eb614a9",
"sha256": "df8f479273630b1a5e186d96d3f75732ae4f996258a446369fd4853941dd5e19"
},
"downloads": -1,
"filename": "job_scraper_selenium-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "30ed5f3127e61909644831d02eb614a9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 7936,
"upload_time": "2025-11-03T12:43:07",
"upload_time_iso_8601": "2025-11-03T12:43:07.477696Z",
"url": "https://files.pythonhosted.org/packages/13/d7/f554a3456518362e3ad9476edf61c62388252bce6d9ef69422766739e3b8/job_scraper_selenium-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d7ecc630dbc6506622a8d9e8162fbb71c456bcc51d8a7180ed72eea6b1be14d5",
"md5": "e4a13be4e0a357f65edb82fb168afce8",
"sha256": "5ceaf1773c1ae76e6683ebfe38c422297ecee18e2d51a2040dc31deb40fa5ebb"
},
"downloads": -1,
"filename": "job_scraper_selenium-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "e4a13be4e0a357f65edb82fb168afce8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 13264,
"upload_time": "2025-11-03T12:43:08",
"upload_time_iso_8601": "2025-11-03T12:43:08.845492Z",
"url": "https://files.pythonhosted.org/packages/d7/ec/c630dbc6506622a8d9e8162fbb71c456bcc51d8a7180ed72eea6b1be14d5/job_scraper_selenium-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-03 12:43:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "adilc0070",
"github_project": "job-scraper",
"github_not_found": true,
"lcname": "job-scraper-selenium"
}