job-scraper-selenium


Namejob-scraper-selenium JSON
Version 1.0.2 PyPI version JSON
download
home_pagehttps://github.com/adilc0070/job-scraper
SummaryA Python package for scraping job postings from Indeed and LinkedIn
upload_time2025-11-03 12:43:08
maintainerNone
docs_urlNone
authorAdil C
requires_python>=3.7
licenseNone
keywords job scraper indeed linkedin selenium web-scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Job Scraper 🚀

A powerful and easy-to-use Python package for scraping job postings from **Indeed** and **LinkedIn** using Selenium browser automation.

## Features ✨

- 🔍 Scrape job details from Indeed and LinkedIn
- 🤖 Automatic platform detection
- 📊 Extract title, company, location, and full job description
- 🎯 Simple and intuitive API
- 🛡️ Bypasses anti-bot protection using Selenium
- ⚙️ Configurable options (headless mode, verbose output)

## Installation 📦

### Install from source

```bash
# Clone or download the repository
git clone https://github.com/adilc0070/job-scraper.git
cd job-scraper

# Install the package
pip install .
```

### Install in development mode

```bash
pip install -e .
```

## Quick Start 🚀

### Basic Usage

```python
from job_scraper import scrape_job

# Scrape any supported job posting (auto-detects platform)
job = scrape_job('https://in.indeed.com/viewjob?jk=...')

if job:
    print(f"Title: {job['title']}")
    print(f"Company: {job['company']}")
    print(f"Location: {job['location']}")
    print(f"Description: {job['description'][:200]}...")
```

### Platform-Specific Scraping

```python
from job_scraper import scrape_indeed_job, scrape_linkedin_job

# Scrape from Indeed
indeed_job = scrape_indeed_job('https://in.indeed.com/viewjob?jk=...')

# Scrape from LinkedIn
linkedin_job = scrape_linkedin_job('https://www.linkedin.com/jobs/view/...')
```

### Advanced Options

```python
from job_scraper import scrape_job

# Run in non-headless mode (show browser window)
job = scrape_job(url, headless=False)

# Disable verbose output (silent mode)
job = scrape_job(url, verbose=False)

# Combine options
job = scrape_job(url, headless=False, verbose=False)
```

## API Reference 📚

### `scrape_job(url, headless=True, verbose=True)`

Automatically detect and scrape jobs from Indeed or LinkedIn.

**Parameters:**
- `url` (str): The job posting URL (Indeed or LinkedIn)
- `headless` (bool): Run browser in headless mode (default: True)
- `verbose` (bool): Print progress messages (default: True)

**Returns:**
- `dict`: Job details or `None` if scraping fails

**Example:**
```python
job = scrape_job('https://in.indeed.com/viewjob?jk=123456')
```

### `scrape_indeed_job(url, headless=True, verbose=True)`

Scrape a single Indeed job posting.

**Parameters:**
- Same as `scrape_job()`

**Returns:**
- `dict`: Job details with keys: `title`, `company`, `location`, `description`, `source`, `url`

### `scrape_linkedin_job(url, headless=True, verbose=True)`

Scrape a single LinkedIn job posting.

**Parameters:**
- Same as `scrape_job()`

**Returns:**
- `dict`: Job details with keys: `title`, `company`, `location`, `description`, `source`, `url`

## Response Format 📋

All scraping functions return a dictionary with the following structure:

```python
{
    'title': 'Senior Python Developer',
    'company': 'Tech Company Inc.',
    'location': 'San Francisco, CA',
    'description': 'Full job description text...',
    'source': 'Indeed',  # or 'LinkedIn'
    'url': 'https://...'
}
```

## Examples 💡

### Scrape Multiple Jobs

```python
from job_scraper import scrape_job

urls = [
    'https://in.indeed.com/viewjob?jk=123',
    'https://www.linkedin.com/jobs/view/456',
    'https://in.indeed.com/viewjob?jk=789'
]

jobs = []
for url in urls:
    job = scrape_job(url, verbose=False)
    if job:
        jobs.append(job)
        print(f"✓ Scraped: {job['title']}")

print(f"\nTotal jobs scraped: {len(jobs)}")
```

### Save to JSON

```python
import json
from job_scraper import scrape_job

job = scrape_job('https://in.indeed.com/viewjob?jk=...')

if job:
    with open('job_data.json', 'w', encoding='utf-8') as f:
        json.dump(job, f, indent=2, ensure_ascii=False)
    print("Job saved to job_data.json")
```

### Filter by Keywords

```python
from job_scraper import scrape_job

keywords = ['python', 'django', 'flask']

job = scrape_job('https://in.indeed.com/viewjob?jk=...')

if job and any(keyword.lower() in job['description'].lower() for keyword in keywords):
    print(f"✓ Job matches keywords: {job['title']}")
else:
    print("✗ Job doesn't match keywords")
```

## Requirements 🛠️

- Python 3.7+
- selenium >= 4.0.0
- beautifulsoup4 >= 4.9.0
- webdriver-manager >= 4.0.0
- Google Chrome (installed automatically by webdriver-manager)

## Limitations ⚠️

- Some job sites may block automated scraping
- LinkedIn may require authentication for certain jobs
- Rate limiting may apply - avoid scraping too many jobs rapidly
- Always respect the website's Terms of Service and robots.txt

## Troubleshooting 🔧

### Browser not found
The package automatically downloads ChromeDriver. Ensure you have Google Chrome installed.

### 403 Forbidden Error
Some sites may block requests. Try:
- Using `headless=False` to run in visible browser mode
- Adding delays between requests
- Using a VPN or proxy

### Encoding errors
The package handles most encoding issues automatically. If problems persist, the scraped data is returned as UTF-8.

## Contributing 🤝

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

### Ways to Contribute

1. **Report bugs** - Open an issue on GitHub
2. **Add features** - Submit a pull request
3. **Improve documentation** - Fix typos or add examples
4. **Support more platforms** - Add scrapers for other job sites

## Publishing to PyPI 📤

To publish this package to PyPI so others can install it with `pip install job-scraper-selenium`:

### 1. Create PyPI account
- Sign up at [https://pypi.org/account/register/](https://pypi.org/account/register/)

### 2. Install publishing tools
```bash
pip install build twine
```

### 3. Build the package
```bash
python -m build
```

### 4. Upload to PyPI
```bash
# Test on TestPyPI first (optional)
twine upload --repository testpypi dist/*

# Upload to real PyPI
twine upload dist/*
```

### 5. Install your package
```bash
pip install job-scraper-selenium
```

## License 📄

MIT License - see [LICENSE](LICENSE) file for details

## Author ✍️

**Your Name**
- GitHub: [@adilc0070](https://github.com/adilc0070)
- Email: adilc0070@gmail.com

## Acknowledgments 🙏

- Selenium for browser automation
- BeautifulSoup for HTML parsing
- webdriver-manager for automatic driver management

## Support 💬

If you found this package helpful, please:
- ⭐ Star the repository
- 🐛 Report issues
- 🔀 Submit pull requests
- 📢 Share with others

---

**Disclaimer:** This package is for educational purposes. Always respect website Terms of Service and use responsibly.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/adilc0070/job-scraper",
    "name": "job-scraper-selenium",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "job scraper indeed linkedin selenium web-scraping",
    "author": "Adil C",
    "author_email": "adilc0070@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d7/ec/c630dbc6506622a8d9e8162fbb71c456bcc51d8a7180ed72eea6b1be14d5/job_scraper_selenium-1.0.2.tar.gz",
    "platform": null,
    "description": "# Job Scraper \ud83d\ude80\r\n\r\nA powerful and easy-to-use Python package for scraping job postings from **Indeed** and **LinkedIn** using Selenium browser automation.\r\n\r\n## Features \u2728\r\n\r\n- \ud83d\udd0d Scrape job details from Indeed and LinkedIn\r\n- \ud83e\udd16 Automatic platform detection\r\n- \ud83d\udcca Extract title, company, location, and full job description\r\n- \ud83c\udfaf Simple and intuitive API\r\n- \ud83d\udee1\ufe0f Bypasses anti-bot protection using Selenium\r\n- \u2699\ufe0f Configurable options (headless mode, verbose output)\r\n\r\n## Installation \ud83d\udce6\r\n\r\n### Install from source\r\n\r\n```bash\r\n# Clone or download the repository\r\ngit clone https://github.com/adilc0070/job-scraper.git\r\ncd job-scraper\r\n\r\n# Install the package\r\npip install .\r\n```\r\n\r\n### Install in development mode\r\n\r\n```bash\r\npip install -e .\r\n```\r\n\r\n## Quick Start \ud83d\ude80\r\n\r\n### Basic Usage\r\n\r\n```python\r\nfrom job_scraper import scrape_job\r\n\r\n# Scrape any supported job posting (auto-detects platform)\r\njob = scrape_job('https://in.indeed.com/viewjob?jk=...')\r\n\r\nif job:\r\n    print(f\"Title: {job['title']}\")\r\n    print(f\"Company: {job['company']}\")\r\n    print(f\"Location: {job['location']}\")\r\n    print(f\"Description: {job['description'][:200]}...\")\r\n```\r\n\r\n### Platform-Specific Scraping\r\n\r\n```python\r\nfrom job_scraper import scrape_indeed_job, scrape_linkedin_job\r\n\r\n# Scrape from Indeed\r\nindeed_job = scrape_indeed_job('https://in.indeed.com/viewjob?jk=...')\r\n\r\n# Scrape from LinkedIn\r\nlinkedin_job = scrape_linkedin_job('https://www.linkedin.com/jobs/view/...')\r\n```\r\n\r\n### Advanced Options\r\n\r\n```python\r\nfrom job_scraper import scrape_job\r\n\r\n# Run in non-headless mode (show browser window)\r\njob = scrape_job(url, headless=False)\r\n\r\n# Disable verbose output (silent mode)\r\njob = scrape_job(url, verbose=False)\r\n\r\n# Combine options\r\njob = scrape_job(url, headless=False, verbose=False)\r\n```\r\n\r\n## API Reference \ud83d\udcda\r\n\r\n### `scrape_job(url, headless=True, verbose=True)`\r\n\r\nAutomatically detect and scrape jobs from Indeed or LinkedIn.\r\n\r\n**Parameters:**\r\n- `url` (str): The job posting URL (Indeed or LinkedIn)\r\n- `headless` (bool): Run browser in headless mode (default: True)\r\n- `verbose` (bool): Print progress messages (default: True)\r\n\r\n**Returns:**\r\n- `dict`: Job details or `None` if scraping fails\r\n\r\n**Example:**\r\n```python\r\njob = scrape_job('https://in.indeed.com/viewjob?jk=123456')\r\n```\r\n\r\n### `scrape_indeed_job(url, headless=True, verbose=True)`\r\n\r\nScrape a single Indeed job posting.\r\n\r\n**Parameters:**\r\n- Same as `scrape_job()`\r\n\r\n**Returns:**\r\n- `dict`: Job details with keys: `title`, `company`, `location`, `description`, `source`, `url`\r\n\r\n### `scrape_linkedin_job(url, headless=True, verbose=True)`\r\n\r\nScrape a single LinkedIn job posting.\r\n\r\n**Parameters:**\r\n- Same as `scrape_job()`\r\n\r\n**Returns:**\r\n- `dict`: Job details with keys: `title`, `company`, `location`, `description`, `source`, `url`\r\n\r\n## Response Format \ud83d\udccb\r\n\r\nAll scraping functions return a dictionary with the following structure:\r\n\r\n```python\r\n{\r\n    'title': 'Senior Python Developer',\r\n    'company': 'Tech Company Inc.',\r\n    'location': 'San Francisco, CA',\r\n    'description': 'Full job description text...',\r\n    'source': 'Indeed',  # or 'LinkedIn'\r\n    'url': 'https://...'\r\n}\r\n```\r\n\r\n## Examples \ud83d\udca1\r\n\r\n### Scrape Multiple Jobs\r\n\r\n```python\r\nfrom job_scraper import scrape_job\r\n\r\nurls = [\r\n    'https://in.indeed.com/viewjob?jk=123',\r\n    'https://www.linkedin.com/jobs/view/456',\r\n    'https://in.indeed.com/viewjob?jk=789'\r\n]\r\n\r\njobs = []\r\nfor url in urls:\r\n    job = scrape_job(url, verbose=False)\r\n    if job:\r\n        jobs.append(job)\r\n        print(f\"\u2713 Scraped: {job['title']}\")\r\n\r\nprint(f\"\\nTotal jobs scraped: {len(jobs)}\")\r\n```\r\n\r\n### Save to JSON\r\n\r\n```python\r\nimport json\r\nfrom job_scraper import scrape_job\r\n\r\njob = scrape_job('https://in.indeed.com/viewjob?jk=...')\r\n\r\nif job:\r\n    with open('job_data.json', 'w', encoding='utf-8') as f:\r\n        json.dump(job, f, indent=2, ensure_ascii=False)\r\n    print(\"Job saved to job_data.json\")\r\n```\r\n\r\n### Filter by Keywords\r\n\r\n```python\r\nfrom job_scraper import scrape_job\r\n\r\nkeywords = ['python', 'django', 'flask']\r\n\r\njob = scrape_job('https://in.indeed.com/viewjob?jk=...')\r\n\r\nif job and any(keyword.lower() in job['description'].lower() for keyword in keywords):\r\n    print(f\"\u2713 Job matches keywords: {job['title']}\")\r\nelse:\r\n    print(\"\u2717 Job doesn't match keywords\")\r\n```\r\n\r\n## Requirements \ud83d\udee0\ufe0f\r\n\r\n- Python 3.7+\r\n- selenium >= 4.0.0\r\n- beautifulsoup4 >= 4.9.0\r\n- webdriver-manager >= 4.0.0\r\n- Google Chrome (installed automatically by webdriver-manager)\r\n\r\n## Limitations \u26a0\ufe0f\r\n\r\n- Some job sites may block automated scraping\r\n- LinkedIn may require authentication for certain jobs\r\n- Rate limiting may apply - avoid scraping too many jobs rapidly\r\n- Always respect the website's Terms of Service and robots.txt\r\n\r\n## Troubleshooting \ud83d\udd27\r\n\r\n### Browser not found\r\nThe package automatically downloads ChromeDriver. Ensure you have Google Chrome installed.\r\n\r\n### 403 Forbidden Error\r\nSome sites may block requests. Try:\r\n- Using `headless=False` to run in visible browser mode\r\n- Adding delays between requests\r\n- Using a VPN or proxy\r\n\r\n### Encoding errors\r\nThe package handles most encoding issues automatically. If problems persist, the scraped data is returned as UTF-8.\r\n\r\n## Contributing \ud83e\udd1d\r\n\r\nContributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\r\n\r\n### Ways to Contribute\r\n\r\n1. **Report bugs** - Open an issue on GitHub\r\n2. **Add features** - Submit a pull request\r\n3. **Improve documentation** - Fix typos or add examples\r\n4. **Support more platforms** - Add scrapers for other job sites\r\n\r\n## Publishing to PyPI \ud83d\udce4\r\n\r\nTo publish this package to PyPI so others can install it with `pip install job-scraper-selenium`:\r\n\r\n### 1. Create PyPI account\r\n- Sign up at [https://pypi.org/account/register/](https://pypi.org/account/register/)\r\n\r\n### 2. Install publishing tools\r\n```bash\r\npip install build twine\r\n```\r\n\r\n### 3. Build the package\r\n```bash\r\npython -m build\r\n```\r\n\r\n### 4. Upload to PyPI\r\n```bash\r\n# Test on TestPyPI first (optional)\r\ntwine upload --repository testpypi dist/*\r\n\r\n# Upload to real PyPI\r\ntwine upload dist/*\r\n```\r\n\r\n### 5. Install your package\r\n```bash\r\npip install job-scraper-selenium\r\n```\r\n\r\n## License \ud83d\udcc4\r\n\r\nMIT License - see [LICENSE](LICENSE) file for details\r\n\r\n## Author \u270d\ufe0f\r\n\r\n**Your Name**\r\n- GitHub: [@adilc0070](https://github.com/adilc0070)\r\n- Email: adilc0070@gmail.com\r\n\r\n## Acknowledgments \ud83d\ude4f\r\n\r\n- Selenium for browser automation\r\n- BeautifulSoup for HTML parsing\r\n- webdriver-manager for automatic driver management\r\n\r\n## Support \ud83d\udcac\r\n\r\nIf you found this package helpful, please:\r\n- \u2b50 Star the repository\r\n- \ud83d\udc1b Report issues\r\n- \ud83d\udd00 Submit pull requests\r\n- \ud83d\udce2 Share with others\r\n\r\n---\r\n\r\n**Disclaimer:** This package is for educational purposes. Always respect website Terms of Service and use responsibly.\r\n\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python package for scraping job postings from Indeed and LinkedIn",
    "version": "1.0.2",
    "project_urls": {
        "Bug Reports": "https://github.com/adilc0070/job-scraper/issues",
        "Homepage": "https://github.com/adilc0070/job-scraper",
        "Source": "https://github.com/adilc0070/job-scraper"
    },
    "split_keywords": [
        "job",
        "scraper",
        "indeed",
        "linkedin",
        "selenium",
        "web-scraping"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "13d7f554a3456518362e3ad9476edf61c62388252bce6d9ef69422766739e3b8",
                "md5": "30ed5f3127e61909644831d02eb614a9",
                "sha256": "df8f479273630b1a5e186d96d3f75732ae4f996258a446369fd4853941dd5e19"
            },
            "downloads": -1,
            "filename": "job_scraper_selenium-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "30ed5f3127e61909644831d02eb614a9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 7936,
            "upload_time": "2025-11-03T12:43:07",
            "upload_time_iso_8601": "2025-11-03T12:43:07.477696Z",
            "url": "https://files.pythonhosted.org/packages/13/d7/f554a3456518362e3ad9476edf61c62388252bce6d9ef69422766739e3b8/job_scraper_selenium-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d7ecc630dbc6506622a8d9e8162fbb71c456bcc51d8a7180ed72eea6b1be14d5",
                "md5": "e4a13be4e0a357f65edb82fb168afce8",
                "sha256": "5ceaf1773c1ae76e6683ebfe38c422297ecee18e2d51a2040dc31deb40fa5ebb"
            },
            "downloads": -1,
            "filename": "job_scraper_selenium-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e4a13be4e0a357f65edb82fb168afce8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 13264,
            "upload_time": "2025-11-03T12:43:08",
            "upload_time_iso_8601": "2025-11-03T12:43:08.845492Z",
            "url": "https://files.pythonhosted.org/packages/d7/ec/c630dbc6506622a8d9e8162fbb71c456bcc51d8a7180ed72eea6b1be14d5/job_scraper_selenium-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-03 12:43:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "adilc0070",
    "github_project": "job-scraper",
    "github_not_found": true,
    "lcname": "job-scraper-selenium"
}
        
Elapsed time: 1.16243s