linkedin-scraper


Namelinkedin-scraper JSON
Version 2.11.2 PyPI version JSON
download
home_pagehttps://github.com/joeyism/linkedin_scraper
SummaryScrapes user data from Linkedin
upload_time2023-07-04 20:47:23
maintainer
docs_urlNone
authorJoey Sham
requires_python
license
keywords linkedin scraping scraper
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Linkedin Scraper

Scrapes Linkedin User Data

[Linkedin Scraper](#linkedin-scraper)
* [Installation](#installation)
* [Setup](#setup)
* [Usage](#usage)
  + [Sample Usage](#sample-usage)
  + [User Scraping](#user-scraping)
  + [Company Scraping](#company-scraping)
  + [Job Scraping](#job-scraping)
  + [Job Search Scraping](#job-search-scraping)
  + [Scraping sites where login is required first](#scraping-sites-where-login-is-required-first)
  + [Scraping sites and login automatically](#scraping-sites-and-login-automatically)
* [API](#api)
  + [Person](#person)
    - [`linkedin_url`](#linkedin_url)
    - [`name`](#name)
    - [`about`](#about)
    - [`experiences`](#experiences)
    - [`educations`](#educations)
    - [`interests`](#interests)
    - [`accomplishment`](#accomplishment)
    - [`company`](#company)
    - [`job_title`](#job_title)
    - [`driver`](#driver)
    - [`scrape`](#scrape)
    - [`scrape(close_on_complete=True)`](#scrapeclose_on_completetrue)
  + [Company](#company)
    - [`linkedin_url`](#linkedin_url-1)
    - [`name`](#name-1)
    - [`about_us`](#about_us)
    - [`website`](#website)
    - [`headquarters`](#headquarters)
    - [`founded`](#founded)
    - [`company_type`](#company_type)
    - [`company_size`](#company_size)
    - [`specialties`](#specialties)
    - [`showcase_pages`](#showcase_pages)
    - [`affiliated_companies`](#affiliated_companies)
    - [`driver`](#driver-1)
    - [`get_employees`](#get_employees)
    - [`scrape(close_on_complete=True)`](#scrapeclose_on_completetrue-1)
* [Contribution](#contribution)

## Installation

```bash
pip3 install --user linkedin_scraper
```

Version **2.0.0** and before is called `linkedin_user_scraper` and can be installed via `pip3 install --user linkedin_user_scraper`

## Setup
First, you must set your chromedriver location by

```bash
export CHROMEDRIVER=~/chromedriver
```

## Sponsor
[![rds-cost](https://raw.githubusercontent.com/joeyism/linkedin_scraper/master/docs/proxycurl.png)](https://nubela.co/proxycurl/?utm_campaign=influencer%20marketing&utm_source=github&utm_medium=social&utm_term=-&utm_content=joeyism)

Scrape public LinkedIn profile data at scale with [Proxycurl APIs](https://nubela.co/proxycurl/?utm_campaign=influencer%20marketing&utm_source=github&utm_medium=social&utm_term=-&utm_content=joeyism).

• Scraping Public profiles are battle tested in court in HiQ VS LinkedIn case.<br/>
• GDPR, CCPA, SOC2 compliant<br/>
• High rate limit - 300 requests/minute<br/>
• Fast - APIs respond in ~2s<br/>
• Fresh data - 88% of data is scraped real-time, other 12% are not older than 29 days<br/>
• High accuracy<br/>
• Tons of data points returned per profile

Built for developers, by developers.

## Usage
To use it, just create the class.

### Sample Usage
```python
from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()

email = "some-email@email.address"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/joey-sham-aa2a50122", driver=driver)
```

**NOTE**: The account used to log-in should have it's language set English to make sure everything works as expected.

### User Scraping
```python
from linkedin_scraper import Person
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5")
```

### Company Scraping
```python
from linkedin_scraper import Company
company = Company("https://ca.linkedin.com/company/google")
```

### Job Scraping
```python
from linkedin_scraper import JobSearch, actions
from selenium import webdriver

driver = webdriver.Chrome()
email = "some-email@email.address"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
input("Press Enter")
job = Job("https://www.linkedin.com/jobs/collections/recommended/?currentJobId=3456898261", driver=driver, close_on_complete=False)
```

### Job Search Scraping
```python
from linkedin_scraper import JobSearch, actions
from selenium import webdriver

driver = webdriver.Chrome()
email = "some-email@email.address"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
input("Press Enter")
job_search = JobSearch(driver=driver, close_on_complete=False, scrape=False)
# job_search contains jobs from your logged in front page:
# - job_search.recommended_jobs
# - job_search.still_hiring
# - job_search.more_jobs

job_listings = job_search.search("Machine Learning Engineer") # returns the list of `Job` from the first page
```

### Scraping sites where login is required first
1. Run `ipython` or `python`
2. In `ipython`/`python`, run the following code (you can modify it if you need to specify your driver)
3. 
```python
from linkedin_scraper import Person
from selenium import webdriver
driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver, scrape=False)
```
4. Login to Linkedin
5. [OPTIONAL] Logout of Linkedin
6. In the same `ipython`/`python` code, run
```python
person.scrape()
```

The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting `scrape=False`, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the browser and it won't affect your profile views. Then when you run `person.scrape()`, it'll scrape and close the browser. If you want to keep the browser on so you can scrape others, run it as 

**NOTE**: For version >= `2.1.0`, scraping can also occur while logged in. Beware that users will be able to see that you viewed their profile.

```python
person.scrape(close_on_complete=False)
``` 
so it doesn't close.

### Scraping sites and login automatically
From verison **2.4.0** on, `actions` is a part of the library that allows signing into Linkedin first. The email and password can be provided as a variable into the function. If not provided, both will be prompted in terminal.

```python
from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()
email = "some-email@email.address"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver)
```


## API

### Person
A Person object can be created with the following inputs:

```python
Person(linkedin_url=None, name=None, about=[], experiences=[], educations=[], interests=[], accomplishments=[], company=None, job_title=None, driver=None, scrape=True)
```
#### `linkedin_url`
This is the linkedin url of their profile

#### `name`
This is the name of the person

#### `about`
This is the small paragraph about the person

#### `experiences`
This is the past experiences they have. A list of `linkedin_scraper.scraper.Experience`

#### `educations`
This is the past educations they have. A list of `linkedin_scraper.scraper.Education`

#### `interests`
This is the interests they have. A list of `linkedin_scraper.scraper.Interest`

#### `accomplishment`
This is the accomplishments they have. A list of `linkedin_scraper.scraper.Accomplishment`

#### `company`
This the most recent company or institution they have worked at. 

#### `job_title`
This the most recent job title they have. 

#### `driver`
This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

For example
```python
driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver)
```

#### `scrape`
When this is **True**, the scraping happens automatically. To scrape afterwards, that can be run by the `scrape()` function from the `Person` object.


#### `scrape(close_on_complete=True)`
This is the meat of the code, where execution of this function scrapes the profile. If *close_on_complete* is True (which it is by default), then the browser will close upon completion. If scraping of other profiles are desired, then you might want to set that to false so you can keep using the same driver.

 


### Company

```python
Company(linkedin_url=None, name=None, about_us=None, website=None, headquarters=None, founded=None, company_type=None, company_size=None, specialties=None, showcase_pages=[], affiliated_companies=[], driver=None, scrape=True, get_employees=True)
```

#### `linkedin_url`
This is the linkedin url of their profile

#### `name`
This is the name of the company

#### `about_us`
The description of the company

#### `website`
The website of the company

#### `headquarters`
The headquarters location of the company

#### `founded`
When the company was founded

#### `company_type`
The type of the company

#### `company_size`
How many people are employeed at the company

#### `specialties`
What the company specializes in

#### `showcase_pages`
Pages that the company owns to showcase their products

#### `affiliated_companies`
Other companies that are affiliated with this one

#### `driver`
This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

#### `get_employees`
Whether to get all the employees of company

For example
```python
driver = webdriver.Chrome()
company = Company("https://ca.linkedin.com/company/google", driver=driver)
```


#### `scrape(close_on_complete=True)`
This is the meat of the code, where execution of this function scrapes the company. If *close_on_complete* is True (which it is by default), then the browser will close upon completion. If scraping of other companies are desired, then you might want to set that to false so you can keep using the same driver.

## Contribution

<a href="https://www.buymeacoffee.com/joeyism" target="_blank"><img src="https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png" alt="Buy Me A Coffee" style="height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;" ></a>



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/joeyism/linkedin_scraper",
    "name": "linkedin-scraper",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "linkedin,scraping,scraper",
    "author": "Joey Sham",
    "author_email": "sham.joey@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/62/a4/b9951aa748b152fe9cc03845c114b0124ed7680265b07c6db3bdeb2452a3/linkedin_scraper-2.11.2.tar.gz",
    "platform": null,
    "description": "# Linkedin Scraper\n\nScrapes Linkedin User Data\n\n[Linkedin Scraper](#linkedin-scraper)\n* [Installation](#installation)\n* [Setup](#setup)\n* [Usage](#usage)\n  + [Sample Usage](#sample-usage)\n  + [User Scraping](#user-scraping)\n  + [Company Scraping](#company-scraping)\n  + [Job Scraping](#job-scraping)\n  + [Job Search Scraping](#job-search-scraping)\n  + [Scraping sites where login is required first](#scraping-sites-where-login-is-required-first)\n  + [Scraping sites and login automatically](#scraping-sites-and-login-automatically)\n* [API](#api)\n  + [Person](#person)\n    - [`linkedin_url`](#linkedin_url)\n    - [`name`](#name)\n    - [`about`](#about)\n    - [`experiences`](#experiences)\n    - [`educations`](#educations)\n    - [`interests`](#interests)\n    - [`accomplishment`](#accomplishment)\n    - [`company`](#company)\n    - [`job_title`](#job_title)\n    - [`driver`](#driver)\n    - [`scrape`](#scrape)\n    - [`scrape(close_on_complete=True)`](#scrapeclose_on_completetrue)\n  + [Company](#company)\n    - [`linkedin_url`](#linkedin_url-1)\n    - [`name`](#name-1)\n    - [`about_us`](#about_us)\n    - [`website`](#website)\n    - [`headquarters`](#headquarters)\n    - [`founded`](#founded)\n    - [`company_type`](#company_type)\n    - [`company_size`](#company_size)\n    - [`specialties`](#specialties)\n    - [`showcase_pages`](#showcase_pages)\n    - [`affiliated_companies`](#affiliated_companies)\n    - [`driver`](#driver-1)\n    - [`get_employees`](#get_employees)\n    - [`scrape(close_on_complete=True)`](#scrapeclose_on_completetrue-1)\n* [Contribution](#contribution)\n\n## Installation\n\n```bash\npip3 install --user linkedin_scraper\n```\n\nVersion **2.0.0** and before is called `linkedin_user_scraper` and can be installed via `pip3 install --user linkedin_user_scraper`\n\n## Setup\nFirst, you must set your chromedriver location by\n\n```bash\nexport CHROMEDRIVER=~/chromedriver\n```\n\n## Sponsor\n[![rds-cost](https://raw.githubusercontent.com/joeyism/linkedin_scraper/master/docs/proxycurl.png)](https://nubela.co/proxycurl/?utm_campaign=influencer%20marketing&utm_source=github&utm_medium=social&utm_term=-&utm_content=joeyism)\n\nScrape public LinkedIn profile data at scale with [Proxycurl APIs](https://nubela.co/proxycurl/?utm_campaign=influencer%20marketing&utm_source=github&utm_medium=social&utm_term=-&utm_content=joeyism).\n\n\u2022 Scraping Public profiles are battle tested in court in HiQ VS LinkedIn case.<br/>\n\u2022 GDPR, CCPA, SOC2 compliant<br/>\n\u2022 High rate limit - 300 requests/minute<br/>\n\u2022 Fast - APIs respond in ~2s<br/>\n\u2022 Fresh data - 88% of data is scraped real-time, other 12% are not older than 29 days<br/>\n\u2022 High accuracy<br/>\n\u2022 Tons of data points returned per profile\n\nBuilt for developers, by developers.\n\n## Usage\nTo use it, just create the class.\n\n### Sample Usage\n```python\nfrom linkedin_scraper import Person, actions\nfrom selenium import webdriver\ndriver = webdriver.Chrome()\n\nemail = \"some-email@email.address\"\npassword = \"password123\"\nactions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal\nperson = Person(\"https://www.linkedin.com/in/joey-sham-aa2a50122\", driver=driver)\n```\n\n**NOTE**: The account used to log-in should have it's language set English to make sure everything works as expected.\n\n### User Scraping\n```python\nfrom linkedin_scraper import Person\nperson = Person(\"https://www.linkedin.com/in/andre-iguodala-65b48ab5\")\n```\n\n### Company Scraping\n```python\nfrom linkedin_scraper import Company\ncompany = Company(\"https://ca.linkedin.com/company/google\")\n```\n\n### Job Scraping\n```python\nfrom linkedin_scraper import JobSearch, actions\nfrom selenium import webdriver\n\ndriver = webdriver.Chrome()\nemail = \"some-email@email.address\"\npassword = \"password123\"\nactions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal\ninput(\"Press Enter\")\njob = Job(\"https://www.linkedin.com/jobs/collections/recommended/?currentJobId=3456898261\", driver=driver, close_on_complete=False)\n```\n\n### Job Search Scraping\n```python\nfrom linkedin_scraper import JobSearch, actions\nfrom selenium import webdriver\n\ndriver = webdriver.Chrome()\nemail = \"some-email@email.address\"\npassword = \"password123\"\nactions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal\ninput(\"Press Enter\")\njob_search = JobSearch(driver=driver, close_on_complete=False, scrape=False)\n# job_search contains jobs from your logged in front page:\n# - job_search.recommended_jobs\n# - job_search.still_hiring\n# - job_search.more_jobs\n\njob_listings = job_search.search(\"Machine Learning Engineer\") # returns the list of `Job` from the first page\n```\n\n### Scraping sites where login is required first\n1. Run `ipython` or `python`\n2. In `ipython`/`python`, run the following code (you can modify it if you need to specify your driver)\n3. \n```python\nfrom linkedin_scraper import Person\nfrom selenium import webdriver\ndriver = webdriver.Chrome()\nperson = Person(\"https://www.linkedin.com/in/andre-iguodala-65b48ab5\", driver = driver, scrape=False)\n```\n4. Login to Linkedin\n5. [OPTIONAL] Logout of Linkedin\n6. In the same `ipython`/`python` code, run\n```python\nperson.scrape()\n```\n\nThe reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting `scrape=False`, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the browser and it won't affect your profile views. Then when you run `person.scrape()`, it'll scrape and close the browser. If you want to keep the browser on so you can scrape others, run it as \n\n**NOTE**: For version >= `2.1.0`, scraping can also occur while logged in. Beware that users will be able to see that you viewed their profile.\n\n```python\nperson.scrape(close_on_complete=False)\n``` \nso it doesn't close.\n\n### Scraping sites and login automatically\nFrom verison **2.4.0** on, `actions` is a part of the library that allows signing into Linkedin first. The email and password can be provided as a variable into the function. If not provided, both will be prompted in terminal.\n\n```python\nfrom linkedin_scraper import Person, actions\nfrom selenium import webdriver\ndriver = webdriver.Chrome()\nemail = \"some-email@email.address\"\npassword = \"password123\"\nactions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal\nperson = Person(\"https://www.linkedin.com/in/andre-iguodala-65b48ab5\", driver=driver)\n```\n\n\n## API\n\n### Person\nA Person object can be created with the following inputs:\n\n```python\nPerson(linkedin_url=None, name=None, about=[], experiences=[], educations=[], interests=[], accomplishments=[], company=None, job_title=None, driver=None, scrape=True)\n```\n#### `linkedin_url`\nThis is the linkedin url of their profile\n\n#### `name`\nThis is the name of the person\n\n#### `about`\nThis is the small paragraph about the person\n\n#### `experiences`\nThis is the past experiences they have. A list of `linkedin_scraper.scraper.Experience`\n\n#### `educations`\nThis is the past educations they have. A list of `linkedin_scraper.scraper.Education`\n\n#### `interests`\nThis is the interests they have. A list of `linkedin_scraper.scraper.Interest`\n\n#### `accomplishment`\nThis is the accomplishments they have. A list of `linkedin_scraper.scraper.Accomplishment`\n\n#### `company`\nThis the most recent company or institution they have worked at. \n\n#### `job_title`\nThis the most recent job title they have. \n\n#### `driver`\nThis is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.\n\nFor example\n```python\ndriver = webdriver.Chrome()\nperson = Person(\"https://www.linkedin.com/in/andre-iguodala-65b48ab5\", driver = driver)\n```\n\n#### `scrape`\nWhen this is **True**, the scraping happens automatically. To scrape afterwards, that can be run by the `scrape()` function from the `Person` object.\n\n\n#### `scrape(close_on_complete=True)`\nThis is the meat of the code, where execution of this function scrapes the profile. If *close_on_complete* is True (which it is by default), then the browser will close upon completion. If scraping of other profiles are desired, then you might want to set that to false so you can keep using the same driver.\n\n \n\n\n### Company\n\n```python\nCompany(linkedin_url=None, name=None, about_us=None, website=None, headquarters=None, founded=None, company_type=None, company_size=None, specialties=None, showcase_pages=[], affiliated_companies=[], driver=None, scrape=True, get_employees=True)\n```\n\n#### `linkedin_url`\nThis is the linkedin url of their profile\n\n#### `name`\nThis is the name of the company\n\n#### `about_us`\nThe description of the company\n\n#### `website`\nThe website of the company\n\n#### `headquarters`\nThe headquarters location of the company\n\n#### `founded`\nWhen the company was founded\n\n#### `company_type`\nThe type of the company\n\n#### `company_size`\nHow many people are employeed at the company\n\n#### `specialties`\nWhat the company specializes in\n\n#### `showcase_pages`\nPages that the company owns to showcase their products\n\n#### `affiliated_companies`\nOther companies that are affiliated with this one\n\n#### `driver`\nThis is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.\n\n#### `get_employees`\nWhether to get all the employees of company\n\nFor example\n```python\ndriver = webdriver.Chrome()\ncompany = Company(\"https://ca.linkedin.com/company/google\", driver=driver)\n```\n\n\n#### `scrape(close_on_complete=True)`\nThis is the meat of the code, where execution of this function scrapes the company. If *close_on_complete* is True (which it is by default), then the browser will close upon completion. If scraping of other companies are desired, then you might want to set that to false so you can keep using the same driver.\n\n## Contribution\n\n<a href=\"https://www.buymeacoffee.com/joeyism\" target=\"_blank\"><img src=\"https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png\" alt=\"Buy Me A Coffee\" style=\"height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;\" ></a>\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Scrapes user data from Linkedin",
    "version": "2.11.2",
    "project_urls": {
        "Download": "https://github.com/joeyism/linkedin_scraper/dist/2.11.2.tar.gz",
        "Homepage": "https://github.com/joeyism/linkedin_scraper"
    },
    "split_keywords": [
        "linkedin",
        "scraping",
        "scraper"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a6194873f3259f020de59dff11ae85e36fc4cf669bc860e6123df00159b7f958",
                "md5": "c0fac8c29f59bb81096957ab38f9bebd",
                "sha256": "a4d07866278b13ff90c53b353fef043ffbaaed594485181176ea12fe4a8ee959"
            },
            "downloads": -1,
            "filename": "linkedin_scraper-2.11.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c0fac8c29f59bb81096957ab38f9bebd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 28664,
            "upload_time": "2023-07-04T20:47:21",
            "upload_time_iso_8601": "2023-07-04T20:47:21.623250Z",
            "url": "https://files.pythonhosted.org/packages/a6/19/4873f3259f020de59dff11ae85e36fc4cf669bc860e6123df00159b7f958/linkedin_scraper-2.11.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "62a4b9951aa748b152fe9cc03845c114b0124ed7680265b07c6db3bdeb2452a3",
                "md5": "4788203f22a0d5649a900f0e8358bff3",
                "sha256": "9dd55cbd8bd9dcd75654ae6ffbe0a555e6f1d6b5c2dd3f9ab0202f1eedfb5682"
            },
            "downloads": -1,
            "filename": "linkedin_scraper-2.11.2.tar.gz",
            "has_sig": false,
            "md5_digest": "4788203f22a0d5649a900f0e8358bff3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 29851,
            "upload_time": "2023-07-04T20:47:23",
            "upload_time_iso_8601": "2023-07-04T20:47:23.579156Z",
            "url": "https://files.pythonhosted.org/packages/62/a4/b9951aa748b152fe9cc03845c114b0124ed7680265b07c6db3bdeb2452a3/linkedin_scraper-2.11.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-04 20:47:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "joeyism",
    "github_project": "linkedin_scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "linkedin-scraper"
}
        
Elapsed time: 0.10996s