html2info


Namehtml2info JSON
Version 0.2.0 PyPI version JSON
download
home_page
SummaryA package to parse raw HTML and return structured information.
upload_time2023-05-08 02:27:49
maintainer
docs_urlNone
authorVladimir Iglovikov
requires_python>=3.6
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # html2info

`html2info` is a Python package that allows you to parse LinkedIn profiles from raw HTML and return structured information in JSON format.

## Features

- Extracts profile information such as name, title, location, profile photo, about, experience, and education.
- Returns a JSON object containing the parsed data.

## Installation

Install `html2info` using pip:

```bash
pip install html2info
```

## Usage

Here's an example of how to use html2info:

### LinkedIn

```python
from html2info.linkedin import Person

url = "https://www.linkedin.com/in/iglovikov"
raw_data = "..."  # Raw HTML content of the LinkedIn page

person = Person(url, raw_data)
person.parse()
print(person.to_dict())
```

```json
{
    "linkedin_url": "https://www.linkedin.com/in/iglovikov",
    "name": "Vladimir Iglovikov",
    "title": "Kaggle Grandmaster. Co-creator of Albumentations.AI",
    "location": "San Francisco, California, United States",
    "profile_photo_link": "https://media.licdn.com/dms/image/C4D03AQFDvheHDkAQlw/profile-displayphoto-shrink_400_400/0/1654539436934?e=1687392000&v=beta&t=OX7WrIprduo-xWEvrRKNzYdqcqG6bdzDtlm6LWuHbIE",
    "about": "• Advisor and Angel investor.\n• Co-creator, Albumentations.AI: Open-source library with 30k daily downloads, adopted by top Computer Vision companies & Kaggle competition winners\n• Former Staff ML Engineer, Lyft Level5 (Autonomous Vehicles): Led Deep Learning model development & integration for Self-Driving & Ride Sharing\n• Kaggle Grandmaster: Multiple ML competition wins\n• Author: 20+ publications in Deep Learning for Medical, Satellite, Street View, and Natural Images",
    "experience": [
      {
        "title": "Chief Executive Officer",
        "company": "Ternaus Inc · Full-time",
        "image_link": null,
        "company_link": "https://www.linkedin.com/search/results/all/?keywords=Ternaus+Inc",
        "dates": "Aug 2022 - Present · 9 mos",
        "description": null
      },
      {
        "title": "Evangelist",
        "company": "OpenDataScience",
        "image_link": "https://media.licdn.com/dms/image/C510BAQFU1fTt5tE6Ug/company-logo_100_100/0/1554042536921?e=1689811200&v=beta&t=-sIbC_T8hZjxf5TNgO_H0ClRcYb7Y_oow6dAdW8xMHg",
        "company_link": "https://www.linkedin.com/company/11241268/",
        "dates": "Aug 2016 - Mar 2022 · 5 yrs 8 mos",
        "description": "OpenDataScience, or ODS, is a Russian-speaking community of over 50,000 data scientists, researchers, and engineers. ODS freely disseminates knowledge, and promotes professional development and exchange of ideas and opportunities in all areas of Data Science through live events, online classes and discussions, and other resources. Join us at http://ods.ai."
      },
      {
        "title": "Staff ML Engineer",
        "company": "Lyft · Full-time",
        "image_link": "https://media.licdn.com/dms/image/C560BAQFoMDej0VdZVA/company-logo_100_100/0/1545416046198?e=1689811200&v=beta&t=JV79uOIdgcbYcAeg0YAklLLZ6c5VkldGSG-Zu3G42xI",
        "company_link": "https://www.linkedin.com/company/2620735/",
        "dates": "Oct 2017 - Aug 2021 · 3 yrs 11 mos",
        "description": null
      },
      {
        "title": "Advisor",
        "company": "Iterative.ai · Part-time",
        "image_link": "https://media.licdn.com/dms/image/C4E0BAQGnnEVzx81kBg/company-logo_100_100/0/1653056165184?e=1689811200&v=beta&t=dNl2Q2CDgmX2r3KiymYIqjPtXJQXIYeTzgdNduZLLTs",
        "company_link": "https://www.linkedin.com/company/18657719/",
        "dates": "Nov 2018 - Nov 2020 · 2 yrs 1 mo",
        "description": null
      },
      {
        "title": "Senior Data Scientist (Machine Learning)",
        "company": "TrueAccord",
        "image_link": "https://media.licdn.com/dms/image/C560BAQEo_A523IxkGQ/company-logo_100_100/0/1656418732741?e=1689811200&v=beta&t=YRjhRCxnfijmSz40qvRCeKxkfoMHYGU1oiPGIJht-aw",
        "company_link": "https://www.linkedin.com/company/3249455/",
        "dates": "Jun 2016 - Sep 2017 · 1 yr 4 mos",
        "description": "Developed a supervised machine learning algorithm that predicts what personalized emails should be sent to each user to drive him to the target website. ROC AUC score 0.88. Prototyped, implemented, deployed and tested machine learning algorithm that helped to prioritize outbound phone traffic, improving conversion through phone calls by 80%."
      }
    ],
    "education_list": [
      {
        "university_name": "University of California, Davis",
        "degree_and_major": "Doctor of Philosophy (Ph.D.), Physics",
        "dates": "2010 - 2015",
        "university_link": "https://www.linkedin.com/company/2842/",
        "image_link": "https://media.licdn.com/dms/image/C4E0BAQEBG25KNBwuCQ/company-logo_100_100/0/1616103040374?e=1689811200&v=beta&t=sUF5ars4S8ek3vZs01usUvGwSJsU01KYtANnMkkZFdQ"
      },
      {
        "university_name": "Saint Petersburg State University",
        "degree_and_major": "Master's degree, Physics",
        "dates": "2001 - 2010",
        "university_link": "https://www.linkedin.com/company/15099991/",
        "image_link": "https://media.licdn.com/dms/image/C560BAQHWUjwogE235A/company-logo_100_100/0/1519863922741?e=1689811200&v=beta&t=DSpsTKY_AcMrmzWY1592EvCClph4M_TVOLdNSDpOg2I"
      }
    ]
  }
```

### Kaggle

```python
from html2info.kaggle import Person

url = "https://www.kaggle.com/iglovikov"
raw_data = "..."  # Raw HTML content of the LinkedIn page

person = Person(url, raw_data)
person.parse()
print(person.to_dict())
```

```json
{
    "url": "https://www.kaggle.com/iglovikov",
    "name": "Vladimir Iglovikov",
    "title": "CEO  at ternaus.com",
    "location": "San Francisco, California, United States",
    "profile_photo_link": "https://storage.googleapis.com/kaggle-avatars/images/286455-fb.jpg",
    "social_network_links": [
      "https://github.com/ternaus",
      "https://twitter.com/viglovikov",
      "https://www.linkedin.com/in/iglovikov",
      "https://salesbrain.tech/"
    ],
    "personal_website_link": "https://salesbrain.tech/",
    "num_followers": 1534,
    "competitions_summary": {
      "tier": "grandmaster",
      "tier_image": "/static/images/tiers/grandmaster@48.png",
      "medals": {
        "gold": 5,
        "silver": 9,
        "bronze": 8
      },
      "highest_rank": 19
    },
    "datasets_summary": {
      "tier": "contributor",
      "tier_image": "/static/images/tiers/contributor@48.png",
      "medals": {
        "gold": 0,
        "silver": 0,
        "bronze": 0
      },
      "highest_rank": -1
    },
    "notebooks_summary": {
      "tier": "contributor",
      "tier_image": "/static/images/tiers/contributor@48.png",
      "medals": {
        "gold": 1,
        "silver": 1,
        "bronze": 1
      },
      "highest_rank": -1
    },
    "discussion_summary": {
      "tier": "master",
      "tier_image": "/static/images/tiers/master@48.png",
      "medals": {
        "gold": 52,
        "silver": 26,
        "bronze": 177
      },
      "highest_rank": 6
    },
    "bio": "* CEO at Ternaus Inc\n* Staff Computer Vision Engineer at Level5 Engineering Center, Lyft Inc (2017-2021)\n* Senior Data Scientist at TrueAccord (2016-2017)\n* Data Scientist at Bidgely (2015-2016)\n* PhD in theoretical Condensed Matter Physics at University of California, Davis (2010-2015)\n* MS in theoretical High Energy Physics at Saint Petersburg State University (2001-2010)\n* Спецназ ВДВ . Медаль за воинскую доблесть за вторую Чеченскую. (2002-2004)\n"
  }

```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "html2info",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Vladimir Iglovikov",
    "author_email": "iglovikov@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c4/18/48a94e598852fd1c027913a01919a0f7147828b2fa705d177acb4261dc3d/html2info-0.2.0.tar.gz",
    "platform": null,
    "description": "# html2info\n\n`html2info` is a Python package that allows you to parse LinkedIn profiles from raw HTML and return structured information in JSON format.\n\n## Features\n\n- Extracts profile information such as name, title, location, profile photo, about, experience, and education.\n- Returns a JSON object containing the parsed data.\n\n## Installation\n\nInstall `html2info` using pip:\n\n```bash\npip install html2info\n```\n\n## Usage\n\nHere's an example of how to use html2info:\n\n### LinkedIn\n\n```python\nfrom html2info.linkedin import Person\n\nurl = \"https://www.linkedin.com/in/iglovikov\"\nraw_data = \"...\"  # Raw HTML content of the LinkedIn page\n\nperson = Person(url, raw_data)\nperson.parse()\nprint(person.to_dict())\n```\n\n```json\n{\n    \"linkedin_url\": \"https://www.linkedin.com/in/iglovikov\",\n    \"name\": \"Vladimir Iglovikov\",\n    \"title\": \"Kaggle Grandmaster. Co-creator of Albumentations.AI\",\n    \"location\": \"San Francisco, California, United States\",\n    \"profile_photo_link\": \"https://media.licdn.com/dms/image/C4D03AQFDvheHDkAQlw/profile-displayphoto-shrink_400_400/0/1654539436934?e=1687392000&v=beta&t=OX7WrIprduo-xWEvrRKNzYdqcqG6bdzDtlm6LWuHbIE\",\n    \"about\": \"\u2022 Advisor and Angel investor.\\n\u2022 Co-creator, Albumentations.AI: Open-source library with 30k daily downloads, adopted by top Computer Vision companies & Kaggle competition winners\\n\u2022 Former Staff ML Engineer, Lyft Level5 (Autonomous Vehicles): Led Deep Learning model development & integration for Self-Driving & Ride Sharing\\n\u2022 Kaggle Grandmaster: Multiple ML competition wins\\n\u2022 Author: 20+ publications in Deep Learning for Medical, Satellite, Street View, and Natural Images\",\n    \"experience\": [\n      {\n        \"title\": \"Chief Executive Officer\",\n        \"company\": \"Ternaus Inc \u00b7 Full-time\",\n        \"image_link\": null,\n        \"company_link\": \"https://www.linkedin.com/search/results/all/?keywords=Ternaus+Inc\",\n        \"dates\": \"Aug 2022 - Present \u00b7 9 mos\",\n        \"description\": null\n      },\n      {\n        \"title\": \"Evangelist\",\n        \"company\": \"OpenDataScience\",\n        \"image_link\": \"https://media.licdn.com/dms/image/C510BAQFU1fTt5tE6Ug/company-logo_100_100/0/1554042536921?e=1689811200&v=beta&t=-sIbC_T8hZjxf5TNgO_H0ClRcYb7Y_oow6dAdW8xMHg\",\n        \"company_link\": \"https://www.linkedin.com/company/11241268/\",\n        \"dates\": \"Aug 2016 - Mar 2022 \u00b7 5 yrs 8 mos\",\n        \"description\": \"OpenDataScience, or ODS, is a Russian-speaking community of over 50,000 data scientists, researchers, and engineers. ODS freely disseminates knowledge, and promotes professional development and exchange of ideas and opportunities in all areas of Data Science through live events, online classes and discussions, and other resources. Join us at http://ods.ai.\"\n      },\n      {\n        \"title\": \"Staff ML Engineer\",\n        \"company\": \"Lyft \u00b7 Full-time\",\n        \"image_link\": \"https://media.licdn.com/dms/image/C560BAQFoMDej0VdZVA/company-logo_100_100/0/1545416046198?e=1689811200&v=beta&t=JV79uOIdgcbYcAeg0YAklLLZ6c5VkldGSG-Zu3G42xI\",\n        \"company_link\": \"https://www.linkedin.com/company/2620735/\",\n        \"dates\": \"Oct 2017 - Aug 2021 \u00b7 3 yrs 11 mos\",\n        \"description\": null\n      },\n      {\n        \"title\": \"Advisor\",\n        \"company\": \"Iterative.ai \u00b7 Part-time\",\n        \"image_link\": \"https://media.licdn.com/dms/image/C4E0BAQGnnEVzx81kBg/company-logo_100_100/0/1653056165184?e=1689811200&v=beta&t=dNl2Q2CDgmX2r3KiymYIqjPtXJQXIYeTzgdNduZLLTs\",\n        \"company_link\": \"https://www.linkedin.com/company/18657719/\",\n        \"dates\": \"Nov 2018 - Nov 2020 \u00b7 2 yrs 1 mo\",\n        \"description\": null\n      },\n      {\n        \"title\": \"Senior Data Scientist (Machine Learning)\",\n        \"company\": \"TrueAccord\",\n        \"image_link\": \"https://media.licdn.com/dms/image/C560BAQEo_A523IxkGQ/company-logo_100_100/0/1656418732741?e=1689811200&v=beta&t=YRjhRCxnfijmSz40qvRCeKxkfoMHYGU1oiPGIJht-aw\",\n        \"company_link\": \"https://www.linkedin.com/company/3249455/\",\n        \"dates\": \"Jun 2016 - Sep 2017 \u00b7 1 yr 4 mos\",\n        \"description\": \"Developed a supervised machine learning algorithm that predicts what personalized emails should be sent to each user to drive him to the target website. ROC AUC score 0.88. Prototyped, implemented, deployed and tested machine learning algorithm that helped to prioritize outbound phone traffic, improving conversion through phone calls by 80%.\"\n      }\n    ],\n    \"education_list\": [\n      {\n        \"university_name\": \"University of California, Davis\",\n        \"degree_and_major\": \"Doctor of Philosophy (Ph.D.), Physics\",\n        \"dates\": \"2010 - 2015\",\n        \"university_link\": \"https://www.linkedin.com/company/2842/\",\n        \"image_link\": \"https://media.licdn.com/dms/image/C4E0BAQEBG25KNBwuCQ/company-logo_100_100/0/1616103040374?e=1689811200&v=beta&t=sUF5ars4S8ek3vZs01usUvGwSJsU01KYtANnMkkZFdQ\"\n      },\n      {\n        \"university_name\": \"Saint Petersburg State University\",\n        \"degree_and_major\": \"Master's degree, Physics\",\n        \"dates\": \"2001 - 2010\",\n        \"university_link\": \"https://www.linkedin.com/company/15099991/\",\n        \"image_link\": \"https://media.licdn.com/dms/image/C560BAQHWUjwogE235A/company-logo_100_100/0/1519863922741?e=1689811200&v=beta&t=DSpsTKY_AcMrmzWY1592EvCClph4M_TVOLdNSDpOg2I\"\n      }\n    ]\n  }\n```\n\n### Kaggle\n\n```python\nfrom html2info.kaggle import Person\n\nurl = \"https://www.kaggle.com/iglovikov\"\nraw_data = \"...\"  # Raw HTML content of the LinkedIn page\n\nperson = Person(url, raw_data)\nperson.parse()\nprint(person.to_dict())\n```\n\n```json\n{\n    \"url\": \"https://www.kaggle.com/iglovikov\",\n    \"name\": \"Vladimir Iglovikov\",\n    \"title\": \"CEO  at ternaus.com\",\n    \"location\": \"San Francisco, California, United States\",\n    \"profile_photo_link\": \"https://storage.googleapis.com/kaggle-avatars/images/286455-fb.jpg\",\n    \"social_network_links\": [\n      \"https://github.com/ternaus\",\n      \"https://twitter.com/viglovikov\",\n      \"https://www.linkedin.com/in/iglovikov\",\n      \"https://salesbrain.tech/\"\n    ],\n    \"personal_website_link\": \"https://salesbrain.tech/\",\n    \"num_followers\": 1534,\n    \"competitions_summary\": {\n      \"tier\": \"grandmaster\",\n      \"tier_image\": \"/static/images/tiers/grandmaster@48.png\",\n      \"medals\": {\n        \"gold\": 5,\n        \"silver\": 9,\n        \"bronze\": 8\n      },\n      \"highest_rank\": 19\n    },\n    \"datasets_summary\": {\n      \"tier\": \"contributor\",\n      \"tier_image\": \"/static/images/tiers/contributor@48.png\",\n      \"medals\": {\n        \"gold\": 0,\n        \"silver\": 0,\n        \"bronze\": 0\n      },\n      \"highest_rank\": -1\n    },\n    \"notebooks_summary\": {\n      \"tier\": \"contributor\",\n      \"tier_image\": \"/static/images/tiers/contributor@48.png\",\n      \"medals\": {\n        \"gold\": 1,\n        \"silver\": 1,\n        \"bronze\": 1\n      },\n      \"highest_rank\": -1\n    },\n    \"discussion_summary\": {\n      \"tier\": \"master\",\n      \"tier_image\": \"/static/images/tiers/master@48.png\",\n      \"medals\": {\n        \"gold\": 52,\n        \"silver\": 26,\n        \"bronze\": 177\n      },\n      \"highest_rank\": 6\n    },\n    \"bio\": \"* CEO at Ternaus Inc\\n* Staff Computer Vision Engineer at Level5 Engineering Center, Lyft Inc (2017-2021)\\n* Senior Data Scientist at TrueAccord (2016-2017)\\n* Data Scientist at Bidgely (2015-2016)\\n* PhD in theoretical Condensed Matter Physics at University of California, Davis (2010-2015)\\n* MS in theoretical High Energy Physics at Saint Petersburg State University (2001-2010)\\n* \u0421\u043f\u0435\u0446\u043d\u0430\u0437 \u0412\u0414\u0412 . \u041c\u0435\u0434\u0430\u043b\u044c \u0437\u0430 \u0432\u043e\u0438\u043d\u0441\u043a\u0443\u044e \u0434\u043e\u0431\u043b\u0435\u0441\u0442\u044c \u0437\u0430 \u0432\u0442\u043e\u0440\u0443\u044e \u0427\u0435\u0447\u0435\u043d\u0441\u043a\u0443\u044e. (2002-2004)\\n\"\n  }\n\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A package to parse raw HTML and return structured information.",
    "version": "0.2.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4a3b3a85f6bb142a04ee9d0f9c7aa86e1e0c401900b5d683cd8b4080447f579a",
                "md5": "e459a26d668173826a67667973d40c61",
                "sha256": "13797a34bafb2d761db130bc086604f3842da7b7ab93ea4aab6d1832a34723dd"
            },
            "downloads": -1,
            "filename": "html2info-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e459a26d668173826a67667973d40c61",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 9106,
            "upload_time": "2023-05-08T02:27:47",
            "upload_time_iso_8601": "2023-05-08T02:27:47.293062Z",
            "url": "https://files.pythonhosted.org/packages/4a/3b/3a85f6bb142a04ee9d0f9c7aa86e1e0c401900b5d683cd8b4080447f579a/html2info-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c41848a94e598852fd1c027913a01919a0f7147828b2fa705d177acb4261dc3d",
                "md5": "a344b6ea913dd80da63f56da07522c0c",
                "sha256": "49c8200eabde604577592ef3528629bcb3f368d48de8bfe0a4dddaf72730e9f4"
            },
            "downloads": -1,
            "filename": "html2info-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a344b6ea913dd80da63f56da07522c0c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 11361,
            "upload_time": "2023-05-08T02:27:49",
            "upload_time_iso_8601": "2023-05-08T02:27:49.699923Z",
            "url": "https://files.pythonhosted.org/packages/c4/18/48a94e598852fd1c027913a01919a0f7147828b2fa705d177acb4261dc3d/html2info-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-08 02:27:49",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "html2info"
}
        
Elapsed time: 0.07295s