a-pandas-ex-bs4df


Namea-pandas-ex-bs4df JSON
Version 0.10 PyPI version JSON
download
home_pagehttps://github.com/hansalemaos/a_pandas_ex_bs4df
SummaryOne line web scraping by combining pandas and BeautifulSoup4
upload_time2022-10-29 21:42:54
maintainer
docs_urlNone
authorJohannes Fischer
requires_python
licenseMIT
keywords beautifulsoup4 bs4 pandas web scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
### One line web scraping by combining pandas and BeautifulSoup4



##### Check out the video



<div align="left">

      <a href="https://www.youtube.com/watch?v=pvnODvnMyrg">

         <img src="https://img.youtube.com/vi/pvnODvnMyrg/0.jpg" style="width:100%;">

      </a>

</div>



##### Code from the video



```python

pip install a-pandas-ex-bs4df 

```



```python

from a_pandas_ex_bs4df import pd_add_bs4_to_df

import pandas as pd

pd_add_bs4_to_df()    



from PrettyColorPrinter import add_printer #optional

add_printer(True) #optional



df=pd.Q_bs4_to_df(r'https://github.com/search?l=Python&q=python&type=Repositories')

df.loc[(~df.bb_href.isna()) & df.aa_attrs_values.str.contains('middle',regex=False, na=False)]

df.loc[(~df.bb_href.isna()) & df.aa_attrs_values.str.contains('middle',regex=False, na=False)].ff_fetchParents.apply(lambda x: x())

df.loc[(~df.bb_src.isna()) & (~df.bb_src.str.contains(r'\.png$',regex=True,na=False))]

df.loc[(~df.bb_src.isna()) & (df.bb_src.str.contains(r'\.png$',regex=True,na=False))]

```



```python

Parameters:

    htmlcode:Union[str,bytes]

        file path, url or html source code

        urls will be downloaded with requests

    dontuse:tuple

        bs4 attributes to exclude from the dataframe

        default = (

        "element_classes",

        "builder",

        "is_xml",

        "known_xml",

        "_namespaces",

        "parse_only",

        "markup",

        "contains_replacement_characters",

        "original_encoding",

        "declared_html_encoding",

        "parser_class",

        "namespace",

        "prefix",

        "cdata_list_attributes",

        "preserve_whitespace_tag_stack",

        "open_tag_counter",

        "preserve_whitespace_tags",

        "interesting_string_types",

        "current_data",

        "string_container_stack",

        "_most_recent_element",

        "currentTag",

    )

    parser: str

        Have a look at the bs4 documentation

        (default='lxml')

    tags_to_find:Union[bool,str]=True

        will be passed to soup.find_all()

        Have a look at the bs4 documentation

        (default=True) #everything

Returns:

    df: pd.DataFrame

```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hansalemaos/a_pandas_ex_bs4df",
    "name": "a-pandas-ex-bs4df",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "BeautifulSoup4,bs4,pandas,web scraping",
    "author": "Johannes Fischer",
    "author_email": "<aulasparticularesdealemaosp@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ce/04/72c94c4c717af32875e28964dce7b9ea04824eb56a11518f6e2f24f7ed6c/a_pandas_ex_bs4df-0.10.tar.gz",
    "platform": null,
    "description": "\n### One line web scraping by combining pandas and BeautifulSoup4\n\n\n\n##### Check out the video\n\n\n\n<div align=\"left\">\n\n      <a href=\"https://www.youtube.com/watch?v=pvnODvnMyrg\">\n\n         <img src=\"https://img.youtube.com/vi/pvnODvnMyrg/0.jpg\" style=\"width:100%;\">\n\n      </a>\n\n</div>\n\n\n\n##### Code from the video\n\n\n\n```python\n\npip install a-pandas-ex-bs4df \n\n```\n\n\n\n```python\n\nfrom a_pandas_ex_bs4df import pd_add_bs4_to_df\n\nimport pandas as pd\n\npd_add_bs4_to_df()    \n\n\n\nfrom PrettyColorPrinter import add_printer #optional\n\nadd_printer(True) #optional\n\n\n\ndf=pd.Q_bs4_to_df(r'https://github.com/search?l=Python&q=python&type=Repositories')\n\ndf.loc[(~df.bb_href.isna()) & df.aa_attrs_values.str.contains('middle',regex=False, na=False)]\n\ndf.loc[(~df.bb_href.isna()) & df.aa_attrs_values.str.contains('middle',regex=False, na=False)].ff_fetchParents.apply(lambda x: x())\n\ndf.loc[(~df.bb_src.isna()) & (~df.bb_src.str.contains(r'\\.png$',regex=True,na=False))]\n\ndf.loc[(~df.bb_src.isna()) & (df.bb_src.str.contains(r'\\.png$',regex=True,na=False))]\n\n```\n\n\n\n```python\n\nParameters:\n\n    htmlcode:Union[str,bytes]\n\n        file path, url or html source code\n\n        urls will be downloaded with requests\n\n    dontuse:tuple\n\n        bs4 attributes to exclude from the dataframe\n\n        default = (\n\n        \"element_classes\",\n\n        \"builder\",\n\n        \"is_xml\",\n\n        \"known_xml\",\n\n        \"_namespaces\",\n\n        \"parse_only\",\n\n        \"markup\",\n\n        \"contains_replacement_characters\",\n\n        \"original_encoding\",\n\n        \"declared_html_encoding\",\n\n        \"parser_class\",\n\n        \"namespace\",\n\n        \"prefix\",\n\n        \"cdata_list_attributes\",\n\n        \"preserve_whitespace_tag_stack\",\n\n        \"open_tag_counter\",\n\n        \"preserve_whitespace_tags\",\n\n        \"interesting_string_types\",\n\n        \"current_data\",\n\n        \"string_container_stack\",\n\n        \"_most_recent_element\",\n\n        \"currentTag\",\n\n    )\n\n    parser: str\n\n        Have a look at the bs4 documentation\n\n        (default='lxml')\n\n    tags_to_find:Union[bool,str]=True\n\n        will be passed to soup.find_all()\n\n        Have a look at the bs4 documentation\n\n        (default=True) #everything\n\nReturns:\n\n    df: pd.DataFrame\n\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "One line web scraping by combining pandas and BeautifulSoup4",
    "version": "0.10",
    "split_keywords": [
        "beautifulsoup4",
        "bs4",
        "pandas",
        "web scraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0321d85dcef2301023e46cf66aeed325a0fce1492c89d90d84295561840ee67d",
                "md5": "eb457682b329a9b7d96ab8ce71a4e177",
                "sha256": "58383acd844ccdac85b7f22a2e865bc077e944bcfc02f615d23a563168ccdebf"
            },
            "downloads": -1,
            "filename": "a_pandas_ex_bs4df-0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eb457682b329a9b7d96ab8ce71a4e177",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7742,
            "upload_time": "2022-10-29T21:42:52",
            "upload_time_iso_8601": "2022-10-29T21:42:52.969587Z",
            "url": "https://files.pythonhosted.org/packages/03/21/d85dcef2301023e46cf66aeed325a0fce1492c89d90d84295561840ee67d/a_pandas_ex_bs4df-0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ce0472c94c4c717af32875e28964dce7b9ea04824eb56a11518f6e2f24f7ed6c",
                "md5": "c90939dab6c03bc332d2f8b019acafe0",
                "sha256": "2b22ace100590415338716a259c3adcbb939042ec08998b5abb975ecdf73a845"
            },
            "downloads": -1,
            "filename": "a_pandas_ex_bs4df-0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "c90939dab6c03bc332d2f8b019acafe0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5352,
            "upload_time": "2022-10-29T21:42:54",
            "upload_time_iso_8601": "2022-10-29T21:42:54.881414Z",
            "url": "https://files.pythonhosted.org/packages/ce/04/72c94c4c717af32875e28964dce7b9ea04824eb56a11518f6e2f24f7ed6c/a_pandas_ex_bs4df-0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-10-29 21:42:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "hansalemaos",
    "github_project": "a_pandas_ex_bs4df",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "a-pandas-ex-bs4df"
}
        
Elapsed time: 0.08118s