spider4


Namespider4 JSON
Version 4.0.1 PyPI version JSON
download
home_pageNone
SummaryScreen-scraping library
upload_time2024-06-12 14:27:25
maintainerNone
docs_urlNone
authorNone
requires_python>=3.6.0
licenseMIT License
keywords html xml parse soup
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Spider is a library that makes it easy to scrape information
from web pages. It sits atop an HTML or XML parser, providing Pythonic
idioms for iterating, searching, and modifying the parse tree.

# Quick start

```
>>> from spider import Spider
>>> sp = Spider("<p>Some<b>bad<i>HTML")
>>> print(sp.prettify())
<html>
    <body>
        <p>
            Some
            <b>
                bad
                <i>
                    HTML
                </i>
            </b>
        </p>
    </body>
</html>
>>> sp.find(text="bad")
'bad'
>>> sp.i
<i>HTML</i>
#
>>> sp = Spider("<tag1>Some<tag2/>bad<tag3>XML", "xml")
#
>>> print(sp.prettify())
<?xml version="1.0" encoding="utf-8"?>
<tag1>
    Some
        <tag2/>
            bad
        <tag3>
        XML
    </tag3>
</tag1>
```


# Running the unit tests

Spider supports unit test discovery using Pytest:

```
$ pytest
```


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "spider4",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6.0",
    "maintainer_email": null,
    "keywords": "HTML, XML, parse, soup",
    "author": null,
    "author_email": "Shariful Alam <2ashariful@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/70/6b/3e13a72e4cb1e8aa9c478e4878200622720f3d4aa830f8b4d1b7bbcbdafe/spider4-4.0.1.tar.gz",
    "platform": null,
    "description": "Spider is a library that makes it easy to scrape information\nfrom web pages. It sits atop an HTML or XML parser, providing Pythonic\nidioms for iterating, searching, and modifying the parse tree.\n\n# Quick start\n\n```\n>>> from spider import Spider\n>>> sp = Spider(\"<p>Some<b>bad<i>HTML\")\n>>> print(sp.prettify())\n<html>\n    <body>\n        <p>\n            Some\n            <b>\n                bad\n                <i>\n                    HTML\n                </i>\n            </b>\n        </p>\n    </body>\n</html>\n>>> sp.find(text=\"bad\")\n'bad'\n>>> sp.i\n<i>HTML</i>\n#\n>>> sp = Spider(\"<tag1>Some<tag2/>bad<tag3>XML\", \"xml\")\n#\n>>> print(sp.prettify())\n<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<tag1>\n    Some\n        <tag2/>\n            bad\n        <tag3>\n        XML\n    </tag3>\n</tag1>\n```\n\n\n# Running the unit tests\n\nSpider supports unit test discovery using Pytest:\n\n```\n$ pytest\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Screen-scraping library",
    "version": "4.0.1",
    "project_urls": {
        "Download": "https://github.com/shari-ful/spider.git",
        "Homepage": "https://github.com/shari-ful/spider.git"
    },
    "split_keywords": [
        "html",
        " xml",
        " parse",
        " soup"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4574d823a2dc1c758c1c6376fd241ec0d2f415879a598b028ad67d616795677f",
                "md5": "f7228789ffeb5901d3f0b0748725f5ff",
                "sha256": "5f48e253db9c1bc5fd69aeca1eb36fa0eee4fb38e519baa38a46dd8cb5cb38aa"
            },
            "downloads": -1,
            "filename": "spider4-4.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f7228789ffeb5901d3f0b0748725f5ff",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6.0",
            "size": 145656,
            "upload_time": "2024-06-12T14:27:22",
            "upload_time_iso_8601": "2024-06-12T14:27:22.637714Z",
            "url": "https://files.pythonhosted.org/packages/45/74/d823a2dc1c758c1c6376fd241ec0d2f415879a598b028ad67d616795677f/spider4-4.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "706b3e13a72e4cb1e8aa9c478e4878200622720f3d4aa830f8b4d1b7bbcbdafe",
                "md5": "1194aafa09650196b39cbd1ef4b69796",
                "sha256": "230da9e2aa587da1d007fcef045da5db729f8233247bcb0c183ee43f044ed077"
            },
            "downloads": -1,
            "filename": "spider4-4.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "1194aafa09650196b39cbd1ef4b69796",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6.0",
            "size": 124094,
            "upload_time": "2024-06-12T14:27:25",
            "upload_time_iso_8601": "2024-06-12T14:27:25.709567Z",
            "url": "https://files.pythonhosted.org/packages/70/6b/3e13a72e4cb1e8aa9c478e4878200622720f3d4aa830f8b4d1b7bbcbdafe/spider4-4.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-12 14:27:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "shari-ful",
    "github_project": "spider",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "spider4"
}
        
Elapsed time: 0.26400s