news-please


Namenews-please JSON
Version 1.6.12 PyPI version JSON
download
home_pagehttps://github.com/fhamborg/news-please
Summarynews-please is an open source easy-to-use news extractor that just works.
upload_time2024-07-08 07:53:44
maintainerNone
docs_urlNone
authorFelix Hamborg
requires_pythonNone
licenseApache License 2.0
keywords news crawler news scraper news extractor crawler extractor scraper information retrieval
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            news-please is an open source, easy-to-use news crawler that extracts structured information from almost any news website. It can follow recursively internal hyperlinks and read RSS feeds to fetch both most recent and also old, archived articles. You only need to provide the root URL of the news website. Furthermore, its API allows developers to access the exctraction functionality within their software. news-please also implements a workflow optimized for the news archive provided by commoncrawl.org, allowing users to efficiently crawl and extract news articles including various filter options.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/fhamborg/news-please",
    "name": "news-please",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "news crawler news scraper news extractor crawler extractor scraper information retrieval",
    "author": "Felix Hamborg",
    "author_email": "felix.hamborg@uni-konstanz.de",
    "download_url": "https://files.pythonhosted.org/packages/5a/28/f61eea249207ed9a63be44417b5b3928229f250015800e5ed9a0a635b17e/news_please-1.6.12.tar.gz",
    "platform": null,
    "description": "news-please is an open source, easy-to-use news crawler that extracts structured information from almost any news website. It can follow recursively internal hyperlinks and read RSS feeds to fetch both most recent and also old, archived articles. You only need to provide the root URL of the news website. Furthermore, its API allows developers to access the exctraction functionality within their software. news-please also implements a workflow optimized for the news archive provided by commoncrawl.org, allowing users to efficiently crawl and extract news articles including various filter options.\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "news-please is an open source easy-to-use news extractor that just works.",
    "version": "1.6.12",
    "project_urls": {
        "Download": "https://github.com/fhamborg/news-please",
        "Homepage": "https://github.com/fhamborg/news-please"
    },
    "split_keywords": [
        "news",
        "crawler",
        "news",
        "scraper",
        "news",
        "extractor",
        "crawler",
        "extractor",
        "scraper",
        "information",
        "retrieval"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e2d9bc258080d4d1e7e0514ccfed08feb520a29e81dcd49b14b5031319b5955a",
                "md5": "da5921d5f557af1bde313807767c8fd5",
                "sha256": "22ce7077a3339ebf2ba86c8a5ce263e2feb9a25d03e4f8b5708f3e44da875710"
            },
            "downloads": -1,
            "filename": "news_please-1.6.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "da5921d5f557af1bde313807767c8fd5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 95296,
            "upload_time": "2024-07-08T07:53:41",
            "upload_time_iso_8601": "2024-07-08T07:53:41.813028Z",
            "url": "https://files.pythonhosted.org/packages/e2/d9/bc258080d4d1e7e0514ccfed08feb520a29e81dcd49b14b5031319b5955a/news_please-1.6.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5a28f61eea249207ed9a63be44417b5b3928229f250015800e5ed9a0a635b17e",
                "md5": "dc42644d1bab055ef554968f24cd8a58",
                "sha256": "5cbabb0a0e69b7662a05922b88bec44a907377007346d1cb5b45edd1b20a07b7"
            },
            "downloads": -1,
            "filename": "news_please-1.6.12.tar.gz",
            "has_sig": false,
            "md5_digest": "dc42644d1bab055ef554968f24cd8a58",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 75151,
            "upload_time": "2024-07-08T07:53:44",
            "upload_time_iso_8601": "2024-07-08T07:53:44.835354Z",
            "url": "https://files.pythonhosted.org/packages/5a/28/f61eea249207ed9a63be44417b5b3928229f250015800e5ed9a0a635b17e/news_please-1.6.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-08 07:53:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fhamborg",
    "github_project": "news-please",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "news-please"
}
        
Elapsed time: 3.16813s