.. image:: https://raw.githubusercontent.com/scrapinghub/scrapyrt/master/artwork/logo.gif
:width: 400px
:align: center
==========================
ScrapyRT (Scrapy realtime)
==========================
.. image:: https://github.com/scrapinghub/scrapyrt/workflows/CI/badge.svg
:target: https://github.com/scrapinghub/scrapyrt/actions
.. image:: https://img.shields.io/pypi/pyversions/scrapyrt.svg
:target: https://pypi.python.org/pypi/scrapyrt
.. image:: https://img.shields.io/pypi/v/scrapyrt.svg
:target: https://pypi.python.org/pypi/scrapyrt
.. image:: https://img.shields.io/pypi/l/scrapyrt.svg
:target: https://pypi.python.org/pypi/scrapyrt
.. image:: https://img.shields.io/pypi/dm/scrapyrt.svg
:target: https://pypistats.org/packages/scrapyrt
:alt: Downloads count
.. image:: https://readthedocs.org/projects/scrapyrt/badge/?version=latest
:target: https://scrapyrt.readthedocs.io/en/latest/api.html
Add HTTP API for your `Scrapy <https://scrapy.org/>`_ project in minutes.
You send a request to ScrapyRT with spider name and URL, and in response, you get items collected by a spider
visiting this URL.
* All Scrapy project components (e.g. middleware, pipelines, extensions) are supported
* You run Scrapyrt in Scrapy project directory. It starts HTTP server allowing you to schedule spiders and get spider output in JSON.
Quickstart
===============
**1. install**
.. code-block:: shell
> pip install scrapyrt
**2. switch to Scrapy project (e.g. quotesbot project)**
.. code-block:: shell
> cd my/project_path/is/quotesbot
**3. launch ScrapyRT**
.. code-block:: shell
> scrapyrt
**4. run your spiders**
.. code-block:: shell
> curl "localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/"
**5. run more complex query, e.g. specify callback for Scrapy request and zipcode argument for spider**
.. code-block:: shell
> curl --data '{"request": {"url": "http://quotes.toscrape.com/page/2/", "callback":"some_callback"}, "spider_name": "toscrape-css", "crawl_args": {"zipcode":"14000"}}' http://localhost:9080/crawl.json -v
Scrapyrt will look for ``scrapy.cfg`` file to determine your project settings,
and will raise error if it won't find one. Note that you need to have all
your project requirements installed.
Note
====
* Project is not a replacement for `Scrapyd <https://scrapyd.readthedocs.io/en/stable/>`_ or `Scrapy Cloud <https://www.zyte.com/scrapy-cloud/>`_ or other infrastructure to run long running crawls
* Not suitable for long running spiders, good for spiders that will fetch one response from some website and return items quickly
Documentation
=============
`Documentation is available on readthedocs <http://scrapyrt.readthedocs.org/en/latest/index.html>`_.
Support
=======
Open source support is provided here in Github. Please `create a question
issue`_ (ie. issue with "question" label).
Commercial support is also available by `Zyte`_.
.. _create a question issue: https://github.com/scrapinghub/scrapyrt/issues/new?labels=question
.. _Zyte: http://zyte.com
License
=======
ScrapyRT is offered under `BSD 3-Clause license <https://en.wikipedia.org/wiki/BSD_licenses#3-clause_license_(%22BSD_License_2.0%22,_%22Revised_BSD_License%22,_%22New_BSD_License%22,_or_%22Modified_BSD_License%22)>`_.
Development
===========
Development taking place on `Github <https://github.com/scrapinghub/scrapyrt>`_.
Raw data
{
"_id": null,
"home_page": "https://github.com/scrapinghub/scrapyrt",
"name": "scrapyrt",
"maintainer": "Scrapinghub",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "info@scrapinghub.com",
"keywords": "",
"author": "Scrapinghub",
"author_email": "info@scrapinghub.com",
"download_url": "https://files.pythonhosted.org/packages/8b/f9/63cbe0aeb83619fee0dd913bc5b2e660f99f5a608a6ba181adf386540573/scrapyrt-0.16.0.tar.gz",
"platform": null,
"description": ".. image:: https://raw.githubusercontent.com/scrapinghub/scrapyrt/master/artwork/logo.gif\n :width: 400px\n :align: center\n\n==========================\nScrapyRT (Scrapy realtime)\n==========================\n\n.. image:: https://github.com/scrapinghub/scrapyrt/workflows/CI/badge.svg\n :target: https://github.com/scrapinghub/scrapyrt/actions\n\n.. image:: https://img.shields.io/pypi/pyversions/scrapyrt.svg\n :target: https://pypi.python.org/pypi/scrapyrt\n\n.. image:: https://img.shields.io/pypi/v/scrapyrt.svg\n :target: https://pypi.python.org/pypi/scrapyrt\n\n.. image:: https://img.shields.io/pypi/l/scrapyrt.svg\n :target: https://pypi.python.org/pypi/scrapyrt\n\n.. image:: https://img.shields.io/pypi/dm/scrapyrt.svg\n :target: https://pypistats.org/packages/scrapyrt\n :alt: Downloads count\n\n.. image:: https://readthedocs.org/projects/scrapyrt/badge/?version=latest\n :target: https://scrapyrt.readthedocs.io/en/latest/api.html\n\nAdd HTTP API for your `Scrapy <https://scrapy.org/>`_ project in minutes.\n\nYou send a request to ScrapyRT with spider name and URL, and in response, you get items collected by a spider\nvisiting this URL. \n\n* All Scrapy project components (e.g. middleware, pipelines, extensions) are supported\n* You run Scrapyrt in Scrapy project directory. It starts HTTP server allowing you to schedule spiders and get spider output in JSON.\n\n\nQuickstart\n===============\n\n**1. install**\n\n.. code-block:: shell\n\n > pip install scrapyrt\n\n**2. switch to Scrapy project (e.g. quotesbot project)**\n\n.. code-block:: shell\n\n > cd my/project_path/is/quotesbot\n\n**3. launch ScrapyRT**\n\n.. code-block:: shell\n\n > scrapyrt\n\n**4. run your spiders**\n\n.. code-block:: shell\n\n > curl \"localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/\"\n\n**5. run more complex query, e.g. specify callback for Scrapy request and zipcode argument for spider**\n\n.. code-block:: shell\n\n > curl --data '{\"request\": {\"url\": \"http://quotes.toscrape.com/page/2/\", \"callback\":\"some_callback\"}, \"spider_name\": \"toscrape-css\", \"crawl_args\": {\"zipcode\":\"14000\"}}' http://localhost:9080/crawl.json -v\n\nScrapyrt will look for ``scrapy.cfg`` file to determine your project settings,\nand will raise error if it won't find one. Note that you need to have all\nyour project requirements installed.\n\nNote\n====\n* Project is not a replacement for `Scrapyd <https://scrapyd.readthedocs.io/en/stable/>`_ or `Scrapy Cloud <https://www.zyte.com/scrapy-cloud/>`_ or other infrastructure to run long running crawls\n* Not suitable for long running spiders, good for spiders that will fetch one response from some website and return items quickly\n\n\nDocumentation\n=============\n\n`Documentation is available on readthedocs <http://scrapyrt.readthedocs.org/en/latest/index.html>`_.\n\nSupport\n=======\n\nOpen source support is provided here in Github. Please `create a question\nissue`_ (ie. issue with \"question\" label).\n\nCommercial support is also available by `Zyte`_.\n\n.. _create a question issue: https://github.com/scrapinghub/scrapyrt/issues/new?labels=question\n.. _Zyte: http://zyte.com\n\nLicense\n=======\nScrapyRT is offered under `BSD 3-Clause license <https://en.wikipedia.org/wiki/BSD_licenses#3-clause_license_(%22BSD_License_2.0%22,_%22Revised_BSD_License%22,_%22New_BSD_License%22,_or_%22Modified_BSD_License%22)>`_.\n\n\nDevelopment\n===========\nDevelopment taking place on `Github <https://github.com/scrapinghub/scrapyrt>`_.\n",
"bugtrack_url": null,
"license": "BSD",
"summary": "Put Scrapy spiders behind an HTTP API",
"version": "0.16.0",
"project_urls": {
"Documentation": "https://scrapyrt.readthedocs.io/en/latest/index.html",
"Homepage": "https://github.com/scrapinghub/scrapyrt",
"Source": "https://github.com/scrapinghub/scrapyrt",
"Tracker": "https://github.com/scrapinghub/scrapyrt/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ea9fdf4dcc9c914edf64f52c84aed86afb3424c0637572c1d773fa2a6fe42cbf",
"md5": "7330f941ea1707656c8759dba34e385f",
"sha256": "8d6be014746f5e201d645ee8b9c8415b7ff9bd71d834ebb5a81084cc3d2d6752"
},
"downloads": -1,
"filename": "scrapyrt-0.16.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "7330f941ea1707656c8759dba34e385f",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 36566,
"upload_time": "2024-02-14T09:20:10",
"upload_time_iso_8601": "2024-02-14T09:20:10.093817Z",
"url": "https://files.pythonhosted.org/packages/ea/9f/df4dcc9c914edf64f52c84aed86afb3424c0637572c1d773fa2a6fe42cbf/scrapyrt-0.16.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8bf963cbe0aeb83619fee0dd913bc5b2e660f99f5a608a6ba181adf386540573",
"md5": "0065b7c51023f6b175444a8d6a04895e",
"sha256": "753ef3645444dba71d0f0a7b5a7707e52e1ae4b6088ac02d81611015dd55a63d"
},
"downloads": -1,
"filename": "scrapyrt-0.16.0.tar.gz",
"has_sig": false,
"md5_digest": "0065b7c51023f6b175444a8d6a04895e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 29827,
"upload_time": "2024-02-14T09:20:11",
"upload_time_iso_8601": "2024-02-14T09:20:11.719644Z",
"url": "https://files.pythonhosted.org/packages/8b/f9/63cbe0aeb83619fee0dd913bc5b2e660f99f5a608a6ba181adf386540573/scrapyrt-0.16.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-14 09:20:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "scrapinghub",
"github_project": "scrapyrt",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "scrapyrt"
}