Name | scrapy-arweave JSON |
Version |
0.0.2
JSON |
| download |
home_page | https://github.com/pawanpaudel93/scrapy-arweave |
Summary | Scrapy is a popular open-source and collaborative python framework for extracting the data you need from websites. scrapy-arweave provides scrapy pipelines and feed exports to store items into Arweave. |
upload_time | 2023-06-15 04:27:49 |
maintainer | |
docs_url | None |
author | Pawan Paudel |
requires_python | >=3.0 |
license | ISC |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<p align="center"><img src="logo.png" alt="original" width="100%" height="100%"></p>
<h1 align="center">Welcome to Scrapy-Arweave</h1>
<p>
<img alt="Version" src="https://img.shields.io/badge/version-0.0.2-blue.svg?cacheSeconds=2592000" />
</p>
Scrapy is a popular open-source and collaborative python framework for extracting the data you need from websites. scrapy-arweave provides scrapy pipelines and feed exports to store items into [Arweave](https://arweave.org/).
### 🏠 [Homepage](https://github.com/pawanpaudel93/scrapy-arweave)
## Install
```shell
pip install scrapy-arweave
```
## Example
[scrapy-arweave-example](https://github.com/pawanpaudel93/scrapy-arweave-example)
## Usage
1. Install scrapy-arweave and some additional requirements.
```shell
pip install scrapy-arweave
```
It has some requirements that must be installed as well:
### Debian/Ubuntu
```shell
sudo apt-get install libmagic1
```
### Windows
```shell
pip install python-magic-bin
```
### OSX
- When using Homebrew: `brew install libmagic`
- When using macports: `port install file`
2. Add 'scrapy-arweave.pipelines.ImagesPipeline' and/or 'scrapy-arweave.pipelines.FilesPipeline' to ITEM_PIPELINES setting in your Scrapy project if you need to store images or other files to Arweave.
For Images Pipeline, use:
```shell
ITEM_PIPELINES = {'scrapy_arweave.pipelines.ImagesPipeline': 1}
```
For Files Pipeline, use:
```shell
ITEM_PIPELINES = {'scrapy_arweave.pipelines.FilesPipeline': 1}
```
The advantage of using the ImagesPipeline for image files is that you can configure some extra functions like generating thumbnails and filtering the images based on their size.
Or You can also use both the Files and Images Pipeline at the same time.
```python
ITEM_PIPELINES = {
'scrapy_arweave.pipelines.ImagesPipeline': 0,
'scrapy_arweave.pipelines.FilesPipeline': 1
}
```
If you are using the ImagesPipeline make sure to install the pillow package. The Images Pipeline requires Pillow 7.1.0 or greater. It is used for thumbnailing and normalizing images to JPEG/RGB format.
```shell
pip install pillow
```
Then, configure the target storage setting to a valid value that will be used for storing the downloaded images. Otherwise the pipeline will remain disabled, even if you include it in the ITEM_PIPELINES setting.
Add store path of files or images for Web3Storage, LightHouse, Moralis, Pinata or Estuary as required.
```python
# For ImagesPipeline
IMAGES_STORE = 'ar://images'
# For FilesPipeline
FILES_STORE = 'ar://files'
```
For more info regarding ImagesPipeline and FilesPipline. [See here](https://docs.scrapy.org/en/latest/topics/media-pipeline.html)
3. For Feed storage to store the output of scraping as json, csv, json, jsonlines, jsonl, jl, csv, xml, marshal, pickle etc set FEED_STORAGES as following for the desired output format:
```python
from scrapy_arweave.feedexport import get_feed_storages
FEED_STORAGES = get_feed_storages()
```
Then set WALLET_JWK and GATEWAY_URL. And, set FEEDS as following to finally store the scraped data.
```python
WALLET_JWK = "<WALLET_JWK>" # It can be wallet jwk file path or jwk data itself
GATEWAY_URL = "https://arweave.net"
FEEDS = {
'ar://house.json': {
"format": "json"
},
}
```
See more on FEEDS [here](https://docs.scrapy.org/en/latest/topics/feed-exports.html#feeds)
4. Now perform the scrapping as you would normally.
## Author
👤 **Pawan Paudel**
- Github: [@pawanpaudel93](https://github.com/pawanpaudel93)
## 🤝 Contributing
Contributions, issues and feature requests are welcome!<br />Feel free to check [issues page](https://github.com/pawanpaudel93/scrapy-arweave/issues).
## Show your support
Give a ⭐️ if this project helped you!
Copyright © 2023 [Pawan Paudel](https://github.com/pawanpaudel93).<br />
Raw data
{
"_id": null,
"home_page": "https://github.com/pawanpaudel93/scrapy-arweave",
"name": "scrapy-arweave",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.0",
"maintainer_email": "",
"keywords": "",
"author": "Pawan Paudel",
"author_email": "pawanpaudel93@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/16/d5/f3c4a160c98de1ca598c76ec8a56161e680b3e95d4f956f6d1ce1e6b4675/scrapy_arweave-0.0.2.tar.gz",
"platform": null,
"description": "\n<p align=\"center\"><img src=\"logo.png\" alt=\"original\" width=\"100%\" height=\"100%\"></p>\n\n<h1 align=\"center\">Welcome to Scrapy-Arweave</h1>\n<p>\n <img alt=\"Version\" src=\"https://img.shields.io/badge/version-0.0.2-blue.svg?cacheSeconds=2592000\" />\n</p>\n\nScrapy is a popular open-source and collaborative python framework for extracting the data you need from websites. scrapy-arweave provides scrapy pipelines and feed exports to store items into [Arweave](https://arweave.org/).\n\n### \ud83c\udfe0 [Homepage](https://github.com/pawanpaudel93/scrapy-arweave)\n\n## Install\n\n```shell\npip install scrapy-arweave\n```\n\n## Example\n\n[scrapy-arweave-example](https://github.com/pawanpaudel93/scrapy-arweave-example)\n\n## Usage\n\n1. Install scrapy-arweave and some additional requirements.\n\n ```shell\n pip install scrapy-arweave\n\n ```\n\n It has some requirements that must be installed as well:\n\n### Debian/Ubuntu\n\n ```shell\n sudo apt-get install libmagic1\n ```\n\n### Windows\n\n ```shell\n pip install python-magic-bin\n ```\n\n### OSX\n\n- When using Homebrew: `brew install libmagic`\n- When using macports: `port install file`\n\n2. Add 'scrapy-arweave.pipelines.ImagesPipeline' and/or 'scrapy-arweave.pipelines.FilesPipeline' to ITEM_PIPELINES setting in your Scrapy project if you need to store images or other files to Arweave.\n For Images Pipeline, use:\n\n ```shell\n ITEM_PIPELINES = {'scrapy_arweave.pipelines.ImagesPipeline': 1}\n ```\n\n For Files Pipeline, use:\n\n ```shell\n ITEM_PIPELINES = {'scrapy_arweave.pipelines.FilesPipeline': 1}\n ```\n\n The advantage of using the ImagesPipeline for image files is that you can configure some extra functions like generating thumbnails and filtering the images based on their size.\n\n Or You can also use both the Files and Images Pipeline at the same time.\n\n ```python\n ITEM_PIPELINES = {\n 'scrapy_arweave.pipelines.ImagesPipeline': 0,\n 'scrapy_arweave.pipelines.FilesPipeline': 1\n }\n ```\n\n If you are using the ImagesPipeline make sure to install the pillow package. The Images Pipeline requires Pillow 7.1.0 or greater. It is used for thumbnailing and normalizing images to JPEG/RGB format.\n\n ```shell\n pip install pillow\n ```\n\n Then, configure the target storage setting to a valid value that will be used for storing the downloaded images. Otherwise the pipeline will remain disabled, even if you include it in the ITEM_PIPELINES setting.\n\n Add store path of files or images for Web3Storage, LightHouse, Moralis, Pinata or Estuary as required.\n\n ```python\n # For ImagesPipeline\n IMAGES_STORE = 'ar://images'\n \n # For FilesPipeline\n FILES_STORE = 'ar://files'\n ```\n\n For more info regarding ImagesPipeline and FilesPipline. [See here](https://docs.scrapy.org/en/latest/topics/media-pipeline.html)\n\n3. For Feed storage to store the output of scraping as json, csv, json, jsonlines, jsonl, jl, csv, xml, marshal, pickle etc set FEED_STORAGES as following for the desired output format:\n\n ```python\n from scrapy_arweave.feedexport import get_feed_storages\n FEED_STORAGES = get_feed_storages()\n ```\n\n Then set WALLET_JWK and GATEWAY_URL. And, set FEEDS as following to finally store the scraped data.\n\n ```python\n WALLET_JWK = \"<WALLET_JWK>\" # It can be wallet jwk file path or jwk data itself\n GATEWAY_URL = \"https://arweave.net\"\n\n FEEDS = {\n 'ar://house.json': {\n \"format\": \"json\"\n },\n }\n ```\n\n See more on FEEDS [here](https://docs.scrapy.org/en/latest/topics/feed-exports.html#feeds)\n\n4. Now perform the scrapping as you would normally.\n\n## Author\n\n\ud83d\udc64 **Pawan Paudel**\n\n- Github: [@pawanpaudel93](https://github.com/pawanpaudel93)\n\n## \ud83e\udd1d Contributing\n\nContributions, issues and feature requests are welcome!<br />Feel free to check [issues page](https://github.com/pawanpaudel93/scrapy-arweave/issues).\n\n## Show your support\n\nGive a \u2b50\ufe0f if this project helped you!\n\nCopyright \u00a9 2023 [Pawan Paudel](https://github.com/pawanpaudel93).<br />\n",
"bugtrack_url": null,
"license": "ISC",
"summary": "Scrapy is a popular open-source and collaborative python framework for extracting the data you need from websites. scrapy-arweave provides scrapy pipelines and feed exports to store items into Arweave.",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://github.com/pawanpaudel93/scrapy-arweave"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7a487743dd36bfef43853b879f16664919c6b603eb2c038e96d8944ff76b05ba",
"md5": "cc2801a86d2724f3586a27fc1b743f05",
"sha256": "436dcb1e4675fc17fe1bb8b627d9f58830c6492bd27903ab1e048d4a85d86fd0"
},
"downloads": -1,
"filename": "scrapy_arweave-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cc2801a86d2724f3586a27fc1b743f05",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.0",
"size": 9210,
"upload_time": "2023-06-15T04:27:47",
"upload_time_iso_8601": "2023-06-15T04:27:47.909296Z",
"url": "https://files.pythonhosted.org/packages/7a/48/7743dd36bfef43853b879f16664919c6b603eb2c038e96d8944ff76b05ba/scrapy_arweave-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "16d5f3c4a160c98de1ca598c76ec8a56161e680b3e95d4f956f6d1ce1e6b4675",
"md5": "e3a18ff4daac35c838d3fb9350e5ca37",
"sha256": "e2e8d4e5d5ce2b7bfe2e0aee9540ba43e94963d10f2cc0444b548bb39e64ccf6"
},
"downloads": -1,
"filename": "scrapy_arweave-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "e3a18ff4daac35c838d3fb9350e5ca37",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.0",
"size": 11376,
"upload_time": "2023-06-15T04:27:49",
"upload_time_iso_8601": "2023-06-15T04:27:49.316787Z",
"url": "https://files.pythonhosted.org/packages/16/d5/f3c4a160c98de1ca598c76ec8a56161e680b3e95d4f956f6d1ce1e6b4675/scrapy_arweave-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-15 04:27:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pawanpaudel93",
"github_project": "scrapy-arweave",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "scrapy-arweave"
}