# Scrapy Log Export
## Description
A scrapy extension that allows for a LOG_URI setting, similar to a FEED_URI setting.
The same FEED_STORAGE classes that are used in the feedexport extensions are used here.
This extension is useful if you're running scrapy in a container and want to store your logs with a cloud service provider.
Please note that this extension still requires that a local log file is written. Once scrapy's engine has stopped, the extension will upload the log file to the cloud and optionally delete the local file.
## Installation
You can install scrapy-logexporter using pip:
```
pip install scrapy-logexporter
```
## Configuration
Enable the extension by adding it to your `settings.py`:
```
from environs import Env
env = Env()
env.read_env()
# Enable the extension
EXTENSIONS = {
"scrapy_logexport.LogExporter": 0,
}
LOG_FILE = 'scrapy.log' # Must be a local file
LOG_EXPORTER_DELETE_LOCAL = True # Delete local log file after upload, defaults to False
LOG_URI = f"s3://your-bucket/%(name)s %(time)s.log" # Store on S3
AWS_ACCESS_KEY_ID = env("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = env("AWS_SECRET_ACCESS_KEY")
```
## Setting LOG_URI
The FEED_STORAGE class used for the LOG_URI is determined by the URI scheme. The following schemes are supported, by default:
```
FEED_STORAGES_BASE = {
"": "scrapy.extensions.feedexport.FileFeedStorage",
"file": "scrapy.extensions.feedexport.FileFeedStorage",
"ftp": "scrapy.extensions.feedexport.FTPFeedStorage",
"gs": "scrapy.extensions.feedexport.GCSFeedStorage",
"s3": "scrapy.extensions.feedexport.S3FeedStorage",
"stdout": "scrapy.extensions.feedexport.StdoutFeedStorage",
}
```
If you've already added more to FEED_STORAGES they're be available for use with LOG_URI.
Additionally a LOG_STORAGES setting is available to add more storage classes for use with LOG_URI.
Also not that similar to FEED_URI, the LOG_URI can be a template string. By default
any spider attr (such as `name`) or `time` are available. You can additionally
add any other attributes to the template by declaring the LOG_URI_PARAMS setting.
The LOG_URI_PARAMS settings should be a function, or a string that's a path to a function.
The function needs to take `spider` as an argument and return a dictionary of the parameters.
```
LOG_URI_PARAMS: Optional[Union[str, Callable[[dict, Spider], dict]]] = {'my_attr': 'my_value'}
def uri_params_func(spider):
return {
'custom_param': 'my_value',
'another_param': 'another_value',
}
# takes the spider's name, the time the spider started, and the custom_param and another_param
LOG_URI = f"s3://your-bucket/%(name)s_%(time)s_%(custom_param)s_%(another_param)s.log"
LOG_URI_PARAMS = uri_params_func
```
## Overriding feedexport settings
Because much of the backend is the same, you can override some feedexport settings, if you wish them to be different for logexport.
| FeedExport | LogExport |
| ----------------------- | ------------------------------- |
| FEED_STORAGE_S3_ACL | LOG_STORAGE_S3_ACL |
| AWS_ENDPOINT_URL | LOG_STORAGE_AWS_ENDPOINT_URL |
| GCS_PROJECT_ID | LOG_STORAGE_GCS_PROJECT_ID |
| FEED_STORAGE_GCS_ACL | LOG_STORAGE_GCS_ACL |
| FEED_STORAGE_FTP_ACTIVE | LOG_STORAGE_FTP_ACTIVE |
Additionally if there's shared keys in FEED_STORAGES and LOG_STORAGES, the LOG_STORAGES key will be used.
## All possible settings
```
LOG_FILE # Required
LOG_URI # Required
LOG_EXPORTER_DELETE_LOCAL
LOG_URI_PARAMS
# Overrides for feedexport settings
LOG_STORAGES
LOG_STORAGE_S3_ACL
LOG_STORAGE_AWS_ENDPOINT_URL
LOG_STORAGE_GCS_PROJECT_ID
LOG_STORAGE_GCS_ACL
LOG_STORAGE_FTP_ACTIVE
# S3FeedStorage settings
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
FEEDEXPORT_S3_ACL # Overridden by LOG_STORAGE_S3_ACL
AWS_ENDPOINT_URL # Overridden by LOG_STORAGE_AWS_ENDPOINT_URL
# GCFeedStorage settings
GCS_PROJECT_ID # Overridden by LOG_STORAGE_GCS_PROJECT_ID
FEED_EXPORT_GCS_ACL # Overridden by LOG_STORAGE_GCS_ACL
# FTPFeedStorage settings
FEED_STORAGE_FTP_ACTIVE # Overridden by LOG_STORAGE_FTP_ACTIVE
FEED_TEMPDIR # Not used by logexport directly
```
Raw data
{
"_id": null,
"home_page": "https://github.com/nicholas-mischke/scrapy-logexport",
"name": "scrapy-logexport",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "Nicholas Mischke",
"author_email": "nmischkework@proton.me",
"download_url": "https://files.pythonhosted.org/packages/6e/97/936eb951d5e397f870065f8f3b7ea826a4b5ed7a9dd37744f4dbd8b3458e/scrapy_logexport-0.2.1.tar.gz",
"platform": null,
"description": "\n# Scrapy Log Export\n\n## Description\nA scrapy extension that allows for a LOG_URI setting, similar to a FEED_URI setting.\nThe same FEED_STORAGE classes that are used in the feedexport extensions are used here.\n\nThis extension is useful if you're running scrapy in a container and want to store your logs with a cloud service provider.\n\nPlease note that this extension still requires that a local log file is written. Once scrapy's engine has stopped, the extension will upload the log file to the cloud and optionally delete the local file.\n\n## Installation\nYou can install scrapy-logexporter using pip:\n```\n pip install scrapy-logexporter\n```\n\n## Configuration\n\nEnable the extension by adding it to your `settings.py`:\n```\n from environs import Env\n\n env = Env() \n env.read_env() \n\n # Enable the extension\n EXTENSIONS = {\n \"scrapy_logexport.LogExporter\": 0,\n }\n\n LOG_FILE = 'scrapy.log' # Must be a local file\n LOG_EXPORTER_DELETE_LOCAL = True # Delete local log file after upload, defaults to False\n LOG_URI = f\"s3://your-bucket/%(name)s %(time)s.log\" # Store on S3\n \n AWS_ACCESS_KEY_ID = env(\"AWS_ACCESS_KEY_ID\")\n AWS_SECRET_ACCESS_KEY = env(\"AWS_SECRET_ACCESS_KEY\")\n\n```\n\n## Setting LOG_URI\n\nThe FEED_STORAGE class used for the LOG_URI is determined by the URI scheme. The following schemes are supported, by default:\n\n```\nFEED_STORAGES_BASE = {\n \"\": \"scrapy.extensions.feedexport.FileFeedStorage\",\n \"file\": \"scrapy.extensions.feedexport.FileFeedStorage\",\n \"ftp\": \"scrapy.extensions.feedexport.FTPFeedStorage\",\n \"gs\": \"scrapy.extensions.feedexport.GCSFeedStorage\",\n \"s3\": \"scrapy.extensions.feedexport.S3FeedStorage\",\n \"stdout\": \"scrapy.extensions.feedexport.StdoutFeedStorage\",\n}\n```\nIf you've already added more to FEED_STORAGES they're be available for use with LOG_URI.\nAdditionally a LOG_STORAGES setting is available to add more storage classes for use with LOG_URI.\n\nAlso not that similar to FEED_URI, the LOG_URI can be a template string. By default\nany spider attr (such as `name`) or `time` are available. You can additionally \nadd any other attributes to the template by declaring the LOG_URI_PARAMS setting.\n\nThe LOG_URI_PARAMS settings should be a function, or a string that's a path to a function.\nThe function needs to take `spider` as an argument and return a dictionary of the parameters.\n\n```\nLOG_URI_PARAMS: Optional[Union[str, Callable[[dict, Spider], dict]]] = {'my_attr': 'my_value'}\n\ndef uri_params_func(spider):\n return {\n 'custom_param': 'my_value',\n 'another_param': 'another_value',\n }\n\n# takes the spider's name, the time the spider started, and the custom_param and another_param\nLOG_URI = f\"s3://your-bucket/%(name)s_%(time)s_%(custom_param)s_%(another_param)s.log\"\nLOG_URI_PARAMS = uri_params_func\n\n```\n\n## Overriding feedexport settings\n\nBecause much of the backend is the same, you can override some feedexport settings, if you wish them to be different for logexport.\n\n| FeedExport | LogExport |\n| ----------------------- | ------------------------------- |\n| FEED_STORAGE_S3_ACL | LOG_STORAGE_S3_ACL |\n| AWS_ENDPOINT_URL | LOG_STORAGE_AWS_ENDPOINT_URL |\n| GCS_PROJECT_ID | LOG_STORAGE_GCS_PROJECT_ID |\n| FEED_STORAGE_GCS_ACL | LOG_STORAGE_GCS_ACL |\n| FEED_STORAGE_FTP_ACTIVE | LOG_STORAGE_FTP_ACTIVE |\n\n\nAdditionally if there's shared keys in FEED_STORAGES and LOG_STORAGES, the LOG_STORAGES key will be used.\n\n## All possible settings\n\n```\nLOG_FILE # Required\nLOG_URI # Required\n\nLOG_EXPORTER_DELETE_LOCAL\nLOG_URI_PARAMS\n\n# Overrides for feedexport settings\nLOG_STORAGES\nLOG_STORAGE_S3_ACL\nLOG_STORAGE_AWS_ENDPOINT_URL\nLOG_STORAGE_GCS_PROJECT_ID\nLOG_STORAGE_GCS_ACL\nLOG_STORAGE_FTP_ACTIVE\n\n# S3FeedStorage settings\nAWS_ACCESS_KEY_ID\nAWS_SECRET_ACCESS_KEY\nAWS_SESSION_TOKEN\nFEEDEXPORT_S3_ACL # Overridden by LOG_STORAGE_S3_ACL\nAWS_ENDPOINT_URL # Overridden by LOG_STORAGE_AWS_ENDPOINT_URL\n\n# GCFeedStorage settings\nGCS_PROJECT_ID # Overridden by LOG_STORAGE_GCS_PROJECT_ID\nFEED_EXPORT_GCS_ACL # Overridden by LOG_STORAGE_GCS_ACL\n\n# FTPFeedStorage settings\nFEED_STORAGE_FTP_ACTIVE # Overridden by LOG_STORAGE_FTP_ACTIVE\n\nFEED_TEMPDIR # Not used by logexport directly\n```",
"bugtrack_url": null,
"license": "MIT",
"summary": "Upload scrapy logs to cloud storage",
"version": "0.2.1",
"project_urls": {
"Homepage": "https://github.com/nicholas-mischke/scrapy-logexport",
"Repository": "https://github.com/nicholas-mischke/scrapy-logexport"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d92a1914ebf0b82ec9c883b5881bad8ffac50b5702ab03aea76535afe2d95b35",
"md5": "9395d8f76d32c70ef46e80a5da67194e",
"sha256": "d4a28d67edd12bf78d7fb3902cba56d100234d29404b825e9127c3967f5d010f"
},
"downloads": -1,
"filename": "scrapy_logexport-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9395d8f76d32c70ef46e80a5da67194e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7,<4.0",
"size": 5278,
"upload_time": "2023-06-05T10:46:36",
"upload_time_iso_8601": "2023-06-05T10:46:36.942366Z",
"url": "https://files.pythonhosted.org/packages/d9/2a/1914ebf0b82ec9c883b5881bad8ffac50b5702ab03aea76535afe2d95b35/scrapy_logexport-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6e97936eb951d5e397f870065f8f3b7ea826a4b5ed7a9dd37744f4dbd8b3458e",
"md5": "b41e60b1f18b266fb15f749e4b6634cc",
"sha256": "f5d92b912a1345372e24a703b20005f80d40c7af7da4f8cfc7179495685f28db"
},
"downloads": -1,
"filename": "scrapy_logexport-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "b41e60b1f18b266fb15f749e4b6634cc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7,<4.0",
"size": 4688,
"upload_time": "2023-06-05T10:46:38",
"upload_time_iso_8601": "2023-06-05T10:46:38.095574Z",
"url": "https://files.pythonhosted.org/packages/6e/97/936eb951d5e397f870065f8f3b7ea826a4b5ed7a9dd37744f4dbd8b3458e/scrapy_logexport-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-05 10:46:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nicholas-mischke",
"github_project": "scrapy-logexport",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"tox": true,
"lcname": "scrapy-logexport"
}