scrapy-logexport


Namescrapy-logexport JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/nicholas-mischke/scrapy-logexport
SummaryUpload scrapy logs to cloud storage
upload_time2023-06-05 10:46:38
maintainer
docs_urlNone
authorNicholas Mischke
requires_python>=3.7,<4.0
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Scrapy Log Export

## Description
A scrapy extension that allows for a LOG_URI setting, similar to a FEED_URI setting.
The same FEED_STORAGE classes that are used in the feedexport extensions are used here.

This extension is useful if you're running scrapy in a container and want to store your logs with a cloud service provider.

Please note that this extension still requires that a local log file is written. Once scrapy's engine has stopped, the extension will upload the log file to the cloud and optionally delete the local file.

## Installation
You can install scrapy-logexporter using pip:
```
    pip install scrapy-logexporter
```

## Configuration

Enable the extension by adding it to your `settings.py`:
```
    from environs import Env

    env = Env()  
    env.read_env() 

    # Enable the extension
    EXTENSIONS = {
        "scrapy_logexport.LogExporter": 0,
    }

    LOG_FILE = 'scrapy.log' # Must be a local file
    LOG_EXPORTER_DELETE_LOCAL = True # Delete local log file after upload, defaults to False
    LOG_URI = f"s3://your-bucket/%(name)s %(time)s.log" # Store on S3
    
    AWS_ACCESS_KEY_ID = env("AWS_ACCESS_KEY_ID")
    AWS_SECRET_ACCESS_KEY = env("AWS_SECRET_ACCESS_KEY")

```

## Setting LOG_URI

The FEED_STORAGE class used for the LOG_URI is determined by the URI scheme. The following schemes are supported, by default:

```
FEED_STORAGES_BASE = {
    "": "scrapy.extensions.feedexport.FileFeedStorage",
    "file": "scrapy.extensions.feedexport.FileFeedStorage",
    "ftp": "scrapy.extensions.feedexport.FTPFeedStorage",
    "gs": "scrapy.extensions.feedexport.GCSFeedStorage",
    "s3": "scrapy.extensions.feedexport.S3FeedStorage",
    "stdout": "scrapy.extensions.feedexport.StdoutFeedStorage",
}
```
If you've already added more to FEED_STORAGES they're be available for use with LOG_URI.
Additionally a LOG_STORAGES setting is available to add more storage classes for use with LOG_URI.

Also not that similar to FEED_URI, the LOG_URI can be a template string. By default
any spider attr (such as `name`) or `time` are available. You can additionally 
add any other attributes to the template by declaring the LOG_URI_PARAMS setting.

The LOG_URI_PARAMS settings should be a function, or a string that's a path to a function.
The function needs to take `spider` as an argument and return a dictionary of the parameters.

```
LOG_URI_PARAMS: Optional[Union[str, Callable[[dict, Spider], dict]]] = {'my_attr': 'my_value'}

def uri_params_func(spider):
    return {
        'custom_param': 'my_value',
        'another_param': 'another_value',
    }

# takes the spider's name, the time the spider started, and the custom_param and another_param
LOG_URI = f"s3://your-bucket/%(name)s_%(time)s_%(custom_param)s_%(another_param)s.log"
LOG_URI_PARAMS = uri_params_func

```

## Overriding feedexport settings

Because much of the backend is the same, you can override some feedexport settings, if you wish them to be different for logexport.

| FeedExport              | LogExport                       |
| ----------------------- | ------------------------------- |
| FEED_STORAGE_S3_ACL     | LOG_STORAGE_S3_ACL              |
| AWS_ENDPOINT_URL        | LOG_STORAGE_AWS_ENDPOINT_URL    |
| GCS_PROJECT_ID          | LOG_STORAGE_GCS_PROJECT_ID      |
| FEED_STORAGE_GCS_ACL    | LOG_STORAGE_GCS_ACL             |
| FEED_STORAGE_FTP_ACTIVE | LOG_STORAGE_FTP_ACTIVE          |


Additionally if there's shared keys in FEED_STORAGES and LOG_STORAGES, the LOG_STORAGES key will be used.

## All possible settings

```
LOG_FILE # Required
LOG_URI # Required

LOG_EXPORTER_DELETE_LOCAL
LOG_URI_PARAMS

# Overrides for feedexport settings
LOG_STORAGES
LOG_STORAGE_S3_ACL
LOG_STORAGE_AWS_ENDPOINT_URL
LOG_STORAGE_GCS_PROJECT_ID
LOG_STORAGE_GCS_ACL
LOG_STORAGE_FTP_ACTIVE

# S3FeedStorage settings
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
FEEDEXPORT_S3_ACL # Overridden by LOG_STORAGE_S3_ACL
AWS_ENDPOINT_URL # Overridden by LOG_STORAGE_AWS_ENDPOINT_URL

# GCFeedStorage settings
GCS_PROJECT_ID # Overridden by LOG_STORAGE_GCS_PROJECT_ID
FEED_EXPORT_GCS_ACL # Overridden by LOG_STORAGE_GCS_ACL

# FTPFeedStorage settings
FEED_STORAGE_FTP_ACTIVE # Overridden by LOG_STORAGE_FTP_ACTIVE

FEED_TEMPDIR # Not used by logexport directly
```
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nicholas-mischke/scrapy-logexport",
    "name": "scrapy-logexport",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Nicholas Mischke",
    "author_email": "nmischkework@proton.me",
    "download_url": "https://files.pythonhosted.org/packages/6e/97/936eb951d5e397f870065f8f3b7ea826a4b5ed7a9dd37744f4dbd8b3458e/scrapy_logexport-0.2.1.tar.gz",
    "platform": null,
    "description": "\n# Scrapy Log Export\n\n## Description\nA scrapy extension that allows for a LOG_URI setting, similar to a FEED_URI setting.\nThe same FEED_STORAGE classes that are used in the feedexport extensions are used here.\n\nThis extension is useful if you're running scrapy in a container and want to store your logs with a cloud service provider.\n\nPlease note that this extension still requires that a local log file is written. Once scrapy's engine has stopped, the extension will upload the log file to the cloud and optionally delete the local file.\n\n## Installation\nYou can install scrapy-logexporter using pip:\n```\n    pip install scrapy-logexporter\n```\n\n## Configuration\n\nEnable the extension by adding it to your `settings.py`:\n```\n    from environs import Env\n\n    env = Env()  \n    env.read_env() \n\n    # Enable the extension\n    EXTENSIONS = {\n        \"scrapy_logexport.LogExporter\": 0,\n    }\n\n    LOG_FILE = 'scrapy.log' # Must be a local file\n    LOG_EXPORTER_DELETE_LOCAL = True # Delete local log file after upload, defaults to False\n    LOG_URI = f\"s3://your-bucket/%(name)s %(time)s.log\" # Store on S3\n    \n    AWS_ACCESS_KEY_ID = env(\"AWS_ACCESS_KEY_ID\")\n    AWS_SECRET_ACCESS_KEY = env(\"AWS_SECRET_ACCESS_KEY\")\n\n```\n\n## Setting LOG_URI\n\nThe FEED_STORAGE class used for the LOG_URI is determined by the URI scheme. The following schemes are supported, by default:\n\n```\nFEED_STORAGES_BASE = {\n    \"\": \"scrapy.extensions.feedexport.FileFeedStorage\",\n    \"file\": \"scrapy.extensions.feedexport.FileFeedStorage\",\n    \"ftp\": \"scrapy.extensions.feedexport.FTPFeedStorage\",\n    \"gs\": \"scrapy.extensions.feedexport.GCSFeedStorage\",\n    \"s3\": \"scrapy.extensions.feedexport.S3FeedStorage\",\n    \"stdout\": \"scrapy.extensions.feedexport.StdoutFeedStorage\",\n}\n```\nIf you've already added more to FEED_STORAGES they're be available for use with LOG_URI.\nAdditionally a LOG_STORAGES setting is available to add more storage classes for use with LOG_URI.\n\nAlso not that similar to FEED_URI, the LOG_URI can be a template string. By default\nany spider attr (such as `name`) or `time` are available. You can additionally \nadd any other attributes to the template by declaring the LOG_URI_PARAMS setting.\n\nThe LOG_URI_PARAMS settings should be a function, or a string that's a path to a function.\nThe function needs to take `spider` as an argument and return a dictionary of the parameters.\n\n```\nLOG_URI_PARAMS: Optional[Union[str, Callable[[dict, Spider], dict]]] = {'my_attr': 'my_value'}\n\ndef uri_params_func(spider):\n    return {\n        'custom_param': 'my_value',\n        'another_param': 'another_value',\n    }\n\n# takes the spider's name, the time the spider started, and the custom_param and another_param\nLOG_URI = f\"s3://your-bucket/%(name)s_%(time)s_%(custom_param)s_%(another_param)s.log\"\nLOG_URI_PARAMS = uri_params_func\n\n```\n\n## Overriding feedexport settings\n\nBecause much of the backend is the same, you can override some feedexport settings, if you wish them to be different for logexport.\n\n| FeedExport              | LogExport                       |\n| ----------------------- | ------------------------------- |\n| FEED_STORAGE_S3_ACL     | LOG_STORAGE_S3_ACL              |\n| AWS_ENDPOINT_URL        | LOG_STORAGE_AWS_ENDPOINT_URL    |\n| GCS_PROJECT_ID          | LOG_STORAGE_GCS_PROJECT_ID      |\n| FEED_STORAGE_GCS_ACL    | LOG_STORAGE_GCS_ACL             |\n| FEED_STORAGE_FTP_ACTIVE | LOG_STORAGE_FTP_ACTIVE          |\n\n\nAdditionally if there's shared keys in FEED_STORAGES and LOG_STORAGES, the LOG_STORAGES key will be used.\n\n## All possible settings\n\n```\nLOG_FILE # Required\nLOG_URI # Required\n\nLOG_EXPORTER_DELETE_LOCAL\nLOG_URI_PARAMS\n\n# Overrides for feedexport settings\nLOG_STORAGES\nLOG_STORAGE_S3_ACL\nLOG_STORAGE_AWS_ENDPOINT_URL\nLOG_STORAGE_GCS_PROJECT_ID\nLOG_STORAGE_GCS_ACL\nLOG_STORAGE_FTP_ACTIVE\n\n# S3FeedStorage settings\nAWS_ACCESS_KEY_ID\nAWS_SECRET_ACCESS_KEY\nAWS_SESSION_TOKEN\nFEEDEXPORT_S3_ACL # Overridden by LOG_STORAGE_S3_ACL\nAWS_ENDPOINT_URL # Overridden by LOG_STORAGE_AWS_ENDPOINT_URL\n\n# GCFeedStorage settings\nGCS_PROJECT_ID # Overridden by LOG_STORAGE_GCS_PROJECT_ID\nFEED_EXPORT_GCS_ACL # Overridden by LOG_STORAGE_GCS_ACL\n\n# FTPFeedStorage settings\nFEED_STORAGE_FTP_ACTIVE # Overridden by LOG_STORAGE_FTP_ACTIVE\n\nFEED_TEMPDIR # Not used by logexport directly\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Upload scrapy logs to cloud storage",
    "version": "0.2.1",
    "project_urls": {
        "Homepage": "https://github.com/nicholas-mischke/scrapy-logexport",
        "Repository": "https://github.com/nicholas-mischke/scrapy-logexport"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d92a1914ebf0b82ec9c883b5881bad8ffac50b5702ab03aea76535afe2d95b35",
                "md5": "9395d8f76d32c70ef46e80a5da67194e",
                "sha256": "d4a28d67edd12bf78d7fb3902cba56d100234d29404b825e9127c3967f5d010f"
            },
            "downloads": -1,
            "filename": "scrapy_logexport-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9395d8f76d32c70ef46e80a5da67194e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7,<4.0",
            "size": 5278,
            "upload_time": "2023-06-05T10:46:36",
            "upload_time_iso_8601": "2023-06-05T10:46:36.942366Z",
            "url": "https://files.pythonhosted.org/packages/d9/2a/1914ebf0b82ec9c883b5881bad8ffac50b5702ab03aea76535afe2d95b35/scrapy_logexport-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6e97936eb951d5e397f870065f8f3b7ea826a4b5ed7a9dd37744f4dbd8b3458e",
                "md5": "b41e60b1f18b266fb15f749e4b6634cc",
                "sha256": "f5d92b912a1345372e24a703b20005f80d40c7af7da4f8cfc7179495685f28db"
            },
            "downloads": -1,
            "filename": "scrapy_logexport-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "b41e60b1f18b266fb15f749e4b6634cc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7,<4.0",
            "size": 4688,
            "upload_time": "2023-06-05T10:46:38",
            "upload_time_iso_8601": "2023-06-05T10:46:38.095574Z",
            "url": "https://files.pythonhosted.org/packages/6e/97/936eb951d5e397f870065f8f3b7ea826a4b5ed7a9dd37744f4dbd8b3458e/scrapy_logexport-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-05 10:46:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nicholas-mischke",
    "github_project": "scrapy-logexport",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "tox": true,
    "lcname": "scrapy-logexport"
}
        
Elapsed time: 0.19209s