fetch-sitemap


Namefetch-sitemap JSON
Version 27 PyPI version JSON
download
home_pageNone
SummaryFetch a given sitemap and retrieve all URLs in it.
upload_time2024-10-17 19:57:27
maintainerNone
docs_urlNone
authorMartin Mahner
requires_python<4.0,>=3.9
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # fetch-sitemap

Retrieves all URLs of a given sitemap.xml URL and fetches each page one by one. 
Useful for (load) testing the entire site for error responses.

![Sample Output](https://raw.githubusercontent.com/bartTC/fetch-sitemap/main/example.png)

## Installation

```bash 
$ pip install fetch-sitemap
```

## Usage 

```
$ fetch-sitemap --help

 Usage: fetch-sitemap [OPTIONS] SITEMAP_URL

 Fetch a given sitemap and retrieve all URLs in it.

╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --basic-auth                -a  TEXT              Basic auth information. Format: 'username:password'                                                          │
│ --limit                     -l  INT [>=1]         Maximum number of URLs to fetch from the given sitemap.xml.                                                  │
│ --recursive/--no-recursive                        Recursively fetch all sitemap documents from the given sitemap.xml. [default: recursive]                     │
│ --concurrency-limit         -c  INT [>=1]         Max number of concurrent requests. [default: 5; >=1]                                                         │
│ --request-timeout           -t  INT [>=1]         Timeout for fetching a URL in seconds. [default: 30; >=1]                                                    │
│ --random                    -r                    Append a random string like ?12334232343 to each URL to bypass frontend cache.                               │
│ --random-length                 INT [1 to 100]    Length of the --random hash. [default: 15; 1 to 100]                                                         │
│ --report-path               -p  FILE              Store results in a CSV file. Example: ./report.csv                                                           │
│ --output-dir                -o  DIRECTORY         Store all fetched sitemap documents in this folder. Example: /tmp/my.domain.com/                             │
│ --slow-threshold                FLOAT [>=0.0]     Responses slower than this (in seconds) are considered 'slow'. [default: 5.0; >=0.0]                         │
│ --slow-num                      INTEGER OR "ALL"  How many 'slow' responses to show. [default: 10]                                                             │
│ --user-agent                    TEXT              User-Agent string set in the HTTP header. [default: Mozilla/5.0 (compatible; fetch-sitemap/23)]              │
│ --version                                         Show the version and exit.                                                                                   │
│ --help                                            Show this message and exit.                                                                                  │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```

## 🤺 Local Development

```bash
poetry install
poetry run fetch-sitemap -h
poetry run ./tests.sh
```
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fetch-sitemap",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Martin Mahner",
    "author_email": "martin@mahner.org",
    "download_url": "https://files.pythonhosted.org/packages/5f/f7/a438aafe4b8c25943177c300dc6067523d2df642393682b9587ccc8a2d44/fetch_sitemap-27.tar.gz",
    "platform": null,
    "description": "# fetch-sitemap\n\nRetrieves all URLs of a given sitemap.xml URL and fetches each page one by one. \nUseful for (load) testing the entire site for error responses.\n\n![Sample Output](https://raw.githubusercontent.com/bartTC/fetch-sitemap/main/example.png)\n\n## Installation\n\n```bash \n$ pip install fetch-sitemap\n```\n\n## Usage \n\n```\n$ fetch-sitemap --help\n\n Usage: fetch-sitemap [OPTIONS] SITEMAP_URL\n\n Fetch a given sitemap and retrieve all URLs in it.\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --basic-auth                -a  TEXT              Basic auth information. Format: 'username:password'                                                          \u2502\n\u2502 --limit                     -l  INT [>=1]         Maximum number of URLs to fetch from the given sitemap.xml.                                                  \u2502\n\u2502 --recursive/--no-recursive                        Recursively fetch all sitemap documents from the given sitemap.xml. [default: recursive]                     \u2502\n\u2502 --concurrency-limit         -c  INT [>=1]         Max number of concurrent requests. [default: 5; >=1]                                                         \u2502\n\u2502 --request-timeout           -t  INT [>=1]         Timeout for fetching a URL in seconds. [default: 30; >=1]                                                    \u2502\n\u2502 --random                    -r                    Append a random string like ?12334232343 to each URL to bypass frontend cache.                               \u2502\n\u2502 --random-length                 INT [1 to 100]    Length of the --random hash. [default: 15; 1 to 100]                                                         \u2502\n\u2502 --report-path               -p  FILE              Store results in a CSV file. Example: ./report.csv                                                           \u2502\n\u2502 --output-dir                -o  DIRECTORY         Store all fetched sitemap documents in this folder. Example: /tmp/my.domain.com/                             \u2502\n\u2502 --slow-threshold                FLOAT [>=0.0]     Responses slower than this (in seconds) are considered 'slow'. [default: 5.0; >=0.0]                         \u2502\n\u2502 --slow-num                      INTEGER OR \"ALL\"  How many 'slow' responses to show. [default: 10]                                                             \u2502\n\u2502 --user-agent                    TEXT              User-Agent string set in the HTTP header. [default: Mozilla/5.0 (compatible; fetch-sitemap/23)]              \u2502\n\u2502 --version                                         Show the version and exit.                                                                                   \u2502\n\u2502 --help                                            Show this message and exit.                                                                                  \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n## \ud83e\udd3a Local Development\n\n```bash\npoetry install\npoetry run fetch-sitemap -h\npoetry run ./tests.sh\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fetch a given sitemap and retrieve all URLs in it.",
    "version": "27",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "09a38e87dab6872b12ee477adc5005e30c3c01a5ef6cbe95b6fa0315bacb569e",
                "md5": "5bfdba457048eec23c644589d86ce6d2",
                "sha256": "4f9f4606303c416a46be20ff82450b87391ece12fef33bd810da1514b1519ad4"
            },
            "downloads": -1,
            "filename": "fetch_sitemap-27-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5bfdba457048eec23c644589d86ce6d2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 9943,
            "upload_time": "2024-10-17T19:57:26",
            "upload_time_iso_8601": "2024-10-17T19:57:26.415974Z",
            "url": "https://files.pythonhosted.org/packages/09/a3/8e87dab6872b12ee477adc5005e30c3c01a5ef6cbe95b6fa0315bacb569e/fetch_sitemap-27-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5ff7a438aafe4b8c25943177c300dc6067523d2df642393682b9587ccc8a2d44",
                "md5": "3acd3e540c3ed5bc239399c9d62a0fd9",
                "sha256": "ff888992d0e3eee82075b42f5d441784ad3d20816ebc3f480cd2a0750a504052"
            },
            "downloads": -1,
            "filename": "fetch_sitemap-27.tar.gz",
            "has_sig": false,
            "md5_digest": "3acd3e540c3ed5bc239399c9d62a0fd9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 9223,
            "upload_time": "2024-10-17T19:57:27",
            "upload_time_iso_8601": "2024-10-17T19:57:27.397185Z",
            "url": "https://files.pythonhosted.org/packages/5f/f7/a438aafe4b8c25943177c300dc6067523d2df642393682b9587ccc8a2d44/fetch_sitemap-27.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-17 19:57:27",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "fetch-sitemap"
}
        
Elapsed time: 4.80799s