multipart


Namemultipart JSON
Version 1.1.0 PyPI version JSON
download
home_pageNone
SummaryParser for multipart/form-data
upload_time2024-10-03 17:10:37
maintainerNone
docs_urlNone
authorNone
requires_python>=3.5
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Parser for multipart/form-data
==============================

.. image:: https://github.com/defnull/multipart/actions/workflows/test.yaml/badge.svg
    :target: https://github.com/defnull/multipart/actions/workflows/test.yaml
    :alt: Tests Status

.. image:: https://img.shields.io/pypi/v/multipart.svg
    :target: https://pypi.python.org/pypi/multipart/
    :alt: Latest Version

.. image:: https://img.shields.io/pypi/l/multipart.svg
    :target: https://pypi.python.org/pypi/multipart/
    :alt: License

This module provides multiple parsers for RFC-7578 ``multipart/form-data``, both
low-level for framework authors and high-level for WSGI application developers:

* ``PushMultipartParser``: A low-level incremental `SansIO <https://sans-io.readthedocs.io/>`_
  (non-blocking) parser suitable for asyncio and other time or memory constrained
  environments.
* ``MultipartParser``: A streaming parser emitting memory- and disk-buffered
  ``MultipartPart`` instances.
* ``parse_form_data``: A helper function to parse both ``multipart/form-data``
  and ``application/x-www-form-urlencoded`` form submissions from a
  `WSGI <https://peps.python.org/pep-3333/>`_ environment.

Installation
------------

``pip install multipart``

Features
--------

* Pure python single file module with no dependencies.
* 100% test coverage. Tested with inputs as seen from actual browsers and HTTP clients.
* Parses multiple GB/s on modern hardware (quick tests, no proper benchmark).
* Quickly rejects malicious or broken inputs and emits useful error messages.
* Enforces configurable memory and disk resource limits to prevent DoS attacks.

**Limitations:** This parser implements ``multipart/form-data`` as it is used by
actual modern browsers and HTTP clients, which means:

* Just ``multipart/form-data``, not suitable for email parsing.
* No ``multipart/mixed`` support (deprecated in RFC 7578).
* No ``base64`` or ``quoted-printable`` transfer encoding (deprecated in RFC 7578).
* No ``encoded-word`` or ``name=_charset_`` encoding markers (discouraged in RFC 7578).
* No support for clearly broken input (e.g. invalid line breaks or header names).

Usage and examples
------------------

For WSGI application developers we strongly suggest using the ``parse_form_data``
helper function. It accepts a WSGI ``environ`` dictionary and parses both types
of form submission (``multipart/form-data`` and ``application/x-www-form-urlencoded``)
based on the actual content type of the request. You'll get two ``MultiDict``
instances in return, one for text fields and the other for file uploads:

.. code-block:: python

    from multipart import parse_form_data

    def wsgi(environ, start_response):
      if environ["REQUEST_METHOD"] == "POST":
        forms, files = parse_form_data(environ)
        
        title = forms["title"]    # string
        upload = files["upload"]  # MultipartPart
        upload.save_as(...)

The ``parse_form_data`` helper function internally uses ``MultipartParser``, a
streaming parser that reads from a ``multipart/form-data`` encoded binary data
stream and emits ``MultipartPart`` instances as soon as a part is fully parsed.
This is most useful if you want to consume the individual parts as soon as they
arrive, instead of waiting for the entire request to be parsed:

.. code-block:: python

    from multipart import parse_options_header, MultipartParser

    def wsgi(environ, start_response):
      assert environ["REQUEST_METHOD"] == "POST"
      ctype, copts = parse_options_header(environ.get("CONTENT_TYPE", ""))
      boundary = copts.get("boundary")
      charset = copts.get("charset", "utf8")
      assert ctype == "multipart/form-data"
    
      parser = MultipartParser(environ["wsgi.input"], boundary, charset)
      for part in parser:
        if part.filename:
          print(f"{part.name}: File upload ({part.size} bytes)")
          part.save_as(...)
        elif part.size < 1024:
          print(f"{part.name}: Text field ({part.value!r})")
        else:
          print(f"{part.name}: Test field, but too big to print :/")

The ``MultipartParser`` handles IO and file buffering for you, but does so using
blocking APIs. If you need absolute control over the parsing process and want to
avoid blocking IO at all cost, then have a look at ``PushMultipartParser``, the
low-level non-blocking incremental ``multipart/form-data`` parser that powers all
the other parsers in this library:

.. code-block:: python

    from multipart import PushMultipartParser, MultipartSegment

    async def process_multipart(reader: asyncio.StreamReader, boundary: str):
      with PushMultipartParser(boundary) as parser:
        while not parser.closed:
          chunk = await reader.read(1024*64)
          for result in parser.parse(chunk):
            if isinstance(result, MultipartSegment):
              print(f"== Start of segment: {result.name}")
              for header, value in result.headerlist:
                print(f"{header}: {value}")
            elif result:  # Result is a non-empty bytearray
              print(f"[received {len(result)} bytes of data]")
            else:         # Result is None
              print(f"== End of segment")


Changelog
---------

* **1.1**

  * Some of these fixes changed behavior to match documentation or specification,
    none of them should be a surprise. Existing apps should be able to upgrade
    without change. 
  * fix: Fail faster on input with invalid line breaks (#55)
  * fix: Allow empty segment names (#56)
  * fix: Avoid ResourceWarning when using parse_form_data (#57)
  * fix: MultipartPart now always has a sensible content type.
  * fix: Actually check parser state on context manager exit.
  * fix: Honor Content-Length header, if present.
  * perf: Reduce overhead for small segments (-21%)
  * perf: Reduce write overhead for large uploads (-2%)

* **1.0**

  * A completely new, fast, non-blocking ``PushMultipartParser`` parser, which
    now serves as the basis for all other parsers.
  * The new parser is stricter and rejects clearly broken input quicker, even in
    non-strict mode (e.g. invalid line breaks or header names). This should not
    affect data sent by actual browsers or HTTP clients.
  * Default charset for ``MultipartParser`` headers and text fields changed to
    ``utf8``, as recommended by W3C HTTP.
  * Default disk and memory limits for ``MultipartParser`` increased, but
    multiple other limits added for finer control. Check if the the new defaults
    still fit your needs.
  * Undocumented APIs deprecated or removed, some of which were not strictly
    private. This includes parameters for ``MultipartParser`` and some
    ``MultipartPart`` methods, but those should not be used by anyone but the
    parser itself.

* **0.2.5**

  * Don't test semicolon separators in urlencoded data (#33)
  * Add python-requires directive, indicating Python 3.5 or later is required and preventing older Pythons from attempting to download this version (#32)
  * Add official support for Python 3.10-3.12 (#38, #48)
  * Default value of ``copy_file`` should be ``2 ** 16``, not ``2 * 16`` (#41)
  * Update URL for Bottle (#42)

* **0.2.4**

  * Consistently decode non-utf8 URL-encoded form-data

* **0.2.3**

  * Import MutableMapping from collections.abc (#23)
  * Fix a few more ResourceWarnings in the test suite (#24)
  * Allow stream to contain data before first boundary (#25)

* **0.2.2**

  * Fix #21 ResourceWarnings on Python 3

* **0.2.1**

  * Fix #20 empty payload

* **0.2**

  * Dropped support for Python versions below 3.6. Stay on 0.1 if you need Python 2.5+ support.

* **0.1**

  * First release


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "multipart",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Marcel Hellkamp <marc@gsites.de>",
    "download_url": "https://files.pythonhosted.org/packages/ee/fc/03c4a1db15b4365cddb7f18285267b599744a048f8e1a98759cf677e33f0/multipart-1.1.0.tar.gz",
    "platform": null,
    "description": "Parser for multipart/form-data\n==============================\n\n.. image:: https://github.com/defnull/multipart/actions/workflows/test.yaml/badge.svg\n    :target: https://github.com/defnull/multipart/actions/workflows/test.yaml\n    :alt: Tests Status\n\n.. image:: https://img.shields.io/pypi/v/multipart.svg\n    :target: https://pypi.python.org/pypi/multipart/\n    :alt: Latest Version\n\n.. image:: https://img.shields.io/pypi/l/multipart.svg\n    :target: https://pypi.python.org/pypi/multipart/\n    :alt: License\n\nThis module provides multiple parsers for RFC-7578 ``multipart/form-data``, both\nlow-level for framework authors and high-level for WSGI application developers:\n\n* ``PushMultipartParser``: A low-level incremental `SansIO <https://sans-io.readthedocs.io/>`_\n  (non-blocking) parser suitable for asyncio and other time or memory constrained\n  environments.\n* ``MultipartParser``: A streaming parser emitting memory- and disk-buffered\n  ``MultipartPart`` instances.\n* ``parse_form_data``: A helper function to parse both ``multipart/form-data``\n  and ``application/x-www-form-urlencoded`` form submissions from a\n  `WSGI <https://peps.python.org/pep-3333/>`_ environment.\n\nInstallation\n------------\n\n``pip install multipart``\n\nFeatures\n--------\n\n* Pure python single file module with no dependencies.\n* 100% test coverage. Tested with inputs as seen from actual browsers and HTTP clients.\n* Parses multiple GB/s on modern hardware (quick tests, no proper benchmark).\n* Quickly rejects malicious or broken inputs and emits useful error messages.\n* Enforces configurable memory and disk resource limits to prevent DoS attacks.\n\n**Limitations:** This parser implements ``multipart/form-data`` as it is used by\nactual modern browsers and HTTP clients, which means:\n\n* Just ``multipart/form-data``, not suitable for email parsing.\n* No ``multipart/mixed`` support (deprecated in RFC 7578).\n* No ``base64`` or ``quoted-printable`` transfer encoding (deprecated in RFC 7578).\n* No ``encoded-word`` or ``name=_charset_`` encoding markers (discouraged in RFC 7578).\n* No support for clearly broken input (e.g. invalid line breaks or header names).\n\nUsage and examples\n------------------\n\nFor WSGI application developers we strongly suggest using the ``parse_form_data``\nhelper function. It accepts a WSGI ``environ`` dictionary and parses both types\nof form submission (``multipart/form-data`` and ``application/x-www-form-urlencoded``)\nbased on the actual content type of the request. You'll get two ``MultiDict``\ninstances in return, one for text fields and the other for file uploads:\n\n.. code-block:: python\n\n    from multipart import parse_form_data\n\n    def wsgi(environ, start_response):\n      if environ[\"REQUEST_METHOD\"] == \"POST\":\n        forms, files = parse_form_data(environ)\n        \n        title = forms[\"title\"]    # string\n        upload = files[\"upload\"]  # MultipartPart\n        upload.save_as(...)\n\nThe ``parse_form_data`` helper function internally uses ``MultipartParser``, a\nstreaming parser that reads from a ``multipart/form-data`` encoded binary data\nstream and emits ``MultipartPart`` instances as soon as a part is fully parsed.\nThis is most useful if you want to consume the individual parts as soon as they\narrive, instead of waiting for the entire request to be parsed:\n\n.. code-block:: python\n\n    from multipart import parse_options_header, MultipartParser\n\n    def wsgi(environ, start_response):\n      assert environ[\"REQUEST_METHOD\"] == \"POST\"\n      ctype, copts = parse_options_header(environ.get(\"CONTENT_TYPE\", \"\"))\n      boundary = copts.get(\"boundary\")\n      charset = copts.get(\"charset\", \"utf8\")\n      assert ctype == \"multipart/form-data\"\n    \n      parser = MultipartParser(environ[\"wsgi.input\"], boundary, charset)\n      for part in parser:\n        if part.filename:\n          print(f\"{part.name}: File upload ({part.size} bytes)\")\n          part.save_as(...)\n        elif part.size < 1024:\n          print(f\"{part.name}: Text field ({part.value!r})\")\n        else:\n          print(f\"{part.name}: Test field, but too big to print :/\")\n\nThe ``MultipartParser`` handles IO and file buffering for you, but does so using\nblocking APIs. If you need absolute control over the parsing process and want to\navoid blocking IO at all cost, then have a look at ``PushMultipartParser``, the\nlow-level non-blocking incremental ``multipart/form-data`` parser that powers all\nthe other parsers in this library:\n\n.. code-block:: python\n\n    from multipart import PushMultipartParser, MultipartSegment\n\n    async def process_multipart(reader: asyncio.StreamReader, boundary: str):\n      with PushMultipartParser(boundary) as parser:\n        while not parser.closed:\n          chunk = await reader.read(1024*64)\n          for result in parser.parse(chunk):\n            if isinstance(result, MultipartSegment):\n              print(f\"== Start of segment: {result.name}\")\n              for header, value in result.headerlist:\n                print(f\"{header}: {value}\")\n            elif result:  # Result is a non-empty bytearray\n              print(f\"[received {len(result)} bytes of data]\")\n            else:         # Result is None\n              print(f\"== End of segment\")\n\n\nChangelog\n---------\n\n* **1.1**\n\n  * Some of these fixes changed behavior to match documentation or specification,\n    none of them should be a surprise. Existing apps should be able to upgrade\n    without change. \n  * fix: Fail faster on input with invalid line breaks (#55)\n  * fix: Allow empty segment names (#56)\n  * fix: Avoid ResourceWarning when using parse_form_data (#57)\n  * fix: MultipartPart now always has a sensible content type.\n  * fix: Actually check parser state on context manager exit.\n  * fix: Honor Content-Length header, if present.\n  * perf: Reduce overhead for small segments (-21%)\n  * perf: Reduce write overhead for large uploads (-2%)\n\n* **1.0**\n\n  * A completely new, fast, non-blocking ``PushMultipartParser`` parser, which\n    now serves as the basis for all other parsers.\n  * The new parser is stricter and rejects clearly broken input quicker, even in\n    non-strict mode (e.g. invalid line breaks or header names). This should not\n    affect data sent by actual browsers or HTTP clients.\n  * Default charset for ``MultipartParser`` headers and text fields changed to\n    ``utf8``, as recommended by W3C HTTP.\n  * Default disk and memory limits for ``MultipartParser`` increased, but\n    multiple other limits added for finer control. Check if the the new defaults\n    still fit your needs.\n  * Undocumented APIs deprecated or removed, some of which were not strictly\n    private. This includes parameters for ``MultipartParser`` and some\n    ``MultipartPart`` methods, but those should not be used by anyone but the\n    parser itself.\n\n* **0.2.5**\n\n  * Don't test semicolon separators in urlencoded data (#33)\n  * Add python-requires directive, indicating Python 3.5 or later is required and preventing older Pythons from attempting to download this version (#32)\n  * Add official support for Python 3.10-3.12 (#38, #48)\n  * Default value of ``copy_file`` should be ``2 ** 16``, not ``2 * 16`` (#41)\n  * Update URL for Bottle (#42)\n\n* **0.2.4**\n\n  * Consistently decode non-utf8 URL-encoded form-data\n\n* **0.2.3**\n\n  * Import MutableMapping from collections.abc (#23)\n  * Fix a few more ResourceWarnings in the test suite (#24)\n  * Allow stream to contain data before first boundary (#25)\n\n* **0.2.2**\n\n  * Fix #21 ResourceWarnings on Python 3\n\n* **0.2.1**\n\n  * Fix #20 empty payload\n\n* **0.2**\n\n  * Dropped support for Python versions below 3.6. Stay on 0.1 if you need Python 2.5+ support.\n\n* **0.1**\n\n  * First release\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Parser for multipart/form-data",
    "version": "1.1.0",
    "project_urls": {
        "Changelog": "https://github.com/defnull/multipart?tab=readme-ov-file#changelog",
        "Documentation": "https://github.com/defnull/multipart?tab=readme-ov-file#parser-for-multipartform-data",
        "Homepage": "https://github.com/defnull/multipart",
        "Issues": "https://github.com/defnull/multipart/issues",
        "PyPI": "https://pypi.org/project/multipart/",
        "Source": "https://github.com/defnull/multipart"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "edc482f2eef01dde7e142776203706c3b7a221656975bff61965207dcbc0c88d",
                "md5": "cab1a28ca4271ee53d4ea46c99b90801",
                "sha256": "5a784677de8b49e6409e730dfe018f73c5d7aef360e44750e00f67d669b51e91"
            },
            "downloads": -1,
            "filename": "multipart-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cab1a28ca4271ee53d4ea46c99b90801",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.5",
            "size": 13592,
            "upload_time": "2024-10-03T17:10:36",
            "upload_time_iso_8601": "2024-10-03T17:10:36.532484Z",
            "url": "https://files.pythonhosted.org/packages/ed/c4/82f2eef01dde7e142776203706c3b7a221656975bff61965207dcbc0c88d/multipart-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eefc03c4a1db15b4365cddb7f18285267b599744a048f8e1a98759cf677e33f0",
                "md5": "d9832b0baa5b4f9083fdff7bac64a45a",
                "sha256": "ee32683f5c454740cd9139e1d6057053823da0729c426f156464f81111529ba1"
            },
            "downloads": -1,
            "filename": "multipart-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d9832b0baa5b4f9083fdff7bac64a45a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 34622,
            "upload_time": "2024-10-03T17:10:37",
            "upload_time_iso_8601": "2024-10-03T17:10:37.936983Z",
            "url": "https://files.pythonhosted.org/packages/ee/fc/03c4a1db15b4365cddb7f18285267b599744a048f8e1a98759cf677e33f0/multipart-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-03 17:10:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "defnull",
    "github_project": "multipart?tab=readme-ov-file#changelog",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "multipart"
}
        
Elapsed time: 2.41351s