pytest-spark


Namepytest-spark JSON
Version 0.6.0 PyPI version JSON
download
home_pagehttps://github.com/malexer/pytest-spark
Summarypytest plugin to run the tests with support of pyspark.
upload_time2020-02-23 13:00:31
maintainer
docs_urlNone
authorAlex (Oleksii) Markov
requires_python
licenseMIT
keywords pytest spark pyspark unittest test
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            pytest-spark
############

.. image:: https://travis-ci.org/malexer/pytest-spark.svg?branch=master
    :target: https://travis-ci.org/malexer/pytest-spark

pytest_ plugin to run the tests with support of pyspark (`Apache Spark`_).

This plugin will allow to specify SPARK_HOME directory in ``pytest.ini``
and thus to make "pyspark" importable in your tests which are executed
by pytest.

You can also define "spark_options" in ``pytest.ini`` to customize pyspark,
including "spark.jars.packages" option which allows to load external
libraries (e.g. "com.databricks:spark-xml").

pytest-spark provides session scope fixtures ``spark_context`` and
``spark_session`` which can be used in your tests.

**Note:** no need to define SPARK_HOME if you've installed pyspark using
pip (e.g. ``pip install pyspark``) - it should be already importable. In
this case just don't define SPARK_HOME neither in pytest
(pytest.ini / --spark_home) nor as environment variable.


Install
=======

.. code-block:: shell

    $ pip install pytest-spark


Usage
=====

Set Spark location
------------------

To run tests with required spark_home location you need to define it by
using one of the following methods:

1. Specify command line option "--spark_home"::

    $ pytest --spark_home=/opt/spark

2. Add "spark_home" value to ``pytest.ini`` in your project directory::

    [pytest]
    spark_home = /opt/spark

3. Set the "SPARK_HOME" environment variable.

pytest-spark will try to import ``pyspark`` from provided location.


.. note::
    "spark_home" will be read in the specified order. i.e. you can
    override ``pytest.ini`` value by command line option.


Customize spark_options
-----------------------

Just define "spark_options" in your ``pytest.ini``, e.g.::

    [pytest]
    spark_home = /opt/spark
    spark_options =
        spark.app.name: my-pytest-spark-tests
        spark.executor.instances: 1
        spark.jars.packages: com.databricks:spark-xml_2.12:0.5.0


Using the ``spark_context`` fixture
-----------------------------------

Use fixture ``spark_context`` in your tests as a regular pyspark fixture.
SparkContext instance will be created once and reused for the whole test
session.

Example::

    def test_my_case(spark_context):
        test_rdd = spark_context.parallelize([1, 2, 3, 4])
        # ...


Using the ``spark_session`` fixture (Spark 2.0 and above)
---------------------------------------------------------

Use fixture ``spark_session`` in your tests as a regular pyspark fixture.
A SparkSession instance with Hive support enabled will be created once and reused for the whole test
session.

Example::

    def test_spark_session_dataframe(spark_session):
        test_df = spark_session.createDataFrame([[1,3],[2,4]], "a: int, b: int")
        # ...

Overriding default parameters of the ``spark_session`` fixture
--------------------------------------------------------------
By default ``spark_session`` will be loaded with the following configurations : 

Example::

    {
        'spark.app.name': 'pytest-spark',
        'spark.default.parallelism': 1,
        'spark.dynamicAllocation.enabled': 'false',
        'spark.executor.cores': 1,
        'spark.executor.instances': 1,
        'spark.io.compression.codec': 'lz4',
        'spark.rdd.compress': 'false',
        'spark.sql.shuffle.partitions': 1,
        'spark.shuffle.compress': 'false',
        'spark.sql.catalogImplementation': 'hive',
    }

You can override some of these parameters in your ``pytest.ini``. 
For example, removing Hive Support for the spark session : 

Example::

    [pytest]
    spark_home = /opt/spark
    spark_options =
        spark.sql.catalogImplementation: in-memory

Development
===========

Tests
-----

Run tests locally::

    $ docker-compose up --build


.. _pytest: http://pytest.org/
.. _Apache Spark: https://spark.apache.org/



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/malexer/pytest-spark",
    "name": "pytest-spark",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "pytest spark pyspark unittest test",
    "author": "Alex (Oleksii) Markov",
    "author_email": "alex@markovs.me",
    "download_url": "https://files.pythonhosted.org/packages/ea/06/0a05e3bb6dbf86a45590d1192236fae6717bfc80d8cfdf1d86ac56af7928/pytest-spark-0.6.0.tar.gz",
    "platform": "",
    "description": "pytest-spark\n############\n\n.. image:: https://travis-ci.org/malexer/pytest-spark.svg?branch=master\n    :target: https://travis-ci.org/malexer/pytest-spark\n\npytest_ plugin to run the tests with support of pyspark (`Apache Spark`_).\n\nThis plugin will allow to specify SPARK_HOME directory in ``pytest.ini``\nand thus to make \"pyspark\" importable in your tests which are executed\nby pytest.\n\nYou can also define \"spark_options\" in ``pytest.ini`` to customize pyspark,\nincluding \"spark.jars.packages\" option which allows to load external\nlibraries (e.g. \"com.databricks:spark-xml\").\n\npytest-spark provides session scope fixtures ``spark_context`` and\n``spark_session`` which can be used in your tests.\n\n**Note:** no need to define SPARK_HOME if you've installed pyspark using\npip (e.g. ``pip install pyspark``) - it should be already importable. In\nthis case just don't define SPARK_HOME neither in pytest\n(pytest.ini / --spark_home) nor as environment variable.\n\n\nInstall\n=======\n\n.. code-block:: shell\n\n    $ pip install pytest-spark\n\n\nUsage\n=====\n\nSet Spark location\n------------------\n\nTo run tests with required spark_home location you need to define it by\nusing one of the following methods:\n\n1. Specify command line option \"--spark_home\"::\n\n    $ pytest --spark_home=/opt/spark\n\n2. Add \"spark_home\" value to ``pytest.ini`` in your project directory::\n\n    [pytest]\n    spark_home = /opt/spark\n\n3. Set the \"SPARK_HOME\" environment variable.\n\npytest-spark will try to import ``pyspark`` from provided location.\n\n\n.. note::\n    \"spark_home\" will be read in the specified order. i.e. you can\n    override ``pytest.ini`` value by command line option.\n\n\nCustomize spark_options\n-----------------------\n\nJust define \"spark_options\" in your ``pytest.ini``, e.g.::\n\n    [pytest]\n    spark_home = /opt/spark\n    spark_options =\n        spark.app.name: my-pytest-spark-tests\n        spark.executor.instances: 1\n        spark.jars.packages: com.databricks:spark-xml_2.12:0.5.0\n\n\nUsing the ``spark_context`` fixture\n-----------------------------------\n\nUse fixture ``spark_context`` in your tests as a regular pyspark fixture.\nSparkContext instance will be created once and reused for the whole test\nsession.\n\nExample::\n\n    def test_my_case(spark_context):\n        test_rdd = spark_context.parallelize([1, 2, 3, 4])\n        # ...\n\n\nUsing the ``spark_session`` fixture (Spark 2.0 and above)\n---------------------------------------------------------\n\nUse fixture ``spark_session`` in your tests as a regular pyspark fixture.\nA SparkSession instance with Hive support enabled will be created once and reused for the whole test\nsession.\n\nExample::\n\n    def test_spark_session_dataframe(spark_session):\n        test_df = spark_session.createDataFrame([[1,3],[2,4]], \"a: int, b: int\")\n        # ...\n\nOverriding default parameters of the ``spark_session`` fixture\n--------------------------------------------------------------\nBy default ``spark_session`` will be loaded with the following configurations : \n\nExample::\n\n    {\n        'spark.app.name': 'pytest-spark',\n        'spark.default.parallelism': 1,\n        'spark.dynamicAllocation.enabled': 'false',\n        'spark.executor.cores': 1,\n        'spark.executor.instances': 1,\n        'spark.io.compression.codec': 'lz4',\n        'spark.rdd.compress': 'false',\n        'spark.sql.shuffle.partitions': 1,\n        'spark.shuffle.compress': 'false',\n        'spark.sql.catalogImplementation': 'hive',\n    }\n\nYou can override some of these parameters in your ``pytest.ini``. \nFor example, removing Hive Support for the spark session : \n\nExample::\n\n    [pytest]\n    spark_home = /opt/spark\n    spark_options =\n        spark.sql.catalogImplementation: in-memory\n\nDevelopment\n===========\n\nTests\n-----\n\nRun tests locally::\n\n    $ docker-compose up --build\n\n\n.. _pytest: http://pytest.org/\n.. _Apache Spark: https://spark.apache.org/\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "pytest plugin to run the tests with support of pyspark.",
    "version": "0.6.0",
    "project_urls": {
        "Homepage": "https://github.com/malexer/pytest-spark"
    },
    "split_keywords": [
        "pytest",
        "spark",
        "pyspark",
        "unittest",
        "test"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fa580a5820b4912e63f50b043170eeda56efab52104877818e2ac08c2eecc26d",
                "md5": "dfaeba8dbed1bbae15b9db0f1bb36ceb",
                "sha256": "cabfbcfca6a4876c5e03b151ba9217f3888fe5142154c1e885dd7902afa85a89"
            },
            "downloads": -1,
            "filename": "pytest_spark-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dfaeba8dbed1bbae15b9db0f1bb36ceb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6894,
            "upload_time": "2020-02-23T13:00:29",
            "upload_time_iso_8601": "2020-02-23T13:00:29.770133Z",
            "url": "https://files.pythonhosted.org/packages/fa/58/0a5820b4912e63f50b043170eeda56efab52104877818e2ac08c2eecc26d/pytest_spark-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ea060a05e3bb6dbf86a45590d1192236fae6717bfc80d8cfdf1d86ac56af7928",
                "md5": "d4cc138c0ca0afbfa17c362d4e1a2b3d",
                "sha256": "06e3fbfa2e7fa69d2976c10037c9ee3549c80580228bde5b9aa602f44b711f17"
            },
            "downloads": -1,
            "filename": "pytest-spark-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d4cc138c0ca0afbfa17c362d4e1a2b3d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5725,
            "upload_time": "2020-02-23T13:00:31",
            "upload_time_iso_8601": "2020-02-23T13:00:31.334027Z",
            "url": "https://files.pythonhosted.org/packages/ea/06/0a05e3bb6dbf86a45590d1192236fae6717bfc80d8cfdf1d86ac56af7928/pytest-spark-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-02-23 13:00:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "malexer",
    "github_project": "pytest-spark",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pytest-spark"
}
        
Elapsed time: 0.29160s