ophelia-spark


Nameophelia-spark JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/LuisFalva/ophelia
SummaryOphelia is a spark miner AI engine that builds data mining & ml pipelines with PySpark.
upload_time2024-07-17 21:54:29
maintainerNone
docs_urlNone
authorLuis Vargas
requires_python<3.12,>=3.9
licenseFree for non-commercial use
keywords ophelia-spark
VCS
bugtrack_url
requirements click cloudpickle colorama dask-expr dask dask fsspec importlib-metadata joblib locket numpy packaging pandas partd py4j pyarrow pyhocon pyparsing pyspark python-dateutil pytz pyyaml quadprog scikit-learn scipy six threadpoolctl toolz tzdata zipp
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Welcome to [Ophelia Spark](https://ophelia.readme.io/)

## ๐Ÿ“ Generalized ML Features

Our project focuses on creating robust and efficient PySpark ML and Mllib pipelines, making them easily replicable and secure for various machine learning tasks. Key features include optimized techniques for handling data skewness, user-friendly interfaces for building custom models, and streamlined data mining pipelines with Ophelia spark wrappers. Additionally, it functions as an emulator of NumPy and pandas, offering similar functionalities for a seamless user experience. Below are the detailed features:

- **Building PySpark ML & Mllib Pipelines**: Simplified and secure methods to construct machine learning pipelines using PySpark, ensuring replicability and robustness.
- **Optimized Techniques for Data Skewness**: Embedded strategies to address and mitigate data skewness issues, improving model performance and accuracy.
- **Build-Your-Own Models**: User-friendly tools for constructing custom models and data mining pipelines, leveraging the power of PySpark and Ophelia spark wrappers for enhanced flexibility and efficiency.
- **NumPy and pandas Functionality Emulation**: Emulates the functions and features of NumPy and pandas, making it intuitive and easy for users familiar with these libraries to transition and utilize similar functionalities within PySpark.

These features aim to empower users with the tools they need to handle complex machine learning tasks effectively, ensuring a seamless experience from data processing to model deployment.

# Getting Started:

### Requirements ๐Ÿ“œ

Before starting, you'll need to have installed pyspark >= 3.0.x, pandas >= 1.1.3, numpy >= 1.19.1, dask >= 2.30.x, scikit-learn >= 0.23.x 

Additionally, if you want to use the Ophelia package, you'll also need Python (supported 3.7 and 3.8 versions) and pip installed.

### Building from the source ๐Ÿ› ๏ธ

Just clone the `Ophelia` repo and import `Ophelia`:
```sh   
git clone https://github.com/LuisFalva/ophelia.git
```

After wiring and clone the `Ophelia` repo go to:
```sh   
cd ophelia_spark
```
### First time installation ๐Ÿ“ก
> For the very first time running and installing Ophelia in your local machine you need to wire with Ophelia's main repo.
Just run the following script in order to set up correctly:

And execute the following `make` instruction:
```sh   
make install
```

**First Important Note**: You must see a successful message like the one below.
```sh   
[Ophelia] Successfully installed ophelia_spark:0.1.0. Have fun! =)
```

**Second Important Note**: You also can pull Ophelia 0.1.0
[or make sure version matches with the one you need and configure the env `OPHELIA_DOCKER_VERSION`]
docker image and use it as base image for new images.
```sh   
make docker-pull
```

Also, you can push new changes to your corresponding version as follows:
```sh   
make docker-build
```

### Importing and initializing Ophelia ๐Ÿ“ฆ

To initialize `Ophelia` with Spark embedded session use:

```python
>>> from ophelia_spark.start import Ophelia
>>> ophelia = Ophelia("Spark App Name")
>>> sc = ophelia.Spark.build_spark_context()

  ____          _            _  _           _____                      _    
 / __ \        | |          | |(_)         / ____|                    | |   
| |  | | _ __  | |__    ___ | | _   __ _  | (___   _ __    __ _  _ __ | | __
| |  | || '_ \ | '_ \  / _ \| || | / _` |  \___ \ | '_ \  / _` || '__|| |/ /
| |__| || |_) || | | ||  __/| || || (_| |  ____) || |_) || (_| || |   |   < 
 \____/ | .__/ |_| |_| \___||_||_| \__,_| |_____/ | .__/  \__,_||_|   |_|\_\
        | |                                       | |                       
        |_|                                       |_|                       

```
Main class objects provided by initializing Ophelia session:

- `read` & `write`

```python
from ophelia_spark.read.spark_read import Read
from ophelia_spark.write.spark_write import Write
```
- `generic` & `functions`

```python
from ophelia_spark.functions import Shape, Rolling, Reshape, CorrMat, CrossTabular, PctChange, Selects, DynamicSampling
from ophelia_spark.generic import (split_date, row_index, lag_min_max_data, regex_expr, remove_duplicate_element,
                                   year_array, dates_index, sorted_date_list, feature_pick, binary_search,
                                   century_from_year, simple_average, delta_series, simple_moving_average, average,
                                   weight_moving_average, single_exp_smooth, double_exp_smooth, initial_seasonal_components,
                                   triple_exp_smooth, row_indexing, string_match)
```
- ML package for `unsupervised`, `sampling` and `feature_miner` objects

```python
from ophelia_spark.ml.sampling.synthetic_sample import SyntheticSample
from ophelia_spark.ml.unsupervised.feature import PCAnalysis, SingularVD
from ophelia_spark.ml.feature_miner import BuildStringIndex, BuildOneHotEncoder, BuildVectorAssembler, BuildStandardScaler, SparkToNumpy, NumpyToVector
```

Let me show you some application examples:

The `Read` class implements Spark reading object in multiple formats `{'csv', 'parquet', 'excel', 'json'}`

```python
from ophelia_spark.read.spark_read import Read
spark_df = spark.readFile(path, 'csv', header=True, infer_schema=True)
```

Also, you may import class `Shape` from factory `functions` in order to see the dimension of our spark DataFrame such as numpy style.

```python
from ophelia_spark.functions import Shape
dic = {
    'Product': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'Year': [2010, 2010, 2010, 2011, 2011, 2011, 2012, 2012, 2012],
    'Revenue': [100, 200, 300, 110, 190, 320, 120, 220, 350]
}
dic_to_df = spark.createDataFrame(pd.DataFrame(data=dic))
dic_to_df.show(10, False)

+---------+------------+-----------+
| Product |    Year    |  Revenue  |
+---------+------------+-----------+
|    A    |    2010    |    100    |
|    B    |    2010    |    200    |
|    C    |    2010    |    300    |
|    A    |    2011    |    110    |
|    B    |    2011    |    190    |
|    C    |    2011    |    320    |
|    A    |    2012    |    120    |
|    B    |    2012    |    220    |
|    C    |    2012    |    350    |
+---------+------------+-----------+

dic_to_df.Shape
(9, 3)
```

The `pct_change` wrapper is added to the Spark `DataFrame` class in order to have the most commonly used method in Pandas
objects to get the relative percentage change from one observation to another, sorted by a date-type column and lagged by a numeric-type column.

```python
from ophelia_spark.functions import PctChange
dic_to_df.pctChange().show(10, False)

+---------------------+
| Revenue             |
+---------------------+
| null                |
| 1.0                 |
| 0.5                 |
| -0.6333333333333333 |
| 0.7272727272727273  |
| 0.6842105263157894  |
| -0.625              |
| 0.8333333333333333  |
| 0.5909090909090908  |
+---------------------+
```

Another option is to configure all receiving parameters from the function, as follows:
- `periods`; this parameter will control the offset of the lag periods. Since the default value is 1, this will always return a lag-1 information DataFrame.
- `partition_by`; this parameter will fix the partition column over the DataFrame, e.g. 'bank_segment', 'assurance_product_type'.
- `order_by`; order by parameter will be the specific column to order the sequential observations, e.g. 'balance_date', 'trade_close_date', 'contract_date'.
- `pct_cols`; percentage change col (pct_cols) will be the specific column to lag-over giving back the relative change between one element to other, e.g. ๐‘ฅ๐‘ก รท ๐‘ฅ๐‘ก โˆ’ 1

In this case, we will specify only the `periods` parameter to yield a lag of -2 days over the DataFrame.
```python
dic_to_df.pctChange(periods=2).na.fill(0).show(5, False)

+--------------------+
|Revenue             |
+--------------------+
|0.0                 |
|0.0                 |
|2.0                 |
|-0.44999999999999996|
|-0.3666666666666667 |
+--------------------+
only showing top 5 rows
```

Adding parameters: `partition_by`, `order_by` & `pct_cols`
```python
dic_to_df.pctChange(partition_by="Product", order_by="Year", pct_cols="Revenue").na.fill(0).show(5, False)

+---------------------+
|Revenue              |
+---------------------+
|0.0                  |
|-0.050000000000000044|
|0.1578947368421053   |
|0.0                  |
|0.06666666666666665  |
+---------------------+
only showing top 5 rows
```

You may also lag more than one column at a time by simply adding a list with string column names:
```python
dic_to_df.pctChange(partition_by="Product", order_by="Year", pct_cols=["Year", "Revenue"]).na.fill(0).show(5, False)

+--------------------+---------------------+
|Year                |Revenue              |
+--------------------+---------------------+
|0.0                 |0.0                  |
|4.975124378110429E-4|-0.050000000000000044|
|4.972650422674363E-4|0.1578947368421053   |
|0.0                 |0.0                  |
|4.975124378110429E-4|0.06666666666666665  |
+--------------------+---------------------+
only showing top 5 rows
```
 
### Want to contribute? ๐Ÿค”

Bring it on! If you have an idea or want to ask anything, or there is a bug you want fixed, you may open an [issue ticket](https://github.com/LuisFalva/ophilea/issues). You will find the guidelines to make an issue request there. Also, you can get a glimpse of [Open Source Contribution Guide best practices here](https://opensource.guide/).
Cheers ๐Ÿป!

### Support or Contact ๐Ÿ“ 

Having trouble with Ophilea? Yo can DM me at [falvaluis@gmail.com](https://mail.google.com/mail/u/0/?tab=rm&ogbl#inbox?compose=CllgCJZZQVJHBJKmdjtXgzlrRcRktFLwFQsvWKqcTRtvQTVcHvgTNSxVzjZqjvDFhZlVJlPKqtg) and Iโ€™ll help you sort it out.

### License ๐Ÿ“ƒ

Released under the Apache License, version 2.0.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/LuisFalva/ophelia",
    "name": "ophelia-spark",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.9",
    "maintainer_email": null,
    "keywords": "ophelia-spark",
    "author": "Luis Vargas",
    "author_email": "falvaluis@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/fd/05/cc996b3d4a03db081dbc127f11d1f786ec41f09493e8f88bda3ae04b18ac/ophelia_spark-0.1.3.tar.gz",
    "platform": null,
    "description": "# Welcome to [Ophelia Spark](https://ophelia.readme.io/)\n\n## \ud83d\udcdd Generalized ML Features\n\nOur project focuses on creating robust and efficient PySpark ML and Mllib pipelines, making them easily replicable and secure for various machine learning tasks. Key features include optimized techniques for handling data skewness, user-friendly interfaces for building custom models, and streamlined data mining pipelines with Ophelia spark wrappers. Additionally, it functions as an emulator of NumPy and pandas, offering similar functionalities for a seamless user experience. Below are the detailed features:\n\n- **Building PySpark ML & Mllib Pipelines**: Simplified and secure methods to construct machine learning pipelines using PySpark, ensuring replicability and robustness.\n- **Optimized Techniques for Data Skewness**: Embedded strategies to address and mitigate data skewness issues, improving model performance and accuracy.\n- **Build-Your-Own Models**: User-friendly tools for constructing custom models and data mining pipelines, leveraging the power of PySpark and Ophelia spark wrappers for enhanced flexibility and efficiency.\n- **NumPy and pandas Functionality Emulation**: Emulates the functions and features of NumPy and pandas, making it intuitive and easy for users familiar with these libraries to transition and utilize similar functionalities within PySpark.\n\nThese features aim to empower users with the tools they need to handle complex machine learning tasks effectively, ensuring a seamless experience from data processing to model deployment.\n\n# Getting Started:\n\n### Requirements \ud83d\udcdc\n\nBefore starting, you'll need to have installed pyspark >= 3.0.x, pandas >= 1.1.3, numpy >= 1.19.1, dask >= 2.30.x, scikit-learn >= 0.23.x \n\nAdditionally, if you want to use the Ophelia package, you'll also need Python (supported 3.7 and 3.8 versions) and pip installed.\n\n### Building from the source \ud83d\udee0\ufe0f\n\nJust clone the `Ophelia` repo and import `Ophelia`:\n```sh   \ngit clone https://github.com/LuisFalva/ophelia.git\n```\n\nAfter wiring and clone the `Ophelia` repo go to:\n```sh   \ncd ophelia_spark\n```\n### First time installation \ud83d\udce1\n> For the very first time running and installing Ophelia in your local machine you need to wire with Ophelia's main repo.\nJust run the following script in order to set up correctly:\n\nAnd execute the following `make` instruction:\n```sh   \nmake install\n```\n\n**First Important Note**: You must see a successful message like the one below.\n```sh   \n[Ophelia] Successfully installed ophelia_spark:0.1.0. Have fun! =)\n```\n\n**Second Important Note**: You also can pull Ophelia 0.1.0\n[or make sure version matches with the one you need and configure the env `OPHELIA_DOCKER_VERSION`]\ndocker image and use it as base image for new images.\n```sh   \nmake docker-pull\n```\n\nAlso, you can push new changes to your corresponding version as follows:\n```sh   \nmake docker-build\n```\n\n### Importing and initializing Ophelia \ud83d\udce6\n\nTo initialize `Ophelia` with Spark embedded session use:\n\n```python\n>>> from ophelia_spark.start import Ophelia\n>>> ophelia = Ophelia(\"Spark App Name\")\n>>> sc = ophelia.Spark.build_spark_context()\n\n  ____          _            _  _           _____                      _    \n / __ \\        | |          | |(_)         / ____|                    | |   \n| |  | | _ __  | |__    ___ | | _   __ _  | (___   _ __    __ _  _ __ | | __\n| |  | || '_ \\ | '_ \\  / _ \\| || | / _` |  \\___ \\ | '_ \\  / _` || '__|| |/ /\n| |__| || |_) || | | ||  __/| || || (_| |  ____) || |_) || (_| || |   |   < \n \\____/ | .__/ |_| |_| \\___||_||_| \\__,_| |_____/ | .__/  \\__,_||_|   |_|\\_\\\n        | |                                       | |                       \n        |_|                                       |_|                       \n\n```\nMain class objects provided by initializing Ophelia session:\n\n- `read` & `write`\n\n```python\nfrom ophelia_spark.read.spark_read import Read\nfrom ophelia_spark.write.spark_write import Write\n```\n- `generic` & `functions`\n\n```python\nfrom ophelia_spark.functions import Shape, Rolling, Reshape, CorrMat, CrossTabular, PctChange, Selects, DynamicSampling\nfrom ophelia_spark.generic import (split_date, row_index, lag_min_max_data, regex_expr, remove_duplicate_element,\n                                   year_array, dates_index, sorted_date_list, feature_pick, binary_search,\n                                   century_from_year, simple_average, delta_series, simple_moving_average, average,\n                                   weight_moving_average, single_exp_smooth, double_exp_smooth, initial_seasonal_components,\n                                   triple_exp_smooth, row_indexing, string_match)\n```\n- ML package for `unsupervised`, `sampling` and `feature_miner` objects\n\n```python\nfrom ophelia_spark.ml.sampling.synthetic_sample import SyntheticSample\nfrom ophelia_spark.ml.unsupervised.feature import PCAnalysis, SingularVD\nfrom ophelia_spark.ml.feature_miner import BuildStringIndex, BuildOneHotEncoder, BuildVectorAssembler, BuildStandardScaler, SparkToNumpy, NumpyToVector\n```\n\nLet me show you some application examples:\n\nThe `Read` class implements Spark reading object in multiple formats `{'csv', 'parquet', 'excel', 'json'}`\n\n```python\nfrom ophelia_spark.read.spark_read import Read\nspark_df = spark.readFile(path, 'csv', header=True, infer_schema=True)\n```\n\nAlso, you may import class `Shape` from factory `functions` in order to see the dimension of our spark DataFrame such as numpy style.\n\n```python\nfrom ophelia_spark.functions import Shape\ndic = {\n    'Product': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],\n    'Year': [2010, 2010, 2010, 2011, 2011, 2011, 2012, 2012, 2012],\n    'Revenue': [100, 200, 300, 110, 190, 320, 120, 220, 350]\n}\ndic_to_df = spark.createDataFrame(pd.DataFrame(data=dic))\ndic_to_df.show(10, False)\n\n+---------+------------+-----------+\n| Product |    Year    |  Revenue  |\n+---------+------------+-----------+\n|    A    |    2010    |    100    |\n|    B    |    2010    |    200    |\n|    C    |    2010    |    300    |\n|    A    |    2011    |    110    |\n|    B    |    2011    |    190    |\n|    C    |    2011    |    320    |\n|    A    |    2012    |    120    |\n|    B    |    2012    |    220    |\n|    C    |    2012    |    350    |\n+---------+------------+-----------+\n\ndic_to_df.Shape\n(9, 3)\n```\n\nThe `pct_change` wrapper is added to the Spark `DataFrame` class in order to have the most commonly used method in Pandas\nobjects to get the relative percentage change from one observation to another, sorted by a date-type column and lagged by a numeric-type column.\n\n```python\nfrom ophelia_spark.functions import PctChange\ndic_to_df.pctChange().show(10, False)\n\n+---------------------+\n| Revenue             |\n+---------------------+\n| null                |\n| 1.0                 |\n| 0.5                 |\n| -0.6333333333333333 |\n| 0.7272727272727273  |\n| 0.6842105263157894  |\n| -0.625              |\n| 0.8333333333333333  |\n| 0.5909090909090908  |\n+---------------------+\n```\n\nAnother option is to configure all receiving parameters from the function, as follows:\n- `periods`; this parameter will control the offset of the lag periods. Since the default value is 1, this will always return a lag-1 information DataFrame.\n- `partition_by`; this parameter will fix the partition column over the DataFrame, e.g. 'bank_segment', 'assurance_product_type'.\n- `order_by`; order by parameter will be the specific column to order the sequential observations, e.g. 'balance_date', 'trade_close_date', 'contract_date'.\n- `pct_cols`; percentage change col (pct_cols) will be the specific column to lag-over giving back the relative change between one element to other, e.g. \ud835\udc65\ud835\udc61 \u00f7 \ud835\udc65\ud835\udc61 \u2212 1\n\nIn this case, we will specify only the `periods` parameter to yield a lag of -2 days over the DataFrame.\n```python\ndic_to_df.pctChange(periods=2).na.fill(0).show(5, False)\n\n+--------------------+\n|Revenue             |\n+--------------------+\n|0.0                 |\n|0.0                 |\n|2.0                 |\n|-0.44999999999999996|\n|-0.3666666666666667 |\n+--------------------+\nonly showing top 5 rows\n```\n\nAdding parameters: `partition_by`, `order_by` & `pct_cols`\n```python\ndic_to_df.pctChange(partition_by=\"Product\", order_by=\"Year\", pct_cols=\"Revenue\").na.fill(0).show(5, False)\n\n+---------------------+\n|Revenue              |\n+---------------------+\n|0.0                  |\n|-0.050000000000000044|\n|0.1578947368421053   |\n|0.0                  |\n|0.06666666666666665  |\n+---------------------+\nonly showing top 5 rows\n```\n\nYou may also lag more than one column at a time by simply adding a list with string column names:\n```python\ndic_to_df.pctChange(partition_by=\"Product\", order_by=\"Year\", pct_cols=[\"Year\", \"Revenue\"]).na.fill(0).show(5, False)\n\n+--------------------+---------------------+\n|Year                |Revenue              |\n+--------------------+---------------------+\n|0.0                 |0.0                  |\n|4.975124378110429E-4|-0.050000000000000044|\n|4.972650422674363E-4|0.1578947368421053   |\n|0.0                 |0.0                  |\n|4.975124378110429E-4|0.06666666666666665  |\n+--------------------+---------------------+\nonly showing top 5 rows\n```\n \n### Want to contribute? \ud83e\udd14\n\nBring it on! If you have an idea or want to ask anything, or there is a bug you want fixed, you may open an [issue ticket](https://github.com/LuisFalva/ophilea/issues). You will find the guidelines to make an issue request there. Also, you can get a glimpse of [Open Source Contribution Guide best practices here](https://opensource.guide/).\nCheers \ud83c\udf7b!\n\n### Support or Contact \ud83d\udce0\n\nHaving trouble with Ophilea? Yo can DM me at [falvaluis@gmail.com](https://mail.google.com/mail/u/0/?tab=rm&ogbl#inbox?compose=CllgCJZZQVJHBJKmdjtXgzlrRcRktFLwFQsvWKqcTRtvQTVcHvgTNSxVzjZqjvDFhZlVJlPKqtg) and I\u2019ll help you sort it out.\n\n### License \ud83d\udcc3\n\nReleased under the Apache License, version 2.0.\n",
    "bugtrack_url": null,
    "license": "Free for non-commercial use",
    "summary": "Ophelia is a spark miner AI engine that builds data mining & ml pipelines with PySpark.",
    "version": "0.1.3",
    "project_urls": {
        "Documentation": "https://github.com/LuisFalva/ophelia",
        "Homepage": "https://github.com/LuisFalva/ophelia",
        "Repository": "https://github.com/LuisFalva/ophelia"
    },
    "split_keywords": [
        "ophelia-spark"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5bb4b45b0d0da62e8905c27221db7d0b549b778a3a815d705d7eac7eeff0fb33",
                "md5": "cb400291d039a7bb5e0706005167da03",
                "sha256": "cd45dace5d174d33cfc0454381cea73b0b0d1c6919de5d503a81aacda278d382"
            },
            "downloads": -1,
            "filename": "ophelia_spark-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cb400291d039a7bb5e0706005167da03",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.9",
            "size": 48941,
            "upload_time": "2024-07-17T21:54:27",
            "upload_time_iso_8601": "2024-07-17T21:54:27.873179Z",
            "url": "https://files.pythonhosted.org/packages/5b/b4/b45b0d0da62e8905c27221db7d0b549b778a3a815d705d7eac7eeff0fb33/ophelia_spark-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fd05cc996b3d4a03db081dbc127f11d1f786ec41f09493e8f88bda3ae04b18ac",
                "md5": "41a62dc9bfd5e708015c676249415322",
                "sha256": "ff291b747e99d5422f17350358444bf4f3b42a56d815bb74f007a0caa58101de"
            },
            "downloads": -1,
            "filename": "ophelia_spark-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "41a62dc9bfd5e708015c676249415322",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.9",
            "size": 45689,
            "upload_time": "2024-07-17T21:54:29",
            "upload_time_iso_8601": "2024-07-17T21:54:29.383293Z",
            "url": "https://files.pythonhosted.org/packages/fd/05/cc996b3d4a03db081dbc127f11d1f786ec41f09493e8f88bda3ae04b18ac/ophelia_spark-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-17 21:54:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LuisFalva",
    "github_project": "ophelia",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "cloudpickle",
            "specs": [
                [
                    "==",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "colorama",
            "specs": [
                [
                    "==",
                    "0.4.6"
                ]
            ]
        },
        {
            "name": "dask-expr",
            "specs": [
                [
                    "==",
                    "1.1.7"
                ]
            ]
        },
        {
            "name": "dask",
            "specs": [
                [
                    "==",
                    "2024.7.0"
                ]
            ]
        },
        {
            "name": "dask",
            "specs": [
                [
                    "==",
                    "2024.7.0"
                ]
            ]
        },
        {
            "name": "fsspec",
            "specs": [
                [
                    "==",
                    "2024.6.1"
                ]
            ]
        },
        {
            "name": "importlib-metadata",
            "specs": [
                [
                    "==",
                    "8.0.0"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    "==",
                    "1.4.2"
                ]
            ]
        },
        {
            "name": "locket",
            "specs": [
                [
                    "==",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "24.1"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "partd",
            "specs": [
                [
                    "==",
                    "1.4.2"
                ]
            ]
        },
        {
            "name": "py4j",
            "specs": [
                [
                    "==",
                    "0.10.9.5"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": [
                [
                    "==",
                    "16.0.0"
                ]
            ]
        },
        {
            "name": "pyhocon",
            "specs": [
                [
                    "==",
                    "0.3.45"
                ]
            ]
        },
        {
            "name": "pyparsing",
            "specs": [
                [
                    "==",
                    "3.1.2"
                ]
            ]
        },
        {
            "name": "pyspark",
            "specs": [
                [
                    "==",
                    "3.2.2"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    "==",
                    "6.0.1"
                ]
            ]
        },
        {
            "name": "quadprog",
            "specs": [
                [
                    "==",
                    "0.1.12"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.5.1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.13.1"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "threadpoolctl",
            "specs": [
                [
                    "==",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "toolz",
            "specs": [
                [
                    "==",
                    "0.12.1"
                ]
            ]
        },
        {
            "name": "tzdata",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "zipp",
            "specs": [
                [
                    "==",
                    "3.19.2"
                ]
            ]
        }
    ],
    "lcname": "ophelia-spark"
}
        
Elapsed time: 0.29616s