pandas-nql


Namepandas-nql JSON
Version 1.1.0 PyPI version JSON
download
home_pagehttps://github.com/jason983/pandas_nql
SummaryPandas_nql is an open source Python library that enables natural language queries on Pandas Dataframes using the latest advances in generative AI. Inspired by OpenAI's groundbreaking language models, pandas_nql allows users to analyze data in a more intuitive way - by simply asking questions in plain English instead of writing complex code.
upload_time2023-12-14 22:55:53
maintainer
docs_urlNone
authorJason Beechum
requires_python
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Pandas Natural Language Query (NQL) Library

Pandas_nql is an open source Python library that enables natural language queries on Pandas Dataframes using the latest advances in generative AI. Inspired by OpenAI's groundbreaking language models, pandas_nql allows users to analyze data in a more intuitive way - by simply asking questions in plain English instead of writing complex code.



This library is perfect for data scientists, analysts, and developers looking to enhance their data analysis workflows. By leveraging the power of GPT and other language models behind the scenes, pandas_nql can understand complex data questions and automatically translate them into sql statements to extract insights from data.



Whether you're a Python expert looking to save time or someone new to data analysis, pandas_nql makes exploring datasets more accessible. It's as simple as pip installing the library and typing a query like "show me average monthly sales by region." You'll feel like you have a personal AI-powered data analyst at your fingertips!



Some key features:



- Query Dataframes in plain English without writing code

- Understands complex questions and data relationships

- Automatically translates questions to SQL statements

- Open source library for community involvement



Bring natural language queries to your data analysis today with the power of pandas_nql!



### Disclaimers

- So you aware, your data is never sent to the language model for query creation, however, the schema of the data is sent and used. 

- Typical AI warning - AI can make mistakes. Consider checking important information.



### Installation

```

pip install pandas_nql

```



### Prerequisites

- OPENAI_API_KEY Environment variable must be set with a valid OpenAI Api Key

- Python3.9+



### Get started

How to select data from a Pandas dataframe using natural language:



```Python

import pandas as pd

from pands_nql import PandasNQL



# load Dataframe

data = {

    'Name': ['John', 'Jane', 'Bob', 'Jason', 'Mike'],

    'Age': [25, 30, 22, 47, 46],

    'City': ['New York', 'San Francisco', 'Seattle', 'Denver', 'Denver']

}



df = pd.DataFrame(data)



# Instantiate PandasNQL object passing in data to query

pandas_nql = PandasNQL(df)



# Call the query method to select data

results_df = pandas_nql.query("Find the number of people in each City.")



print(results_df)



# ...



```



### Sql Generators

Generates the sql statement used to query the data. The statemens isgenerated using the schema of the data and the given natural language query. 

The default sql statement generator is OpenAI. You can override the default with another generator, such as the Hugging Face T5 sql statement generator, or create your own. 

While the Hugging Face T5 generator is not not as accurate as OpenAI, it is free.



#### Use the T5 sql statement generator

```Python

import pandas as pd

from pands_nql import PandasNQL, T5SqlGenerator



# load Dataframe

df = ...



# instantiate custom generator

t5_sql_generator = T5SqlGenerator(...)



# Instantiate PandasNQL with data

pandas_nql = PandasNQL(df, generator=t5_sql_generator)



# Call the query method to select data

results_df = pandas_nql.query("Find the number of people in each City.")



print(results_df)

```



#### Write your own sql statement generator



```Python

from generators import SqlStatementGeneratorBase



# define custom sql stement generator class

class CustomSqlGenerator(SqlGeneratorBase):

    

    def __init__(self, ...):

        super().__init__()        



    # override generate_sql method

    def generate_sql(self, query: str, schema: str, dataset_name: str = TEMP_VIEW_NAME) -> str:

        # generate sql statement

        # return sql statement

```



#### Update __init__.py

from pandas_nql.custom_sql_generator import CustomSqlGenerator



#### Use the custom sql statement generator



```Python

import pandas as pd

from pands_nql import PandasNQL, CustomSqlGenerator



# load Dataframe

df = ...



# instantiate custom generator

custom_generator = CustomSqlGenerator(...)



# Instantiate PandasNQL with data

pandas_nql = PandasNQL(df, generator=custom_generator)



# Call the query method to select data

results_df = pandas_nql.query("Find the number of people in each City.")



print(results_df)

```



### Schema String Builders

Schema String Builders build strings representing the schema of a provided Pandas DataFrame. The default builder is the Sql Schema String Builder. This builder returns the schema string in the format: column_name sql_data_type (i.e.: City varchar(255)). You can override the default with another builder, such as the Pandas schema string builder, or create your own. The Pandas builder uses the format: coulmn_name: pandas_data_type (i.e.: City: object)



#### Write your own custom schema string builder.

```Python

from pandas_nql import SchemaStringBuilderBase



# define custom sql stement generator class

class CustomSchemaStringBuilderBase(SchemaStringBuilderBase):

    

    def __init__(self, ...):

        super().__init__()        



    # override generate_sql method

    def build_schema_string(self, dtypes: pd.Series) -> str:

        # build schema string

        # return schema string

```



#### Update __init__.py

from pandas_nql.custom_schema_string_builder import CustomStringBuilder



#### Use the custom schema string builder



```Python

import pandas as pd

from pands_nql import PandasNQL, CustomSqlGenerator, CustomSchemaStringBuilder



# load Dataframe

df = ...



# instantiate custom generator

custom_generator = CustomSqlGenerator(...)

custom_schema_builder = CustomSchemaStringBuilder(...)



# Instantiate PandasNQL with data

pandas_nql = PandasNQL(df, 

                        generator=custom_generator, 

                        schema_builder=custom_schema_builder)



# Call the query method to select data

results_df = pandas_nql.query("Find the number of people in each City.")



print(results_df)

```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jason983/pandas_nql",
    "name": "pandas-nql",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Jason Beechum",
    "author_email": "jasonbeechum@yahoo.com",
    "download_url": "https://files.pythonhosted.org/packages/f1/d4/fcd74c7578e8eb5296339481448ee826dd3c1f30669d6e18cfc8bea4e6a4/pandas_nql-1.1.0.tar.gz",
    "platform": null,
    "description": "# Pandas Natural Language Query (NQL) Library\r\n\r\nPandas_nql is an open source Python library that enables natural language queries on Pandas Dataframes using the latest advances in generative AI. Inspired by OpenAI's groundbreaking language models, pandas_nql allows users to analyze data in a more intuitive way - by simply asking questions in plain English instead of writing complex code.\r\n\r\n\r\n\r\nThis library is perfect for data scientists, analysts, and developers looking to enhance their data analysis workflows. By leveraging the power of GPT and other language models behind the scenes, pandas_nql can understand complex data questions and automatically translate them into sql statements to extract insights from data.\r\n\r\n\r\n\r\nWhether you're a Python expert looking to save time or someone new to data analysis, pandas_nql makes exploring datasets more accessible. It's as simple as pip installing the library and typing a query like \"show me average monthly sales by region.\" You'll feel like you have a personal AI-powered data analyst at your fingertips!\r\n\r\n\r\n\r\nSome key features:\r\n\r\n\r\n\r\n- Query Dataframes in plain English without writing code\r\n\r\n- Understands complex questions and data relationships\r\n\r\n- Automatically translates questions to SQL statements\r\n\r\n- Open source library for community involvement\r\n\r\n\r\n\r\nBring natural language queries to your data analysis today with the power of pandas_nql!\r\n\r\n\r\n\r\n### Disclaimers\r\n\r\n- So you aware, your data is never sent to the language model for query creation, however, the schema of the data is sent and used. \r\n\r\n- Typical AI warning - AI can make mistakes. Consider checking important information.\r\n\r\n\r\n\r\n### Installation\r\n\r\n```\r\n\r\npip install pandas_nql\r\n\r\n```\r\n\r\n\r\n\r\n### Prerequisites\r\n\r\n- OPENAI_API_KEY Environment variable must be set with a valid OpenAI Api Key\r\n\r\n- Python3.9+\r\n\r\n\r\n\r\n### Get started\r\n\r\nHow to select data from a Pandas dataframe using natural language:\r\n\r\n\r\n\r\n```Python\r\n\r\nimport pandas as pd\r\n\r\nfrom pands_nql import PandasNQL\r\n\r\n\r\n\r\n# load Dataframe\r\n\r\ndata = {\r\n\r\n    'Name': ['John', 'Jane', 'Bob', 'Jason', 'Mike'],\r\n\r\n    'Age': [25, 30, 22, 47, 46],\r\n\r\n    'City': ['New York', 'San Francisco', 'Seattle', 'Denver', 'Denver']\r\n\r\n}\r\n\r\n\r\n\r\ndf = pd.DataFrame(data)\r\n\r\n\r\n\r\n# Instantiate PandasNQL object passing in data to query\r\n\r\npandas_nql = PandasNQL(df)\r\n\r\n\r\n\r\n# Call the query method to select data\r\n\r\nresults_df = pandas_nql.query(\"Find the number of people in each City.\")\r\n\r\n\r\n\r\nprint(results_df)\r\n\r\n\r\n\r\n# ...\r\n\r\n\r\n\r\n```\r\n\r\n\r\n\r\n### Sql Generators\r\n\r\nGenerates the sql statement used to query the data. The statemens isgenerated using the schema of the data and the given natural language query. \r\n\r\nThe default sql statement generator is OpenAI. You can override the default with another generator, such as the Hugging Face T5 sql statement generator, or create your own. \r\n\r\nWhile the Hugging Face T5 generator is not not as accurate as OpenAI, it is free.\r\n\r\n\r\n\r\n#### Use the T5 sql statement generator\r\n\r\n```Python\r\n\r\nimport pandas as pd\r\n\r\nfrom pands_nql import PandasNQL, T5SqlGenerator\r\n\r\n\r\n\r\n# load Dataframe\r\n\r\ndf = ...\r\n\r\n\r\n\r\n# instantiate custom generator\r\n\r\nt5_sql_generator = T5SqlGenerator(...)\r\n\r\n\r\n\r\n# Instantiate PandasNQL with data\r\n\r\npandas_nql = PandasNQL(df, generator=t5_sql_generator)\r\n\r\n\r\n\r\n# Call the query method to select data\r\n\r\nresults_df = pandas_nql.query(\"Find the number of people in each City.\")\r\n\r\n\r\n\r\nprint(results_df)\r\n\r\n```\r\n\r\n\r\n\r\n#### Write your own sql statement generator\r\n\r\n\r\n\r\n```Python\r\n\r\nfrom generators import SqlStatementGeneratorBase\r\n\r\n\r\n\r\n# define custom sql stement generator class\r\n\r\nclass CustomSqlGenerator(SqlGeneratorBase):\r\n\r\n    \r\n\r\n    def __init__(self, ...):\r\n\r\n        super().__init__()        \r\n\r\n\r\n\r\n    # override generate_sql method\r\n\r\n    def generate_sql(self, query: str, schema: str, dataset_name: str = TEMP_VIEW_NAME) -> str:\r\n\r\n        # generate sql statement\r\n\r\n        # return sql statement\r\n\r\n```\r\n\r\n\r\n\r\n#### Update __init__.py\r\n\r\nfrom pandas_nql.custom_sql_generator import CustomSqlGenerator\r\n\r\n\r\n\r\n#### Use the custom sql statement generator\r\n\r\n\r\n\r\n```Python\r\n\r\nimport pandas as pd\r\n\r\nfrom pands_nql import PandasNQL, CustomSqlGenerator\r\n\r\n\r\n\r\n# load Dataframe\r\n\r\ndf = ...\r\n\r\n\r\n\r\n# instantiate custom generator\r\n\r\ncustom_generator = CustomSqlGenerator(...)\r\n\r\n\r\n\r\n# Instantiate PandasNQL with data\r\n\r\npandas_nql = PandasNQL(df, generator=custom_generator)\r\n\r\n\r\n\r\n# Call the query method to select data\r\n\r\nresults_df = pandas_nql.query(\"Find the number of people in each City.\")\r\n\r\n\r\n\r\nprint(results_df)\r\n\r\n```\r\n\r\n\r\n\r\n### Schema String Builders\r\n\r\nSchema String Builders build strings representing the schema of a provided Pandas DataFrame. The default builder is the Sql Schema String Builder. This builder returns the schema string in the format: column_name sql_data_type (i.e.: City varchar(255)). You can override the default with another builder, such as the Pandas schema string builder, or create your own. The Pandas builder uses the format: coulmn_name: pandas_data_type (i.e.: City: object)\r\n\r\n\r\n\r\n#### Write your own custom schema string builder.\r\n\r\n```Python\r\n\r\nfrom pandas_nql import SchemaStringBuilderBase\r\n\r\n\r\n\r\n# define custom sql stement generator class\r\n\r\nclass CustomSchemaStringBuilderBase(SchemaStringBuilderBase):\r\n\r\n    \r\n\r\n    def __init__(self, ...):\r\n\r\n        super().__init__()        \r\n\r\n\r\n\r\n    # override generate_sql method\r\n\r\n    def build_schema_string(self, dtypes: pd.Series) -> str:\r\n\r\n        # build schema string\r\n\r\n        # return schema string\r\n\r\n```\r\n\r\n\r\n\r\n#### Update __init__.py\r\n\r\nfrom pandas_nql.custom_schema_string_builder import CustomStringBuilder\r\n\r\n\r\n\r\n#### Use the custom schema string builder\r\n\r\n\r\n\r\n```Python\r\n\r\nimport pandas as pd\r\n\r\nfrom pands_nql import PandasNQL, CustomSqlGenerator, CustomSchemaStringBuilder\r\n\r\n\r\n\r\n# load Dataframe\r\n\r\ndf = ...\r\n\r\n\r\n\r\n# instantiate custom generator\r\n\r\ncustom_generator = CustomSqlGenerator(...)\r\n\r\ncustom_schema_builder = CustomSchemaStringBuilder(...)\r\n\r\n\r\n\r\n# Instantiate PandasNQL with data\r\n\r\npandas_nql = PandasNQL(df, \r\n\r\n                        generator=custom_generator, \r\n\r\n                        schema_builder=custom_schema_builder)\r\n\r\n\r\n\r\n# Call the query method to select data\r\n\r\nresults_df = pandas_nql.query(\"Find the number of people in each City.\")\r\n\r\n\r\n\r\nprint(results_df)\r\n\r\n```\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pandas_nql is an open source Python library that enables natural language queries on Pandas Dataframes using the latest advances in generative AI. Inspired by OpenAI's groundbreaking language models, pandas_nql allows users to analyze data in a more intuitive way - by simply asking questions in plain English instead of writing complex code.",
    "version": "1.1.0",
    "project_urls": {
        "Homepage": "https://github.com/jason983/pandas_nql"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e9f4c57ddf5b33e0cab8f0a903e6e6230b79fb3ebfa045f71ddcf59988b7cd1a",
                "md5": "4f95d78254cd2fd91044cf84500c840f",
                "sha256": "05fce82a302925a80125be44952c96c4cb8b51a3ddf9dbd0b51f3a3285cb21e4"
            },
            "downloads": -1,
            "filename": "pandas_nql-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4f95d78254cd2fd91044cf84500c840f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 11498,
            "upload_time": "2023-12-14T22:55:51",
            "upload_time_iso_8601": "2023-12-14T22:55:51.631082Z",
            "url": "https://files.pythonhosted.org/packages/e9/f4/c57ddf5b33e0cab8f0a903e6e6230b79fb3ebfa045f71ddcf59988b7cd1a/pandas_nql-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f1d4fcd74c7578e8eb5296339481448ee826dd3c1f30669d6e18cfc8bea4e6a4",
                "md5": "fa4f9633f65049e5218216002a932dff",
                "sha256": "9c9dd08c8a879646c038b846810e75ab9cf73bd943e416c1f0131513c2a92c28"
            },
            "downloads": -1,
            "filename": "pandas_nql-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "fa4f9633f65049e5218216002a932dff",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 9922,
            "upload_time": "2023-12-14T22:55:53",
            "upload_time_iso_8601": "2023-12-14T22:55:53.170500Z",
            "url": "https://files.pythonhosted.org/packages/f1/d4/fcd74c7578e8eb5296339481448ee826dd3c1f30669d6e18cfc8bea4e6a4/pandas_nql-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-14 22:55:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jason983",
    "github_project": "pandas_nql",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "pandas-nql"
}
        
Elapsed time: 0.25155s