# NORTH GRAVITY PYTHON SDK #
This document describes the North Gravity Python SDK which enables external users to use the North Gravity platform tools within their Python scripts.
The Python SDK can be used within:
- a single Python script that is ran thanks to the Python Runner task within a pipeline in the North Gravity application
- a single Jupyter Notebook that is ran thanks to the Jupyter Runner task within a pipeline in the North Gravity application
- an ensemble of Python scripts that are part of a container, for a Task created by the user, used in a pipeline in the North Gravity application
**Examples of usage can be found at the bottom of the documentation.**
Note that the SDK does not cover everything from API documentation but rather commonly used features.
The scope of the SDK:
- **Datalake Handler** - downloading / uploading / reading files from the data lake
- **Status Handler** - sending statuses about the task run
- **Task Handler** - enables communication between tasks within a pipeline and reading/writing parameters
- **Time Series Handler** - retrieves data directly from the time series database
# How to install and set the package:
## Install
```text
pip3 install northgravity==0.1.14
```
As the library is available from pip, it can be installed as a specific version within a Python Task from within requirements.txt just by adding:
```text
northgravity==0.1.14
```
The package relies on the requests library so, in the project, the user must install this library in the requirements.txt file.
```text
pip3 install requests
```
## Environment Variables
The package uses information from the environment variables. They are automatically provided when running a script within a pipeline (as a Task or within the Python/Jupyter Runners).
If running locally the script, users must set them in the project to be able to run the project locally.
Mandatory environment variables to set:
- LOGIN → login received from North Gravity
- PASSWORD → password to log in. Credentials are used to generate the token so that each request is authenticated.
- NG_API_ENDPOINT → the URL to the North Gravity platform API (by default, the url is set to https://api.northgravity.com)
This allows to pass the authentication process and directs users' requests to the North Gravity environment API.
Alternatively, instead of the LOGIN and PASSWORD, you may set NG_API_KEY environment variable with an API key generated within the North Gravity platform to enable the authentication process.
Other variables may be useful when creating the tasks within the platform:
- NG_STATUS_GROUP_NAME → the group on the data lake where the pipeline is located, and is used to display the statuses
- JOBID → any value; when the pipeline is executed, this value is set by the North Gravity platform
- PIPELINE_ID → any value; when the pipeline is created, this value is set by the North Gravity platform
--- -
# Logger
Please note that the default logging level for this package is `WARNING`. For more verbosity this can be changed to debug or info in following way:
```python
import logging
import northgravity as ng
logger = logging.getLogger('NG_SDK')
logger.setLevel(logging.DEBUG)
```
---
# Datalake Handler
## How to download or read a file from data lake by its name ?
The DatalakeHandler class can be used as follow within a script to download or upload a file:
```python
import northgravity as ng
import pandas as pd
# Instantiate the Datalake Handler
dh = ng.DatalakeHandler()
# download file from data lake with name and group name
# it will be saved locally with name local_name.csv
dh.download_by_name(file_name='my_file.csv',
group_name='My Group',
file_type='SOURCE',
dest_file_name='folder/local_name.csv',
save=True,
unzip=False)
# OR read file from data lake with name and group name
# it returns a BytesIO object (kept in the RAM, not saved in the disk)
fileIO = dh.download_by_name(file_name='my_file.csv',
group_name='My Group',
file_type='SOURCE',
dest_file_name=None,
save=False,
unzip=False)
# read the object as pandas DataFrame
df = pd.read_csv(fileIO)
```
The download methods allows to either:
- download and save locally the wanted file, if *save=True*
- read the file directly from the datalake and get a BytesIO object (kept in memory only, that can for example be read by pandas as a dataframe directly)
Note that by default:
- the file is NOT saved locally, but returned as a BytesIO object (streamed from the datalake).
- the argument *dest_file_name=None*, which will save the downloaded file to the root folder with its original name.
## How to download or read a file from data lake by its ID ?
In the case that the file ID is known, it can be directly downloaded/read as follow:
```python
import northgravity as ng
import pandas as pd
# Instantiate the Datalake Handler
dh = ng.DatalakeHandler()
# download file from data lake by its ID
# it will be saved locally with name local_name.csv
dh.download_by_id(file_id='XXXX-XXXX',
dest_file_name='folder/local_name.csv',
save=True,
unzip=False)
# read file from data lake by its ID
# it returns a BytesIO object
fileIO = dh.download_by_id(file_id='XXXX-XXXX',
dest_file_name=None,
save=False,
unzip=False)
# read the object as pandas DataFrame
df = pd.read_csv(fileIO)
```
The download methods allows to either:
- download and save locally the wanted file, if *save=True*
- read the file directly from the datalake and get a BytesIO object (kept in memory only, that can for example be read by pandas as a dataframe directly)
Note that by default:
- the file is NOT saved locally, but returned as a BytesIO object (streamed from the datalake).
- the argument *dest_file_name=None*, which will save the downloaded file to the root folder with its original name.
## How to upload a file to the data lake?
The uploading method will upload to the given group the file at the specified path, and returns its ID on the lake:
```python
import northgravity as ng
# Instantiate the Datalake Handler
dh = ng.DatalakeHandler()
# upload file to data lake
file_id = dh.upload_file(file='path/local_name.csv',
group_name='My Group',
file_upload_name='name_in_the_datalake.csv')
```
It is possible as well to stream a python object's content directly to the datalake from memory, without having to save the file on the disk.
The prerequisite is to pass to the uploading method a BytesIO object (not other objects such as pandas Dataframe).
```python
import northgravity as ng
import io
# Instantiate the Datalake Handler
dh = ng.DatalakeHandler()
# Turn the pandas DataFrame (df) to BytesIO for streaming
fileIO = io.BytesIO(df.to_csv().encode())
# upload file to data lake
file_id = dh.upload_file(file=fileIO,
group_name='My Group',
file_upload_name='name_in_the_datalake.csv')
```
--- -
# Timeseries Queries
## How to get the list of existing symbols for a given group ?
Data saved in the time series database is structured by group, keys and timestamp.
Each set of keys has unique dates entries in the database, with corresponding columns values.
To explore what are the available symbols for a given group, the following method can be used:
```python
import northgravity as ng
# Instantiate Timeseries class
ts = ng.Timeseries()
# Get the list of symbols for given group
group = 'My Group'
group_symbols = ts.get_symbols(group_name=group)
```
The default size of the returned list is 1 000 items.
Note that the return object from the get_symbols() method is a JSON (python dict) where the keys and columns are accessible in the *items* key of the JSON.
## How to query by metadata or descriptions?
In order to find the symbols querying by the metadata, column or symbol names, search_for parameter may be used.
It will look for passed string in the whole time series database and return the JSON with keys and columns where searched string appears.
```python
import northgravity as ng
# Instantiate Timeseries class
ts = ng.Timeseries()
# Get the list of symbols for given group
search = 'Data description'
searched_symbols = ts.get_symbols(search_for=search)
```
Passing both group_name and search_for parameters of the get_symbols() method allows to narrow down the results from selected group.
The user must provide either group_name or search_for to the method in order to obtain the symbols.
If trying to get the list of symbols from a group that contains more than 1 000 items, the results will be paginated (by default into chunks of 1 000 items).
To navigate in the large results the *get_symbols()* method takes as extra arguments the size of the returned list and the from page:
```python
import northgravity as ng
# Instantiate Timeseries class
ts = ng.Timeseries()
# Get the list of symbols for given group
group = 'My Group'
# Get the results into chunks of 200 items for page 5
# meaning the results from the 1000th till the 1200th
group_symbols = ts.get_symbols(group_name=group, _size=200, _from=5)
```
By default, these parameters are _size=2000 (the maximum limit for the items lengths) and _from=0.
## How to read data from Timeseries database?
It is possible to use the SDK to directly query the TimeSeries database for data, given the symbol's keys, columns and the datalake group it is stored on.
On the application, it is similar of creating a Dataprep instance, that selects a set of symbols from groups into a basket.
The retrieved data can be:
- streamed directly to memory, retrieved as a BytesIO object, by setting *file_name* as None (default value),
- saved as a csv file locally with the provided path and name as *file_name*.
The symbols are the keys the data was saved with in the database. For a given symbol, all the keys must be passed, as a dictionary object with the key name and value.
It is possible to use a wildcard for the symbols values, to have all the values for that key, using *.
The wanted columns are then passed as a list that can contain one or more items.
If an empty list [ ] is passed to the function, it returns all the available columns.
To read all available data for specific symbols and columns with no time frame, no start or end date are passed to the method.
Extra settings are as well available to query data:
- Metadata: to either return it in the query of not
- Format: either get a Normalized CSV (NCSV) or a dataframe format
- Timezone: get the timestamps in the timezone of the user's account or get the data in a specific timezone
- Corrections: how to handle corrections to the TimeSeries database (corrections set to 'yes', 'no', 'history' or 'only')
- Delta: whether to show by datatimestamp file (delta=False) or insert time (delta=True)
The following code shows an example of how to query the TimseSeries database: :
```python
import northgravity as ng
import pandas as pd
# Instantiate Timeseries class
ts = ng.Timeseries()
# Symbols to query from database
symbols = {'Key1': "Val1", "Key2": "Val2"}
columns = ['Open', 'Close']
# retrieve all available data from group acccording to keys & columns
# and save as query.csv in test/
ts.retrieve_data_as_csv(file_name='test/query.csv',
symbols=symbols,
columns=columns,
group_name='My Group'
)
# The retrieved data can be read as a pandas dataframe
df = pd.read_csv("test/query.csv")
# retrieve all available data from group acccording to keys & columns
# and stream to memory as BytesIO object
fileIO = ts.retrieve_data_as_csv(file_name=None,
symbols=symbols,
columns=columns,
group_name='My Group'
)
# read as pandas dataframe
df = pd.read_csv(fileIO)
```
## How to read data from Timeseries for specific dates?
To retrieve data the data within specific time frame, user can specify the start and end date.
There are two options how the start and end date may look like:
- only date (e.g., 2021-01-04)
- date and time (e.g., 2021-02-01T12:00:00; ISO format must be followed)
For example, if user specified start_date=2021-02-01 and end_date=2021-02-06, then data will be retrieved like this: from 2021-02-01 00:00:00 till 2021-02-06 23:59:59.
If date and time is specified then data will be retrieved exactly for the specified time frame.
Note that ISO format must be followed: YYYY-MM-DD**T**HH:mm:ss. Pay attention to the "T" letter between date and time.
```python
import northgravity as ng
import pandas as pd
# Instantiate Timeseries class
ts = ng.Timeseries()
# Symbols to query from database
symbols = {'Key1': "Val1", "Key2": "Val2"}
columns = 'Open'
# retrieve data between start_date and end_date
# data will be retrieved between 2021-01-04 00:00:00 and 2021-02-05 23:59:59
# saved as a csv file named test.csv
ts.retrieve_data_as_csv(file_name='test/test.csv',
symbols=symbols,
columns=columns,
group_name='My Group',
start_date='2021-01-04',
end_date='2021-02-05'
)
# retrieve data for specific time frame
# from 2021-01-04 12:30:00
# to 2021-02-05 09:15:00
ts.retrieve_data_as_csv(file_name='test/test.csv',
symbols=symbols,
columns=columns,
group_name='My Group',
start_date='2021-01-04T12:30:00',
end_date='2021-02-05T09:15:00'
)
# For given keys, columns, group and dates range
# Streaming instead of saving
fileIO = ts.retrieve_data_as_csv(file_name=None,
symbols=symbols,
columns=columns,
group_name='My Group',
start_date='2021-01-04',
end_date='2021-02-05'
)
# read as pandas dataframe
df = pd.read_csv(fileIO)
```
## How to use a wildcard for a key's values ?
To get all the value for one a several keys for the query, the character * can be used as a wildcard.
The argument *allow_wildcard* should be set to True in the retrieval function to enable the use of wildcard.
Please note that by default, the use of wildcards is **DISABLED**.
```python
import northgravity as ng
import pandas as pd
# Instantiate Timeseries class
ts = ng.Timeseries()
# Symbols to query from database
symbols = {'Key1': "Val1", "Key2": "Val2", "Key3": "*"}
columns = ['Open', 'Close']
# retrieve all history for the symbols with keys and columns
# all values for Key3 will be returned
# the data will be streamed to memory
fileIO = ts.retrieve_data_as_csv(file_name=None,
symbols=symbols,
columns=columns,
group_name='My Group',
allow_wildcard=True
)
# read as pandas dataframe
df = pd.read_csv(fileIO)
```
## How to get all the columns for a given set of keys ?
To get all the columns values for a given set of keys in the database, the query can take an empty list as the queried columns, as follow:
```python
import northgravity as ng
import pandas as pd
# Instantiate Timeseries class
ts = ng.Timeseries()
# Symbols to query from database
symbols = {'Key1': "Val1", "Key2": "Val2", "Key3": "Val3"}
columns = []
# retrieve all history for the symbols with keys and columns
# all columns for the set of keys will be returned
# the data will be streamed to memory
fileIO = ts.retrieve_data_as_csv(file_name=None,
symbols=symbols,
columns=columns,
group_name='My Group'
)
# read as pandas dataframe
df = pd.read_csv(fileIO)
```
Note that this configuration can be used with keys wildcards (with *allow_wildacrd=True*) and any other setting.
## How to modify the Time Zone of the data ?
The timestamps in the queried time series are set by default in the timezone of the user's account, who created the script or the pipeline.
It is described in the Date column header between brackets (for example *Date(UTC)*)
To modify the time zone in the retrieved dataset, the timezone can be passed directly to the retrieval function as follow.
It must respect the Continent/City format.
```python
import northgravity as ng
import pandas as pd
# Instantiate Timeseries class
ts = ng.Timeseries()
# Symbols to query from database
symbols = {'Key1': "Val1", "Key2": "Val2"}
columns = ['Open', 'Close']
# retrieve all available data from group according to keys & columns
# and stream to memory as BytesIO object
fileIO = ts.retrieve_data_as_csv(file_name=None,
symbols=symbols,
columns=columns,
group_name='My Group',
timezone='Europe/London'
)
# read as pandas dataframe
df = pd.read_csv(fileIO)
```
## How to get the metadata along with the data ?
It is possible to get extra columns in the retrieved data, along with the keys & columns values, containing the metadata of the symbols.
It is done by setting the argument *metadata=True* in the retrieval function.
By default, no metadata is included in the queried data.
```python
import northgravity as ng
import pandas as pd
# Instantiate Timeseries class
ts = ng.Timeseries()
# Symbols to query from database
symbols = {'Key1': "Val1", "Key2": "Val2"}
columns = ['Open', 'Close']
# retrieve all available data from group according to keys & columns
# and stream to memory as BytesIO object
fileIO = ts.retrieve_data_as_csv(file_name=None,
symbols=symbols,
columns=columns,
group_name='My Group',
metadata=True
)
# read as pandas dataframe
df = pd.read_csv(fileIO)
```
## How to modify the format of the received data ?
The queried data comes by default as Normalized CSV format (NCSV), with in this order:
* the keys columns,
* the date column, with timestamps in either the default timezone or the specified one (*timezone* argument in the function),
* the values columns,
* the metadata columns, if wanted (*metadata=True*)
By setting *NCSV=False* in the retrieval method, the data will be returned as Dataframe format (PANDAS in API docs), as a JSON.
The JSON (python dict) has timestamps as keys and a dictionary containing pairs of symbols_columns and their value.
```python
import northgravity as ng
import pandas as pd
# Instantiate Timeseries class
ts = ng.Timeseries()
# Symbols to query from database
symbols = {'Key1': "Val1", "Key2": "Val2"}
columns = ['Open', 'Close']
# retrieve all available data from group according to keys & columns
# and stream to memory as BytesIO object
file_json = ts.retrieve_data_as_csv(file_name=None,
symbols=symbols,
columns=columns,
group_name='My Group',
metadata=True,
NCSV=False
)
# read as pandas dataframe
# Transpose to have datetime as rows index
df = pd.DataFrame(file_json).T
```
Note that the dataframe, created from the JSON containing the data, is then transposed to have timestamps as DatetimeIndex (along rows axis).
--- -
# Task Handler
Users can extend the existing set of tasks on North Gravity platform by executing scripts or notebooks respectively from the Python Runner Task or the Jupyter Runner Task.
This task then can be used in a pipeline and be able to communicate with other tasks by:
- reading outputs from other tasks, as inputs
- writing outputs, that can be used by others tasks as inputs
A task can as well receive inputs directly as a file picked from the datalake, either for a specific file either for the newest available version of a file on the lake, given its name and group.
Whithin a python script, run into either one of the **Python Runner Task**, this is implemented as follow:
## Read a Task Input
The input file passed to the Task can be:
- either downloaded to the disk,
- either to be read on the fly (useful when limited space on the disk but not in memory).
```python
import northgravity as ng
import pandas as pd
# Instantiate TaskHandler class
th = ng.TaskHandler()
# the current task reads the file passed by a previous task, that is connected to it.
# The passed file is downloaded and saved on the disk as data.csv
th.download_from_input_parameter(arg_name='Input #1',
dest_file_name='data.csv',
save=True)
# The passed file is downloaded and kept in the memory (streamed, not saved on the disk)
file_content = th.download_from_input_parameter(arg_name='Input #2',
dest_file_name=None,
save=False)
# If a csv file was streamed, it can be read as pandas Dataframe for example
df = pd.read_csv(file_content)
```
If dest_file_name=None then the file is saved on the disk with its original name from the datalake.
## How to obtain additional file information
There is a possibility to retrieve information related to the input parameters using the DatalakeHandler (```dh.get_info_from_id()```) and TaskHandler (``` th.read_task_parameter_value()```) together.
```python
import northgravity as ng
import pandas as pd
# Instantiate TaskHandler class
th = ng.TaskHandler()
# Instantiate the Datalake Handler
dh = ng.DatalakeHandler()
dh.get_info_from_id(th.read_task_parameter_value('Input #1'))
```
This can be used to retrieve any metadata associated with the file itself like file name or arrival time (time it was uploaded to the system)
## Set a Task Output
The output of the Task can be set :
- either by uploading a file saved on the disk to the datalake,
- either by streaming the python object content to the datalake as the destination file.
Once uploaded, the Output is set to point to the file on the datalake (by its ID, name ,group name)
```python
import northgravity as ng
import io
# Instantiate TaskHandler class
th = ng.TaskHandler()
# the first task uploads the file dataset.csv to My Final Group and pass the info about this file
# so that the next task can read this file by connecting its input to the output Prepared Dataset
th.upload_to_output_parameter(output_name='Output #1',
file='path/dataset.csv',
group_name='My Final Group',
file_upload_name=None,
file_type='SOURCE')
# Convert the python object to upload as BytesIO object
df_io = io.BytesIO(df.to_csv().encode())
# Stream to the datalake as the destination file
th.upload_to_output_parameter(output_name='Output #1',
file=df_io,
group_name='My Final Group',
file_upload_name='dataset.csv',
file_type='SOURCE')
```
If file_upload_name=None then the saved file will be uploaded with its original name.
If the file is streamed directly to the datalake, the file_upload_name argument must be set.
--- -
# Statuses
Sending status can be used to show in the application what is the progress of the task execution. It allows to use 3 different levels:
- FINISHED (green),
- WARNING (orange),
- ERROR (red).
Sending statuses remains optional as the North Gravity platform sends general statuses.
Only if user needs to pass some specific information in the status, this is worth using.
```python
import northgravity as ng
# Instantiate the Status Handler
sh = ng.StatusHandler()
# Generic status sender
sh.send_status(status='INFO', message='Crucial Information')
# there are pre-defined statuses
sh.info(message='Pipeline Finished Successfully')
sh.warn(message='Something suspicious is happening ...')
sh.error(message='Oops, the task failed ...')
```
Note that the info status is informing the status service that the task executed successfully and is finished.
There is also a possibility to send a warning status with a custom warning message under some circumstances and immediately stop the execution of the pipeline.
```python
from northgravity.ExceptionHandler import PythonStepWarnException
i=1
if i>1:
raise PythonStepWarnException(message='The value of i is bigger than 1! Stopping pipeline execution.')
```
--- -
# Example 1 - OOP
To simplify the use of the SDK methods in a script, Python SDK methods can be inherited by the user’s main class.
Below is an example of a class that has 3 methods:
- Download raw data (or take from the previous task)
- Process the data
- Upload the data to datalake and pass it to the next task
```python
import io
import northgravity as ng
import pandas as pd
class Runner:
def __init__(self):
# Inherit the methods from the SDK Task Handler class
self.handler = ng.TaskHandler()
self.df = None
def download_data(self):
# the method from TaskHandler can be used directly
# it downloads the file passed as input Dataset and save it as data.csv
self.handler.download_from_input_parameter(arg_name='Dataset', dest_file_name='data.csv', save=True)
# Read as pandas dataframe
return pd.read_csv("data.csv")
def process_data(self, df):
# any logic here that processed the downloaded dataset and saves it as processed_data.csv
return df_processed
def upload_data(self, df_processed):
# Encode as bytesIO
fileIO = io.BytesIO(df_processed.to_csv().encode())
# pass the processed data csv file as the output of the task called Processed Data
self.handler.upload_to_output_parameter(output_name='Processed Dataset', file=fileIO, group_name='Final Group')
def run(self):
df = self.download_data()
df_processed = self.process_data(df)
self.upload_data(df_processed)
if __name__ == '__main__':
status = ng.StatusHandler()
Runner().run()
status.info('Test Pipeline Finished')
```
# Example 2 - functional programming
The SDK methods may be used in functional programming simple scripts.
Below is an example of a script that:
- Downloads the data from Input #1
- Processes the data
- Uploads the data to datalake and passes it to the next task
```python
####
#This part consists of local running setup for development and testing purposes.
# It is not needed when running on the northgravity platform, since those variables are available in the platform's environment
import os
# login and password to the northgravity application for authentication
os.environ['LOGIN'] = ''
os.environ['PASSWORD'] = ''
# filename and name of the DataLake group to download the data from
os.environ['Input #1'] = "{'name':'', 'groupName':''}"
# the DataLake group to save the file to
# on the platform defaults to the group where the pipeline running the script is saved
os.environ['NG_STATUS_GROUP_NAME'] = ''
# specify the api endpoint to northgravity application
os.environ['NG_API_ENDPOINT'] = ''
####
import os, sys
import io
import northgravity as ng
import pandas as pd
# logging
import logging ; logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
log = logging.getLogger()
# initialize taskhandler from the Python SDK to communicate between pipeline tasks
th = ng.TaskHandler()
# download data from Input #1 of the task (Python Runner)
df_io = th.download_from_input_parameter('Input #1')
df = pd.read_csv(df_io)
log.info('Data from Input #1 downloaded.')
# your code here
df_processed = your_processing_function(df)
# stream the dataframe as the Bytes IO file (does not save the csv to disc)
file_io = io.BytesIO(df_processed.to_csv(index=False).encode())
# upload the file to Output #1 of the task
th.upload_to_output_parameter(output_name='Output #1',
file=file_io,
# add your file name here
file_upload_name='',
# the group name defaults to the pipeline group
# you can also use it explicitly: group_name='Name of The Group on Datalake'
group_name=os.environ.get('NG_STATUS_GROUP_NAME'),
# the dataframe is saved as a flat file
# use file_type='NCSV' to save NCSV-type datasets into Timeseries database
file_type='SOURCE')
```
---
# SSL Verification Bypass
This package includes a feature designed to bypass SSL certificate verification for HTTP requests, intended for development or testing purposes in trusted environments. Activating this feature allows you to send requests without verifying the SSL certificate of the server you're connecting to, which can be useful when working with self-signed certificates or servers with certificate issues. To use this feature, simply pass the argument `verify=False` to init function of the component you are using:
```python
import northgravity as ng
import os
os.environ['LOGIN'] = ''
os.environ['PASSWORD'] = ''
dh = ng.DatalakeHandler(verify_ssl=False)
th = ng.TaskHandler(verify_ssl=False)
ts = ng.Timeseries(verify_ssl=False)
```
**Warning:** Disabling SSL certificate verification poses a security risk by exposing your application to man-in-the-middle attacks. Therefore, it is strongly recommended to use this feature only in controlled and secure environments, and always ensure SSL verification is enabled in production.
Further documentation on bypassing SSL verification can be found [here.](https://requests.readthedocs.io/en/latest/user/advanced/)
---
## Who do I talk to? ##
* Admin: NorthGravity info@northgravity.com
Raw data
{
"_id": null,
"home_page": "https://www.northgravity.com/",
"name": "northgravity",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "NorthGravity",
"author_email": "info@northgravity.com",
"download_url": "https://files.pythonhosted.org/packages/f8/91/a04649da4e9da3418311d06297763d75b070ee541fbed57254b0efd905c7/northgravity-0.1.14.tar.gz",
"platform": null,
"description": "# NORTH GRAVITY PYTHON SDK #\n\nThis document describes the North Gravity Python SDK which enables external users to use the North Gravity platform tools within their Python scripts. \n\nThe Python SDK can be used within:\n\n- a single Python script that is ran thanks to the Python Runner task within a pipeline in the North Gravity application\n\n- a single Jupyter Notebook that is ran thanks to the Jupyter Runner task within a pipeline in the North Gravity application\n\n- an ensemble of Python scripts that are part of a container, for a Task created by the user, used in a pipeline in the North Gravity application\n\n**Examples of usage can be found at the bottom of the documentation.**\n \n\nNote that the SDK does not cover everything from API documentation but rather commonly used features.\n\nThe scope of the SDK:\n\n- **Datalake Handler** - downloading / uploading / reading files from the data lake \n\n- **Status Handler** - sending statuses about the task run \n\n- **Task Handler** - enables communication between tasks within a pipeline and reading/writing parameters \n\n- **Time Series Handler** - retrieves data directly from the time series database\n\n\n\n# How to install and set the package: \n## Install\n```text\npip3 install northgravity==0.1.14\n```\nAs the library is available from pip, it can be installed as a specific version within a Python Task from within requirements.txt just by adding:\n```text\nnorthgravity==0.1.14\n```\nThe package relies on the requests library so, in the project, the user must install this library in the requirements.txt file.\n```text\npip3 install requests\n```\n\n\n## Environment Variables\nThe package uses information from the environment variables. They are automatically provided when running a script within a pipeline (as a Task or within the Python/Jupyter Runners).\nIf running locally the script, users must set them in the project to be able to run the project locally.\n\n \n\nMandatory environment variables to set:\n\n- LOGIN \u2192 login received from North Gravity\n\n- PASSWORD \u2192 password to log in. Credentials are used to generate the token so that each request is authenticated.\n\n- NG_API_ENDPOINT \u2192 the URL to the North Gravity platform API (by default, the url is set to https://api.northgravity.com)\n \nThis allows to pass the authentication process and directs users' requests to the North Gravity environment API.\nAlternatively, instead of the LOGIN and PASSWORD, you may set NG_API_KEY environment variable with an API key generated within the North Gravity platform to enable the authentication process.\n\nOther variables may be useful when creating the tasks within the platform:\n\n- NG_STATUS_GROUP_NAME \u2192 the group on the data lake where the pipeline is located, and is used to display the statuses\n\n- JOBID \u2192 any value; when the pipeline is executed, this value is set by the North Gravity platform\n\n- PIPELINE_ID \u2192 any value; when the pipeline is created, this value is set by the North Gravity platform\n\n\n\n --- - \n# Logger\nPlease note that the default logging level for this package is `WARNING`. For more verbosity this can be changed to debug or info in following way:\n\n```python\nimport logging\nimport northgravity as ng\n\nlogger = logging.getLogger('NG_SDK')\nlogger.setLevel(logging.DEBUG)\n```\n\n---\n\n# Datalake Handler\n## How to download or read a file from data lake by its name ?\nThe DatalakeHandler class can be used as follow within a script to download or upload a file:\n\n```python\nimport northgravity as ng\nimport pandas as pd\n\n# Instantiate the Datalake Handler\ndh = ng.DatalakeHandler()\n\n# download file from data lake with name and group name\n# it will be saved locally with name local_name.csv\ndh.download_by_name(file_name='my_file.csv', \n group_name='My Group', \n file_type='SOURCE',\n dest_file_name='folder/local_name.csv',\n save=True,\n unzip=False)\n\n# OR read file from data lake with name and group name\n# it returns a BytesIO object (kept in the RAM, not saved in the disk)\nfileIO = dh.download_by_name(file_name='my_file.csv', \n group_name='My Group',\n file_type='SOURCE',\n dest_file_name=None,\n save=False,\n unzip=False)\n\n# read the object as pandas DataFrame\ndf = pd.read_csv(fileIO)\n\n```\nThe download methods allows to either:\n- download and save locally the wanted file, if *save=True*\n- read the file directly from the datalake and get a BytesIO object (kept in memory only, that can for example be read by pandas as a dataframe directly)\n\nNote that by default:\n- the file is NOT saved locally, but returned as a BytesIO object (streamed from the datalake).\n- the argument *dest_file_name=None*, which will save the downloaded file to the root folder with its original name.\n\n\n\n## How to download or read a file from data lake by its ID ?\nIn the case that the file ID is known, it can be directly downloaded/read as follow:\n\n```python\nimport northgravity as ng\nimport pandas as pd\n\n# Instantiate the Datalake Handler\ndh = ng.DatalakeHandler()\n\n# download file from data lake by its ID\n# it will be saved locally with name local_name.csv\ndh.download_by_id(file_id='XXXX-XXXX', \n dest_file_name='folder/local_name.csv',\n save=True,\n unzip=False)\n\n# read file from data lake by its ID\n# it returns a BytesIO object\nfileIO = dh.download_by_id(file_id='XXXX-XXXX', \n dest_file_name=None,\n save=False,\n unzip=False)\n\n# read the object as pandas DataFrame\ndf = pd.read_csv(fileIO)\n\n```\nThe download methods allows to either:\n- download and save locally the wanted file, if *save=True*\n- read the file directly from the datalake and get a BytesIO object (kept in memory only, that can for example be read by pandas as a dataframe directly)\n\nNote that by default:\n- the file is NOT saved locally, but returned as a BytesIO object (streamed from the datalake).\n- the argument *dest_file_name=None*, which will save the downloaded file to the root folder with its original name.\n\n\n\n\n\n## How to upload a file to the data lake?\nThe uploading method will upload to the given group the file at the specified path, and returns its ID on the lake:\n```python\nimport northgravity as ng\n\n# Instantiate the Datalake Handler\ndh = ng.DatalakeHandler()\n\n# upload file to data lake\nfile_id = dh.upload_file(file='path/local_name.csv', \n group_name='My Group', \n file_upload_name='name_in_the_datalake.csv')\n```\nIt is possible as well to stream a python object's content directly to the datalake from memory, without having to save the file on the disk.\n\nThe prerequisite is to pass to the uploading method a BytesIO object (not other objects such as pandas Dataframe).\n\n```python\nimport northgravity as ng\nimport io\n\n# Instantiate the Datalake Handler\ndh = ng.DatalakeHandler()\n\n# Turn the pandas DataFrame (df) to BytesIO for streaming\nfileIO = io.BytesIO(df.to_csv().encode()) \n\n# upload file to data lake\nfile_id = dh.upload_file(file=fileIO, \n group_name='My Group', \n file_upload_name='name_in_the_datalake.csv')\n```\n--- -\n# Timeseries Queries\n## How to get the list of existing symbols for a given group ?\nData saved in the time series database is structured by group, keys and timestamp.\nEach set of keys has unique dates entries in the database, with corresponding columns values.\n\nTo explore what are the available symbols for a given group, the following method can be used:\n```python\nimport northgravity as ng\n\n# Instantiate Timeseries class\nts = ng.Timeseries()\n\n# Get the list of symbols for given group\ngroup = 'My Group'\ngroup_symbols = ts.get_symbols(group_name=group)\n```\nThe default size of the returned list is 1 000 items.\n\nNote that the return object from the get_symbols() method is a JSON (python dict) where the keys and columns are accessible in the *items* key of the JSON.\n\n## How to query by metadata or descriptions?\n\nIn order to find the symbols querying by the metadata, column or symbol names, search_for parameter may be used.\nIt will look for passed string in the whole time series database and return the JSON with keys and columns where searched string appears.\n\n```python\nimport northgravity as ng\n\n# Instantiate Timeseries class\nts = ng.Timeseries()\n\n# Get the list of symbols for given group\nsearch = 'Data description'\nsearched_symbols = ts.get_symbols(search_for=search)\n```\n\nPassing both group_name and search_for parameters of the get_symbols() method allows to narrow down the results from selected group.\nThe user must provide either group_name or search_for to the method in order to obtain the symbols.\n\n\nIf trying to get the list of symbols from a group that contains more than 1 000 items, the results will be paginated (by default into chunks of 1 000 items).\nTo navigate in the large results the *get_symbols()* method takes as extra arguments the size of the returned list and the from page:\n```python\nimport northgravity as ng\n\n# Instantiate Timeseries class\nts = ng.Timeseries()\n\n# Get the list of symbols for given group\ngroup = 'My Group'\n\n# Get the results into chunks of 200 items for page 5\n# meaning the results from the 1000th till the 1200th \ngroup_symbols = ts.get_symbols(group_name=group, _size=200, _from=5)\n```\nBy default, these parameters are _size=2000 (the maximum limit for the items lengths) and _from=0.\n\n\n## How to read data from Timeseries database?\nIt is possible to use the SDK to directly query the TimeSeries database for data, given the symbol's keys, columns and the datalake group it is stored on.\n\nOn the application, it is similar of creating a Dataprep instance, that selects a set of symbols from groups into a basket. \n\nThe retrieved data can be:\n- streamed directly to memory, retrieved as a BytesIO object, by setting *file_name* as None (default value), \n- saved as a csv file locally with the provided path and name as *file_name*.\n\nThe symbols are the keys the data was saved with in the database. For a given symbol, all the keys must be passed, as a dictionary object with the key name and value.\nIt is possible to use a wildcard for the symbols values, to have all the values for that key, using *.\n\nThe wanted columns are then passed as a list that can contain one or more items.\nIf an empty list [ ] is passed to the function, it returns all the available columns.\n\nTo read all available data for specific symbols and columns with no time frame, no start or end date are passed to the method.\n\nExtra settings are as well available to query data:\n- Metadata: to either return it in the query of not\n- Format: either get a Normalized CSV (NCSV) or a dataframe format\n- Timezone: get the timestamps in the timezone of the user's account or get the data in a specific timezone\n- Corrections: how to handle corrections to the TimeSeries database (corrections set to 'yes', 'no', 'history' or 'only')\n- Delta: whether to show by datatimestamp file (delta=False) or insert time (delta=True)\n\n\nThe following code shows an example of how to query the TimseSeries database: :\n\n ```python\nimport northgravity as ng\nimport pandas as pd\n\n# Instantiate Timeseries class\nts = ng.Timeseries()\n\n# Symbols to query from database\nsymbols = {'Key1': \"Val1\", \"Key2\": \"Val2\"}\ncolumns = ['Open', 'Close']\n\n# retrieve all available data from group acccording to keys & columns \n# and save as query.csv in test/\nts.retrieve_data_as_csv(file_name='test/query.csv',\n symbols=symbols,\n columns=columns,\n group_name='My Group'\n )\n\n# The retrieved data can be read as a pandas dataframe\ndf = pd.read_csv(\"test/query.csv\")\n\n\n# retrieve all available data from group acccording to keys & columns \n# and stream to memory as BytesIO object\nfileIO = ts.retrieve_data_as_csv(file_name=None,\n symbols=symbols,\n columns=columns,\n group_name='My Group'\n )\n\n# read as pandas dataframe\ndf = pd.read_csv(fileIO)\n```\n\n## How to read data from Timeseries for specific dates?\nTo retrieve data the data within specific time frame, user can specify the start and end date.\n\nThere are two options how the start and end date may look like:\n\n- only date (e.g., 2021-01-04)\n\n- date and time (e.g., 2021-02-01T12:00:00; ISO format must be followed)\n\nFor example, if user specified start_date=2021-02-01 and end_date=2021-02-06, then data will be retrieved like this: from 2021-02-01 00:00:00 till 2021-02-06 23:59:59.\n\nIf date and time is specified then data will be retrieved exactly for the specified time frame.\n\nNote that ISO format must be followed: YYYY-MM-DD**T**HH:mm:ss. Pay attention to the \"T\" letter between date and time.\n\n ```python\nimport northgravity as ng\nimport pandas as pd\n\n# Instantiate Timeseries class\nts = ng.Timeseries()\n\n# Symbols to query from database\nsymbols = {'Key1': \"Val1\", \"Key2\": \"Val2\"}\ncolumns = 'Open'\n\n# retrieve data between start_date and end_date\n# data will be retrieved between 2021-01-04 00:00:00 and 2021-02-05 23:59:59\n# saved as a csv file named test.csv\nts.retrieve_data_as_csv(file_name='test/test.csv',\n symbols=symbols,\n columns=columns,\n group_name='My Group',\n start_date='2021-01-04',\n end_date='2021-02-05'\n )\n\n# retrieve data for specific time frame\n# from 2021-01-04 12:30:00\n# to 2021-02-05 09:15:00\nts.retrieve_data_as_csv(file_name='test/test.csv',\n symbols=symbols,\n columns=columns,\n group_name='My Group',\n start_date='2021-01-04T12:30:00',\n end_date='2021-02-05T09:15:00'\n )\n\n\n# For given keys, columns, group and dates range\n# Streaming instead of saving\nfileIO = ts.retrieve_data_as_csv(file_name=None,\n symbols=symbols,\n columns=columns,\n group_name='My Group',\n start_date='2021-01-04',\n end_date='2021-02-05'\n )\n\n# read as pandas dataframe\ndf = pd.read_csv(fileIO)\n```\n\n## How to use a wildcard for a key's values ?\nTo get all the value for one a several keys for the query, the character * can be used as a wildcard.\nThe argument *allow_wildcard* should be set to True in the retrieval function to enable the use of wildcard. \n\nPlease note that by default, the use of wildcards is **DISABLED**.\n\n```python\nimport northgravity as ng\nimport pandas as pd\n\n# Instantiate Timeseries class\nts = ng.Timeseries()\n\n# Symbols to query from database\nsymbols = {'Key1': \"Val1\", \"Key2\": \"Val2\", \"Key3\": \"*\"}\ncolumns = ['Open', 'Close']\n\n# retrieve all history for the symbols with keys and columns\n# all values for Key3 will be returned\n# the data will be streamed to memory\nfileIO = ts.retrieve_data_as_csv(file_name=None,\n symbols=symbols,\n columns=columns,\n group_name='My Group',\n allow_wildcard=True\n )\n\n# read as pandas dataframe\ndf = pd.read_csv(fileIO)\n```\n\n\n## How to get all the columns for a given set of keys ?\nTo get all the columns values for a given set of keys in the database, the query can take an empty list as the queried columns, as follow:\n\n```python\nimport northgravity as ng\nimport pandas as pd\n\n# Instantiate Timeseries class\nts = ng.Timeseries()\n\n# Symbols to query from database\nsymbols = {'Key1': \"Val1\", \"Key2\": \"Val2\", \"Key3\": \"Val3\"}\ncolumns = []\n\n# retrieve all history for the symbols with keys and columns\n# all columns for the set of keys will be returned\n# the data will be streamed to memory\nfileIO = ts.retrieve_data_as_csv(file_name=None,\n symbols=symbols,\n columns=columns,\n group_name='My Group'\n )\n\n# read as pandas dataframe\ndf = pd.read_csv(fileIO)\n```\n\nNote that this configuration can be used with keys wildcards (with *allow_wildacrd=True*) and any other setting.\n\n\n## How to modify the Time Zone of the data ?\nThe timestamps in the queried time series are set by default in the timezone of the user's account, who created the script or the pipeline.\nIt is described in the Date column header between brackets (for example *Date(UTC)*)\n\nTo modify the time zone in the retrieved dataset, the timezone can be passed directly to the retrieval function as follow.\nIt must respect the Continent/City format.\n\n```python\nimport northgravity as ng\nimport pandas as pd\n\n# Instantiate Timeseries class\nts = ng.Timeseries()\n\n# Symbols to query from database\nsymbols = {'Key1': \"Val1\", \"Key2\": \"Val2\"}\ncolumns = ['Open', 'Close']\n\n# retrieve all available data from group according to keys & columns \n# and stream to memory as BytesIO object\nfileIO = ts.retrieve_data_as_csv(file_name=None,\n symbols=symbols,\n columns=columns,\n group_name='My Group',\n timezone='Europe/London'\n )\n\n# read as pandas dataframe\ndf = pd.read_csv(fileIO)\n```\n\n## How to get the metadata along with the data ? \nIt is possible to get extra columns in the retrieved data, along with the keys & columns values, containing the metadata of the symbols.\nIt is done by setting the argument *metadata=True* in the retrieval function.\n\nBy default, no metadata is included in the queried data.\n\n```python\nimport northgravity as ng\nimport pandas as pd\n\n# Instantiate Timeseries class\nts = ng.Timeseries()\n\n# Symbols to query from database\nsymbols = {'Key1': \"Val1\", \"Key2\": \"Val2\"}\ncolumns = ['Open', 'Close']\n\n# retrieve all available data from group according to keys & columns \n# and stream to memory as BytesIO object\nfileIO = ts.retrieve_data_as_csv(file_name=None,\n symbols=symbols,\n columns=columns,\n group_name='My Group',\n metadata=True\n )\n\n# read as pandas dataframe\ndf = pd.read_csv(fileIO)\n```\n\n\n## How to modify the format of the received data ?\nThe queried data comes by default as Normalized CSV format (NCSV), with in this order:\n* the keys columns,\n* the date column, with timestamps in either the default timezone or the specified one (*timezone* argument in the function),\n* the values columns,\n* the metadata columns, if wanted (*metadata=True*)\n \nBy setting *NCSV=False* in the retrieval method, the data will be returned as Dataframe format (PANDAS in API docs), as a JSON.\nThe JSON (python dict) has timestamps as keys and a dictionary containing pairs of symbols_columns and their value.\n\n```python\nimport northgravity as ng\nimport pandas as pd\n\n# Instantiate Timeseries class\nts = ng.Timeseries()\n\n# Symbols to query from database\nsymbols = {'Key1': \"Val1\", \"Key2\": \"Val2\"}\ncolumns = ['Open', 'Close']\n\n# retrieve all available data from group according to keys & columns \n# and stream to memory as BytesIO object\nfile_json = ts.retrieve_data_as_csv(file_name=None,\n symbols=symbols,\n columns=columns,\n group_name='My Group',\n metadata=True,\n NCSV=False\n )\n\n# read as pandas dataframe\n# Transpose to have datetime as rows index\ndf = pd.DataFrame(file_json).T\n```\n\nNote that the dataframe, created from the JSON containing the data, is then transposed to have timestamps as DatetimeIndex (along rows axis). \n\n\n--- - \n# Task Handler\nUsers can extend the existing set of tasks on North Gravity platform by executing scripts or notebooks respectively from the Python Runner Task or the Jupyter Runner Task. \n\nThis task then can be used in a pipeline and be able to communicate with other tasks by:\n\n- reading outputs from other tasks, as inputs\n- writing outputs, that can be used by others tasks as inputs\n\nA task can as well receive inputs directly as a file picked from the datalake, either for a specific file either for the newest available version of a file on the lake, given its name and group.\n\nWhithin a python script, run into either one of the **Python Runner Task**, this is implemented as follow:\n\n## Read a Task Input\nThe input file passed to the Task can be: \n- either downloaded to the disk,\n- either to be read on the fly (useful when limited space on the disk but not in memory).\n\n ```python\nimport northgravity as ng\nimport pandas as pd\n\n# Instantiate TaskHandler class\nth = ng.TaskHandler()\n\n# the current task reads the file passed by a previous task, that is connected to it.\n# The passed file is downloaded and saved on the disk as data.csv\nth.download_from_input_parameter(arg_name='Input #1', \n dest_file_name='data.csv',\n save=True)\n\n# The passed file is downloaded and kept in the memory (streamed, not saved on the disk)\nfile_content = th.download_from_input_parameter(arg_name='Input #2', \n dest_file_name=None,\n save=False)\n\n# If a csv file was streamed, it can be read as pandas Dataframe for example\ndf = pd.read_csv(file_content)\n```\n\nIf dest_file_name=None then the file is saved on the disk with its original name from the datalake.\n\n## How to obtain additional file information\n\nThere is a possibility to retrieve information related to the input parameters using the DatalakeHandler (```dh.get_info_from_id()```) and TaskHandler (``` th.read_task_parameter_value()```) together.\n\n```python\nimport northgravity as ng\nimport pandas as pd\n\n# Instantiate TaskHandler class\nth = ng.TaskHandler()\n# Instantiate the Datalake Handler\ndh = ng.DatalakeHandler()\n\ndh.get_info_from_id(th.read_task_parameter_value('Input #1'))\n\n``` \nThis can be used to retrieve any metadata associated with the file itself like file name or arrival time (time it was uploaded to the system)\n\n## Set a Task Output\nThe output of the Task can be set :\n- either by uploading a file saved on the disk to the datalake,\n- either by streaming the python object content to the datalake as the destination file.\n\nOnce uploaded, the Output is set to point to the file on the datalake (by its ID, name ,group name)\n\n\n ```python\nimport northgravity as ng\nimport io\n\n# Instantiate TaskHandler class\nth = ng.TaskHandler()\n\n# the first task uploads the file dataset.csv to My Final Group and pass the info about this file\n# so that the next task can read this file by connecting its input to the output Prepared Dataset\nth.upload_to_output_parameter(output_name='Output #1', \n file='path/dataset.csv', \n group_name='My Final Group',\n file_upload_name=None,\n file_type='SOURCE')\n\n# Convert the python object to upload as BytesIO object\ndf_io = io.BytesIO(df.to_csv().encode())\n\n# Stream to the datalake as the destination file\nth.upload_to_output_parameter(output_name='Output #1', \n file=df_io, \n group_name='My Final Group',\n file_upload_name='dataset.csv',\n file_type='SOURCE')\n```\n\nIf file_upload_name=None then the saved file will be uploaded with its original name.\nIf the file is streamed directly to the datalake, the file_upload_name argument must be set.\n\n--- -\n# Statuses\nSending status can be used to show in the application what is the progress of the task execution. It allows to use 3 different levels: \n- FINISHED (green),\n- WARNING (orange), \n- ERROR (red).\n\nSending statuses remains optional as the North Gravity platform sends general statuses.\nOnly if user needs to pass some specific information in the status, this is worth using.\n\n```python\nimport northgravity as ng\n\n# Instantiate the Status Handler\nsh = ng.StatusHandler()\n\n# Generic status sender\nsh.send_status(status='INFO', message='Crucial Information')\n\n# there are pre-defined statuses\nsh.info(message='Pipeline Finished Successfully')\nsh.warn(message='Something suspicious is happening ...')\nsh.error(message='Oops, the task failed ...')\n```\n\nNote that the info status is informing the status service that the task executed successfully and is finished.\n\nThere is also a possibility to send a warning status with a custom warning message under some circumstances and immediately stop the execution of the pipeline.\n\n```python\nfrom northgravity.ExceptionHandler import PythonStepWarnException\n\ni=1\nif i>1:\n raise PythonStepWarnException(message='The value of i is bigger than 1! Stopping pipeline execution.')\n\n```\n--- -\n# Example 1 - OOP\nTo simplify the use of the SDK methods in a script, Python SDK methods can be inherited by the user\u2019s main class. \n\nBelow is an example of a class that has 3 methods:\n\n- Download raw data (or take from the previous task)\n- Process the data \n- Upload the data to datalake and pass it to the next task\n\n ```python\nimport io\nimport northgravity as ng\nimport pandas as pd\n\nclass Runner:\n def __init__(self):\n # Inherit the methods from the SDK Task Handler class\n self.handler = ng.TaskHandler()\n self.df = None\n \n def download_data(self):\n # the method from TaskHandler can be used directly\n # it downloads the file passed as input Dataset and save it as data.csv\n self.handler.download_from_input_parameter(arg_name='Dataset', dest_file_name='data.csv', save=True)\n \n # Read as pandas dataframe\n return pd.read_csv(\"data.csv\")\n \n \n def process_data(self, df):\n # any logic here that processed the downloaded dataset and saves it as processed_data.csv\n \n return df_processed\n\n \n def upload_data(self, df_processed):\n # Encode as bytesIO\n fileIO = io.BytesIO(df_processed.to_csv().encode())\n \n # pass the processed data csv file as the output of the task called Processed Data\n self.handler.upload_to_output_parameter(output_name='Processed Dataset', file=fileIO, group_name='Final Group')\n \n \n def run(self):\n df = self.download_data()\n df_processed = self.process_data(df)\n self.upload_data(df_processed)\n \n \nif __name__ == '__main__':\n status = ng.StatusHandler()\n Runner().run()\n status.info('Test Pipeline Finished')\n\n ```\n\n# Example 2 - functional programming\n\nThe SDK methods may be used in functional programming simple scripts.\nBelow is an example of a script that:\n- Downloads the data from Input #1\n- Processes the data\n- Uploads the data to datalake and passes it to the next task\n\n ```python\n\n#### \n#This part consists of local running setup for development and testing purposes.\n# It is not needed when running on the northgravity platform, since those variables are available in the platform's environment\n\nimport os\n# login and password to the northgravity application for authentication \nos.environ['LOGIN'] = ''\nos.environ['PASSWORD'] = ''\n\n# filename and name of the DataLake group to download the data from\nos.environ['Input #1'] = \"{'name':'', 'groupName':''}\"\n\n# the DataLake group to save the file to\n# on the platform defaults to the group where the pipeline running the script is saved\nos.environ['NG_STATUS_GROUP_NAME'] = ''\n\n# specify the api endpoint to northgravity application\nos.environ['NG_API_ENDPOINT'] = ''\n\n#### \n\nimport os, sys\nimport io\nimport northgravity as ng\nimport pandas as pd\n\n# logging\nimport logging ; logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\nlog = logging.getLogger()\n\n# initialize taskhandler from the Python SDK to communicate between pipeline tasks\nth = ng.TaskHandler()\n\n# download data from Input #1 of the task (Python Runner)\ndf_io = th.download_from_input_parameter('Input #1')\ndf = pd.read_csv(df_io)\nlog.info('Data from Input #1 downloaded.')\n\n# your code here\ndf_processed = your_processing_function(df)\n\n# stream the dataframe as the Bytes IO file (does not save the csv to disc)\nfile_io = io.BytesIO(df_processed.to_csv(index=False).encode())\n\n# upload the file to Output #1 of the task\nth.upload_to_output_parameter(output_name='Output #1',\n file=file_io,\n # add your file name here\n file_upload_name='',\n # the group name defaults to the pipeline group\n # you can also use it explicitly: group_name='Name of The Group on Datalake'\n group_name=os.environ.get('NG_STATUS_GROUP_NAME'),\n # the dataframe is saved as a flat file\n # use file_type='NCSV' to save NCSV-type datasets into Timeseries database\n file_type='SOURCE')\n```\n---\n# SSL Verification Bypass\n\nThis package includes a feature designed to bypass SSL certificate verification for HTTP requests, intended for development or testing purposes in trusted environments. Activating this feature allows you to send requests without verifying the SSL certificate of the server you're connecting to, which can be useful when working with self-signed certificates or servers with certificate issues. To use this feature, simply pass the argument `verify=False` to init function of the component you are using:\n\n```python\nimport northgravity as ng\nimport os\nos.environ['LOGIN'] = ''\nos.environ['PASSWORD'] = ''\n\ndh = ng.DatalakeHandler(verify_ssl=False)\nth = ng.TaskHandler(verify_ssl=False)\nts = ng.Timeseries(verify_ssl=False)\n```\n\n**Warning:** Disabling SSL certificate verification poses a security risk by exposing your application to man-in-the-middle attacks. Therefore, it is strongly recommended to use this feature only in controlled and secure environments, and always ensure SSL verification is enabled in production.\n\nFurther documentation on bypassing SSL verification can be found [here.](https://requests.readthedocs.io/en/latest/user/advanced/)\n\n---\n## Who do I talk to? ##\n* Admin: NorthGravity info@northgravity.com\n",
"bugtrack_url": null,
"license": "",
"summary": "Python SDK for NorthGravity platform",
"version": "0.1.14",
"project_urls": {
"Homepage": "https://www.northgravity.com/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "040527b287d71c1d1617ea98635412f14b8ece92a5b106cdcd9ba54f7dd8e987",
"md5": "98feb0889e0d4f71b8253ff1e963a005",
"sha256": "aa9954ed9d514f3c78047dd2213368c07ffd33c5023796b19c4bf806f5fc912c"
},
"downloads": -1,
"filename": "northgravity-0.1.14-py3-none-any.whl",
"has_sig": false,
"md5_digest": "98feb0889e0d4f71b8253ff1e963a005",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 25494,
"upload_time": "2024-02-26T09:29:54",
"upload_time_iso_8601": "2024-02-26T09:29:54.326014Z",
"url": "https://files.pythonhosted.org/packages/04/05/27b287d71c1d1617ea98635412f14b8ece92a5b106cdcd9ba54f7dd8e987/northgravity-0.1.14-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f891a04649da4e9da3418311d06297763d75b070ee541fbed57254b0efd905c7",
"md5": "bd00dd02562c56bfd709ce7c81eb51b0",
"sha256": "8b7fdec121dd25a22e23755ec9eb551818266b4e2fb5868f2f09afe5296b2271"
},
"downloads": -1,
"filename": "northgravity-0.1.14.tar.gz",
"has_sig": false,
"md5_digest": "bd00dd02562c56bfd709ce7c81eb51b0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 36516,
"upload_time": "2024-02-26T09:29:57",
"upload_time_iso_8601": "2024-02-26T09:29:57.192402Z",
"url": "https://files.pythonhosted.org/packages/f8/91/a04649da4e9da3418311d06297763d75b070ee541fbed57254b0efd905c7/northgravity-0.1.14.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-26 09:29:57",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "northgravity"
}