katapult


Namekatapult JSON
Version 0.6.22 PyPI version JSON
download
home_pagehttps://github.com/benbenz/katapult
SummaryKatapult is a Python package that allows you to run any script on a cloud service (for now AWS only).
upload_time2023-01-20 16:34:53
maintainer
docs_urlNone
authorYour Name
requires_python>=3.9,<4.0
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Description

Katapult is a Python package that allows you to run any script on a cloud service (for now AWS only).

# Features

- Easily run scripts on AWS by writing a simple configuration file
- Handles Python and Julia scripts, or any command
- Handles PyPi , Conda/Mamba, Apt-get and Julia environments
- Concurrent instance support
- Handles disconnections from instances, including stopped or terminated instances
- Handles interruption of Katapult, with state recovery
- Runs locally or on a remote instance, with 'watcher' functionality 

| Important Note |
| --- |
| Katapult helps you easily create instances on AWS so that you can focus on your scripts. It is important to realize that it can and **will likely generate extra costs**. If you want to minimize those costs, activate the `eco` mode in the configuration or make sure you monitor the resources created by Katapult. Those include: <ul><li>VPCs</li><li>Subnets</li><li>Security Groups</li><li>Instances</li><li>Device Mappings / Disks</li><li>Policies &amp; Roles</li><li>KeyPairs</li></ul>|

# Pre-requisites

In order to use the python AWS client (Boto3), you need to have an existing AWS account and to setup your computer for AWS.

## with AWS CLI

1. Go to [the AWS Signup page](https://portal.aws.amazon.com/billing/signup#/start/email) and create an account
2. Download [the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
3. In the AWS web console, [create a user with administrator privilege](https://docs.aws.amazon.com/streams/latest/dev/setting-up.html)
4. In the AWS web console, under the AMI section, click on the new user and make sure you create an access key under the tab "Security Credentials". Make sure "Console Password" is Enabled as well
5. In ther Terminal, use the AWS CLI to setup your configuration:
```
aws configure
```
See [https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html](here)

6. To run in `remote` mode, you also need to [add the following credentials to your user](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html) (maybe):
- iam:PassRole
- iam:CreateRole
- ec2:AssociateIamInstanceProfile
- ec2:ReplaceIamInstanceProfileAssociation

## manually

1. Go to [the AWS Signup page](https://portal.aws.amazon.com/billing/signup#/start/email) and create an account
2. In the AWS web console, [create a user with administrator privilege](https://docs.aws.amazon.com/streams/latest/dev/setting-up.html)
3. In the AWS web console, under the IAM section, click on the new user and make sure you create an access key under the tab "Security Credentials". Make sure "Console Password" is Enabled as well
4. Add your new user credentials manually, [in the credentials file](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html)

##### '~/.aws/config' example ('C:\Users\USERNAME\\.aws\config' on Windows)

```
[default]
region = eu-west-3
output = json
```

##### '~/.aws/credentials' example ('C:\Users\USERNAME\\.aws\credentials' on Windows)

```
[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
```

## Setting up a separate user with least permissions (manually) 

1. In the AWS web console, in the IAM service, create a group 'katapult-users' with `AmazonEC2FullAccess` and `IAMFullAccess` permissions
2. In the AWS web console, in the IAM service, create a user USERNAME attached to the 'katapult-users' group:
### Step 1
![add user 1](./images/adduser1.jpg)
### Step 2
![add user 2](./images/adduser2.jpg)
### ... Step 5
![add user 3](./images/adduser3.jpg)

  **COPY the Access Key info !**

3. Add your new user profile manually, [in the credentials file](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html)

##### '~/.aws/config' example ('C:\Users\USERNAME\\.aws\config' on Windows)

```
[default]
region = eu-west-3
output = json

[profile katapult]
region = eu-west-3
output = json
```

##### '~/.aws/credentials' example ('C:\Users\USERNAME\\.aws\credentials' on Windows)

```
[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

[katapult]
aws_access_key_id = YOU_PROFILE_ACCESS_KEY_ID
aws_secret_access_key = YOUR_PROFILE_SECRET_ACCESS_KEY
```

4. add the 'profile' : 'katapult_USERNAME' to the configuration

```python
config = {

    ################################################################################
    # GLOBALS
    ################################################################################

    'project'      : 'test' ,                             # this will be concatenated with the instance hashes (if not None) 
    'profile'      : 'katapult' ,
    ...
```


# Installation

## On MacOS / Linux

### with pip

```bash
python3 -m venv .venv
source ./.venv/bin/activate
python3 -m ensurepip --default-pip
python3 -m pip install -r requirements.txt
```

### with poetry

```bash
curl -sSL https://install.python-poetry.org | python3.8 -
poetry install
```

## On Windows (powershell)

### with pip

```bat
C:\> python3 -m venv .venv
C:\> .venv\\Scripts\\activate.bat
C:\> python -m pip install -r requirements.txt
```

### with poetry

```bat
C:\> (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -
C:\> poetry install
```

# Usage / Test runs

```bash
# copy the example file
cp examples/config.example.py config.py
#
# EDIT THE FILE
#

# to run with pip
python3 -m katapult.demo config
# to run with pip with reset (maestro and instances)
python3 -m katapult.demo config reset
# to run with poetry
poetry run demo config
# to run with poetry with reset (maestro and the instances)
poetry run demo config reset

# to run script flow test
poetry install -E scriptflow
cd examples/scriptflow/simple
# [!] EDIT THE PROFILE_NAME in sflow.py [!]
scriptflow run sleepit
```

# Configuration example

```python
config = {

    ################################################################################
    # GLOBALS
    ################################################################################

    'project'      : 'test' ,                             # this will be concatenated with the instance hashes (if not None) 
    'profile'      : None ,                               # if you want to use a specific profile (user/region), specify its name here
    'dev'          : False ,                              # When True, this will ensure the same instance and dev environement are being used (while working on building up the project) 
    'debug'        : 1 ,                                  # debug level (0...3)
    'maestro'      : 'local' ,                            # where the 'maestro' resides: local' | 'remote' (micro instance)
    'auto_stop'    : True ,                               # will automatically stop the instances and the maestro, once the jobs are done
    'provider'     : 'aws' ,                              # the provider name ('aws' | 'azure' | ...)
    'job_assign'   : None ,                               # algorithm used for job assignation / task scheduling ('random' | 'multi_knapsack')
    'recover'      : True ,                               # if True, Katapult will always save the state and try to recover this state on the next execution
    'print_deploy' : False ,                              # if True, this will cause the deploy stage to print more (and lock)
    'mutualize_uploads' : True ,                          # adjusts the directory structure of the uploads ... (False = per job or True = global/mutualized)


    ################################################################################
    # INSTANCES / HARDWARE
    ################################################################################

    'instances' : [
        { 
            'region'       : None ,                       # can be None or has to be valid. Overrides AWS user region configuration.
            'cloud_id'     : None ,                       # can be None, or even wrong/non-existing - then the default one is used
            'img_id'       : 'ami-077fd75cd229c811b' ,    # OS image: has to be valid and available for the profile (user/region)
            'img_username' : 'ubuntu' ,                   # the SSH user for the image
            'type'         : 't2.micro' ,                 # proprietary size spec (has to be valid)
            'cpus'         : None ,                       # number of CPU cores
            'gpu'          : None ,                       # the proprietary type of the GPU 
            'disk_size'    : None ,                       # the disk size of this instance type (in GB)
            'disk_type'    : None ,                       # the proprietary disk type of this instance type: 'standard', 'io1', 'io2', 'st1', etc
            'eco'          : True ,                       # eco == True >> SPOT e.g.
            'eco_life'     : None ,                       # lifecycle of the machine in ECO mode (datetime.timedelta object) (can be None with eco = True)
            'max_bid'      : None ,                       # max bid ($/hour) (can be None with eco = True)
            'number'       : 1 ,                          # multiplicity: the number of instance(s) to create
            'explode'      : True                         # multiplicity: can this instance type be distributed accross multiple instances, to split CPUs
        }

    ] ,

    ################################################################################
    # ENVIRONMENTS / SOFTWARE
    ################################################################################

    'environments' : [
        {
            'name'         : None ,                       # name of the environment - should be unique if not 'None'. 'None' only when len(environments)==1

            # env_conda + env_pypi  : mamba is used to setup the env (pip dependencies included)
            # env_conda (only)      : mamba is used to setup the env
            # env_pypi  (only)      : venv + pip is used to setup the env 

            'command'      : 'examples/install_julia.sh' ,      # None, or a string: path to a bash file to execute when deploying
            'env_aptget'   : [ "openssh-client"] ,        # None, an array of librarires/binaries for apt-get
            'env_conda'    : "examples/environment.yml",   # None, an array of libraries, a path to environment.yml  file, or a path to the root of a conda environment
            'env_conda_channels' : None ,                 # None, an array of channels. If None (or absent), defaults and conda-forge will be used
            'env_pypi'     : "examples/requirements.txt" , # None, an array of libraries, a path to requirements.txt file, or a path to the root of a venv environment 
            'env_julia'    : [ "Wavelets" ] ,             # None, a string or an array of Julia packages to install (requires julia)
        }
    ] ,

    ################################################################################
    # JOBS / SCRIPTS
    ################################################################################

    'jobs' : [
        {
            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)
            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)
            'run_script'   : 'examples/run_remote.py 1 10',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
            'run_command'  : None ,                       # the command to run
            'upload_files' : [ "uploaded.txt"] ,          # any file to upload (array or string) - will be put in the same directory
            'input_files'   : 'input.dat' ,                # the input file name (used by the script)
            'output_files'  : 'output.dat' ,               # the output file name (used by the script)
            'repeat'       : 2 ,                          # the number of times this job is repeated
        } ,
        {
            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)
            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)
            'run_script'   : 'examples/run_remote.py 2 12',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
            'run_command'  : None ,                       # the command to run
            'upload_files' : [ "uploaded.txt"] ,          # any file to upload (array or string) - will be put in the same directory
            'input_files'   : 'input.dat' ,                # the input file name (used by the script)
            'output_files'  : 'output.dat' ,               # the output file name (used by the script)
        }
    ]
}
```

# Minimum configuration example

```python
config = {
    'debug'        : 1 ,                                  # debug level (0...3)
    'maestro'      : 'local' ,                            # where the 'maestro' resides: local' | 'remote' (nano instance) | 'lambda'
    'provider'     : 'aws' ,                              # the provider name ('aws' | 'azure' | ...)

    'instances' : [
        { 
            'type'         : 't2.micro' ,                 # proprietary size spec (has to be valid)
        }

    ] ,

    'environments' : [
        {
            'name'         : None ,                       # name of the environment - should be unique if not 'None'. 'None' only when len(environments)==1
            'env_conda'    : "examples/environment.yml",   # None, an array of libraries, a path to environment.yml  file, or a path to the root of a conda environment
            'env_julia'    : ["Wavelets"] ,                       # None, a string or an array of Julia packages to install (requires julia)
        }
    ] ,

    'jobs' : [
        {
            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)
            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)
            'run_script'   : 'examples/run_remote.py 1 10',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
            'upload_files' : [ "uploaded.txt"] ,          # any file to upload (array or string) - will be put in the same directory
            'input_files'   : 'input.dat' ,                # the input file name (used by the script)
            'output_files'  : 'output.dat' ,               # the output file name (used by the script)
        } ,
        {
            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)
            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)
            'run_script'   : 'examples/run_remote.py 2 12',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
            'upload_files' : [ "uploaded.txt"] ,          # any file to upload (array or string) - will be put in the same directory
            'input_files'   : 'input.dat' ,                # the input file name (used by the script)
            'output_files'  : 'output.dat' ,               # the output file name (used by the script)
        }
    ]
}
```


# Python API

```python
class KatapultLightProvider(ABC):
class KatapultFatProvider(ABC):

    def debug(self,level,*args,**kwargs):

    # start the provider: creates the instances
    # if reset = True, Katapult forces a process cleanup as well as more re-uploads
    def start(self,reset):

    # deploy all materials (environments, files, scripts etc.)
    def deploy(self):

    # run the jobs
    # returns a KatapultRunSession
    def run(self,wait=False):

    # wait for the processes to reach a state
    def wait(self,job_state,run_session=None):

    # get the states of the processes
    def get_jobs_states(self,run_session=None):

    # print a summary of processes
    def print_jobs_summary(self,run_session=None,instance=None):

    # print the aborted logs, if any
    def print_aborted_logs(self,run_session=None,instance=None):

    # fetch results data
    def fetch_results(self,out_directory=None,run_session=None):

    # wait for the watcher process to be completely done (useful for demo)
    def finalize(self):

    # wakeup = start + assign + deploy + run + watch
    def wakeup(self)

    @abstractmethod
    def get_region(self):

    @abstractmethod
    def get_recommended_cpus(self,inst_cfg):

    @abstractmethod
    def create_instance_objects(self,config,for_maestro):

    @abstractmethod
    def find_instance(self,config):

    @abstractmethod
    def start_instance(self,instance):

    @abstractmethod
    def stop_instance(self,instance):

    @abstractmethod
    def terminate_instance(self,instance):

    @abstractmethod
    def update_instance_info(self,instance):    

# GLOBAL methods 

def get_client(provider='aws',maestro='local')
```

# Katapult usage

## Python programmatic use

Note: this demo works the same way, whether Katapult runs locally or remotely

```python

from katapult      import provider as katapult
from katapult.core import KatapultProcessState
import asyncio 

# load config
config = __import__(config).config

# create provider: this loads the config
provider = katapult.get_client(config)

# start the provider: this attempts to create the instances
await provider.start()

# deploy the necessary stuff onto the instances
await provider.deploy()

# run the jobs and get active processes objects back
run_session = await provider.run()

# wait for the active proccesses to be done or aborted:
await provider.wait(KatapultProcessState.DONE|KatapultProcessState.ABORTED)

# you can get the state of all jobs this way:
await provider.get_jobs_states()
# or get the state for a specific run session:
await provider.get_jobs_states(run_session)

# you can print processes summary with:
await provider.print_jobs_summary()

# get the results file locally
await provider.fetch_results('./tmp')

```

## CLI use

Note: the commands below work the same way, whether Katapult runs locally or remotely

### with Poetry

```bash
# init the client with global params and add instances, envs and jobs (if any)
poetry run cli init config.py
# add more jobs
poetry run cli cfg_add_jobs config_jobs.py
# add more stuff
poetry run cli cfg_add_config config_more.py
# deploy the material onto the instances
poetry run cli deploy
# run the jobs
poetry run cli run
# wait for the jobs to be done
poetry run cli wait
# get the results
poetry run cli fetch_results
 # shutdown the daemon
poetry run cli shutdown
```

### with virtualenv

```bash
# init the client with global params and add instances, envs and jobs (if any)
python3 -m katapult.cli init config.py
# add more jobs
python3 -m katapult.cli cfg_add_jobs config_jobs.py
# add more stuff
python3 -m katapult.cli cfg_add_config config_more.py
# deploy the material onto the instances
python3 -m katapult.cli deploy
# run the jobs
python3 -m katapult.cli run
# wait for the jobs to be done
python3 -m katapult.cli wait
# get the results
python3 -m katapult.cli fetch_results
 # shutdown the daemon
python3 -m katapult.cli shutdown
```

# Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

# License
[MIT](https://choosealicense.com/licenses/mit/)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/benbenz/katapult",
    "name": "katapult",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/5e/14/b98793da79034805da969cdb80f15881c57cef3027b9e8bb64e18bc106b4/katapult-0.6.22.tar.gz",
    "platform": null,
    "description": "# Description\n\nKatapult is a Python package that allows you to run any script on a cloud service (for now AWS only).\n\n# Features\n\n- Easily run scripts on AWS by writing a simple configuration file\n- Handles Python and Julia scripts, or any command\n- Handles PyPi , Conda/Mamba, Apt-get and Julia environments\n- Concurrent instance support\n- Handles disconnections from instances, including stopped or terminated instances\n- Handles interruption of Katapult, with state recovery\n- Runs locally or on a remote instance, with 'watcher' functionality \n\n| Important Note |\n| --- |\n| Katapult helps you easily create instances on AWS so that you can focus on your scripts. It is important to realize that it can and **will likely generate extra costs**. If you want to minimize those costs, activate the `eco` mode in the configuration or make sure you monitor the resources created by Katapult. Those include: <ul><li>VPCs</li><li>Subnets</li><li>Security Groups</li><li>Instances</li><li>Device Mappings / Disks</li><li>Policies &amp; Roles</li><li>KeyPairs</li></ul>|\n\n# Pre-requisites\n\nIn order to use the python AWS client (Boto3), you need to have an existing AWS account and to setup your computer for AWS.\n\n## with AWS CLI\n\n1. Go to [the AWS Signup page](https://portal.aws.amazon.com/billing/signup#/start/email) and create an account\n2. Download [the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)\n3. In the AWS web console, [create a user with administrator privilege](https://docs.aws.amazon.com/streams/latest/dev/setting-up.html)\n4. In the AWS web console, under the AMI section, click on the new user and make sure you create an access key under the tab \"Security Credentials\". Make sure \"Console Password\" is Enabled as well\n5. In ther Terminal, use the AWS CLI to setup your configuration:\n```\naws configure\n```\nSee [https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html](here)\n\n6. To run in `remote` mode, you also need to [add the following credentials to your user](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html) (maybe):\n- iam:PassRole\n- iam:CreateRole\n- ec2:AssociateIamInstanceProfile\n- ec2:ReplaceIamInstanceProfileAssociation\n\n## manually\n\n1. Go to [the AWS Signup page](https://portal.aws.amazon.com/billing/signup#/start/email) and create an account\n2. In the AWS web console, [create a user with administrator privilege](https://docs.aws.amazon.com/streams/latest/dev/setting-up.html)\n3. In the AWS web console, under the IAM section, click on the new user and make sure you create an access key under the tab \"Security Credentials\". Make sure \"Console Password\" is Enabled as well\n4. Add your new user credentials manually, [in the credentials file](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html)\n\n##### '~/.aws/config' example ('C:\\Users\\USERNAME\\\\.aws\\config' on Windows)\n\n```\n[default]\nregion = eu-west-3\noutput = json\n```\n\n##### '~/.aws/credentials' example ('C:\\Users\\USERNAME\\\\.aws\\credentials' on Windows)\n\n```\n[default]\naws_access_key_id = YOUR_ACCESS_KEY_ID\naws_secret_access_key = YOUR_SECRET_ACCESS_KEY\n```\n\n## Setting up a separate user with least permissions (manually) \n\n1. In the AWS web console, in the IAM service, create a group 'katapult-users' with `AmazonEC2FullAccess` and `IAMFullAccess` permissions\n2. In the AWS web console, in the IAM service, create a user USERNAME attached to the 'katapult-users' group:\n### Step 1\n![add user 1](./images/adduser1.jpg)\n### Step 2\n![add user 2](./images/adduser2.jpg)\n### ... Step 5\n![add user 3](./images/adduser3.jpg)\n\n  **COPY the Access Key info !**\n\n3. Add your new user profile manually, [in the credentials file](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html)\n\n##### '~/.aws/config' example ('C:\\Users\\USERNAME\\\\.aws\\config' on Windows)\n\n```\n[default]\nregion = eu-west-3\noutput = json\n\n[profile katapult]\nregion = eu-west-3\noutput = json\n```\n\n##### '~/.aws/credentials' example ('C:\\Users\\USERNAME\\\\.aws\\credentials' on Windows)\n\n```\n[default]\naws_access_key_id = YOUR_ACCESS_KEY_ID\naws_secret_access_key = YOUR_SECRET_ACCESS_KEY\n\n[katapult]\naws_access_key_id = YOU_PROFILE_ACCESS_KEY_ID\naws_secret_access_key = YOUR_PROFILE_SECRET_ACCESS_KEY\n```\n\n4. add the 'profile' : 'katapult_USERNAME' to the configuration\n\n```python\nconfig = {\n\n    ################################################################################\n    # GLOBALS\n    ################################################################################\n\n    'project'      : 'test' ,                             # this will be concatenated with the instance hashes (if not None) \n    'profile'      : 'katapult' ,\n    ...\n```\n\n\n# Installation\n\n## On MacOS / Linux\n\n### with pip\n\n```bash\npython3 -m venv .venv\nsource ./.venv/bin/activate\npython3 -m ensurepip --default-pip\npython3 -m pip install -r requirements.txt\n```\n\n### with poetry\n\n```bash\ncurl -sSL https://install.python-poetry.org | python3.8 -\npoetry install\n```\n\n## On Windows (powershell)\n\n### with pip\n\n```bat\nC:\\> python3 -m venv .venv\nC:\\> .venv\\\\Scripts\\\\activate.bat\nC:\\> python -m pip install -r requirements.txt\n```\n\n### with poetry\n\n```bat\nC:\\> (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -\nC:\\> poetry install\n```\n\n# Usage / Test runs\n\n```bash\n# copy the example file\ncp examples/config.example.py config.py\n#\n# EDIT THE FILE\n#\n\n# to run with pip\npython3 -m katapult.demo config\n# to run with pip with reset (maestro and instances)\npython3 -m katapult.demo config reset\n# to run with poetry\npoetry run demo config\n# to run with poetry with reset (maestro and the instances)\npoetry run demo config reset\n\n# to run script flow test\npoetry install -E scriptflow\ncd examples/scriptflow/simple\n# [!] EDIT THE PROFILE_NAME in sflow.py [!]\nscriptflow run sleepit\n```\n\n# Configuration example\n\n```python\nconfig = {\n\n    ################################################################################\n    # GLOBALS\n    ################################################################################\n\n    'project'      : 'test' ,                             # this will be concatenated with the instance hashes (if not None) \n    'profile'      : None ,                               # if you want to use a specific profile (user/region), specify its name here\n    'dev'          : False ,                              # When True, this will ensure the same instance and dev environement are being used (while working on building up the project) \n    'debug'        : 1 ,                                  # debug level (0...3)\n    'maestro'      : 'local' ,                            # where the 'maestro' resides: local' | 'remote' (micro instance)\n    'auto_stop'    : True ,                               # will automatically stop the instances and the maestro, once the jobs are done\n    'provider'     : 'aws' ,                              # the provider name ('aws' | 'azure' | ...)\n    'job_assign'   : None ,                               # algorithm used for job assignation / task scheduling ('random' | 'multi_knapsack')\n    'recover'      : True ,                               # if True, Katapult will always save the state and try to recover this state on the next execution\n    'print_deploy' : False ,                              # if True, this will cause the deploy stage to print more (and lock)\n    'mutualize_uploads' : True ,                          # adjusts the directory structure of the uploads ... (False = per job or True = global/mutualized)\n\n\n    ################################################################################\n    # INSTANCES / HARDWARE\n    ################################################################################\n\n    'instances' : [\n        { \n            'region'       : None ,                       # can be None or has to be valid. Overrides AWS user region configuration.\n            'cloud_id'     : None ,                       # can be None, or even wrong/non-existing - then the default one is used\n            'img_id'       : 'ami-077fd75cd229c811b' ,    # OS image: has to be valid and available for the profile (user/region)\n            'img_username' : 'ubuntu' ,                   # the SSH user for the image\n            'type'         : 't2.micro' ,                 # proprietary size spec (has to be valid)\n            'cpus'         : None ,                       # number of CPU cores\n            'gpu'          : None ,                       # the proprietary type of the GPU \n            'disk_size'    : None ,                       # the disk size of this instance type (in GB)\n            'disk_type'    : None ,                       # the proprietary disk type of this instance type: 'standard', 'io1', 'io2', 'st1', etc\n            'eco'          : True ,                       # eco == True >> SPOT e.g.\n            'eco_life'     : None ,                       # lifecycle of the machine in ECO mode (datetime.timedelta object) (can be None with eco = True)\n            'max_bid'      : None ,                       # max bid ($/hour) (can be None with eco = True)\n            'number'       : 1 ,                          # multiplicity: the number of instance(s) to create\n            'explode'      : True                         # multiplicity: can this instance type be distributed accross multiple instances, to split CPUs\n        }\n\n    ] ,\n\n    ################################################################################\n    # ENVIRONMENTS / SOFTWARE\n    ################################################################################\n\n    'environments' : [\n        {\n            'name'         : None ,                       # name of the environment - should be unique if not 'None'. 'None' only when len(environments)==1\n\n            # env_conda + env_pypi  : mamba is used to setup the env (pip dependencies included)\n            # env_conda (only)      : mamba is used to setup the env\n            # env_pypi  (only)      : venv + pip is used to setup the env \n\n            'command'      : 'examples/install_julia.sh' ,      # None, or a string: path to a bash file to execute when deploying\n            'env_aptget'   : [ \"openssh-client\"] ,        # None, an array of librarires/binaries for apt-get\n            'env_conda'    : \"examples/environment.yml\",   # None, an array of libraries, a path to environment.yml  file, or a path to the root of a conda environment\n            'env_conda_channels' : None ,                 # None, an array of channels. If None (or absent), defaults and conda-forge will be used\n            'env_pypi'     : \"examples/requirements.txt\" , # None, an array of libraries, a path to requirements.txt file, or a path to the root of a venv environment \n            'env_julia'    : [ \"Wavelets\" ] ,             # None, a string or an array of Julia packages to install (requires julia)\n        }\n    ] ,\n\n    ################################################################################\n    # JOBS / SCRIPTS\n    ################################################################################\n\n    'jobs' : [\n        {\n            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)\n            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)\n            'run_script'   : 'examples/run_remote.py 1 10',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')\n            'run_command'  : None ,                       # the command to run\n            'upload_files' : [ \"uploaded.txt\"] ,          # any file to upload (array or string) - will be put in the same directory\n            'input_files'   : 'input.dat' ,                # the input file name (used by the script)\n            'output_files'  : 'output.dat' ,               # the output file name (used by the script)\n            'repeat'       : 2 ,                          # the number of times this job is repeated\n        } ,\n        {\n            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)\n            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)\n            'run_script'   : 'examples/run_remote.py 2 12',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')\n            'run_command'  : None ,                       # the command to run\n            'upload_files' : [ \"uploaded.txt\"] ,          # any file to upload (array or string) - will be put in the same directory\n            'input_files'   : 'input.dat' ,                # the input file name (used by the script)\n            'output_files'  : 'output.dat' ,               # the output file name (used by the script)\n        }\n    ]\n}\n```\n\n# Minimum configuration example\n\n```python\nconfig = {\n    'debug'        : 1 ,                                  # debug level (0...3)\n    'maestro'      : 'local' ,                            # where the 'maestro' resides: local' | 'remote' (nano instance) | 'lambda'\n    'provider'     : 'aws' ,                              # the provider name ('aws' | 'azure' | ...)\n\n    'instances' : [\n        { \n            'type'         : 't2.micro' ,                 # proprietary size spec (has to be valid)\n        }\n\n    ] ,\n\n    'environments' : [\n        {\n            'name'         : None ,                       # name of the environment - should be unique if not 'None'. 'None' only when len(environments)==1\n            'env_conda'    : \"examples/environment.yml\",   # None, an array of libraries, a path to environment.yml  file, or a path to the root of a conda environment\n            'env_julia'    : [\"Wavelets\"] ,                       # None, a string or an array of Julia packages to install (requires julia)\n        }\n    ] ,\n\n    'jobs' : [\n        {\n            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)\n            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)\n            'run_script'   : 'examples/run_remote.py 1 10',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')\n            'upload_files' : [ \"uploaded.txt\"] ,          # any file to upload (array or string) - will be put in the same directory\n            'input_files'   : 'input.dat' ,                # the input file name (used by the script)\n            'output_files'  : 'output.dat' ,               # the output file name (used by the script)\n        } ,\n        {\n            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)\n            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)\n            'run_script'   : 'examples/run_remote.py 2 12',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')\n            'upload_files' : [ \"uploaded.txt\"] ,          # any file to upload (array or string) - will be put in the same directory\n            'input_files'   : 'input.dat' ,                # the input file name (used by the script)\n            'output_files'  : 'output.dat' ,               # the output file name (used by the script)\n        }\n    ]\n}\n```\n\n\n# Python API\n\n```python\nclass KatapultLightProvider(ABC):\nclass KatapultFatProvider(ABC):\n\n    def debug(self,level,*args,**kwargs):\n\n    # start the provider: creates the instances\n    # if reset = True, Katapult forces a process cleanup as well as more re-uploads\n    def start(self,reset):\n\n    # deploy all materials (environments, files, scripts etc.)\n    def deploy(self):\n\n    # run the jobs\n    # returns a KatapultRunSession\n    def run(self,wait=False):\n\n    # wait for the processes to reach a state\n    def wait(self,job_state,run_session=None):\n\n    # get the states of the processes\n    def get_jobs_states(self,run_session=None):\n\n    # print a summary of processes\n    def print_jobs_summary(self,run_session=None,instance=None):\n\n    # print the aborted logs, if any\n    def print_aborted_logs(self,run_session=None,instance=None):\n\n    # fetch results data\n    def fetch_results(self,out_directory=None,run_session=None):\n\n    # wait for the watcher process to be completely done (useful for demo)\n    def finalize(self):\n\n    # wakeup = start + assign + deploy + run + watch\n    def wakeup(self)\n\n    @abstractmethod\n    def get_region(self):\n\n    @abstractmethod\n    def get_recommended_cpus(self,inst_cfg):\n\n    @abstractmethod\n    def create_instance_objects(self,config,for_maestro):\n\n    @abstractmethod\n    def find_instance(self,config):\n\n    @abstractmethod\n    def start_instance(self,instance):\n\n    @abstractmethod\n    def stop_instance(self,instance):\n\n    @abstractmethod\n    def terminate_instance(self,instance):\n\n    @abstractmethod\n    def update_instance_info(self,instance):    \n\n# GLOBAL methods \n\ndef get_client(provider='aws',maestro='local')\n```\n\n# Katapult usage\n\n## Python programmatic use\n\nNote: this demo works the same way, whether Katapult runs locally or remotely\n\n```python\n\nfrom katapult      import provider as katapult\nfrom katapult.core import KatapultProcessState\nimport asyncio \n\n# load config\nconfig = __import__(config).config\n\n# create provider: this loads the config\nprovider = katapult.get_client(config)\n\n# start the provider: this attempts to create the instances\nawait provider.start()\n\n# deploy the necessary stuff onto the instances\nawait provider.deploy()\n\n# run the jobs and get active processes objects back\nrun_session = await provider.run()\n\n# wait for the active proccesses to be done or aborted:\nawait provider.wait(KatapultProcessState.DONE|KatapultProcessState.ABORTED)\n\n# you can get the state of all jobs this way:\nawait provider.get_jobs_states()\n# or get the state for a specific run session:\nawait provider.get_jobs_states(run_session)\n\n# you can print processes summary with:\nawait provider.print_jobs_summary()\n\n# get the results file locally\nawait provider.fetch_results('./tmp')\n\n```\n\n## CLI use\n\nNote: the commands below work the same way, whether Katapult runs locally or remotely\n\n### with Poetry\n\n```bash\n# init the client with global params and add instances, envs and jobs (if any)\npoetry run cli init config.py\n# add more jobs\npoetry run cli cfg_add_jobs config_jobs.py\n# add more stuff\npoetry run cli cfg_add_config config_more.py\n# deploy the material onto the instances\npoetry run cli deploy\n# run the jobs\npoetry run cli run\n# wait for the jobs to be done\npoetry run cli wait\n# get the results\npoetry run cli fetch_results\n # shutdown the daemon\npoetry run cli shutdown\n```\n\n### with virtualenv\n\n```bash\n# init the client with global params and add instances, envs and jobs (if any)\npython3 -m katapult.cli init config.py\n# add more jobs\npython3 -m katapult.cli cfg_add_jobs config_jobs.py\n# add more stuff\npython3 -m katapult.cli cfg_add_config config_more.py\n# deploy the material onto the instances\npython3 -m katapult.cli deploy\n# run the jobs\npython3 -m katapult.cli run\n# wait for the jobs to be done\npython3 -m katapult.cli wait\n# get the results\npython3 -m katapult.cli fetch_results\n # shutdown the daemon\npython3 -m katapult.cli shutdown\n```\n\n# Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\nPlease make sure to update tests as appropriate.\n\n# License\n[MIT](https://choosealicense.com/licenses/mit/)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Katapult is a Python package that allows you to run any script on a cloud service (for now AWS only).",
    "version": "0.6.22",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5b0006cf83a8e14a53f9456a2b4528712464f8ec834bd6ba989bcbb5ad5974d4",
                "md5": "c8605e7777da068e9c90cb8b18ebeb7e",
                "sha256": "d6f39c5b1817daf6710132fcd7d8079768e880ac0f24b0df505b11d8f0a639d9"
            },
            "downloads": -1,
            "filename": "katapult-0.6.22-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c8605e7777da068e9c90cb8b18ebeb7e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 108584,
            "upload_time": "2023-01-20T16:34:50",
            "upload_time_iso_8601": "2023-01-20T16:34:50.333516Z",
            "url": "https://files.pythonhosted.org/packages/5b/00/06cf83a8e14a53f9456a2b4528712464f8ec834bd6ba989bcbb5ad5974d4/katapult-0.6.22-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5e14b98793da79034805da969cdb80f15881c57cef3027b9e8bb64e18bc106b4",
                "md5": "a269d71e09fbbb5ebf354ed5d29becb9",
                "sha256": "f85a1329ef9a0fe5628be826fda21f9fa010168bc144c51a9999c33b0b5b4de7"
            },
            "downloads": -1,
            "filename": "katapult-0.6.22.tar.gz",
            "has_sig": false,
            "md5_digest": "a269d71e09fbbb5ebf354ed5d29becb9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 103731,
            "upload_time": "2023-01-20T16:34:53",
            "upload_time_iso_8601": "2023-01-20T16:34:53.066032Z",
            "url": "https://files.pythonhosted.org/packages/5e/14/b98793da79034805da969cdb80f15881c57cef3027b9e8bb64e18bc106b4/katapult-0.6.22.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-20 16:34:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "benbenz",
    "github_project": "katapult",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "katapult"
}
        
Elapsed time: 0.09932s