# Locusts
Locusts is a Python package for distributing many small jobs on a system (which can be your machine or a remote HPC running SLURM).
## Installation
Locusts package is currently part of the [PyPI](https://test.pypi.org) Test archive.
In order to install it, type
`python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps locusts`
Note: PyPI Test is not a permanent archive. Expect this installation procedure to change over time.
## How it works
Locusts is thought for whom has to run a **huge amount of small, independent jobs** and has problems with the most used schedulers which will scatter the jobs over over too many nodes, or queue them indefinitely.
Moreover, this package provides a **safe, clean environment for each job instance**, and keeps and collects notable inputs and outputs.
In short, locusts creates a minimal filesystem where it prepares one environment for each job it has to execute. The runs are directed by a manager bash script, which schedules them and reports its stauts and the one of the jobs to the main locusts routine, which will always be run locally. Finally, it checks for a set of compulsory output files and compiles a list of success and failures.
### Modes
Locusts can help you distributing your jobs when you are facing one of these three situations:
* You want to run everything on your local machine (**local mode**)
* You want to submit jobs to a HPC (**remote mode**)
* You want to submit jobs to a HPC which shares a directory with your local machine (**remote-shared mode**)
### Environments
Once you give locusts the set of input to consider and the command to execute, it creates the Generic Environment, a minimal filesystem composed of three folders:
* An **execution folder**, where the main manager scripts will be placed and executed and where execution cache files will keep them updated on the progress of the single jobs
* A **work folder**, where the specific inputs of each job are considered and where outputs are addressed
* A **shared folder**, where common inputs have to be placed in case a group of different jobs wants to use them
Basing on this architecture, Locusts provides two types of environments the user can choose from depending on her needs:
#### Default Locusts Environment
![Locusts Default](./locusts-img/Locusts.001.jpeg)
If the user only needs to process a (possibly huge) amount of files and get another (still huge) amount of output files in return, this environment is the optimal choice: it allows for minimal data transfer and disk space usage while each of the parallel runs will run in a protected sub-environment. The desired output files and the corresponding logs will then be collected and put in a folder designated by the user
#### Custom Environment
![Locusts Custom](./locusts-img/Locusts.003.jpeg)
The user could nonetheless want to parallelize a program or a code having more complex effects than taking in a bunch of input files and returning some outputs: for example, a program displacing files around a filesystem will not be able to run in the Default Locusts Environment. In these situations, the program needs to have access to a whole environment rather than to a set of input files.
Starting from this common base, there are two different environments that can be used:
* The default Locusts Environment consists in having one folder corresponding to each set of files for running one instance of the command
* The Custom Environment lets the user employ any other filesystem
## Tutorial
### Example 1: Running a script requiring input/output management (Default Environment)
You can find this example in the directory `tests/test_manager/`
In `tests/test_manager/my_input_dir/` you will find 101 pairs of input files: `inputfile\_\#.txt` and `secondinputfile\_\#.txt`, where 0 <= \# <= 100. Additionally, you will also find a single file named `sharedfile.txt`.
The aim here is executing this small script over the 101 sets of inputs:
`
sleep 1;
ls -lrth <inputfile> <secondinputfile> <sharedfile> > <outputfile>;
cat <inputfile> <secondinputfile> <sharedfile> > <secondoutputfile>
`
For each pair, the script takes in `inputfile\_\#.txt`, `secondinputfile\_\#.txt` (both vary from instance to instance) and `sharedfile.txt` (which instead remains always the same), and returns `ls\_output\_\#.txt` and `cat\_output\_\#.txt`. In order to mimick a longer process, the script is artificially made to last at least one second.
The file `tests/test_manager/test_manager.py` gives you an example (and also a template) of how ou can submit a job on Locusts.
The function you want to call is `locusts.swarm.launch`, which takes several arguments.
Before describing them, let's look at the strategy used by Locusts: in essence, you give Locusts a template of the command you want to execute, and the you tell Locusts where to look for files to execute that template with. In our case, the template is:
`
sleep 1;
ls -lrth inputfile_<id>.txt secondinputfile_<id>.txt <shared>sf1 > ls_output_<id>.txt;
cat inputfile_<id>.txt secondinputfile_<id>.txt <shared>sf1 > cat_output_<id>.txt
`
Notice there are two handles that Locusts will know how to replace: `<id>` and `<shared>`. The `<id>` handle is there to specify the variable part of a filename (in our case, an integer in the [0,100] interval). The `<shared>` tag tells locust
* `indir` takes the location (absolute path or relative from where you are calling the script) of the directory containing all your input files
* `outdir` takes the location (absolute path or relative from where you are calling the script) of the directory where you want to collect your results
* `code` takes a unique codename for the job you want to launch
* `spcins` takes a list containing the template names for the
shdins=shared_inputs,
outs=outputs,
cmd=command_template,
parf=parameter_file
### Example 2: Running a script requiring input/output management (Default Environment)
You will find the material
Raw data
{
"_id": null,
"home_page": "https://github.com/pypa/sampleproject",
"name": "locusts",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.5, <4",
"maintainer_email": "",
"keywords": "locusts,distributed computing,embarassingly parallel",
"author": "Edoardo Sarti",
"author_email": "edoardo.sarti@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ed/2a/45b973c86d5d33f41ddc73cf49aad887ef40c99b5e4133544e18bb446e85/locusts-0.0.77.tar.gz",
"platform": null,
"description": "# Locusts\n\nLocusts is a Python package for distributing many small jobs on a system (which can be your machine or a remote HPC running SLURM).\n\n\n## Installation\n\nLocusts package is currently part of the [PyPI](https://test.pypi.org) Test archive.\nIn order to install it, type\n\n`python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps locusts`\n\nNote: PyPI Test is not a permanent archive. Expect this installation procedure to change over time.\n\n\n## How it works\n\nLocusts is thought for whom has to run a **huge amount of small, independent jobs** and has problems with the most used schedulers which will scatter the jobs over over too many nodes, or queue them indefinitely.\nMoreover, this package provides a **safe, clean environment for each job instance**, and keeps and collects notable inputs and outputs.\nIn short, locusts creates a minimal filesystem where it prepares one environment for each job it has to execute. The runs are directed by a manager bash script, which schedules them and reports its stauts and the one of the jobs to the main locusts routine, which will always be run locally. Finally, it checks for a set of compulsory output files and compiles a list of success and failures.\n\n### Modes\n\nLocusts can help you distributing your jobs when you are facing one of these three situations:\n* You want to run everything on your local machine (**local mode**)\n* You want to submit jobs to a HPC (**remote mode**)\n* You want to submit jobs to a HPC which shares a directory with your local machine (**remote-shared mode**)\n\n### Environments\n\nOnce you give locusts the set of input to consider and the command to execute, it creates the Generic Environment, a minimal filesystem composed of three folders:\n* An **execution folder**, where the main manager scripts will be placed and executed and where execution cache files will keep them updated on the progress of the single jobs\n* A **work folder**, where the specific inputs of each job are considered and where outputs are addressed\n* A **shared folder**, where common inputs have to be placed in case a group of different jobs wants to use them\n\nBasing on this architecture, Locusts provides two types of environments the user can choose from depending on her needs:\n\n#### Default Locusts Environment\n![Locusts Default](./locusts-img/Locusts.001.jpeg)\n\nIf the user only needs to process a (possibly huge) amount of files and get another (still huge) amount of output files in return, this environment is the optimal choice: it allows for minimal data transfer and disk space usage while each of the parallel runs will run in a protected sub-environment. The desired output files and the corresponding logs will then be collected and put in a folder designated by the user\n\n#### Custom Environment\n![Locusts Custom](./locusts-img/Locusts.003.jpeg)\nThe user could nonetheless want to parallelize a program or a code having more complex effects than taking in a bunch of input files and returning some outputs: for example, a program displacing files around a filesystem will not be able to run in the Default Locusts Environment. In these situations, the program needs to have access to a whole environment rather than to a set of input files. \n\n\n\nStarting from this common base, there are two different environments that can be used:\n* The default Locusts Environment consists in having one folder corresponding to each set of files for running one instance of the command\n* The Custom Environment lets the user employ any other filesystem \n\n## Tutorial\n\n### Example 1: Running a script requiring input/output management (Default Environment)\nYou can find this example in the directory `tests/test_manager/`\nIn `tests/test_manager/my_input_dir/` you will find 101 pairs of input files: `inputfile\\_\\#.txt` and `secondinputfile\\_\\#.txt`, where 0 <= \\# <= 100. Additionally, you will also find a single file named `sharedfile.txt`.\nThe aim here is executing this small script over the 101 sets of inputs:\n`\nsleep 1;\nls -lrth <inputfile> <secondinputfile> <sharedfile> > <outputfile>;\ncat <inputfile> <secondinputfile> <sharedfile> > <secondoutputfile>\n`\nFor each pair, the script takes in `inputfile\\_\\#.txt`, `secondinputfile\\_\\#.txt` (both vary from instance to instance) and `sharedfile.txt` (which instead remains always the same), and returns `ls\\_output\\_\\#.txt` and `cat\\_output\\_\\#.txt`. In order to mimick a longer process, the script is artificially made to last at least one second.\n\nThe file `tests/test_manager/test_manager.py` gives you an example (and also a template) of how ou can submit a job on Locusts.\nThe function you want to call is `locusts.swarm.launch`, which takes several arguments.\nBefore describing them, let's look at the strategy used by Locusts: in essence, you give Locusts a template of the command you want to execute, and the you tell Locusts where to look for files to execute that template with. In our case, the template is:\n`\nsleep 1;\nls -lrth inputfile_<id>.txt secondinputfile_<id>.txt <shared>sf1 > ls_output_<id>.txt;\ncat inputfile_<id>.txt secondinputfile_<id>.txt <shared>sf1 > cat_output_<id>.txt\n`\nNotice there are two handles that Locusts will know how to replace: `<id>` and `<shared>`. The `<id>` handle is there to specify the variable part of a filename (in our case, an integer in the [0,100] interval). The `<shared>` tag tells locust\n\n\n* `indir` takes the location (absolute path or relative from where you are calling the script) of the directory containing all your input files\n* `outdir` takes the location (absolute path or relative from where you are calling the script) of the directory where you want to collect your results\n* `code` takes a unique codename for the job you want to launch\n* `spcins` takes a list containing the template names for the \n shdins=shared_inputs,\n outs=outputs,\n cmd=command_template,\n parf=parameter_file\n\n### Example 2: Running a script requiring input/output management (Default Environment)\nYou will find the material \n",
"bugtrack_url": null,
"license": "",
"summary": "Distributes many short tasks on multicore and hpc systems",
"version": "0.0.77",
"project_urls": {
"Bug Reports": "https://github.com/pypa/sampleproject/issues",
"Funding": "https://donate.pypi.org",
"Homepage": "https://github.com/pypa/sampleproject",
"Say Thanks!": "http://saythanks.io/to/example",
"Source": "https://github.com/pypa/sampleproject/"
},
"split_keywords": [
"locusts",
"distributed computing",
"embarassingly parallel"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f989c62bfa49e59c13ac59e25d68de32972d98757feb7667a51b569529c4d125",
"md5": "225c4744216a1f7805e66f8031d827a5",
"sha256": "0085cd4f083e908428c4c23b9bed221638f49b6821f61dd2f6d19a4af4caee68"
},
"downloads": -1,
"filename": "locusts-0.0.77-py3-none-any.whl",
"has_sig": false,
"md5_digest": "225c4744216a1f7805e66f8031d827a5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.5, <4",
"size": 27026,
"upload_time": "2024-02-16T22:45:00",
"upload_time_iso_8601": "2024-02-16T22:45:00.524103Z",
"url": "https://files.pythonhosted.org/packages/f9/89/c62bfa49e59c13ac59e25d68de32972d98757feb7667a51b569529c4d125/locusts-0.0.77-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ed2a45b973c86d5d33f41ddc73cf49aad887ef40c99b5e4133544e18bb446e85",
"md5": "e160ba203162da4c61a1a77ddfccee81",
"sha256": "4a71809393b1a212667f8c273e77eff6e8e39ca5a6b0b49f70f8fc7b008c83c8"
},
"downloads": -1,
"filename": "locusts-0.0.77.tar.gz",
"has_sig": false,
"md5_digest": "e160ba203162da4c61a1a77ddfccee81",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5, <4",
"size": 27971,
"upload_time": "2024-02-16T22:45:03",
"upload_time_iso_8601": "2024-02-16T22:45:03.347226Z",
"url": "https://files.pythonhosted.org/packages/ed/2a/45b973c86d5d33f41ddc73cf49aad887ef40c99b5e4133544e18bb446e85/locusts-0.0.77.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-16 22:45:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pypa",
"github_project": "sampleproject",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "locusts"
}