sfrout


Namesfrout JSON
Version 0.0.27 PyPI version JSON
download
home_page
SummaryScalable, asynchronious SalesForce (SFDC) report downloader for SSO/SAML clients
upload_time2023-04-07 12:51:40
maintainer
docs_urlNone
author
requires_python>=3.11
license
keywords report sfdc salesforce
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![SFrout Downloads](https://static.pepy.tech/badge/sfrout)
![License Badge](https://img.shields.io/pypi/l/sfrout.svg)
![Wheel Support Badge](https://img.shields.io/pypi/wheel/sfrout.svg)
![Python Version Support Badge](https://img.shields.io/pypi/pyversions/sfrout.svg)

# SFrout - Sales Force Report Downloader

## What is it?

**SFrout** is a scalable, asynchronous SalesForce report downloader for SAML/SSO clients. The app allows you to download reports based on their ID using your personal SFDC account. Supports asynchronous requests, threaded processing of the files, logging to rotating file and stdout, produces summary report for the session. 

## Installation

SFrout require Python 3.11 to work properly.

- navigate to some convenient folder (optional)

```sh
cd c:/path/to/venv
```

- create Python virtual env (optional)

```sh
python -m venv c:/path/to/venv
```

- activate virtual env

for Windows:
```sh
c:\path\to\venv\Scripts\activate.bat
```

for Unix
```sh
source /path/to/venv/bin/activate
```

- install sfrout from PyPi 

```sh
pip install sfrout
```

## Usage

### Python interface

- create a python file and paste in below script. Amend domain and path to your reports file

```python
import sfrout


sfrout.run(domain="https://corp.my.salesforce.com/", 
           reports_path="C:\\path\\to\\reports.csv")
```

- fill in reports.csv file according to given template

[Input file template](https://github.com/LukaszHoszowski/sfrout/blob/main/example/reports-default.csv)

- execute the script

```sh
"c:/path/to/your/file.py"
```

- shortly after you might be prompted to log in to SalesForce in MS Edge, website will open automatically

- next the progress bar will show up

![](https://github.com/LukaszHoszowski/sfrout/blob/main/docs/_static/progress_bar.png?raw=True)

- once finish, **SFrout** will print summary table

![](https://github.com/LukaszHoszowski/sfrout/blob/main/docs/_static/summary.png?raw=True)

### CLI

- open terminal

```sh
cmd.exe
```

- type in command

```sh
sfrout "https://corp.my.salesforce.com/" "C:\\path\\to\\reports.csv"
```

- fill in reports.csv file according to given template

[Input file template](https://github.com/LukaszHoszowski/sfrout/blob/main/example/reports-default.csv)

CLI interface allow you to configure parameters according to below list:

```sh
Options:
  -s, --summary_filepath PATH  Path to the summary report ->
                               c:/summary_report.csv
  -r, --report TEXT            Run single report ->
                               "name,id,path,optional_report_params"
  -p, --path PATH              Override save location of the reports
  -t, --threads INTEGER        Number of threads to spawn  [default: 0]
  -ls, --stdout_loglevel TEXT  STDOUT logging level -> [DEBUG | INFO | WARN
                               |WARNING | ERROR | CRITICAL]  [default:
                               WARNING]
  -lf, --file_loglevel TEXT    File logging level -> [DEBUG | INFO | WARN|
                               WARNING | ERROR | CRITICAL]  [default: INFO]
  -v, --verbose                Turn off progress bar
  -h, --help                   Show this message and exit.
```

### Windows Task Scheduler

Create `sfrout.bat`, save it with below script:

```sh
"c:\path\to\new\virtual\environment\Scripts\python.exe" sfrout "https://corp.my.salesforce.com/" "C:\\path\\to\\reports.csv"
pause
```

Test it, if works create a task and set some schedule

## How the program works

Once you run the program:

1) configuration, data parsing
2) creating connector, report, shared queue objects
3) initialization of workers listeners within file handler
4) connector will check the connection and execute all required steps to establish the connection
5) connector will produce asynchronous requests to given domain
6) once single request is fulfilled retrieved content is being put into the queue
7) workers keep on listen for items in the queue
8) once queue is not empty some worker will take the item and start processing
9) once all the request are fulfilled queue will close and send signals to workers to shutdown once they finish their last job

## SFDC Connector

**Authentication:**

Authorization and authentication in SFDC is based on `sid` cookie entry for SSO or on security tokens in other cases. 

SFrout will try to connect to CookieJar of your MS Edge browser and find `sid` entry for given domain. If `sid` is not found, the app will open MS Edge and request given SFDC domain. You will be asked to log in as usual. After 20 seconds program will retry to find `sid` in your CookieJar. Browsers usually store cookies in SQlite db. This information is not being transfered to db immediately, it can be triggered by closing the browser but it isn't the most elegant solution. SFrout will ask for `sid` in 2 seconds intervals as long as `sid` will be available. Entire process will repeat as many times as it takes.

**Sending requests:**

SFDC supports export GET requests by adding `?export=csv&enc=UTF-8&isdtp=p1` to address of your supplemented with headers and above `sid` entry. In response you will receive CSV-like data stream. Time window for entire operation is fixed and equal to **15 minutes**. If you will not be able to receive response in this narrow time window connection will be forceable shutdown and request cancelled regardless of the stage.

Requests are send out asynchronously to speed things up and restrain memory consumption to bare minimum. Once request will fail, regardless of that what has caused failure, SFrout will retry. Limit of attempts has been set to **20**. Once request is successful, response is saved in Report object and put to the queue for further processing.

## File handler

Thread based solution for saving request responses to a file. 

File handler spawns workers in separate threads. Number of workers is equal to half of available threads on your machine (e.g. if your cpu has 6 cores and 12 threads SFR will spawn 6 workers). If information about available resources is not reachable it will default to **2**. Such approach will not dramatically slow down other applications on your computer and will secure required resources for SFR. Each worker will observe `Queue`, if something will be put into `Queue` one of the workers will start processing of the report. Bare in mind that each saving operation erases response and content of the report due to memory consumption. `Queue` size is unlimited so sooner or later workers will handle entire workload. Workers will die once `Queue` will send signal that they shouldn't expect any new items. These workers who are just processing items will finish their jobs and die quietly.

All files are processed by `Pandas` which gives wide palette of available formats. Unfortunately such flexibility somes with the price. In current shape `Pandas` isn't the best in saving `csv` files due to `numpy` engine. On top of that `Pandas` is a relatively heavy library. I plan to switch to some other processing engine in the future.

## Limitations

- **Caution!** SFrout deletes last 5 lines from each response, SFDC adds footer to each data stream. This might be organization specific and require your attention if you plan to use it in your organization.

- be default number of workers is equal to half of available threads of the machine

- by default rotating file log is limitted to 3 parts, up to 1_000_000 bytes each 

- progress bar is based on quantity of items and may show incorrect ETA if report's size vary significantly 

- currently only save to `csv` method is available

## Benchmarks

My testing set consist of 33 reports from various universes of SFDC with size between 200 kb to 200 mb. In total 1.4 gb of data.

Tests were not bounded by network bandwidth. Tests were evaluated on i7-8850H, DDR4 32 gb, Windows 10 x64.

Processing of the testing set vary between 3 and 8 minutes, results strongly correlate to SFDC performance on given time. Time of processing is correlated to size of the reports, bigger = longer.

## Final remarks

This app has been created based on environment of my organization. There is alternative way of Authenticating to SFDC based on security token, unfortunately this option was blocked in my organization and only SSO is available. 

## Documentation available on [Read the Docs](https://sfrout.readthedocs.io)

[![](https://github.com/LukaszHoszowski/sfrout/blob/main/docs/_static/rtd.png?raw=True)](https://sfrout.readthedocs.io)

## Release Notes

[Changelog](https://github.com/LukaszHoszowski/sfrout/blob/main/CHANGELOG.md)

## License

[Apache 2.0](https://github.com/LukaszHoszowski/sfrout/blob/main/LICENSE.md)

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "sfrout",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "",
    "keywords": "Report,SFDC,SalesForce",
    "author": "",
    "author_email": "Lukasz Hoszowski <lukasz.hoszowski@mail.com>",
    "download_url": "https://files.pythonhosted.org/packages/db/ae/baf13cb4b6f88cf13700bc0b04d6c54b9ff3f1bff313919bdd0c3620311e/sfrout-0.0.27.tar.gz",
    "platform": null,
    "description": "![SFrout Downloads](https://static.pepy.tech/badge/sfrout)\n![License Badge](https://img.shields.io/pypi/l/sfrout.svg)\n![Wheel Support Badge](https://img.shields.io/pypi/wheel/sfrout.svg)\n![Python Version Support Badge](https://img.shields.io/pypi/pyversions/sfrout.svg)\n\n# SFrout - Sales Force Report Downloader\n\n## What is it?\n\n**SFrout** is a scalable, asynchronous SalesForce report downloader for SAML/SSO clients. The app allows you to download reports based on their ID using your personal SFDC account. Supports asynchronous requests, threaded processing of the files, logging to rotating file and stdout, produces summary report for the session. \n\n## Installation\n\nSFrout require Python 3.11 to work properly.\n\n- navigate to some convenient folder (optional)\n\n```sh\ncd c:/path/to/venv\n```\n\n- create Python virtual env (optional)\n\n```sh\npython -m venv c:/path/to/venv\n```\n\n- activate virtual env\n\nfor Windows:\n```sh\nc:\\path\\to\\venv\\Scripts\\activate.bat\n```\n\nfor Unix\n```sh\nsource /path/to/venv/bin/activate\n```\n\n- install sfrout from PyPi \n\n```sh\npip install sfrout\n```\n\n## Usage\n\n### Python interface\n\n- create a python file and paste in below script. Amend domain and path to your reports file\n\n```python\nimport sfrout\n\n\nsfrout.run(domain=\"https://corp.my.salesforce.com/\", \n           reports_path=\"C:\\\\path\\\\to\\\\reports.csv\")\n```\n\n- fill in reports.csv file according to given template\n\n[Input file template](https://github.com/LukaszHoszowski/sfrout/blob/main/example/reports-default.csv)\n\n- execute the script\n\n```sh\n\"c:/path/to/your/file.py\"\n```\n\n- shortly after you might be prompted to log in to SalesForce in MS Edge, website will open automatically\n\n- next the progress bar will show up\n\n![](https://github.com/LukaszHoszowski/sfrout/blob/main/docs/_static/progress_bar.png?raw=True)\n\n- once finish, **SFrout** will print summary table\n\n![](https://github.com/LukaszHoszowski/sfrout/blob/main/docs/_static/summary.png?raw=True)\n\n### CLI\n\n- open terminal\n\n```sh\ncmd.exe\n```\n\n- type in command\n\n```sh\nsfrout \"https://corp.my.salesforce.com/\" \"C:\\\\path\\\\to\\\\reports.csv\"\n```\n\n- fill in reports.csv file according to given template\n\n[Input file template](https://github.com/LukaszHoszowski/sfrout/blob/main/example/reports-default.csv)\n\nCLI interface allow you to configure parameters according to below list:\n\n```sh\nOptions:\n  -s, --summary_filepath PATH  Path to the summary report ->\n                               c:/summary_report.csv\n  -r, --report TEXT            Run single report ->\n                               \"name,id,path,optional_report_params\"\n  -p, --path PATH              Override save location of the reports\n  -t, --threads INTEGER        Number of threads to spawn  [default: 0]\n  -ls, --stdout_loglevel TEXT  STDOUT logging level -> [DEBUG | INFO | WARN\n                               |WARNING | ERROR | CRITICAL]  [default:\n                               WARNING]\n  -lf, --file_loglevel TEXT    File logging level -> [DEBUG | INFO | WARN|\n                               WARNING | ERROR | CRITICAL]  [default: INFO]\n  -v, --verbose                Turn off progress bar\n  -h, --help                   Show this message and exit.\n```\n\n### Windows Task Scheduler\n\nCreate `sfrout.bat`, save it with below script:\n\n```sh\n\"c:\\path\\to\\new\\virtual\\environment\\Scripts\\python.exe\" sfrout \"https://corp.my.salesforce.com/\" \"C:\\\\path\\\\to\\\\reports.csv\"\npause\n```\n\nTest it, if works create a task and set some schedule\n\n## How the program works\n\nOnce you run the program:\n\n1) configuration, data parsing\n2) creating connector, report, shared queue objects\n3) initialization of workers listeners within file handler\n4) connector will check the connection and execute all required steps to establish the connection\n5) connector will produce asynchronous requests to given domain\n6) once single request is fulfilled retrieved content is being put into the queue\n7) workers keep on listen for items in the queue\n8) once queue is not empty some worker will take the item and start processing\n9) once all the request are fulfilled queue will close and send signals to workers to shutdown once they finish their last job\n\n## SFDC Connector\n\n**Authentication:**\n\nAuthorization and authentication in SFDC is based on `sid` cookie entry for SSO or on security tokens in other cases. \n\nSFrout will try to connect to CookieJar of your MS Edge browser and find `sid` entry for given domain. If `sid` is not found, the app will open MS Edge and request given SFDC domain. You will be asked to log in as usual. After 20 seconds program will retry to find `sid` in your CookieJar. Browsers usually store cookies in SQlite db. This information is not being transfered to db immediately, it can be triggered by closing the browser but it isn't the most elegant solution. SFrout will ask for `sid` in 2 seconds intervals as long as `sid` will be available. Entire process will repeat as many times as it takes.\n\n**Sending requests:**\n\nSFDC supports export GET requests by adding `?export=csv&enc=UTF-8&isdtp=p1` to address of your supplemented with headers and above `sid` entry. In response you will receive CSV-like data stream. Time window for entire operation is fixed and equal to **15 minutes**. If you will not be able to receive response in this narrow time window connection will be forceable shutdown and request cancelled regardless of the stage.\n\nRequests are send out asynchronously to speed things up and restrain memory consumption to bare minimum. Once request will fail, regardless of that what has caused failure, SFrout will retry. Limit of attempts has been set to **20**. Once request is successful, response is saved in Report object and put to the queue for further processing.\n\n## File handler\n\nThread based solution for saving request responses to a file. \n\nFile handler spawns workers in separate threads. Number of workers is equal to half of available threads on your machine (e.g. if your cpu has 6 cores and 12 threads SFR will spawn 6 workers). If information about available resources is not reachable it will default to **2**. Such approach will not dramatically slow down other applications on your computer and will secure required resources for SFR. Each worker will observe `Queue`, if something will be put into `Queue` one of the workers will start processing of the report. Bare in mind that each saving operation erases response and content of the report due to memory consumption. `Queue` size is unlimited so sooner or later workers will handle entire workload. Workers will die once `Queue` will send signal that they shouldn't expect any new items. These workers who are just processing items will finish their jobs and die quietly.\n\nAll files are processed by `Pandas` which gives wide palette of available formats. Unfortunately such flexibility somes with the price. In current shape `Pandas` isn't the best in saving `csv` files due to `numpy` engine. On top of that `Pandas` is a relatively heavy library. I plan to switch to some other processing engine in the future.\n\n## Limitations\n\n- **Caution!** SFrout deletes last 5 lines from each response, SFDC adds footer to each data stream. This might be organization specific and require your attention if you plan to use it in your organization.\n\n- be default number of workers is equal to half of available threads of the machine\n\n- by default rotating file log is limitted to 3 parts, up to 1_000_000 bytes each \n\n- progress bar is based on quantity of items and may show incorrect ETA if report's size vary significantly \n\n- currently only save to `csv` method is available\n\n## Benchmarks\n\nMy testing set consist of 33 reports from various universes of SFDC with size between 200 kb to 200 mb. In total 1.4 gb of data.\n\nTests were not bounded by network bandwidth. Tests were evaluated on i7-8850H, DDR4 32 gb, Windows 10 x64.\n\nProcessing of the testing set vary between 3 and 8 minutes, results strongly correlate to SFDC performance on given time. Time of processing is correlated to size of the reports, bigger = longer.\n\n## Final remarks\n\nThis app has been created based on environment of my organization. There is alternative way of Authenticating to SFDC based on security token, unfortunately this option was blocked in my organization and only SSO is available. \n\n## Documentation available on [Read the Docs](https://sfrout.readthedocs.io)\n\n[![](https://github.com/LukaszHoszowski/sfrout/blob/main/docs/_static/rtd.png?raw=True)](https://sfrout.readthedocs.io)\n\n## Release Notes\n\n[Changelog](https://github.com/LukaszHoszowski/sfrout/blob/main/CHANGELOG.md)\n\n## License\n\n[Apache 2.0](https://github.com/LukaszHoszowski/sfrout/blob/main/LICENSE.md)\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Scalable, asynchronious SalesForce (SFDC) report downloader for SSO/SAML clients",
    "version": "0.0.27",
    "split_keywords": [
        "report",
        "sfdc",
        "salesforce"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cebe7a7a900baf71a8fb7a4b14d259c4b39b77cfcced4c215c820b561dd6899e",
                "md5": "f8ecf25e802eddac3d03cd6ed71524ac",
                "sha256": "93146b8a8d8b7137ad959e18c90b9f01a7edb4a006e13e4990fecec3f5087854"
            },
            "downloads": -1,
            "filename": "sfrout-0.0.27-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f8ecf25e802eddac3d03cd6ed71524ac",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 23912,
            "upload_time": "2023-04-07T12:51:39",
            "upload_time_iso_8601": "2023-04-07T12:51:39.285693Z",
            "url": "https://files.pythonhosted.org/packages/ce/be/7a7a900baf71a8fb7a4b14d259c4b39b77cfcced4c215c820b561dd6899e/sfrout-0.0.27-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dbaebaf13cb4b6f88cf13700bc0b04d6c54b9ff3f1bff313919bdd0c3620311e",
                "md5": "ab0d0e68bb0b12d14945f47f43eebaca",
                "sha256": "604c80e1f4bdd936e43863aef0a8674f5dfbd9bae81b5986a367abcc99c38b1e"
            },
            "downloads": -1,
            "filename": "sfrout-0.0.27.tar.gz",
            "has_sig": false,
            "md5_digest": "ab0d0e68bb0b12d14945f47f43eebaca",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 109704,
            "upload_time": "2023-04-07T12:51:40",
            "upload_time_iso_8601": "2023-04-07T12:51:40.718139Z",
            "url": "https://files.pythonhosted.org/packages/db/ae/baf13cb4b6f88cf13700bc0b04d6c54b9ff3f1bff313919bdd0c3620311e/sfrout-0.0.27.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-07 12:51:40",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "sfrout"
}
        
Elapsed time: 0.05325s