# `edgar-analyzer` - Textual Analysis with EDGAR filings
`edgar-analyzer` is a CLI tool to download SEC filings from EDGAR and perform textual analyses.
## Installation
```bash
pip install edgar-analyzer
```
## Workflow
### Setup
**Download index files**, which contain the firm CIK, name, filing date, type, and URL of the filing.
```bash
edgar-analyzer download_index --user_agent "MyCompany name@mycompany.com" --output "./index"
```
**Build a database** of the previously download index files for more efficient queries.
```bash
edgar-analyzer build_database --inputdir "./index" --database "edgar-idx.sqlite3"
```
**Download filings**, only filings in the database but not downloaded yet will be downloaded. Download speed will be auto throttled as per SEC's fair use policy.
```bash
edgar-analyzer download_filings --user_agent "MyCompany name@mycompany.com" --output "./output" --database "edgar-idx.sqlite3" --file_type "8-K" -t 4
```
### Run specific jobs
These tasks can be executed once the database of filings is built.
#### Find event date
```bash
❯ edgar-analyzer find_event_date -h
usage: edgar-analyzer [OPTION]... find_event_date [-h] -d data_directory --file_type file_type [-db databsae] [-t threads]
Find event date from filings from header data
options:
-h, --help show this help message and exit
-t threads, --threads threads
number of processes to use
required named arguments:
-d data_directory, --data_dir data_directory
directory of filings
--file_type file_type
type of filing
-db databsae, --database databsae
sqlite database to store results
```
#### Find reported items
```bash
❯ edgar-analyzer find_reported_items -h
usage: edgar-analyzer [OPTION]... find_reported_items [-h] -d data_directory --file_type file_type [-db databsae] [-t threads]
Find reported items from filings from header data
options:
-h, --help show this help message and exit
-t threads, --threads threads
number of processes to use
required named arguments:
-d data_directory, --data_dir data_directory
directory of filings
--file_type file_type
type of filing
-db databsae, --database databsae
sqlite database to store results
```
#### more to be integrated
## Example
Just a simple example of the job `find_event_date`. Based on the 1,491,368 8K filings (2004-2022), the table below shows the reporting lags (date of filing minus date of event).
We can find that _most_ filings are filed on the same day as the event reported, and that over 99.99% of filings are filed within 4 calendar days (SEC requires 4 business days).
| Filing lag (calendar days) | Frequency | Percentage | Cumulative |
| ---------------------------- | --------- | ---------- | ---------- |
| 0 | 1470089 | 98.57% | 98.57% |
| 1 | 20761 | 1.39% | 99.97% |
| 2 | 285 | 0.02% | 99.98% |
| 3 | 89 | 0.01% | 99.99% |
| 4 | 47 | 0.00% | 99.99% |
| 5 | 26 | 0.00% | 100.00% |
| 6 | 14 | 0.00% | 100.00% |
| 7 | 6 | 0.00% | 100.00% |
| 8 | 4 | 0.00% | 100.00% |
| 9 | 3 | 0.00% | 100.00% |
| 10 or more | 44 | 0.00% | 100.00% |
## Note
This tool is a work in progress and breaking changes may be expected.
## Contact
If you identify any issue, please feel free to contact me at [mingze.gao@sydney.edu.au](mailto:mingze.gao@sydney.edu.au).
Raw data
{
"_id": null,
"home_page": "https://github.com/mgao6767/edgar-analyzer",
"name": "edgar-analyzer",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Mingze Gao",
"author_email": "mingze.gao@sydney.edu.au",
"download_url": "https://files.pythonhosted.org/packages/41/a3/3efac8a0dca51b8bd45a775767c1404d6743beb9bcd0902d4b9b9ddbd9c7/edgar-analyzer-0.0.1rc7.tar.gz",
"platform": null,
"description": "# `edgar-analyzer` - Textual Analysis with EDGAR filings\n\n`edgar-analyzer` is a CLI tool to download SEC filings from EDGAR and perform textual analyses.\n\n## Installation\n\n```bash\npip install edgar-analyzer\n```\n\n## Workflow\n\n### Setup\n\n**Download index files**, which contain the firm CIK, name, filing date, type, and URL of the filing.\n\n```bash\nedgar-analyzer download_index --user_agent \"MyCompany name@mycompany.com\" --output \"./index\"\n```\n\n**Build a database** of the previously download index files for more efficient queries.\n\n```bash\nedgar-analyzer build_database --inputdir \"./index\" --database \"edgar-idx.sqlite3\"\n```\n\n**Download filings**, only filings in the database but not downloaded yet will be downloaded. Download speed will be auto throttled as per SEC's fair use policy.\n\n```bash\nedgar-analyzer download_filings --user_agent \"MyCompany name@mycompany.com\" --output \"./output\" --database \"edgar-idx.sqlite3\" --file_type \"8-K\" -t 4\n```\n\n### Run specific jobs\n\nThese tasks can be executed once the database of filings is built.\n\n#### Find event date\n\n```bash\n\u276f edgar-analyzer find_event_date -h\nusage: edgar-analyzer [OPTION]... find_event_date [-h] -d data_directory --file_type file_type [-db databsae] [-t threads]\n\nFind event date from filings from header data\n\noptions:\n -h, --help show this help message and exit\n -t threads, --threads threads\n number of processes to use\n\nrequired named arguments:\n -d data_directory, --data_dir data_directory\n directory of filings\n --file_type file_type\n type of filing\n -db databsae, --database databsae\n sqlite database to store results\n```\n\n#### Find reported items\n\n```bash\n\u276f edgar-analyzer find_reported_items -h\nusage: edgar-analyzer [OPTION]... find_reported_items [-h] -d data_directory --file_type file_type [-db databsae] [-t threads]\n\nFind reported items from filings from header data\n\noptions:\n -h, --help show this help message and exit\n -t threads, --threads threads\n number of processes to use\n\nrequired named arguments:\n -d data_directory, --data_dir data_directory\n directory of filings\n --file_type file_type\n type of filing\n -db databsae, --database databsae\n sqlite database to store results\n```\n\n#### more to be integrated\n\n## Example\n\nJust a simple example of the job `find_event_date`. Based on the 1,491,368 8K filings (2004-2022), the table below shows the reporting lags (date of filing minus date of event). \n\nWe can find that _most_ filings are filed on the same day as the event reported, and that over 99.99% of filings are filed within 4 calendar days (SEC requires 4 business days).\n\n| Filing lag (calendar days) | Frequency | Percentage | Cumulative |\n| ---------------------------- | --------- | ---------- | ---------- |\n| 0 | 1470089 | 98.57% | 98.57% |\n| 1 | 20761 | 1.39% | 99.97% |\n| 2 | 285 | 0.02% | 99.98% |\n| 3 | 89 | 0.01% | 99.99% |\n| 4 | 47 | 0.00% | 99.99% |\n| 5 | 26 | 0.00% | 100.00% |\n| 6 | 14 | 0.00% | 100.00% |\n| 7 | 6 | 0.00% | 100.00% |\n| 8 | 4 | 0.00% | 100.00% |\n| 9 | 3 | 0.00% | 100.00% |\n| 10 or more | 44 | 0.00% | 100.00% |\n\n## Note\n\nThis tool is a work in progress and breaking changes may be expected.\n\n## Contact\n\nIf you identify any issue, please feel free to contact me at [mingze.gao@sydney.edu.au](mailto:mingze.gao@sydney.edu.au).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Textual analysis on SEC filings from EDGAR",
"version": "0.0.1rc7",
"project_urls": {
"Homepage": "https://github.com/mgao6767/edgar-analyzer"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "41a33efac8a0dca51b8bd45a775767c1404d6743beb9bcd0902d4b9b9ddbd9c7",
"md5": "8d2f4b231685cc844a44ae1fc05c6123",
"sha256": "45ea120589c82965c574054a3809bbdeca012b6282db5f39aa1537f8760d497d"
},
"downloads": -1,
"filename": "edgar-analyzer-0.0.1rc7.tar.gz",
"has_sig": false,
"md5_digest": "8d2f4b231685cc844a44ae1fc05c6123",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11022,
"upload_time": "2023-09-08T07:33:04",
"upload_time_iso_8601": "2023-09-08T07:33:04.175089Z",
"url": "https://files.pythonhosted.org/packages/41/a3/3efac8a0dca51b8bd45a775767c1404d6743beb9bcd0902d4b9b9ddbd9c7/edgar-analyzer-0.0.1rc7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-08 07:33:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mgao6767",
"github_project": "edgar-analyzer",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "edgar-analyzer"
}