secdaily


Namesecdaily JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryA tool that replicates the quarterly Financial Statement Datasets from the SEC (https://www.sec.gov/dera/data/financial-statement-data-sets), but on a daily basis.
upload_time2025-07-13 05:18:29
maintainerHansjoerg Wingeier
docs_urlNone
authorHansjoerg
requires_python>=3.10
licenseApache-2.0
keywords sec.gov sec edgar sec filing edgar finance cik 10-q 10-k financial statements financial statements dataset financial analysis data processing financial data sec api xbrl
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SEC Financial Statement Data Set Daily Processing

[![PyPI version](https://badge.fury.io/py/secdaily.svg)](https://badge.fury.io/py/secdaily)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

## Purpose

The `secdaily` package replicates the quarterly [Financial Statement Datasets](https://www.sec.gov/dera/data/financial-statement-data-sets) from the SEC, but on a daily basis. While the SEC only provides these datasets once per quarter, this tool allows you to:

- Add daily updates by processing new 10-K and 10-Q filings as they become available
- Generate daily zip files in the same format as the official quarterly datasets

This enables financial analysts, researchers, and developers to access structured financial statement data without waiting for the quarterly releases.

## Installation

The package requires Python 3.10 or higher. Install using pip:

```bash
pip install secdaily
```

## Usage

The main entry point is the `SecDailyOrchestrator` class. Here's a basic example:

```python
from secdaily.SecDaily import SecDailyOrchestrator, Configuration

# create the configuration
configuration = Configuration(workdir=workdir_default)


# Initialize the orchestrator
orchestrator = SecDailyOrchestrator(configuration=configuration)

# Run the full process
orchestrator.process(
    start_year=2025,  # Optional: specify starting year (defaults to current year)
    start_qrtr=1      # Optional: specify starting quarter (defaults to current quarter)
)
```

### Configuration Parameters

The configuration class provides the following parameters:

- `user_agent_def`: User agent string for SEC.gov requests. If not provided, a default string will be generated. Must follow the format specified in [SEC's EDGAR access requirements](https://www.sec.gov/os/accessing-edgar-data): "Company Name contact@company.com"
- `workdir`: Working directory for storing all data. Defaults to current directory.
- `xmldir`: Directory for storing XML files. If not provided, defaults to '_1_xml/' under workdir.
- `csvdir`: Directory for storing CSV files. If not provided, defaults to '_2_csv/' under workdir.
- `formatdir`: Directory for storing SEC-style formatted files. If not provided, defaults to '_3_secstyle/' under workdir.
- `dailyzipdir`: Directory for storing daily zip files. If not provided, defaults to '_4_daily/' under workdir.
- `quarterzipdir`: Directory for storing quarterly zip files. If not provided, defaults to '_5_quarter/' under workdir.
- `clean_intermediate_files`: Flag to clean up intermediate files during housekeeping. Defaults to False.
- `clean_db_entries`: Flag to clean up database entries during housekeeping. Defaults to False.
- `clean_daily_zip_files`: Flag to clean up daily zip files during housekeeping. Defaults to False.
- `clean_quarter_zip_files`: Flag to clean up quarterly zip files during housekeeping. Defaults to False.


### How to use it
Normally, you will use the "orginial" quarterly files from the SEC [Financial Statement Datasets](https://www.sec.gov/dera/data/financial-statement-data-sets) as a starting point. Therefore, you will set the "start_year" and "start_qrtr" parameters to the quarter of the first quarter that is missing at SEC. For example, if quarterly up to 2024Q4 are available on the SEC site, you will set the "start_year" to 2025 and the "start_qrtr" to 1 in order to download and process the daily available xml files and transform them into the same format as the SEC quarterly files. 

The quarterly zip file from the sec is usually available two to three weeks after the quarter end. 

As soon as a new quarter zip file on SEC is available, you can then adjust the startyear and startqrtr parameters to the next quarter. Dpending on the configuration, intermediate files, database entries, and zip files can be cleaned up.

Since reports are filed daily on the SEC, you will run the process daily to be always up-to-date with the latest available reports.


## High-level Process Description

1. **Index Processing**: Parse SEC's index.json to identify new filings
2. **XML Processing**: Download and extract necessary XML files
3. **Data Parsing**: Process the XML files into CSV format (creating initial versions of `num.txt`, `pre.txt`, `lab.txt`)
4. **SEC-style Formatting**: Format the data to match the official SEC dataset structure
5. **Daily Zip Creation**: Package the formatted data into daily zip files
6. **Quarterly Zip Creation**: Package the daily zip files into quarterly zip files
7. **Housekeeping**: Clean up intermediate files, database entries, and zip files based on the provided configuration


### Individual Process Steps

You can also run individual parts of the process:

```python
# Only process index data
orchestrator.process_index_data()

# Only process XML data
orchestrator.process_xml_data()

# Only create SEC-style formatted files
orchestrator.create_sec_style()

# Only create daily zip files
orchestrator.create_daily_zip()

# Only create quarter zip files
orchestrator.create_quarter_zip()

# Only perform housekeeping
# housekeeps everything before the start quarter
orchestrator.housekeeping(start_qrtr_info=QuarterInfo(year=2025, qrtr=1))
```

## Directory Structure of the Created Data

The tool creates the following directory structure in your specified `workdir`:

```
workdir/
├── sec_processing.db          # SQLite database for tracking processing
├── _1_xml/                    # Downloaded XML files
│   ├── 2024q4/  
│   │   ├── 2024-10-01/
│   │   │   ├── xyz_htm.xml.zip
│   │   │   ├── xyz_pre.xml.zip
│   │   │   ├── xyz_lab.xml.zip
│   │   │   └── ...
│   │   └── ...
│   └── ...                    
├── _2_csv/                    # Parsed CSV files
│   ├── 2024q4/  
│   │   ├── 2024-10-01/
│   │   │   ├── xyz_num.csv.zip
│   │   │   ├── xyz_pre.csv.zip
│   │   │   ├── xyz_lab.csv.zip
│   │   │   └── ...
│   │   └── ...
│   └── ...                    
├── _3_secstyle/               # SEC-style formatted files
│   ├── 2024q4/  
│   │   ├── 2024-10-01/
│   │   │   ├── xyz_num.csv.zip
│   │   │   ├── xyz_pre.csv.zip
│   │   │   └── ...
│   │   └── ...
│   └── ...                    
├── _4_daily/                  # Daily zip files
│   ├── 2024q4/                
│   │   ├── 20241001.zip       
│   │   ├── 20241002.zip
│   │   └── ...
│   └── ...
└── _5_quarter/                # Quarterly zip files
    ├── 2024q4.zip
    ├── 2025q1.zip
    └── ...
```

Each daily and quarterly zip file contains:
- `sub.txt` - Submission information
- `pre.txt` - Presentation information
- `num.txt` - Numeric data

## Limitations

- `num.txt` doesn't contain content for the segments column
- XBRL data embedded in HTML files (approximately 20% of reports) is not processed yet
- Numbering of columns "report" and "line" in `pre.txt` may not be the same as in the quarterly files, but the order should be the same
- The tool throttles requests to SEC.gov to comply with their limit of 10 requests per second
- sub.txt only contains a subset of the information available in the quarterly files from sec.gov

## Robustness Features

- Implements retry mechanisms for failed downloads
- Uses a SQLite database to track processing state, allowing for safe restarts
- Throttles requests to comply with SEC.gov's rate limits
- Stores downloaded and created files in a compressed format to conserve disk space
- Uses parallel processing where appropriate for improved performance

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

## SEC Financial Statement Data Sets Tools (secfsdstools)
Also check out the [SEC Financial Statement Data Sets Tools](https://github.com/HansjoergW/secfsdstools) project.

## Links

- [Documentation](https://hansjoergw.github.io/sec-financial-statement-data-set-daily-processing/)
- [GitHub Repository](https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing)
- [Issue Tracker](https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing/issues)
- [Discussions](https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing/discussions)
- [Changelog](https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing/blob/main/CHANGELOG.md)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "secdaily",
    "maintainer": "Hansjoerg Wingeier",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "hansjoeg.wingeier@gmail.com",
    "keywords": "SEC.GOV, SEC EDGAR, SEC Filing, EDGAR, Finance, CIK, 10-Q, 10-K, Financial Statements, Financial Statements Dataset, Financial Analysis, Data Processing, Financial Data, SEC API, XBRL",
    "author": "Hansjoerg",
    "author_email": "hansjoeg.wingeier@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a6/c0/7669aed97fb1ceffb907aa0a5ac6b6d0f20279ff715ac9c3e7b64fa49e08/secdaily-0.2.1.tar.gz",
    "platform": null,
    "description": "# SEC Financial Statement Data Set Daily Processing\n\n[![PyPI version](https://badge.fury.io/py/secdaily.svg)](https://badge.fury.io/py/secdaily)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\n## Purpose\n\nThe `secdaily` package replicates the quarterly [Financial Statement Datasets](https://www.sec.gov/dera/data/financial-statement-data-sets) from the SEC, but on a daily basis. While the SEC only provides these datasets once per quarter, this tool allows you to:\n\n- Add daily updates by processing new 10-K and 10-Q filings as they become available\n- Generate daily zip files in the same format as the official quarterly datasets\n\nThis enables financial analysts, researchers, and developers to access structured financial statement data without waiting for the quarterly releases.\n\n## Installation\n\nThe package requires Python 3.10 or higher. Install using pip:\n\n```bash\npip install secdaily\n```\n\n## Usage\n\nThe main entry point is the `SecDailyOrchestrator` class. Here's a basic example:\n\n```python\nfrom secdaily.SecDaily import SecDailyOrchestrator, Configuration\n\n# create the configuration\nconfiguration = Configuration(workdir=workdir_default)\n\n\n# Initialize the orchestrator\norchestrator = SecDailyOrchestrator(configuration=configuration)\n\n# Run the full process\norchestrator.process(\n    start_year=2025,  # Optional: specify starting year (defaults to current year)\n    start_qrtr=1      # Optional: specify starting quarter (defaults to current quarter)\n)\n```\n\n### Configuration Parameters\n\nThe configuration class provides the following parameters:\n\n- `user_agent_def`: User agent string for SEC.gov requests. If not provided, a default string will be generated. Must follow the format specified in [SEC's EDGAR access requirements](https://www.sec.gov/os/accessing-edgar-data): \"Company Name contact@company.com\"\n- `workdir`: Working directory for storing all data. Defaults to current directory.\n- `xmldir`: Directory for storing XML files. If not provided, defaults to '_1_xml/' under workdir.\n- `csvdir`: Directory for storing CSV files. If not provided, defaults to '_2_csv/' under workdir.\n- `formatdir`: Directory for storing SEC-style formatted files. If not provided, defaults to '_3_secstyle/' under workdir.\n- `dailyzipdir`: Directory for storing daily zip files. If not provided, defaults to '_4_daily/' under workdir.\n- `quarterzipdir`: Directory for storing quarterly zip files. If not provided, defaults to '_5_quarter/' under workdir.\n- `clean_intermediate_files`: Flag to clean up intermediate files during housekeeping. Defaults to False.\n- `clean_db_entries`: Flag to clean up database entries during housekeeping. Defaults to False.\n- `clean_daily_zip_files`: Flag to clean up daily zip files during housekeeping. Defaults to False.\n- `clean_quarter_zip_files`: Flag to clean up quarterly zip files during housekeeping. Defaults to False.\n\n\n### How to use it\nNormally, you will use the \"orginial\" quarterly files from the SEC [Financial Statement Datasets](https://www.sec.gov/dera/data/financial-statement-data-sets) as a starting point. Therefore, you will set the \"start_year\" and \"start_qrtr\" parameters to the quarter of the first quarter that is missing at SEC. For example, if quarterly up to 2024Q4 are available on the SEC site, you will set the \"start_year\" to 2025 and the \"start_qrtr\" to 1 in order to download and process the daily available xml files and transform them into the same format as the SEC quarterly files. \n\nThe quarterly zip file from the sec is usually available two to three weeks after the quarter end. \n\nAs soon as a new quarter zip file on SEC is available, you can then adjust the startyear and startqrtr parameters to the next quarter. Dpending on the configuration, intermediate files, database entries, and zip files can be cleaned up.\n\nSince reports are filed daily on the SEC, you will run the process daily to be always up-to-date with the latest available reports.\n\n\n## High-level Process Description\n\n1. **Index Processing**: Parse SEC's index.json to identify new filings\n2. **XML Processing**: Download and extract necessary XML files\n3. **Data Parsing**: Process the XML files into CSV format (creating initial versions of `num.txt`, `pre.txt`, `lab.txt`)\n4. **SEC-style Formatting**: Format the data to match the official SEC dataset structure\n5. **Daily Zip Creation**: Package the formatted data into daily zip files\n6. **Quarterly Zip Creation**: Package the daily zip files into quarterly zip files\n7. **Housekeeping**: Clean up intermediate files, database entries, and zip files based on the provided configuration\n\n\n### Individual Process Steps\n\nYou can also run individual parts of the process:\n\n```python\n# Only process index data\norchestrator.process_index_data()\n\n# Only process XML data\norchestrator.process_xml_data()\n\n# Only create SEC-style formatted files\norchestrator.create_sec_style()\n\n# Only create daily zip files\norchestrator.create_daily_zip()\n\n# Only create quarter zip files\norchestrator.create_quarter_zip()\n\n# Only perform housekeeping\n# housekeeps everything before the start quarter\norchestrator.housekeeping(start_qrtr_info=QuarterInfo(year=2025, qrtr=1))\n```\n\n## Directory Structure of the Created Data\n\nThe tool creates the following directory structure in your specified `workdir`:\n\n```\nworkdir/\n\u251c\u2500\u2500 sec_processing.db          # SQLite database for tracking processing\n\u251c\u2500\u2500 _1_xml/                    # Downloaded XML files\n\u2502   \u251c\u2500\u2500 2024q4/  \n\u2502   \u2502   \u251c\u2500\u2500 2024-10-01/\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 xyz_htm.xml.zip\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 xyz_pre.xml.zip\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 xyz_lab.xml.zip\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 ...\n\u2502   \u2502   \u2514\u2500\u2500 ...\n\u2502   \u2514\u2500\u2500 ...                    \n\u251c\u2500\u2500 _2_csv/                    # Parsed CSV files\n\u2502   \u251c\u2500\u2500 2024q4/  \n\u2502   \u2502   \u251c\u2500\u2500 2024-10-01/\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 xyz_num.csv.zip\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 xyz_pre.csv.zip\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 xyz_lab.csv.zip\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 ...\n\u2502   \u2502   \u2514\u2500\u2500 ...\n\u2502   \u2514\u2500\u2500 ...                    \n\u251c\u2500\u2500 _3_secstyle/               # SEC-style formatted files\n\u2502   \u251c\u2500\u2500 2024q4/  \n\u2502   \u2502   \u251c\u2500\u2500 2024-10-01/\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 xyz_num.csv.zip\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 xyz_pre.csv.zip\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 ...\n\u2502   \u2502   \u2514\u2500\u2500 ...\n\u2502   \u2514\u2500\u2500 ...                    \n\u251c\u2500\u2500 _4_daily/                  # Daily zip files\n\u2502   \u251c\u2500\u2500 2024q4/                \n\u2502   \u2502   \u251c\u2500\u2500 20241001.zip       \n\u2502   \u2502   \u251c\u2500\u2500 20241002.zip\n\u2502   \u2502   \u2514\u2500\u2500 ...\n\u2502   \u2514\u2500\u2500 ...\n\u2514\u2500\u2500 _5_quarter/                # Quarterly zip files\n    \u251c\u2500\u2500 2024q4.zip\n    \u251c\u2500\u2500 2025q1.zip\n    \u2514\u2500\u2500 ...\n```\n\nEach daily and quarterly zip file contains:\n- `sub.txt` - Submission information\n- `pre.txt` - Presentation information\n- `num.txt` - Numeric data\n\n## Limitations\n\n- `num.txt` doesn't contain content for the segments column\n- XBRL data embedded in HTML files (approximately 20% of reports) is not processed yet\n- Numbering of columns \"report\" and \"line\" in `pre.txt` may not be the same as in the quarterly files, but the order should be the same\n- The tool throttles requests to SEC.gov to comply with their limit of 10 requests per second\n- sub.txt only contains a subset of the information available in the quarterly files from sec.gov\n\n## Robustness Features\n\n- Implements retry mechanisms for failed downloads\n- Uses a SQLite database to track processing state, allowing for safe restarts\n- Throttles requests to comply with SEC.gov's rate limits\n- Stores downloaded and created files in a compressed format to conserve disk space\n- Uses parallel processing where appropriate for improved performance\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the Apache License 2.0 - see the LICENSE file for details.\n\n## SEC Financial Statement Data Sets Tools (secfsdstools)\nAlso check out the [SEC Financial Statement Data Sets Tools](https://github.com/HansjoergW/secfsdstools) project.\n\n## Links\n\n- [Documentation](https://hansjoergw.github.io/sec-financial-statement-data-set-daily-processing/)\n- [GitHub Repository](https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing)\n- [Issue Tracker](https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing/issues)\n- [Discussions](https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing/discussions)\n- [Changelog](https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing/blob/main/CHANGELOG.md)\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A tool that replicates the quarterly Financial Statement Datasets from the SEC (https://www.sec.gov/dera/data/financial-statement-data-sets), but on a daily basis.",
    "version": "0.2.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing/issues",
        "Change Log": "https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing/blob/main/CHANGELOG.md",
        "Forum": "https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing/discussions",
        "Funding": "https://github.com/sponsors/HansjoergW",
        "Github": "https://github.com/HansjoergW/sec-financial-statement-data-set-daily-processing",
        "Homepage": "https://hansjoergw.github.io/sec-financial-statement-data-set-daily-processing/"
    },
    "split_keywords": [
        "sec.gov",
        " sec edgar",
        " sec filing",
        " edgar",
        " finance",
        " cik",
        " 10-q",
        " 10-k",
        " financial statements",
        " financial statements dataset",
        " financial analysis",
        " data processing",
        " financial data",
        " sec api",
        " xbrl"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7b9e366b59a821d1fe483687c608867eb390ae20c4e2d6b92dd7635c8812645b",
                "md5": "7fd7ee75bb2d3125be74f2be56c2cac4",
                "sha256": "df53ca4bf25005dc9a20a957a6b7b58d508a95f8a2e07e7b9491aee8e1060036"
            },
            "downloads": -1,
            "filename": "secdaily-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7fd7ee75bb2d3125be74f2be56c2cac4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 2147313,
            "upload_time": "2025-07-13T05:18:27",
            "upload_time_iso_8601": "2025-07-13T05:18:27.121994Z",
            "url": "https://files.pythonhosted.org/packages/7b/9e/366b59a821d1fe483687c608867eb390ae20c4e2d6b92dd7635c8812645b/secdaily-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a6c07669aed97fb1ceffb907aa0a5ac6b6d0f20279ff715ac9c3e7b64fa49e08",
                "md5": "6a1fffc69e55cec5c29bcb3e3baa558b",
                "sha256": "d762bb2918c62ca0b32e67e0c4dda49acd05c187ec395ef1f22dbdd4b9c7851c"
            },
            "downloads": -1,
            "filename": "secdaily-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "6a1fffc69e55cec5c29bcb3e3baa558b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 2039795,
            "upload_time": "2025-07-13T05:18:29",
            "upload_time_iso_8601": "2025-07-13T05:18:29.203016Z",
            "url": "https://files.pythonhosted.org/packages/a6/c0/7669aed97fb1ceffb907aa0a5ac6b6d0f20279ff715ac9c3e7b64fa49e08/secdaily-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-13 05:18:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "HansjoergW",
    "github_project": "sec-financial-statement-data-set-daily-processing",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "secdaily"
}
        
Elapsed time: 0.72954s