# SEC DATA DOWNLOADER
```bash
pip install llama-index-readers-sec-filings
```
Please checkout this repo that I am building on SEC Question Answering Agent [SEC-QA](https://github.com/Athe-kunal/SEC-QA-Agent)
This repository downloads all the texts from SEC documents (10-K and 10-Q). Currently, it is not supporting documents that are amended, but that will be added in the near futures.
Install the required dependencies
```
python install -r requirements.txt
```
The SEC Downloader expects 5 attributes
- tickers: It is a list of valid tickers
- amount: Number of documents that you want to download
- filing_type: 10-K or 10-Q filing type
- num_workers: It is for multithreading and multiprocessing. We have multi-threading at the ticker level and multi-processing at the year level for a given ticker
- include_amends: To include amendments or not.
## Usage
```python
from llama_index.readers.sec_filings import SECFilingsLoader
loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()
```
It will download the data in the following directories and sub-directories
```yaml
- AAPL
- 2018
- 10-K.json
- 2019
- 10-K.json
- 2020
- 10-K.json
- 2021
- 10-K.json
- 10-Q_12.json
- 2022
- 10-K.json
- 10-Q_03.json
- 10-Q_06.json
- 10-Q_12.json
- 2023
- 10-Q_04.json
- GOOGL
- 2018
- 10-K.json
- 2019
- 10-K.json
- 2020
- 10-K.json
- 2021
- 10-K.json
- 10-Q_09.json
- 2022
- 10-K.json
- 10-Q_03.json
- 10-Q_06.json
- 10-Q_09.json
- 2023
- 10-Q_03.json
- TSLA
- 2018
- 10-K.json
- 2019
- 10-K.json
- 2020
- 10-K.json
- 2021
- 10-K.json
- 10-KA.json
- 10-Q_09.json
- 2022
- 10-K.json
- 10-Q_03.json
- 10-Q_06.json
- 10-Q_09.json
- 2023
- 10-Q_03.json
```
Here for each ticker we have separate folders with 10-K data inside respective years and 10-Q data is saved in the respective year along with the month. `10-Q_03.json` means March data of 10-Q document. Also, the amended documents are stored in their respective year
## EXAMPLES
This loader is can be used with both Langchain and LlamaIndex.
### LlamaIndex
```python
from llama_index.core import VectorStoreIndex, download_loader
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.sec_filings import SECFilingsLoader
loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()
documents = SimpleDirectoryReader("data\TSLA\2022").load_data()
index = VectorStoreIndex.from_documents(documents)
index.query("What are the risk factors of Tesla for the year 2022?")
```
### Langchain
```python
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader
from langchain.indexes import VectorstoreIndexCreator
from llama_index.readers.sec_filings import SECFilingsLoader
loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()
dir_loader = DirectoryLoader("data\TSLA\2022")
index = VectorstoreIndexCreator().from_loaders([dir_loader])
retriever = index.vectorstore.as_retriever()
qa = RetrievalQA.from_chain_type(
llm=OpenAI(), chain_type="stuff", retriever=retriever
)
query = "What are the risk factors of Tesla for the year 2022?"
qa.run(query)
```
## REFERENCES
1. Unstructured SEC Filings API: [repo link](https://github.com/Unstructured-IO/pipeline-sec-filings/tree/main)
2. SEC Edgar Downloader: [repo link](https://github.com/jadchaar/sec-edgar-downloader)
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-readers-sec-filings",
"maintainer": "Athe-kunal",
"docs_url": null,
"requires_python": "<4.0,>=3.8.1",
"maintainer_email": null,
"keywords": "10-K, 10-Q, SEC Filings, finance",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/63/aa/7f7f911131646cab1dc41dd0bde530b38f962c6898cc617b2090e58652ef/llama_index_readers_sec_filings-0.1.5.tar.gz",
"platform": null,
"description": "# SEC DATA DOWNLOADER\n\n```bash\npip install llama-index-readers-sec-filings\n```\n\nPlease checkout this repo that I am building on SEC Question Answering Agent [SEC-QA](https://github.com/Athe-kunal/SEC-QA-Agent)\n\nThis repository downloads all the texts from SEC documents (10-K and 10-Q). Currently, it is not supporting documents that are amended, but that will be added in the near futures.\n\nInstall the required dependencies\n\n```\npython install -r requirements.txt\n```\n\nThe SEC Downloader expects 5 attributes\n\n- tickers: It is a list of valid tickers\n- amount: Number of documents that you want to download\n- filing_type: 10-K or 10-Q filing type\n- num_workers: It is for multithreading and multiprocessing. We have multi-threading at the ticker level and multi-processing at the year level for a given ticker\n- include_amends: To include amendments or not.\n\n## Usage\n\n```python\nfrom llama_index.readers.sec_filings import SECFilingsLoader\n\nloader = SECFilingsLoader(tickers=[\"TSLA\"], amount=3, filing_type=\"10-K\")\nloader.load_data()\n```\n\nIt will download the data in the following directories and sub-directories\n\n```yaml\n- AAPL\n - 2018\n - 10-K.json\n - 2019\n - 10-K.json\n - 2020\n - 10-K.json\n - 2021\n - 10-K.json\n - 10-Q_12.json\n - 2022\n - 10-K.json\n - 10-Q_03.json\n - 10-Q_06.json\n - 10-Q_12.json\n - 2023\n - 10-Q_04.json\n- GOOGL\n - 2018\n - 10-K.json\n - 2019\n - 10-K.json\n - 2020\n - 10-K.json\n - 2021\n - 10-K.json\n - 10-Q_09.json\n - 2022\n - 10-K.json\n - 10-Q_03.json\n - 10-Q_06.json\n - 10-Q_09.json\n - 2023\n - 10-Q_03.json\n- TSLA\n - 2018\n - 10-K.json\n - 2019\n - 10-K.json\n - 2020\n - 10-K.json\n - 2021\n - 10-K.json\n - 10-KA.json\n - 10-Q_09.json\n - 2022\n - 10-K.json\n - 10-Q_03.json\n - 10-Q_06.json\n - 10-Q_09.json\n - 2023\n - 10-Q_03.json\n```\n\nHere for each ticker we have separate folders with 10-K data inside respective years and 10-Q data is saved in the respective year along with the month. `10-Q_03.json` means March data of 10-Q document. Also, the amended documents are stored in their respective year\n\n## EXAMPLES\n\nThis loader is can be used with both Langchain and LlamaIndex.\n\n### LlamaIndex\n\n```python\nfrom llama_index.core import VectorStoreIndex, download_loader\nfrom llama_index.core import SimpleDirectoryReader\n\nfrom llama_index.readers.sec_filings import SECFilingsLoader\n\nloader = SECFilingsLoader(tickers=[\"TSLA\"], amount=3, filing_type=\"10-K\")\nloader.load_data()\n\ndocuments = SimpleDirectoryReader(\"data\\TSLA\\2022\").load_data()\nindex = VectorStoreIndex.from_documents(documents)\nindex.query(\"What are the risk factors of Tesla for the year 2022?\")\n```\n\n### Langchain\n\n```python\nfrom langchain.llms import OpenAI\nfrom langchain.chains import RetrievalQA\nfrom langchain.document_loaders import DirectoryLoader\nfrom langchain.indexes import VectorstoreIndexCreator\n\nfrom llama_index.readers.sec_filings import SECFilingsLoader\n\nloader = SECFilingsLoader(tickers=[\"TSLA\"], amount=3, filing_type=\"10-K\")\nloader.load_data()\n\ndir_loader = DirectoryLoader(\"data\\TSLA\\2022\")\n\nindex = VectorstoreIndexCreator().from_loaders([dir_loader])\nretriever = index.vectorstore.as_retriever()\nqa = RetrievalQA.from_chain_type(\n llm=OpenAI(), chain_type=\"stuff\", retriever=retriever\n)\n\nquery = \"What are the risk factors of Tesla for the year 2022?\"\nqa.run(query)\n```\n\n## REFERENCES\n\n1. Unstructured SEC Filings API: [repo link](https://github.com/Unstructured-IO/pipeline-sec-filings/tree/main)\n2. SEC Edgar Downloader: [repo link](https://github.com/jadchaar/sec-edgar-downloader)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index readers sec_filings integration",
"version": "0.1.5",
"project_urls": null,
"split_keywords": [
"10-k",
" 10-q",
" sec filings",
" finance"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f54ff9e3e5864f3e84998f3f32c0595bda29bfb7d1363cccb4f3ae68297f1eff",
"md5": "eba232bffd87cc7dc63d5907ae854674",
"sha256": "a3def3ab82bb5c931508f2c6c0b829bc5e79d27185e5ff230bce6bf56db39fda"
},
"downloads": -1,
"filename": "llama_index_readers_sec_filings-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eba232bffd87cc7dc63d5907ae854674",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8.1",
"size": 25235,
"upload_time": "2024-05-20T16:48:43",
"upload_time_iso_8601": "2024-05-20T16:48:43.790447Z",
"url": "https://files.pythonhosted.org/packages/f5/4f/f9e3e5864f3e84998f3f32c0595bda29bfb7d1363cccb4f3ae68297f1eff/llama_index_readers_sec_filings-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "63aa7f7f911131646cab1dc41dd0bde530b38f962c6898cc617b2090e58652ef",
"md5": "dfd183bcd1a9f4779e2cafa2d1c89c60",
"sha256": "93a3e20ba9345c31b47b970dee13d3f789167721cb6d9017edcb28e768857ce4"
},
"downloads": -1,
"filename": "llama_index_readers_sec_filings-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "dfd183bcd1a9f4779e2cafa2d1c89c60",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8.1",
"size": 22232,
"upload_time": "2024-05-20T16:48:45",
"upload_time_iso_8601": "2024-05-20T16:48:45.003022Z",
"url": "https://files.pythonhosted.org/packages/63/aa/7f7f911131646cab1dc41dd0bde530b38f962c6898cc617b2090e58652ef/llama_index_readers_sec_filings-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-20 16:48:45",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-readers-sec-filings"
}