# SEC DATA DOWNLOADER
Please checkout this repo that I am building on SEC Question Answering Agent [SEC-QA](https://github.com/Athe-kunal/SEC-QA-Agent)
This repository downloads all the texts from SEC documents (10-K and 10-Q). Currently, it is not supporting documents that are amended, but that will be added in the near futures.
Install the required dependencies
```
python install -r requirements.txt
```
The SEC Downloader expects 5 attributes
- tickers: It is a list of valid tickers
- amount: Number of documents that you want to download
- filing_type: 10-K or 10-Q filing type
- num_workers: It is for multithreading and multiprocessing. We have multi-threading at the ticker level and multi-processing at the year level for a given ticker
- include_amends: To include amendments or not.
## Usage
```python
from llama_index import download_loader
SECFilingsLoader = download_loader("SECFilingsLoader")
loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()
```
It will download the data in the following directories and sub-directories
```yaml
- AAPL
- 2018
- 10-K.json
- 2019
- 10-K.json
- 2020
- 10-K.json
- 2021
- 10-K.json
- 10-Q_12.json
- 2022
- 10-K.json
- 10-Q_03.json
- 10-Q_06.json
- 10-Q_12.json
- 2023
- 10-Q_04.json
- GOOGL
- 2018
- 10-K.json
- 2019
- 10-K.json
- 2020
- 10-K.json
- 2021
- 10-K.json
- 10-Q_09.json
- 2022
- 10-K.json
- 10-Q_03.json
- 10-Q_06.json
- 10-Q_09.json
- 2023
- 10-Q_03.json
- TSLA
- 2018
- 10-K.json
- 2019
- 10-K.json
- 2020
- 10-K.json
- 2021
- 10-K.json
- 10-KA.json
- 10-Q_09.json
- 2022
- 10-K.json
- 10-Q_03.json
- 10-Q_06.json
- 10-Q_09.json
- 2023
- 10-Q_03.json
```
Here for each ticker we have separate folders with 10-K data inside respective years and 10-Q data is saved in the respective year along with the month. `10-Q_03.json` means March data of 10-Q document. Also, the amended documents are stored in their respective year
## EXAMPLES
This loader is can be used with both Langchain and LlamaIndex.
### LlamaIndex
```python
from llama_index import VectorStoreIndex, download_loader
from llama_index import SimpleDirectoryReader
SECFilingsLoader = download_loader("SECFilingsLoader")
loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()
documents = SimpleDirectoryReader("data\TSLA\2022").load_data()
index = VectorStoreIndex.from_documents(documents)
index.query("What are the risk factors of Tesla for the year 2022?")
```
### Langchain
```python
from llama_index import download_loader
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader
from langchain.indexes import VectorstoreIndexCreator
SECFilingsLoader = download_loader("SECFilingsLoader")
loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()
dir_loader = DirectoryLoader("data\TSLA\2022")
index = VectorstoreIndexCreator().from_loaders([dir_loader])
retriever = index.vectorstore.as_retriever()
qa = RetrievalQA.from_chain_type(
llm=OpenAI(), chain_type="stuff", retriever=retriever
)
query = "What are the risk factors of Tesla for the year 2022?"
qa.run(query)
```
## REFERENCES
1. Unstructured SEC Filings API: [repo link](https://github.com/Unstructured-IO/pipeline-sec-filings/tree/main)
2. SEC Edgar Downloader: [repo link](https://github.com/jadchaar/sec-edgar-downloader)
Raw data
{
"_id": null,
"home_page": "",
"name": "llama-index-readers-sec-filings",
"maintainer": "Athe-kunal",
"docs_url": null,
"requires_python": ">=3.8.1,<4.0",
"maintainer_email": "",
"keywords": "10-K,10-Q,SEC Filings,finance",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/a3/e9/540d9ed058a97f9cc3f7bb3b11a050bf65c9c6bbc912541b84f54c6e1f6d/llama_index_readers_sec_filings-0.1.3.tar.gz",
"platform": null,
"description": "# SEC DATA DOWNLOADER\n\nPlease checkout this repo that I am building on SEC Question Answering Agent [SEC-QA](https://github.com/Athe-kunal/SEC-QA-Agent)\n\nThis repository downloads all the texts from SEC documents (10-K and 10-Q). Currently, it is not supporting documents that are amended, but that will be added in the near futures.\n\nInstall the required dependencies\n\n```\npython install -r requirements.txt\n```\n\nThe SEC Downloader expects 5 attributes\n\n- tickers: It is a list of valid tickers\n- amount: Number of documents that you want to download\n- filing_type: 10-K or 10-Q filing type\n- num_workers: It is for multithreading and multiprocessing. We have multi-threading at the ticker level and multi-processing at the year level for a given ticker\n- include_amends: To include amendments or not.\n\n## Usage\n\n```python\nfrom llama_index import download_loader\n\nSECFilingsLoader = download_loader(\"SECFilingsLoader\")\n\nloader = SECFilingsLoader(tickers=[\"TSLA\"], amount=3, filing_type=\"10-K\")\nloader.load_data()\n```\n\nIt will download the data in the following directories and sub-directories\n\n```yaml\n- AAPL\n - 2018\n - 10-K.json\n - 2019\n - 10-K.json\n - 2020\n - 10-K.json\n - 2021\n - 10-K.json\n - 10-Q_12.json\n - 2022\n - 10-K.json\n - 10-Q_03.json\n - 10-Q_06.json\n - 10-Q_12.json\n - 2023\n - 10-Q_04.json\n- GOOGL\n - 2018\n - 10-K.json\n - 2019\n - 10-K.json\n - 2020\n - 10-K.json\n - 2021\n - 10-K.json\n - 10-Q_09.json\n - 2022\n - 10-K.json\n - 10-Q_03.json\n - 10-Q_06.json\n - 10-Q_09.json\n - 2023\n - 10-Q_03.json\n- TSLA\n - 2018\n - 10-K.json\n - 2019\n - 10-K.json\n - 2020\n - 10-K.json\n - 2021\n - 10-K.json\n - 10-KA.json\n - 10-Q_09.json\n - 2022\n - 10-K.json\n - 10-Q_03.json\n - 10-Q_06.json\n - 10-Q_09.json\n - 2023\n - 10-Q_03.json\n```\n\nHere for each ticker we have separate folders with 10-K data inside respective years and 10-Q data is saved in the respective year along with the month. `10-Q_03.json` means March data of 10-Q document. Also, the amended documents are stored in their respective year\n\n## EXAMPLES\n\nThis loader is can be used with both Langchain and LlamaIndex.\n\n### LlamaIndex\n\n```python\nfrom llama_index import VectorStoreIndex, download_loader\nfrom llama_index import SimpleDirectoryReader\n\nSECFilingsLoader = download_loader(\"SECFilingsLoader\")\n\nloader = SECFilingsLoader(tickers=[\"TSLA\"], amount=3, filing_type=\"10-K\")\nloader.load_data()\n\ndocuments = SimpleDirectoryReader(\"data\\TSLA\\2022\").load_data()\nindex = VectorStoreIndex.from_documents(documents)\nindex.query(\"What are the risk factors of Tesla for the year 2022?\")\n```\n\n### Langchain\n\n```python\nfrom llama_index import download_loader\nfrom langchain.llms import OpenAI\nfrom langchain.chains import RetrievalQA\nfrom langchain.document_loaders import DirectoryLoader\nfrom langchain.indexes import VectorstoreIndexCreator\n\nSECFilingsLoader = download_loader(\"SECFilingsLoader\")\n\nloader = SECFilingsLoader(tickers=[\"TSLA\"], amount=3, filing_type=\"10-K\")\nloader.load_data()\n\ndir_loader = DirectoryLoader(\"data\\TSLA\\2022\")\n\nindex = VectorstoreIndexCreator().from_loaders([dir_loader])\nretriever = index.vectorstore.as_retriever()\nqa = RetrievalQA.from_chain_type(\n llm=OpenAI(), chain_type=\"stuff\", retriever=retriever\n)\n\nquery = \"What are the risk factors of Tesla for the year 2022?\"\nqa.run(query)\n```\n\n## REFERENCES\n\n1. Unstructured SEC Filings API: [repo link](https://github.com/Unstructured-IO/pipeline-sec-filings/tree/main)\n2. SEC Edgar Downloader: [repo link](https://github.com/jadchaar/sec-edgar-downloader)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index readers sec_filings integration",
"version": "0.1.3",
"project_urls": null,
"split_keywords": [
"10-k",
"10-q",
"sec filings",
"finance"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dee214d99bb7b6fbf97c262415008a5e792dd169a8e7173778a3174a15d72ba8",
"md5": "4bd7ab24d1e3011cb378f45c160ec4a7",
"sha256": "e7b7b5ebbb652276e1167deadae2388fac8f21fd939bd5dc5112be1af299b22f"
},
"downloads": -1,
"filename": "llama_index_readers_sec_filings-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4bd7ab24d1e3011cb378f45c160ec4a7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.1,<4.0",
"size": 25202,
"upload_time": "2024-02-21T20:48:40",
"upload_time_iso_8601": "2024-02-21T20:48:40.989351Z",
"url": "https://files.pythonhosted.org/packages/de/e2/14d99bb7b6fbf97c262415008a5e792dd169a8e7173778a3174a15d72ba8/llama_index_readers_sec_filings-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a3e9540d9ed058a97f9cc3f7bb3b11a050bf65c9c6bbc912541b84f54c6e1f6d",
"md5": "b5c2b9df6a3931d11506b951541697d7",
"sha256": "f8eb56d30d35261f50859f571d9e99c9b50401e3a2940b79b5cae52d66e8696d"
},
"downloads": -1,
"filename": "llama_index_readers_sec_filings-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "b5c2b9df6a3931d11506b951541697d7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.1,<4.0",
"size": 22346,
"upload_time": "2024-02-21T20:48:44",
"upload_time_iso_8601": "2024-02-21T20:48:44.261815Z",
"url": "https://files.pythonhosted.org/packages/a3/e9/540d9ed058a97f9cc3f7bb3b11a050bf65c9c6bbc912541b84f54c6e1f6d/llama_index_readers_sec_filings-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-21 20:48:44",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-readers-sec-filings"
}