edgar


Nameedgar JSON
Version 5.6.3 PyPI version JSON
download
home_pagehttps://github.com/joeyism/py-edgar
SummaryScrape data from SEC's EDGAR
upload_time2024-10-13 08:13:39
maintainerNone
docs_urlNone
authorJoey Sham
requires_pythonNone
licenseNone
keywords edgar sec
VCS
bugtrack_url
requirements requests lxml tqdm rapidfuzz
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # EDGAR
A small library to access files from SEC's edgar.

## Installation

>   pip install edgar

## Example
To get a company's latest 5 10-Ks, run

``` python
from edgar import Company
company = Company("Oracle Corp", "0001341439")
tree = company.get_all_filings(filing_type = "10-K")
docs = Company.get_documents(tree, no_of_documents=5)
```
or
```python
from edgar import Company, TXTML

company = Company("INTERNATIONAL BUSINESS MACHINES CORP", "0000051143")
doc = company.get_10K()
text = TXTML.parse_full_10K(doc)
```

To get all companies and find a specific one, run

``` python
from edgar import Edgar
edgar = Edgar()
possible_companies = edgar.find_company_name("Cisco System")
```

To avoid pull of all company data from sec.gov on Edgar initialization, pass in a local path to the data

``` python
from edgar import Edgar
edgar = Edgar("/path/to/cik-lookup-data.txt")
possible_companies = edgar.find_company_name("Cisco System")
```


To get XBRL data, run
```python
from edgar import Company, XBRL, XBRLElement

company = Company("Oracle Corp", "0001341439")
results = company.get_data_files_from_10K("EX-101.INS", isxml=True)
xbrl = XBRL(results[0])
XBRLElement(xbrl.relevant_children_parsed[15]).to_dict() // returns a dictionary of name, value, and schemaRef
```

## API

### Company
```python
Company(name, cik, timeout=10)
```
* name (company name)
* cik (company CIK number)
* timeout (optional) (default: 10)

#### Methods

`get_filings_url(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> str`

Returns a url to fetch filings data
* filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents
* prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents
* ownership: defaults to include. Options are include, exclude, only.
* no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.


`get_all_filings(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> lxml.html.HtmlElement`

Returns the HTML in the form of [lxml.html](http://lxml.de/lxmlhtml.html)
* filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents
* prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents
* ownership: defaults to include. Options are include, exclude, only.
* no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.


`get_10Ks(self, no_of_documents=1, as_documents=False) -> List[lxml.html.HtmlElement]`

Returns the HTML in the form of [lxml.html](http://lxml.de/lxmlhtml.html) of concatenation of all the documents in the 10-K
* no_of_documents (default: 1): numer of documents to be retrieved
* When `as_documents` is set to `True`, it returns `-> List[edgar.document.Documents]` a list of [Documents](#documents)

`get_10Ks_metadata(self) -> List[dict]`

Returns the HTML in the form of a dictionary of concatenation of all the document metadata in the 10-K


`get_document_type_from_10K(self, document_type, no_of_documents=1) -> List[lxml.html.HtmlElement]`

Returns the HTML in the form of [lxml.html](http://lxml.de/lxmlhtml.html) of the document within 10-K
* document_type: Tye type of document you want, i.e. 10-K, EX-3.2
* no_of_documents (default: 1): numer of documents to be retrieved


`get_data_files_from_10K(self, document_type, no_of_documents=1, isxml=False) -> List[lxml.html.HtmlElement]`

Returns the HTML in the form of [lxml.html](http://lxml.de/lxmlhtml.html) of the data file within 10-K
* document_type: Tye type of document you want, i.e. EX-101.INS
* no_of_documents (default: 1): numer of documents to be retrieved
* isxml (default: False): by default, things aren't case sensitive and is parsed with `html` in `lxml. If this is True, then it is parsed with `etree` which is case sensitive

#### Class Method

`get_documents(self, tree: lxml.html.Htmlelement, no_of_documents=1, debug=False, as_documents=False) -> List[lxml.html.HtmlElement]` Returns a list of strings, each string contains the body of the specified document from input

* tree: lxml.html form that is returned from Company.getAllFilings
* no_of_documents: number of document returned. If it is 1, the returned result is just one string, instead of a list of strings. Defaults to 1.
* debug (default: **False**): if **True**, displays the URL and form
* When `as_documents` is set to `True`, it returns `-> List[edgar.document.Documents]` a list of [Documents](#documents)



### Edgar
Gets all companies from EDGAR

`get_cik_by_company_name(company_name: str) -> str`: Returns the CIK if given the exact name or the company

`get_company_name_by_cik(cik: str) -> str`: Returns the company name if given the CIK (with the `000`s) 

`find_company_name(words: str) -> List[str]`: Returns a list of company names by exact word matching

`find_company_name_cik(words: str) -> List[tuple[str, str]]`: Return a list of company names and their CIK values

`match_company_by_company_name(self, name, top=5) -> List[Dict[str, Any]]`: Returns a list of dictionarys, with company names, CIK, and their fuzzy match score
* `top (default: 5)` returns the top number of fuzzy matches. If set to `None`, it'll return the whole list (which is a lot)

### XBRL
Parses data from XBRL
#### Properties
`relevant_children`
* get children that are not `context`
`relevant_children_parsed`
* get children that are not `context`, `unit`, `schemaRef`
* cleans tags

### Documents
Filing and Documents Details for the SEC EDGAR Form (such as 10-K)

```python
Documents(url, timeout=10)
```
#### Properties
`url: str`: URL of the document

`content: dict`: Dictionary of meta data of the document

`content['Filing Date']: str`: Document filing date

`content['Accepted']: str`: Document accepted datetime

`content['Period of Report']: str`: The date period that the document is for

`element: lxml.html.HtmlElement`: The HTML element for the Document (from the url) so it can be further parsed


## Contribution
<a href="https://www.buymeacoffee.com/joeyism" target="_blank"><img src="https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png" alt="Buy Me A Coffee" style="height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;" ></a>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/joeyism/py-edgar",
    "name": "edgar",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "edgar, sec",
    "author": "Joey Sham",
    "author_email": "sham.joey@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/fc/62/6aad0e8c98c11935b90e4c256bdc209fd38ebfe4c4dfd4ffe7a4f14cad07/edgar-5.6.3.tar.gz",
    "platform": null,
    "description": "# EDGAR\nA small library to access files from SEC's edgar.\n\n## Installation\n\n>   pip install edgar\n\n## Example\nTo get a company's latest 5 10-Ks, run\n\n``` python\nfrom edgar import Company\ncompany = Company(\"Oracle Corp\", \"0001341439\")\ntree = company.get_all_filings(filing_type = \"10-K\")\ndocs = Company.get_documents(tree, no_of_documents=5)\n```\nor\n```python\nfrom edgar import Company, TXTML\n\ncompany = Company(\"INTERNATIONAL BUSINESS MACHINES CORP\", \"0000051143\")\ndoc = company.get_10K()\ntext = TXTML.parse_full_10K(doc)\n```\n\nTo get all companies and find a specific one, run\n\n``` python\nfrom edgar import Edgar\nedgar = Edgar()\npossible_companies = edgar.find_company_name(\"Cisco System\")\n```\n\nTo avoid pull of all company data from sec.gov on Edgar initialization, pass in a local path to the data\n\n``` python\nfrom edgar import Edgar\nedgar = Edgar(\"/path/to/cik-lookup-data.txt\")\npossible_companies = edgar.find_company_name(\"Cisco System\")\n```\n\n\nTo get XBRL data, run\n```python\nfrom edgar import Company, XBRL, XBRLElement\n\ncompany = Company(\"Oracle Corp\", \"0001341439\")\nresults = company.get_data_files_from_10K(\"EX-101.INS\", isxml=True)\nxbrl = XBRL(results[0])\nXBRLElement(xbrl.relevant_children_parsed[15]).to_dict() // returns a dictionary of name, value, and schemaRef\n```\n\n## API\n\n### Company\n```python\nCompany(name, cik, timeout=10)\n```\n* name (company name)\n* cik (company CIK number)\n* timeout (optional) (default: 10)\n\n#### Methods\n\n`get_filings_url(self, filing_type=\"\", prior_to=\"\", ownership=\"include\", no_of_entries=100) -> str`\n\nReturns a url to fetch filings data\n* filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents\n* prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents\n* ownership: defaults to include. Options are include, exclude, only.\n* no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.\n\n\n`get_all_filings(self, filing_type=\"\", prior_to=\"\", ownership=\"include\", no_of_entries=100) -> lxml.html.HtmlElement`\n\nReturns the HTML in the form of [lxml.html](http://lxml.de/lxmlhtml.html)\n* filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents\n* prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents\n* ownership: defaults to include. Options are include, exclude, only.\n* no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.\n\n\n`get_10Ks(self, no_of_documents=1, as_documents=False) -> List[lxml.html.HtmlElement]`\n\nReturns the HTML in the form of [lxml.html](http://lxml.de/lxmlhtml.html) of concatenation of all the documents in the 10-K\n* no_of_documents (default: 1): numer of documents to be retrieved\n* When `as_documents` is set to `True`, it returns `-> List[edgar.document.Documents]` a list of [Documents](#documents)\n\n`get_10Ks_metadata(self) -> List[dict]`\n\nReturns the HTML in the form of a dictionary of concatenation of all the document metadata in the 10-K\n\n\n`get_document_type_from_10K(self, document_type, no_of_documents=1) -> List[lxml.html.HtmlElement]`\n\nReturns the HTML in the form of [lxml.html](http://lxml.de/lxmlhtml.html) of the document within 10-K\n* document_type: Tye type of document you want, i.e. 10-K, EX-3.2\n* no_of_documents (default: 1): numer of documents to be retrieved\n\n\n`get_data_files_from_10K(self, document_type, no_of_documents=1, isxml=False) -> List[lxml.html.HtmlElement]`\n\nReturns the HTML in the form of [lxml.html](http://lxml.de/lxmlhtml.html) of the data file within 10-K\n* document_type: Tye type of document you want, i.e. EX-101.INS\n* no_of_documents (default: 1): numer of documents to be retrieved\n* isxml (default: False): by default, things aren't case sensitive and is parsed with `html` in `lxml. If this is True, then it is parsed with `etree` which is case sensitive\n\n#### Class Method\n\n`get_documents(self, tree: lxml.html.Htmlelement, no_of_documents=1, debug=False, as_documents=False) -> List[lxml.html.HtmlElement]` Returns a list of strings, each string contains the body of the specified document from input\n\n* tree: lxml.html form that is returned from Company.getAllFilings\n* no_of_documents: number of document returned. If it is 1, the returned result is just one string, instead of a list of strings. Defaults to 1.\n* debug (default: **False**): if **True**, displays the URL and form\n* When `as_documents` is set to `True`, it returns `-> List[edgar.document.Documents]` a list of [Documents](#documents)\n\n\n\n### Edgar\nGets all companies from EDGAR\n\n`get_cik_by_company_name(company_name: str) -> str`: Returns the CIK if given the exact name or the company\n\n`get_company_name_by_cik(cik: str) -> str`: Returns the company name if given the CIK (with the `000`s) \n\n`find_company_name(words: str) -> List[str]`: Returns a list of company names by exact word matching\n\n`find_company_name_cik(words: str) -> List[tuple[str, str]]`: Return a list of company names and their CIK values\n\n`match_company_by_company_name(self, name, top=5) -> List[Dict[str, Any]]`: Returns a list of dictionarys, with company names, CIK, and their fuzzy match score\n* `top (default: 5)` returns the top number of fuzzy matches. If set to `None`, it'll return the whole list (which is a lot)\n\n### XBRL\nParses data from XBRL\n#### Properties\n`relevant_children`\n* get children that are not `context`\n`relevant_children_parsed`\n* get children that are not `context`, `unit`, `schemaRef`\n* cleans tags\n\n### Documents\nFiling and Documents Details for the SEC EDGAR Form (such as 10-K)\n\n```python\nDocuments(url, timeout=10)\n```\n#### Properties\n`url: str`: URL of the document\n\n`content: dict`: Dictionary of meta data of the document\n\n`content['Filing Date']: str`: Document filing date\n\n`content['Accepted']: str`: Document accepted datetime\n\n`content['Period of Report']: str`: The date period that the document is for\n\n`element: lxml.html.HtmlElement`: The HTML element for the Document (from the url) so it can be further parsed\n\n\n## Contribution\n<a href=\"https://www.buymeacoffee.com/joeyism\" target=\"_blank\"><img src=\"https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png\" alt=\"Buy Me A Coffee\" style=\"height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;\" ></a>\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Scrape data from SEC's EDGAR",
    "version": "5.6.3",
    "project_urls": {
        "Download": "https://github.com/joeyism/py-edgar/archive/5.6.3.tar.gz",
        "Homepage": "https://github.com/joeyism/py-edgar"
    },
    "split_keywords": [
        "edgar",
        " sec"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d0b436966cabbaddc606111c59adb6b0fa25b615679089fcbc9bbfc167e2f582",
                "md5": "b95e999d6713697f85d1c445197e8a31",
                "sha256": "4f4d1c299c8ef61e4be40812d24588ff6d230ed16e884083204a5601f6a465ba"
            },
            "downloads": -1,
            "filename": "edgar-5.6.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b95e999d6713697f85d1c445197e8a31",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 23632,
            "upload_time": "2024-10-13T08:13:36",
            "upload_time_iso_8601": "2024-10-13T08:13:36.820914Z",
            "url": "https://files.pythonhosted.org/packages/d0/b4/36966cabbaddc606111c59adb6b0fa25b615679089fcbc9bbfc167e2f582/edgar-5.6.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fc626aad0e8c98c11935b90e4c256bdc209fd38ebfe4c4dfd4ffe7a4f14cad07",
                "md5": "dfdd5f639ac0b9f567abad987371ac9d",
                "sha256": "6e8ca83809a72ee3872736fac211b01fada7d3228052a088437717147f445a07"
            },
            "downloads": -1,
            "filename": "edgar-5.6.3.tar.gz",
            "has_sig": false,
            "md5_digest": "dfdd5f639ac0b9f567abad987371ac9d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 24883,
            "upload_time": "2024-10-13T08:13:39",
            "upload_time_iso_8601": "2024-10-13T08:13:39.657031Z",
            "url": "https://files.pythonhosted.org/packages/fc/62/6aad0e8c98c11935b90e4c256bdc209fd38ebfe4c4dfd4ffe7a4f14cad07/edgar-5.6.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-13 08:13:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "joeyism",
    "github_project": "py-edgar",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "lxml",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "rapidfuzz",
            "specs": []
        }
    ],
    "lcname": "edgar"
}
        
Elapsed time: 0.37892s