| Name | hodorlive JSON |
| Version |
1.2.17
JSON |
| download |
| home_page | None |
| Summary | xpath/css based scraper with pagination |
| upload_time | 2025-11-04 06:42:10 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.11 |
| license | MIT |
| keywords |
cssselect
hodor
lxml
scraping
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Hodor [](https://pypi.python.org/pypi/hodorlive/)
A simple html scraper with xpath or css.
## Install
```pip install hodorlive```
## Usage
### As python package
***WARNING: This package by default doesn't verify ssl connections. Please check the [arguments](#arguments) to enable them.***
#### Sample code
```python
from hodor import Hodor
from dateutil.parser import parse
def date_convert(data):
return parse(data)
url = 'http://www.nasdaq.com/markets/stocks/symbol-change-history.aspx'
CONFIG = {
'old_symbol': {
'css': '#SymbolChangeList_table tr td:nth-child(1)',
'many': True
},
'new_symbol': {
'css': '#SymbolChangeList_table tr td:nth-child(2)',
'many': True
},
'effective_date': {
'css': '#SymbolChangeList_table tr td:nth-child(3)',
'many': True,
'transform': date_convert
},
'_groups': {
'data': '__all__',
'ticker_changes': ['old_symbol', 'new_symbol']
},
'_paginate_by': {
'xpath': '//*[@id="two_column_main_content_lb_NextPage"]/@href',
'many': False
}
}
h = Hodor(url=url, config=CONFIG, pagination_max_limit=5)
h.data
```
#### Sample output
```python
{'data': [{'effective_date': datetime.datetime(2016, 11, 1, 0, 0),
'new_symbol': 'ARNC',
'old_symbol': 'AA'},
{'effective_date': datetime.datetime(2016, 11, 1, 0, 0),
'new_symbol': 'ARNC$',
'old_symbol': 'AA$'},
{'effective_date': datetime.datetime(2016, 8, 16, 0, 0),
'new_symbol': 'MALN8',
'old_symbol': 'AHUSDN2018'},
{'effective_date': datetime.datetime(2016, 8, 16, 0, 0),
'new_symbol': 'MALN9',
'old_symbol': 'AHUSDN2019'},
{'effective_date': datetime.datetime(2016, 8, 16, 0, 0),
'new_symbol': 'MALQ6',
'old_symbol': 'AHUSDQ2016'},
{'effective_date': datetime.datetime(2016, 8, 16, 0, 0),
'new_symbol': 'MALQ7',
'old_symbol': 'AHUSDQ2017'},
{'effective_date': datetime.datetime(2016, 8, 16, 0, 0),
'new_symbol': 'MALQ8',
'old_symbol': 'AHUSDQ2018'}]}
```
#### Arguments
- ```ua``` (User-Agent)
- ```proxies``` (check requesocks)
- ```auth```
- ```crawl_delay``` (crawl delay in seconds across pagination - default: 3 seconds)
- ```pagination_max_limit``` (max number of pages to crawl - default: 100)
- ```ssl_verify``` (default: False)
- ```robots``` (if set respects robots.txt - default: True)
- ```reppy_capacity``` (robots cache LRU capacity - default: 100)
- ```trim_values``` (if set trims output for leading and trailing whitespace - default: True)
#### Config parameters:
- By default any key in the config is a rule to parse.
- Each rule can be either a ```xpath``` or a ```css```
- Each rule can extract ```many``` values by default unless explicity set to ```False```
- Each rule can allow to ```transform``` the result with a function if provided
- Extra parameters include grouping (```_groups```) and pagination (```_paginate_by```) which is also of the rule format.
## Building & Publishing
### Prerequisites
- Install [uv](https://docs.astral.sh/uv/getting-started/installation/).
- Review the [uvx execution model](https://docs.astral.sh/uv/concepts/tools/#execution-vs-installation) for running tools without global installs.
- Hatch documentation: [https://hatch.pypa.io/latest/](https://hatch.pypa.io/latest/).
### Build workflow
Run the release helper to build and publish wheels and source archives via Hatch:
```bash
./upload.sh
```
The script shells out to `uvx hatch build` followed by `uvx hatch publish` so that Hatch is executed in an ephemeral environment.
### Publishing requirements
Configure credentials in `~/.pypirc` as described in the [PyPI configuration specification](https://packaging.python.org/en/latest/specifications/pypirc/).
Example configuration:
```ini
[distutils]
index-servers =
pypi
testpypi
[pypi]
repository = https://upload.pypi.org/legacy/
username = __token__
password = <pypi-token>
[testpypi]
repository = https://test.pypi.org/legacy/
username = __token__
password = <testpypi-token>
```
Replace token placeholders with secrets from the team password manager and avoid committing the file to version control.
Raw data
{
"_id": null,
"home_page": null,
"name": "hodorlive",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "cssselect, hodor, lxml, scraping",
"author": null,
"author_email": "Compile Inc <dev@compile.com>",
"download_url": "https://files.pythonhosted.org/packages/55/e4/f21907dc770c3784218b7fdf1e33575c50a68f7f0b379159cf2e65666cba/hodorlive-1.2.17.tar.gz",
"platform": null,
"description": "\n\n# Hodor [](https://pypi.python.org/pypi/hodorlive/)\n\nA simple html scraper with xpath or css.\n\n## Install\n\n```pip install hodorlive```\n\n## Usage\n\n### As python package\n\n***WARNING: This package by default doesn't verify ssl connections. Please check the [arguments](#arguments) to enable them.***\n\n#### Sample code\n```python\nfrom hodor import Hodor\nfrom dateutil.parser import parse\n\n\ndef date_convert(data):\n return parse(data)\n\nurl = 'http://www.nasdaq.com/markets/stocks/symbol-change-history.aspx'\n\nCONFIG = {\n 'old_symbol': {\n 'css': '#SymbolChangeList_table tr td:nth-child(1)',\n 'many': True\n },\n 'new_symbol': {\n 'css': '#SymbolChangeList_table tr td:nth-child(2)',\n 'many': True\n },\n 'effective_date': {\n 'css': '#SymbolChangeList_table tr td:nth-child(3)',\n 'many': True,\n 'transform': date_convert\n },\n '_groups': {\n 'data': '__all__',\n 'ticker_changes': ['old_symbol', 'new_symbol']\n },\n '_paginate_by': {\n 'xpath': '//*[@id=\"two_column_main_content_lb_NextPage\"]/@href',\n 'many': False\n }\n}\n\nh = Hodor(url=url, config=CONFIG, pagination_max_limit=5)\n\nh.data\n```\n#### Sample output\n```python\n{'data': [{'effective_date': datetime.datetime(2016, 11, 1, 0, 0),\n 'new_symbol': 'ARNC',\n 'old_symbol': 'AA'},\n {'effective_date': datetime.datetime(2016, 11, 1, 0, 0),\n 'new_symbol': 'ARNC$',\n 'old_symbol': 'AA$'},\n {'effective_date': datetime.datetime(2016, 8, 16, 0, 0),\n 'new_symbol': 'MALN8',\n 'old_symbol': 'AHUSDN2018'},\n {'effective_date': datetime.datetime(2016, 8, 16, 0, 0),\n 'new_symbol': 'MALN9',\n 'old_symbol': 'AHUSDN2019'},\n {'effective_date': datetime.datetime(2016, 8, 16, 0, 0),\n 'new_symbol': 'MALQ6',\n 'old_symbol': 'AHUSDQ2016'},\n {'effective_date': datetime.datetime(2016, 8, 16, 0, 0),\n 'new_symbol': 'MALQ7',\n 'old_symbol': 'AHUSDQ2017'},\n {'effective_date': datetime.datetime(2016, 8, 16, 0, 0),\n 'new_symbol': 'MALQ8',\n 'old_symbol': 'AHUSDQ2018'}]}\n```\n\n#### Arguments\n\n- ```ua``` (User-Agent)\n- ```proxies``` (check requesocks)\n- ```auth```\n- ```crawl_delay``` (crawl delay in seconds across pagination - default: 3 seconds)\n- ```pagination_max_limit``` (max number of pages to crawl - default: 100)\n- ```ssl_verify``` (default: False)\n- ```robots``` (if set respects robots.txt - default: True)\n- ```reppy_capacity``` (robots cache LRU capacity - default: 100)\n- ```trim_values``` (if set trims output for leading and trailing whitespace - default: True)\n\n\n#### Config parameters:\n- By default any key in the config is a rule to parse.\n - Each rule can be either a ```xpath``` or a ```css```\n - Each rule can extract ```many``` values by default unless explicity set to ```False```\n - Each rule can allow to ```transform``` the result with a function if provided\n- Extra parameters include grouping (```_groups```) and pagination (```_paginate_by```) which is also of the rule format.\n\n\n\n## Building & Publishing\n\n### Prerequisites\n\n- Install [uv](https://docs.astral.sh/uv/getting-started/installation/).\n- Review the [uvx execution model](https://docs.astral.sh/uv/concepts/tools/#execution-vs-installation) for running tools without global installs.\n- Hatch documentation: [https://hatch.pypa.io/latest/](https://hatch.pypa.io/latest/).\n\n### Build workflow\n\nRun the release helper to build and publish wheels and source archives via Hatch:\n\n```bash\n./upload.sh\n```\n\nThe script shells out to `uvx hatch build` followed by `uvx hatch publish` so that Hatch is executed in an ephemeral environment.\n\n### Publishing requirements\n\nConfigure credentials in `~/.pypirc` as described in the [PyPI configuration specification](https://packaging.python.org/en/latest/specifications/pypirc/).\n\nExample configuration:\n\n```ini\n[distutils]\nindex-servers =\n pypi\n testpypi\n\n[pypi]\nrepository = https://upload.pypi.org/legacy/\nusername = __token__\npassword = <pypi-token>\n\n[testpypi]\nrepository = https://test.pypi.org/legacy/\nusername = __token__\npassword = <testpypi-token>\n```\n\nReplace token placeholders with secrets from the team password manager and avoid committing the file to version control.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "xpath/css based scraper with pagination",
"version": "1.2.17",
"project_urls": {
"Download": "https://github.com/CompileInc/hodor/archive/v1.2.17.tar.gz",
"Homepage": "https://github.com/CompileInc/hodor"
},
"split_keywords": [
"cssselect",
" hodor",
" lxml",
" scraping"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "988489926f95ceebbcfecb0da3834260b1124e82975ddb7dea7ca146652aa812",
"md5": "7ee85475c61e27cb49cb4b9aea9e5295",
"sha256": "da021b8d5f39401df9bc0f5a9d09458ffc7d6ca8ceb30639e62ccb18d7867059"
},
"downloads": -1,
"filename": "hodorlive-1.2.17-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7ee85475c61e27cb49cb4b9aea9e5295",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 5787,
"upload_time": "2025-11-04T06:42:11",
"upload_time_iso_8601": "2025-11-04T06:42:11.669551Z",
"url": "https://files.pythonhosted.org/packages/98/84/89926f95ceebbcfecb0da3834260b1124e82975ddb7dea7ca146652aa812/hodorlive-1.2.17-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "55e4f21907dc770c3784218b7fdf1e33575c50a68f7f0b379159cf2e65666cba",
"md5": "7c8f346ed5e579c328f70b61410b1d06",
"sha256": "54a26e7322b1b64b117038c58625dc34f2810929b11d955b32aaaab1a3651248"
},
"downloads": -1,
"filename": "hodorlive-1.2.17.tar.gz",
"has_sig": false,
"md5_digest": "7c8f346ed5e579c328f70b61410b1d06",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 23655,
"upload_time": "2025-11-04T06:42:10",
"upload_time_iso_8601": "2025-11-04T06:42:10.316297Z",
"url": "https://files.pythonhosted.org/packages/55/e4/f21907dc770c3784218b7fdf1e33575c50a68f7f0b379159cf2e65666cba/hodorlive-1.2.17.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-04 06:42:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "CompileInc",
"github_project": "hodor",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "hodorlive"
}