# Keiba Scraper
[](https://github.com/new-village/keibascraper/actions/workflows/unittest.yaml)
[](https://badge.fury.io/py/keibascraper)
**keibascraper** is a Python library designed to parse data from [netkeiba.com](https://www.netkeiba.com/), a prominent Japanese horse racing website. It allows users to programmatically extract detailed information about races, entries, results, odds, and horses. Please note that depending on your usage, this may impose a significant load on netkeiba.com.
## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Dependencies](#dependencies)
- [Usage](#usage)
- [Loading Entry Data (出走データ)](#loading-entry-data)
- [Loading Result Data (結果データ)](#loading-result-data)
- [Loading Odds Data (オッズデータ)](#loading-odds-data)
- [Loading Horse Data (血統データ/出走履歴データ)](#loading-horse-data)
- [Bulk Data Loading](#bulk-data-loading)
- [API Reference](#api-reference)
- [`load` Function](#load-function)
- [`race_list` Function](#race_list-function)
- [Contributing](#contributing)
- [License](#license)
## Features
- **Flexible Data Loading**: Supports loading of various data types such as race entries, results, odds, and horse information.
- **Configurable Parsing**: Utilizes JSON configuration files to define parsing rules, making it easy to adapt to changes in the source website.
- **Error Handling**: Provides robust error handling to manage network issues and data inconsistencies.
- **Caching**: Implements caching mechanisms to improve performance and reduce redundant network requests.
## Installation
keibascraper is available on PyPI and can be installed using pip:
```bash
$ python -m pip install keibascraper
```
**Supported Python Versions**: keibascraper officially supports Python 3.8 and above.
## Dependencies
- [requests](https://pypi.org/project/requests/): For handling HTTP requests.
- [BeautifulSoup4](https://pypi.org/project/beautifulsoup4/): For parsing HTML content.
- [jq](https://pypi.org/project/jq/): For parsing JSON content using jq expressions.
## Usage
To use keibascraper, import the library and use the `load` function to fetch and parse data from netkeiba.com. The `load` function requires two parameters: the data type and the entity ID.
### Loading Entry Data (出走データ)
```python
>>> import keibascraper
>>> race, entry = keibascraper.load("entry", "201206050810")
>>> print(race)
[{'race_id': '201206050810', 'race_number': 10, 'race_name': '有馬記念', ... }]
>>> print(entry)
[{'bracket': 7, 'horse_number': 13, 'horse_name': 'ゴールドシップ', ...}, {...}, ...]
```
### Loading Result Data (結果データ)
```python
>>> import keibascraper
>>> race, entry = keibascraper.load("result", "201206050810")
>>> print(race)
[{'race_id': '201206050810', 'race_number': 10, 'race_name': '有馬記念', ... }]
>>> print(entry)
[{'rank': 1, 'horse_name': 'ゴールドシップ', 'rap_time': 151.9,...}, {...}, ...]
```
### Loading Odds Data (オッズデータ)
```python
>>> import keibascraper
>>> odds = keibascraper.load("odds", "201206050810")
>>> print(odds)
[{'horse_number': 13, 'win': 2.7, 'show_min': 1.3, 'show_max': 1.5, ...}, {...}, ...]
```
### Loading Horse Data (血統データ/出走履歴データ)
```python
>>> import keibascraper
>>> horse, result = keibascraper.load("horse", "2009102739")
>>> print(horse)
[{'horse_id': '2009102739', 'father_name': 'ステイゴールド', ... }]
>>> print(result)
[{'race_date': '20151227', 'race_name': '有馬記念', 'rank': 8, ...}, {...}, ...]
```
### Bulk Data Loading
To load multiple races in bulk, you can use the `race_list` function to retrieve a list of race IDs for a specific year and month.
```python
import keibascraper
# Get list of race IDs for July 2022
race_ids = keibascraper.race_list(2022, 7)
# Loop through race IDs and load entry data
for race_id in race_ids:
race_info, entry_list = keibascraper.load("entry", race_id)
# Process the data as needed
```
### Create table query generation for SQLite
The `create_table_sql` function generates an SQL query string for creating a table in an SQLite database. The table structure is dynamically defined based on the configuration file corresponding to the provided `data_type` like `race`, `entry`, `result` and etc. This function ensures that the table is created only if it does not already exist and assigns a primary key to the first column.
```python
>>> import keibascraper
>>> query = keibascraper.create_table_sql("entry")
>>> print(query)
CREATE TABLE IF NOT EXISTS entry (bracket text, ... weight_diff integer);
```
## API Reference
### `load` Function
```python
keibascraper.load(data_type, entity_id)
```
- **Description**: Loads data from netkeiba.com based on the specified data type and entity ID.
- **Parameters**:
- `data_type` (str): Type of data to load. Supported types are `'entry'`, `'result'`, `'odds'`, and `'horse'`.
- `entity_id` (str): Identifier for the data entity (e.g., race ID, horse ID).
- **Returns**:
- For `'entry'` and `'result'`: Returns a list of `[{race}]` and list of `[{entry1}, {entry2}...]`.
- For `'odds'`: Returns a list of `[{odds1}, {odds2}...]`.
- For `'horse'`: Returns a list of `[{horse}]` and list of `[{result1}, {result2}...]`.
- **Raises**:
- `ValueError`: If an unsupported data type is provided.
- `RuntimeError`: If data loading or parsing fails.
### `race_list` Function
```python
keibascraper.race_list(year, month)
```
- **Description**: Retrieves a list of race IDs for the specified year and month.
- **Parameters**:
- `year` (int): The target year.
- `month` (int): The target month.
- **Returns**:
- A list of race IDs (list).
## Contributing
Contributions are welcome! If you have suggestions or find bugs, please open an issue or submit a pull request on the [GitHub repository](https://github.com/new-village/keibascraper).
When contributing, please follow these guidelines:
- **Coding Standards**: Follow PEP 8 style guidelines.
- **Testing**: Ensure that your code passes existing tests and add new tests for your changes.
- **Documentation**: Update documentation and docstrings as needed.
## License
This project is licensed under the terms of the Apache-2.0 license. See the [LICENSE](https://github.com/new-village/keibascraper/blob/main/LICENSE) file for details.
**Disclaimer**: This library is intended for personal use and educational purposes. Scraping data from websites may violate their terms of service. Please ensure that you comply with netkeiba.com's terms and conditions when using this library.
Raw data
{
"_id": null,
"home_page": "https://github.com/new-village/KeibaScraper",
"name": "keibascraper",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "new-village",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/e7/2f/bf3b60ae01df15c8c423b7beaa4ca3dfdcb572f63afeebaaf546a3d0ba2e/keibascraper-3.1.3.tar.gz",
"platform": null,
"description": "# Keiba Scraper\n\n[](https://github.com/new-village/keibascraper/actions/workflows/unittest.yaml)\n[](https://badge.fury.io/py/keibascraper)\n\n**keibascraper** is a Python library designed to parse data from [netkeiba.com](https://www.netkeiba.com/), a prominent Japanese horse racing website. It allows users to programmatically extract detailed information about races, entries, results, odds, and horses. Please note that depending on your usage, this may impose a significant load on netkeiba.com.\n\n\n## Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Dependencies](#dependencies)\n- [Usage](#usage)\n - [Loading Entry Data (\u51fa\u8d70\u30c7\u30fc\u30bf)](#loading-entry-data)\n - [Loading Result Data (\u7d50\u679c\u30c7\u30fc\u30bf)](#loading-result-data)\n - [Loading Odds Data (\u30aa\u30c3\u30ba\u30c7\u30fc\u30bf)](#loading-odds-data)\n - [Loading Horse Data (\u8840\u7d71\u30c7\u30fc\u30bf/\u51fa\u8d70\u5c65\u6b74\u30c7\u30fc\u30bf)](#loading-horse-data)\n - [Bulk Data Loading](#bulk-data-loading)\n- [API Reference](#api-reference)\n - [`load` Function](#load-function)\n - [`race_list` Function](#race_list-function)\n- [Contributing](#contributing)\n- [License](#license)\n\n\n## Features\n\n- **Flexible Data Loading**: Supports loading of various data types such as race entries, results, odds, and horse information.\n- **Configurable Parsing**: Utilizes JSON configuration files to define parsing rules, making it easy to adapt to changes in the source website.\n- **Error Handling**: Provides robust error handling to manage network issues and data inconsistencies.\n- **Caching**: Implements caching mechanisms to improve performance and reduce redundant network requests.\n\n## Installation\n\nkeibascraper is available on PyPI and can be installed using pip:\n\n```bash\n$ python -m pip install keibascraper\n```\n\n**Supported Python Versions**: keibascraper officially supports Python 3.8 and above.\n\n## Dependencies\n\n- [requests](https://pypi.org/project/requests/): For handling HTTP requests.\n- [BeautifulSoup4](https://pypi.org/project/beautifulsoup4/): For parsing HTML content.\n- [jq](https://pypi.org/project/jq/): For parsing JSON content using jq expressions.\n\n## Usage\n\nTo use keibascraper, import the library and use the `load` function to fetch and parse data from netkeiba.com. The `load` function requires two parameters: the data type and the entity ID.\n\n### Loading Entry Data (\u51fa\u8d70\u30c7\u30fc\u30bf)\n\n```python\n>>> import keibascraper\n>>> race, entry = keibascraper.load(\"entry\", \"201206050810\")\n>>> print(race)\n[{'race_id': '201206050810', 'race_number': 10, 'race_name': '\u6709\u99ac\u8a18\u5ff5', ... }]\n>>> print(entry)\n[{'bracket': 7, 'horse_number': 13, 'horse_name': '\u30b4\u30fc\u30eb\u30c9\u30b7\u30c3\u30d7', ...}, {...}, ...]\n\n```\n\n### Loading Result Data (\u7d50\u679c\u30c7\u30fc\u30bf)\n\n```python\n>>> import keibascraper\n>>> race, entry = keibascraper.load(\"result\", \"201206050810\")\n>>> print(race)\n[{'race_id': '201206050810', 'race_number': 10, 'race_name': '\u6709\u99ac\u8a18\u5ff5', ... }]\n>>> print(entry)\n[{'rank': 1, 'horse_name': '\u30b4\u30fc\u30eb\u30c9\u30b7\u30c3\u30d7', 'rap_time': 151.9,...}, {...}, ...]\n```\n\n### Loading Odds Data (\u30aa\u30c3\u30ba\u30c7\u30fc\u30bf)\n\n```python\n>>> import keibascraper\n>>> odds = keibascraper.load(\"odds\", \"201206050810\")\n>>> print(odds)\n[{'horse_number': 13, 'win': 2.7, 'show_min': 1.3, 'show_max': 1.5, ...}, {...}, ...]\n```\n\n### Loading Horse Data (\u8840\u7d71\u30c7\u30fc\u30bf/\u51fa\u8d70\u5c65\u6b74\u30c7\u30fc\u30bf)\n\n```python\n>>> import keibascraper\n>>> horse, result = keibascraper.load(\"horse\", \"2009102739\")\n>>> print(horse)\n[{'horse_id': '2009102739', 'father_name': '\u30b9\u30c6\u30a4\u30b4\u30fc\u30eb\u30c9', ... }]\n>>> print(result)\n[{'race_date': '20151227', 'race_name': '\u6709\u99ac\u8a18\u5ff5', 'rank': 8, ...}, {...}, ...]\n```\n\n### Bulk Data Loading\n\nTo load multiple races in bulk, you can use the `race_list` function to retrieve a list of race IDs for a specific year and month.\n\n```python\nimport keibascraper\n\n# Get list of race IDs for July 2022\nrace_ids = keibascraper.race_list(2022, 7)\n\n# Loop through race IDs and load entry data\nfor race_id in race_ids:\n race_info, entry_list = keibascraper.load(\"entry\", race_id)\n # Process the data as needed\n```\n\n### Create table query generation for SQLite\n\nThe `create_table_sql` function generates an SQL query string for creating a table in an SQLite database. The table structure is dynamically defined based on the configuration file corresponding to the provided `data_type` like `race`, `entry`, `result` and etc. This function ensures that the table is created only if it does not already exist and assigns a primary key to the first column.\n\n```python\n>>> import keibascraper\n>>> query = keibascraper.create_table_sql(\"entry\")\n>>> print(query)\nCREATE TABLE IF NOT EXISTS entry (bracket text, ... weight_diff integer);\n```\n\n## API Reference\n\n### `load` Function\n\n```python\nkeibascraper.load(data_type, entity_id)\n```\n\n- **Description**: Loads data from netkeiba.com based on the specified data type and entity ID.\n- **Parameters**:\n - `data_type` (str): Type of data to load. Supported types are `'entry'`, `'result'`, `'odds'`, and `'horse'`.\n - `entity_id` (str): Identifier for the data entity (e.g., race ID, horse ID).\n- **Returns**:\n - For `'entry'` and `'result'`: Returns a list of `[{race}]` and list of `[{entry1}, {entry2}...]`.\n - For `'odds'`: Returns a list of `[{odds1}, {odds2}...]`.\n - For `'horse'`: Returns a list of `[{horse}]` and list of `[{result1}, {result2}...]`.\n- **Raises**:\n - `ValueError`: If an unsupported data type is provided.\n - `RuntimeError`: If data loading or parsing fails.\n\n### `race_list` Function\n\n```python\nkeibascraper.race_list(year, month)\n```\n\n- **Description**: Retrieves a list of race IDs for the specified year and month.\n- **Parameters**:\n - `year` (int): The target year.\n - `month` (int): The target month.\n- **Returns**:\n - A list of race IDs (list).\n\n\n## Contributing\n\nContributions are welcome! If you have suggestions or find bugs, please open an issue or submit a pull request on the [GitHub repository](https://github.com/new-village/keibascraper).\n\nWhen contributing, please follow these guidelines:\n\n- **Coding Standards**: Follow PEP 8 style guidelines.\n- **Testing**: Ensure that your code passes existing tests and add new tests for your changes.\n- **Documentation**: Update documentation and docstrings as needed.\n\n\n## License\n\nThis project is licensed under the terms of the Apache-2.0 license. See the [LICENSE](https://github.com/new-village/keibascraper/blob/main/LICENSE) file for details.\n\n\n**Disclaimer**: This library is intended for personal use and educational purposes. Scraping data from websites may violate their terms of service. Please ensure that you comply with netkeiba.com's terms and conditions when using this library.\n",
"bugtrack_url": null,
"license": null,
"summary": "keibascraper is a simple scraping library for netkeiba.com",
"version": "3.1.3",
"project_urls": {
"Homepage": "https://github.com/new-village/KeibaScraper"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "07cdc828186f47439c066475eab5e8a755a49530b31bded91cbdaa7092be7f3a",
"md5": "d8e741f1ab8f06db46c9d6e23292e455",
"sha256": "e851752dafb8aac5376cdd81d8fd87c22a70096c6b8e0df6531cb401ebaeb3f8"
},
"downloads": -1,
"filename": "keibascraper-3.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d8e741f1ab8f06db46c9d6e23292e455",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 29784,
"upload_time": "2024-12-14T00:23:13",
"upload_time_iso_8601": "2024-12-14T00:23:13.335671Z",
"url": "https://files.pythonhosted.org/packages/07/cd/c828186f47439c066475eab5e8a755a49530b31bded91cbdaa7092be7f3a/keibascraper-3.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e72fbf3b60ae01df15c8c423b7beaa4ca3dfdcb572f63afeebaaf546a3d0ba2e",
"md5": "2e7f9a16f470262dfaa02998733f5376",
"sha256": "6c487a627e438ee99272363af22ef452c387f461aaefc0b43b37766f1b189da4"
},
"downloads": -1,
"filename": "keibascraper-3.1.3.tar.gz",
"has_sig": false,
"md5_digest": "2e7f9a16f470262dfaa02998733f5376",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 23724,
"upload_time": "2024-12-14T00:23:15",
"upload_time_iso_8601": "2024-12-14T00:23:15.780875Z",
"url": "https://files.pythonhosted.org/packages/e7/2f/bf3b60ae01df15c8c423b7beaa4ca3dfdcb572f63afeebaaf546a3d0ba2e/keibascraper-3.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-14 00:23:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "new-village",
"github_project": "KeibaScraper",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "keibascraper"
}