pgn2data


Namepgn2data JSON
Version 0.0.9 PyPI version JSON
download
home_pagehttps://github.com/zq99/pgn2data
SummaryConverts a chess pgn file into a csv dataset containing game information and move information
upload_time2023-07-05 12:20:15
maintainer
docs_urlNone
authorzq99
requires_python>=3.7
licenseGPL-3.0+
keywords chess pgn notation data forsyth–edwards notation csv dataset database normalization tabulation structured data sql table excel python-chess
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pgn2data

[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
![GitHub stars](https://img.shields.io/github/stars/zq99/pgn2data?style=social)
![GitHub forks](https://img.shields.io/github/forks/zq99/pgn2data?style=social)


This library converts chess pgn files into CSV tabulated data sets.

A pgn file can contain one or multiple chess games. The library parses the pgn file and creates two csv files:

- Games file: contains high level information (e.g. date, site, event, score, players etc...)

- Moves file: contains the moves for each game  (e.g. notation, squares, fen position, is in check etc...)

The two files can be mapped together using a GUID which the process inserts into both files.


## Installation

The library requires Python 3.7 or later.  
 
To install, type the following command on the python terminal:

    pip install pgn2data
    
  
## Implementation

Here is a basic example of how to convert a PGN file:

    from converter.pgn_data import PGNData
    
    pgn_data = PGNData("tal_bronstein_1982.pgn")
    pgn_data.export()

The following is an example of grouping multiple files into the same output file ("output.csv").

    pgn_data = PGNData(["file1.pgn","file2.pgn"],"output")
    pgn_data.export()
    
The export function has a return object which allows you to quickly check the size and location of the files created:

    pgn_data = PGNData("tal_bronstein_1982.pgn")
    result = pgn_data.export()
    result.print_summary()

If you want to check if the files have been created before doing further processing you can do the following:

    pgn_data = PGNData("tal_bronstein_1982.pgn")
    result = pgn_data.export()
    if result.is_complete:
        print("Files created!")
    else:
        print("Files not created!")

The result object also provides methods to import the created files into pandas dataframes:

    pgn_data = PGNData("tal_bronstein_1982.pgn")
    result = pgn_data.export()
    if result.is_complete:
        
        # read the games file
        games_df = result.get_games_df()
        print(games_df.head())
        
        # read the moves file
        moves_df = result.get_moves_df()
        print(moves_df.head())
        
        # read both files joined together
        combined_df = result.get_combined_df()
        print(combined_df.head())

To output the game information only, you can do the following:
    
    from converter.pgn_data import PGNData
    
    pgn_data = PGNData("tal_bronstein_1982.pgn")
    pgn_data.export(moves_required=False)


## Examples

The folder 'samples' in this repository, has some examples of the output from the library.

You can also go [here](https://www.kaggle.com/datasets/zq1200/magnus-carlsen-lichess-games-dataset) to see a Kaggle project that converted all of Magnus Carlsen's online Bullet games
into CSV format. 


## Columns

This is a full list of the columns in each output file:

### Games File

| Field                 | Description                        |
|-----------------------|------------------------------------|
| game_id               | ID of game generated by process    |
| game_order            | Order of game in PGN file          |
| event                 | Event                              |
| site                  | Site                               |
| date_played           | Date played                        |
| round                 | Round                              |
| white                 | White player                       |
| black                 | Black player                       |
| result                | Result                             |
| white_elo             | White player rating                |
| white_rating_diff     | White rating difference from Black |
| black_elo             | Black player rating                |
| black_rating_diff     | Black rating difference from White |
| white_title           | Player title                       |
| black_title           | Player title                       |
| winner                | Player name                        |
| winner_elo            | Player rating                      |
| loser                 | Losing player                      |
| loser_elo             | Player rating                      |
| winner_loser_elo_diff | Diff in rating                     |
| eco                   | Opening                            |
| termination           | How game ended                     |
| time_control          | Time control                       |
| utc_date              | Date played                        |
| utc_time              | Time played                        |
| variant               | Game type                          |
| ply_count             | Ply Count                          |
| date_created          | Extract date                       |
| file_name             | PGN source file                    |


### Moves File

| Field                          | Description                                                             |
|--------------------------------|-------------------------------------------------------------------------|
| game_id                        | ID of game that maps to games file                                      |
| move_no                        | Order of moves                                                          |
| move_no_pair                   | Chess move number                                                       |
| player                         | Player name                                                             |
| notation                       | Standard notation of move                                               |
| move                           | Before and after piece location                                         |
| from_square                    | Piece location before                                                   |
| to_square                      | Piece location after                                                    |
| piece                          | Initial of piece name                                                   |
| color                          | Piece color                                                             |
| fen                            | Fen position                                                            |
| is_check                       | Is check on board                                                       |
| is_check_mate                  | Is checkmate on board                                                   |
| is_fifty_moves                 | Is 50 move complete                                                     |
| is_fivefold_repetition         | Is 5 fold repetition on board                                           |
| is_game_over                   | Is game over                                                            |
| is_insufficient_material       | Is game over from lack of mating material                               |
| white_count                    | Count of white pieces                                                   |
| black_count                    | Count of black pieces                                                   |
| white_{piece}_count            | Count of white specified piece                                          |
| black_{piece}_count            | Count of black specified piece                                          |
| captured_score_for_white       | Total of black pieces captured                                          |
| captured_score_for_black       | Total of white pieces captured                                          |
| fen_row{number}_{colour)_count | Number of pieces for the specified colour on this row of the board      |
| fen_row{number}_{colour}_value | Total value of pieces for the specified colour on this row of the board |
| move_sequence                  | Sequence of moves up to current position                                |


## Contributions

Contributions are welcome, all modifications should come with appropriate tests demonstrating
an issue has been resolved, or new functionality is working as intended. Pull Requests without tests
will not be merged.

The library can be tested by doing the following:

    from testing.tests import run_all_tests
    run_all_tests()

New tests should be added to the above method.


## Acknowledgements

This project makes use of the [python-chess](https://github.com/niklasf/python-chess) library.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/zq99/pgn2data",
    "name": "pgn2data",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "CHESS,PGN,NOTATION,DATA,FORSYTH\u2013EDWARDS NOTATION,CSV,DATASET,DATABASE,NORMALIZATION,TABULATION,STRUCTURED DATA,SQL,TABLE,EXCEL,PYTHON-CHESS",
    "author": "zq99",
    "author_email": "zq99@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/58/27/12b789f240b60f0ffcbe2e038f8c8cd8c2956219a3247a80f1abf902d4d9/pgn2data-0.0.9.tar.gz",
    "platform": null,
    "description": "# pgn2data\r\n\r\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\r\n![GitHub stars](https://img.shields.io/github/stars/zq99/pgn2data?style=social)\r\n![GitHub forks](https://img.shields.io/github/forks/zq99/pgn2data?style=social)\r\n\r\n\r\nThis library converts chess pgn files into CSV tabulated data sets.\r\n\r\nA pgn file can contain one or multiple chess games. The library parses the pgn file and creates two csv files:\r\n\r\n- Games file: contains high level information (e.g. date, site, event, score, players etc...)\r\n\r\n- Moves file: contains the moves for each game  (e.g. notation, squares, fen position, is in check etc...)\r\n\r\nThe two files can be mapped together using a GUID which the process inserts into both files.\r\n\r\n\r\n## Installation\r\n\r\nThe library requires Python 3.7 or later.  \r\n \r\nTo install, type the following command on the python terminal:\r\n\r\n    pip install pgn2data\r\n    \r\n  \r\n## Implementation\r\n\r\nHere is a basic example of how to convert a PGN file:\r\n\r\n    from converter.pgn_data import PGNData\r\n    \r\n    pgn_data = PGNData(\"tal_bronstein_1982.pgn\")\r\n    pgn_data.export()\r\n\r\nThe following is an example of grouping multiple files into the same output file (\"output.csv\").\r\n\r\n    pgn_data = PGNData([\"file1.pgn\",\"file2.pgn\"],\"output\")\r\n    pgn_data.export()\r\n    \r\nThe export function has a return object which allows you to quickly check the size and location of the files created:\r\n\r\n    pgn_data = PGNData(\"tal_bronstein_1982.pgn\")\r\n    result = pgn_data.export()\r\n    result.print_summary()\r\n\r\nIf you want to check if the files have been created before doing further processing you can do the following:\r\n\r\n    pgn_data = PGNData(\"tal_bronstein_1982.pgn\")\r\n    result = pgn_data.export()\r\n    if result.is_complete:\r\n        print(\"Files created!\")\r\n    else:\r\n        print(\"Files not created!\")\r\n\r\nThe result object also provides methods to import the created files into pandas dataframes:\r\n\r\n    pgn_data = PGNData(\"tal_bronstein_1982.pgn\")\r\n    result = pgn_data.export()\r\n    if result.is_complete:\r\n        \r\n        # read the games file\r\n        games_df = result.get_games_df()\r\n        print(games_df.head())\r\n        \r\n        # read the moves file\r\n        moves_df = result.get_moves_df()\r\n        print(moves_df.head())\r\n        \r\n        # read both files joined together\r\n        combined_df = result.get_combined_df()\r\n        print(combined_df.head())\r\n\r\nTo output the game information only, you can do the following:\r\n    \r\n    from converter.pgn_data import PGNData\r\n    \r\n    pgn_data = PGNData(\"tal_bronstein_1982.pgn\")\r\n    pgn_data.export(moves_required=False)\r\n\r\n\r\n## Examples\r\n\r\nThe folder 'samples' in this repository, has some examples of the output from the library.\r\n\r\nYou can also go [here](https://www.kaggle.com/datasets/zq1200/magnus-carlsen-lichess-games-dataset) to see a Kaggle project that converted all of Magnus Carlsen's online Bullet games\r\ninto CSV format. \r\n\r\n\r\n## Columns\r\n\r\nThis is a full list of the columns in each output file:\r\n\r\n### Games File\r\n\r\n| Field                 | Description                        |\r\n|-----------------------|------------------------------------|\r\n| game_id               | ID of game generated by process    |\r\n| game_order            | Order of game in PGN file          |\r\n| event                 | Event                              |\r\n| site                  | Site                               |\r\n| date_played           | Date played                        |\r\n| round                 | Round                              |\r\n| white                 | White player                       |\r\n| black                 | Black player                       |\r\n| result                | Result                             |\r\n| white_elo             | White player rating                |\r\n| white_rating_diff     | White rating difference from Black |\r\n| black_elo             | Black player rating                |\r\n| black_rating_diff     | Black rating difference from White |\r\n| white_title           | Player title                       |\r\n| black_title           | Player title                       |\r\n| winner                | Player name                        |\r\n| winner_elo            | Player rating                      |\r\n| loser                 | Losing player                      |\r\n| loser_elo             | Player rating                      |\r\n| winner_loser_elo_diff | Diff in rating                     |\r\n| eco                   | Opening                            |\r\n| termination           | How game ended                     |\r\n| time_control          | Time control                       |\r\n| utc_date              | Date played                        |\r\n| utc_time              | Time played                        |\r\n| variant               | Game type                          |\r\n| ply_count             | Ply Count                          |\r\n| date_created          | Extract date                       |\r\n| file_name             | PGN source file                    |\r\n\r\n\r\n### Moves File\r\n\r\n| Field                          | Description                                                             |\r\n|--------------------------------|-------------------------------------------------------------------------|\r\n| game_id                        | ID of game that maps to games file                                      |\r\n| move_no                        | Order of moves                                                          |\r\n| move_no_pair                   | Chess move number                                                       |\r\n| player                         | Player name                                                             |\r\n| notation                       | Standard notation of move                                               |\r\n| move                           | Before and after piece location                                         |\r\n| from_square                    | Piece location before                                                   |\r\n| to_square                      | Piece location after                                                    |\r\n| piece                          | Initial of piece name                                                   |\r\n| color                          | Piece color                                                             |\r\n| fen                            | Fen position                                                            |\r\n| is_check                       | Is check on board                                                       |\r\n| is_check_mate                  | Is checkmate on board                                                   |\r\n| is_fifty_moves                 | Is 50 move complete                                                     |\r\n| is_fivefold_repetition         | Is 5 fold repetition on board                                           |\r\n| is_game_over                   | Is game over                                                            |\r\n| is_insufficient_material       | Is game over from lack of mating material                               |\r\n| white_count                    | Count of white pieces                                                   |\r\n| black_count                    | Count of black pieces                                                   |\r\n| white_{piece}_count            | Count of white specified piece                                          |\r\n| black_{piece}_count            | Count of black specified piece                                          |\r\n| captured_score_for_white       | Total of black pieces captured                                          |\r\n| captured_score_for_black       | Total of white pieces captured                                          |\r\n| fen_row{number}_{colour)_count | Number of pieces for the specified colour on this row of the board      |\r\n| fen_row{number}_{colour}_value | Total value of pieces for the specified colour on this row of the board |\r\n| move_sequence                  | Sequence of moves up to current position                                |\r\n\r\n\r\n## Contributions\r\n\r\nContributions are welcome, all modifications should come with appropriate tests demonstrating\r\nan issue has been resolved, or new functionality is working as intended. Pull Requests without tests\r\nwill not be merged.\r\n\r\nThe library can be tested by doing the following:\r\n\r\n    from testing.tests import run_all_tests\r\n    run_all_tests()\r\n\r\nNew tests should be added to the above method.\r\n\r\n\r\n## Acknowledgements\r\n\r\nThis project makes use of the [python-chess](https://github.com/niklasf/python-chess) library.\r\n",
    "bugtrack_url": null,
    "license": "GPL-3.0+",
    "summary": "Converts a chess pgn file into a csv dataset containing game information and move information",
    "version": "0.0.9",
    "project_urls": {
        "Homepage": "https://github.com/zq99/pgn2data"
    },
    "split_keywords": [
        "chess",
        "pgn",
        "notation",
        "data",
        "forsyth\u2013edwards notation",
        "csv",
        "dataset",
        "database",
        "normalization",
        "tabulation",
        "structured data",
        "sql",
        "table",
        "excel",
        "python-chess"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b93ec367ece9612bf98cf5c08597340520cb54cb061b6a61c94147ae0be8af96",
                "md5": "4da973f36163f8fa19c5b08acc40586a",
                "sha256": "4458f12bdcd1c3eb5660b6d67338d2cc05a50e1fdd9d3b132f38f129345ec74c"
            },
            "downloads": -1,
            "filename": "pgn2data-0.0.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4da973f36163f8fa19c5b08acc40586a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 31850,
            "upload_time": "2023-07-05T12:20:12",
            "upload_time_iso_8601": "2023-07-05T12:20:12.978488Z",
            "url": "https://files.pythonhosted.org/packages/b9/3e/c367ece9612bf98cf5c08597340520cb54cb061b6a61c94147ae0be8af96/pgn2data-0.0.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "582712b789f240b60f0ffcbe2e038f8c8cd8c2956219a3247a80f1abf902d4d9",
                "md5": "aa8a1fd17bbe46ba5aa1aca825d6b6ca",
                "sha256": "2021229c11d5a8516d57ead504efe4ff551a50d006aa5c205665c7f681621136"
            },
            "downloads": -1,
            "filename": "pgn2data-0.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "aa8a1fd17bbe46ba5aa1aca825d6b6ca",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 31602,
            "upload_time": "2023-07-05T12:20:15",
            "upload_time_iso_8601": "2023-07-05T12:20:15.971580Z",
            "url": "https://files.pythonhosted.org/packages/58/27/12b789f240b60f0ffcbe2e038f8c8cd8c2956219a3247a80f1abf902d4d9/pgn2data-0.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-05 12:20:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zq99",
    "github_project": "pgn2data",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "pgn2data"
}
        
Elapsed time: 1.06638s