aniparse

Name	aniparse JSON
Version	1.2.2 JSON
	download
home_page	https://github.com/MeGaNeKoS/aniparse
Summary	An anime video filename parser
upload_time	2024-02-23 14:05:36
maintainer
docs_url	None
author	めがねこ
requires_python
license	License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
keywords	anime filename parser
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# Aniparse

Aniparse is a Python library for parsing anime video filenames. It's simple to use, and it's based on the C++
library [Anitomy](https://github.com/erengy/anitomy) with a lot of improvement.

## Update

This library has already achieved its goal in a somewhat hacky way, as discussed in [issue #9](https://github.com/MeGaNeKoS/Aniparse/issues/9).
I am aware that the last commit isn't the clean code, but I don't have much time to work on this project anymore.
It's a sacrifice I have to make. I don't expect any improvements here for another year or so unless something breaks.
If you have an interest in this project, I would suggest you take a look at the [v2-idea](https://github.com/MeGaNeKoS/Aniparse/tree/v2-idea) branch instead.
I've documented the library's goals, how I plan to achieve them, and other details more comprehensively in that branch.

## Example

The following filename

```
[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv
Toradora! S01E03-Your Song.mkv
```

can be parsed using the following code:

```python
import aniparse

aniparse.parse('[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv')
{
'anime_title': 'Toradora!',
'anime_year': 2008,
'audio_term': 'FLAC',
'episode_number': 1,
'episode_title': 'Tiger and Dragon',
'file_checksum': '1234ABCD',
'file_extension': 'mkv',
'file_name': '[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv',
'release_group': 'TaigaSubs',
'release_version': 2,
'video_resolution': '1280x720',
'video_term': 'H.264'
}

aniparse.parse("Toradora! S01E03-Your Song.mkv")
{
'anime_season': 1,
'anime_season_prefix': 'S',
'anime_title': 'Toradora!',
'episode_number': 3,
'episode_prefix': 'E',
'episode_title': 'Your Song',
'file_extension': 'mkv',
'file_name': 'Toradora! S01E03-Your Song.mkv'
}
```

The `parse` function receives a string and returns a dictionary containing all found elements.
It can also receive parsing `options` and `keyword_manager`, this will be explained below.

# How does it work?

Suppose that we're working on the following filename:

```text
"Aim_For_The_Top!_Gunbuster-ep1.BD(H264.FLAC.10bit)[KAA][69ECCDCF].mkv"
```

The filename is first stripped off of its extension and split into groups. Groups are determined by the position of
brackets:

```text
"Aim_For_The_Top!_Gunbuster-ep1.BD", "H264.FLAC.10bit", "KAA", "69ECCDCF"
```

Each group is then split into tokens. In our current example, the delimiter for the enclosed group is `.`, while the
words in other groups are separated by `_`:

```text
"Aim", "For", "The", "Top!", "Gunbuster-ep1", "BD", "H264", "FLAC", "10bit", "KAA", "69ECCDCF"
```

Note: the brackets and delimiter are stored as token with category `Delimiter` and `Bracket`. And each token remembers
if it enclosed or not.

Once the tokenizer is done, the parser comes into effect.
First, all tokens are compared against a set of known keywords. In this case,
the tokens `BD`, `H264`, `FLAC`, `10bit`, and `69ECCDCF` are recognized as keywords,
and are assigned the category `Source`, `VideoTerm`, `AudioTerm`, `VideoResolution`, and `FileChecksum` respectively.

```text
"Aim", "For", "The", "Top!", "Gunbuster-ep1", "KAA"
```

The next step is to look for the episode number. Each token that contains a number is analyzed. Here.
`Gunbuster-ep1` contains number, but it doesn't match the episode number pattern. In this case,
the token checked againts buggy dash pattern. So, `Gunbuster-ep1` will be split into `Gunbuster` and `ep1`.
After that, it will check and `ep1` is recognized as an episode number.
The category `EpisodeNumber` is assigned to it and the changes is saved.

```text
"Aim", "For", "The", "Top!", "Gunbuster", "KAA"
```

The next step is to look for the anime title. The parser will try to find unknown token before the episode number and
not inside a bracket.
In this case, `Aim`, `For`, `The`, `Top!`, and `Gunbuster` are unknown tokens, they are not inside a bracket, so it
assigned to the `AnimeTitle` category.

```text
"KAA"
```

the next step is to look for the release group. The parser will try to find unknown token after the episode number and
inside a bracket.
In this case, `KAA` is unknown token, and it inside a bracket, so it assigned to the `ReleaseGroup` category.

```text

```

the next step is to look for the episode title. The parser will try to find unknown token after the episode number and
not inside a bracket.
In this case, no more unknown token left, so it leave it empty

```text

```

lastly, the parser will try to find any unknown token and assign it to each category or to `Others` if it is not
recognized.

# Why should I use it?

Anime video files are commonly named in a format where the anime title is followed by the episode number,
and all the technical details are enclosed within brackets.
However, fansub groups tend to use their own naming conventions,
and the problem is more complicated than it first appears:

Element order is not always the same.
Technical information is not guaranteed to be enclosed.
Brackets and parentheses may be grouping symbols or a part of the anime/episode title.
Space and underscore are not the only delimiters in use.
A single filename may contain multiple delimiters.
There are so many cases to cover that it's simply not possible to parse all filenames solely with
regular expressions. Aniparse tries a different approach, and it succeeds:
It's able to parse tens of thousands of filenames, with great accuracy.

# Are there any exceptions?

Yes, unfortunately. Aniparse fails to identify the anime title and episode number on rare occasions,
mostly due to bad naming conventions. See the examples below.

Arigatou.Shuffle!.Ep08.[x264.AAC][D6E43829].mkv
Here, Aniparse would report that this file is the 8th episode of `Arigatou Shuffle!`, where `Arigatou` is actually the
name of the fansub group.

Spice and Wolf 2
Is this the 2nd episode of `Spice and Wolf`, or a batch release of `Spice and Wolf 2`? with a text after number, there's
no way to know. It's up to you consider both cases. For current version, it treats as part of title if it's not leading zero,
and as episode number if it's leading zero.

## Suggestions to fansub groups

Please consider abiding by these simple rules before deciding on your naming convention:

- Don't enclose anime title, episode number and episode title within brackets. Enclose everything else, including the
name of your group.
- Don't use parentheses to enclose release information; use square brackets instead. Parentheses should only be used if
they are a part of the anime/episode title.
- Don't use multiple delimiters in a single filename. If possible, stick with either space or underscore.
- Use a separator (e.g. a dash) between anime title and episode number. There are anime titles that end with a number,
which creates ambiguity.
- Indicate the episode interval in batch releases.

## Installation

To install Aniparse, simply use pip:

```commandline
pip install aniparse
```
Or download the source code and inside the source code's folder run:

```commandline
python setup.py install
```
Options
-------

The `parse` function can receive the `options` parameter. E.g.:

```python

import aniparse

aniparse_options = {'allowed_delimiters': ' '}
aniparse.parse('DRAMAtical Murder Episode 1 - Data_01_Login', options=aniparse_options)
{
'anime_title': 'DRAMAtical Murder',
'episode_prefix': 'Episode',
'episode_number': '1',
'episode_title': 'Data_01_Login',
'file_name': 'DRAMAtical Murder Episode 1 - Data_01_Login'
}
```
If the default options had been used, the parser would have considered `_` as a delimiter and replaced it with space in
the episode title.

The options contain the following attributes:

| **Attribute name** | **Type** | **Description** | **Default value** |
| -------------------- | --------------- | --------------------------------------------------------------- | ----------------- |
| allowed_delimiters | string | The list of character to be considered as delimiters. | ' _.&+,&#124;' |
| check_title_enclosed | boolean | Check the anime title in enclosed if no title found | True |
| eps_lower_than_alt | boolean | Set episode number to the lowest and the alt to be the highest | True |
| ignored_dash | boolean | If the dash in anime/episode title should be ignored or not. | True |
| ignored_strings | list of strings | A list of strings to be removed from the filename during parse. | [] |
| keep_delimiters | boolean | If the delimiters should be kept or not in anime/episode title. | False |
| max_extension_length | integer | Maximum extension length. | 4 |
| title_before_episode | boolean | If the anime title should be before the episode number or not. | True |

## License

*Aniparse* is licensed under [Mozilla Public License 2.0](https://www.mozilla.org/en-US/MPL/2.0/FAQ/).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MeGaNeKoS/aniparse",
    "name": "aniparse",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "anime,filename parser",
    "author": "\u3081\u304c\u306d\u3053",
    "author_email": "evictory91+pypackages@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/3e/0a/bfc2cd489b5a8e15b0b612aa427e27efdd3ec9db8238abbdde2654faf70e/aniparse-1.2.2.tar.gz",
    "platform": null,
    "description": "# Aniparse\n\nAniparse is a Python library for parsing anime video filenames. It's simple to use, and it's based on the C++\nlibrary [Anitomy](https://github.com/erengy/anitomy) with a lot of improvement.\n\n## Update\n\nThis library has already achieved its goal in a somewhat hacky way, as discussed in [issue #9](https://github.com/MeGaNeKoS/Aniparse/issues/9). \nI am aware that the last commit isn't the clean code, but I don't have much time to work on this project anymore. \nIt's a sacrifice I have to make. I don't expect any improvements here for another year or so unless something breaks. \nIf you have an interest in this project, I would suggest you take a look at the [v2-idea](https://github.com/MeGaNeKoS/Aniparse/tree/v2-idea) branch instead. \nI've documented the library's goals, how I plan to achieve them, and other details more comprehensively in that branch.\n\n## Example\n\nThe following filename\n\n```\n[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv\nToradora! S01E03-Your Song.mkv\n```\n\ncan be parsed using the following code:\n\n```python\nimport aniparse\n\naniparse.parse('[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv')\n{\n    'anime_title': 'Toradora!',\n    'anime_year': 2008,\n    'audio_term': 'FLAC',\n    'episode_number': 1,\n    'episode_title': 'Tiger and Dragon',\n    'file_checksum': '1234ABCD',\n    'file_extension': 'mkv',\n    'file_name': '[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv',\n    'release_group': 'TaigaSubs',\n    'release_version': 2,\n    'video_resolution': '1280x720',\n    'video_term': 'H.264'\n}\n\naniparse.parse(\"Toradora! S01E03-Your Song.mkv\")\n{\n    'anime_season': 1,\n    'anime_season_prefix': 'S',\n    'anime_title': 'Toradora!',\n    'episode_number': 3,\n    'episode_prefix': 'E',\n    'episode_title': 'Your Song',\n    'file_extension': 'mkv',\n    'file_name': 'Toradora! S01E03-Your Song.mkv'\n}\n```\n\nThe `parse` function receives a string and returns a dictionary containing all found elements.\nIt can also receive parsing `options` and `keyword_manager`, this will be explained below.\n\n# How does it work?\n\nSuppose that we're working on the following filename:\n\n```text\n\"Aim_For_The_Top!_Gunbuster-ep1.BD(H264.FLAC.10bit)[KAA][69ECCDCF].mkv\"\n```\n\nThe filename is first stripped off of its extension and split into groups. Groups are determined by the position of\nbrackets:\n\n```text\n\"Aim_For_The_Top!_Gunbuster-ep1.BD\", \"H264.FLAC.10bit\", \"KAA\", \"69ECCDCF\"\n```\n\nEach group is then split into tokens. In our current example, the delimiter for the enclosed group is `.`, while the\nwords in other groups are separated by `_`:\n\n```text\n\"Aim\", \"For\", \"The\", \"Top!\", \"Gunbuster-ep1\", \"BD\", \"H264\", \"FLAC\", \"10bit\", \"KAA\", \"69ECCDCF\"\n```\n\nNote: the brackets and delimiter are stored as token with category `Delimiter` and `Bracket`. And each token remembers\nif it enclosed or not.\n\nOnce the tokenizer is done, the parser comes into effect.\nFirst, all tokens are compared against a set of known keywords. In this case,\nthe tokens `BD`, `H264`, `FLAC`, `10bit`, and `69ECCDCF` are recognized as keywords,\nand are assigned the category `Source`, `VideoTerm`, `AudioTerm`, `VideoResolution`, and `FileChecksum` respectively.\n\n```text\n\"Aim\", \"For\", \"The\", \"Top!\", \"Gunbuster-ep1\", \"KAA\"\n```\n\nThe next step is to look for the episode number. Each token that contains a number is analyzed. Here.\n`Gunbuster-ep1` contains number, but it doesn't match the episode number pattern. In this case,\nthe token checked againts buggy dash pattern. So, `Gunbuster-ep1` will be split into `Gunbuster` and `ep1`.\nAfter that, it will check and `ep1` is recognized as an episode number.\nThe category `EpisodeNumber` is assigned to it and the changes is saved.\n\n```text\n\"Aim\", \"For\", \"The\", \"Top!\", \"Gunbuster\", \"KAA\"\n```\n\nThe next step is to look for the anime title. The parser will try to find unknown token before the episode number and\nnot inside a bracket.\nIn this case, `Aim`, `For`, `The`, `Top!`, and `Gunbuster` are unknown tokens, they are not inside a bracket, so it\nassigned to the `AnimeTitle` category.\n\n```text\n\"KAA\"\n```\n\nthe next step is to look for the release group. The parser will try to find unknown token after the episode number and\ninside a bracket.\nIn this case, `KAA` is unknown token, and it inside a bracket, so it assigned to the `ReleaseGroup` category.\n\n```text\n\n```\n\nthe next step is to look for the episode title. The parser will try to find unknown token after the episode number and\nnot inside a bracket.\nIn this case, no more unknown token left, so it leave it empty\n\n```text\n\n```\n\nlastly, the parser will try to find any unknown token and assign it to each category or to `Others` if it is not\nrecognized.\n\n# Why should I use it?\n\nAnime video files are commonly named in a format where the anime title is followed by the episode number,\nand all the technical details are enclosed within brackets.\nHowever, fansub groups tend to use their own naming conventions,\nand the problem is more complicated than it first appears:\n\nElement order is not always the same.\nTechnical information is not guaranteed to be enclosed.\nBrackets and parentheses may be grouping symbols or a part of the anime/episode title.\nSpace and underscore are not the only delimiters in use.\nA single filename may contain multiple delimiters.\nThere are so many cases to cover that it's simply not possible to parse all filenames solely with\nregular expressions. Aniparse tries a different approach, and it succeeds:\nIt's able to parse tens of thousands of filenames, with great accuracy.\n\n# Are there any exceptions?\n\nYes, unfortunately. Aniparse fails to identify the anime title and episode number on rare occasions,\nmostly due to bad naming conventions. See the examples below.\n\nArigatou.Shuffle!.Ep08.[x264.AAC][D6E43829].mkv\nHere, Aniparse would report that this file is the 8th episode of `Arigatou Shuffle!`, where `Arigatou` is actually the\nname of the fansub group.\n\nSpice and Wolf 2\nIs this the 2nd episode of `Spice and Wolf`, or a batch release of `Spice and Wolf 2`? with a text after number, there's\nno way to know. It's up to you consider both cases. For current version, it treats as part of title if it's not leading zero,\nand as episode number if it's leading zero.\n\n## Suggestions to fansub groups\n\nPlease consider abiding by these simple rules before deciding on your naming convention:\n\n- Don't enclose anime title, episode number and episode title within brackets. Enclose everything else, including the\n  name of your group.\n- Don't use parentheses to enclose release information; use square brackets instead. Parentheses should only be used if\n  they are a part of the anime/episode title.\n- Don't use multiple delimiters in a single filename. If possible, stick with either space or underscore.\n- Use a separator (e.g. a dash) between anime title and episode number. There are anime titles that end with a number,\n  which creates ambiguity.\n- Indicate the episode interval in batch releases.\n\n## Installation\n\nTo install Aniparse, simply use pip:\n\n```commandline\npip install aniparse\n```\nOr download the source code and inside the source code's folder run:\n\n```commandline\npython setup.py install\n```\nOptions\n-------\n\nThe `parse` function can receive the `options` parameter. E.g.:\n\n```python\n\nimport aniparse\n\naniparse_options = {'allowed_delimiters': ' '}\naniparse.parse('DRAMAtical Murder Episode 1 - Data_01_Login', options=aniparse_options)\n{\n    'anime_title': 'DRAMAtical Murder',\n    'episode_prefix': 'Episode',\n    'episode_number': '1',\n    'episode_title': 'Data_01_Login',\n    'file_name': 'DRAMAtical Murder Episode 1 - Data_01_Login'\n}\n```\nIf the default options had been used, the parser would have considered `_` as a delimiter and replaced it with space in\nthe episode title.\n\nThe options contain the following attributes:\n\n\n| **Attribute name**   | **Type**        | **Description**                                                 | **Default value** |\n| -------------------- | --------------- | --------------------------------------------------------------- | ----------------- |\n| allowed_delimiters   | string          | The list of character to be considered as delimiters.           | ' _.&+,&#124;'    |\n| check_title_enclosed | boolean         | Check the anime title in enclosed if no title found             | True              |\n| eps_lower_than_alt   | boolean         | Set episode number to the lowest and the alt to be the highest  | True              |\n| ignored_dash         | boolean         | If the dash in anime/episode title should be ignored or not.    | True              |\n| ignored_strings      | list of strings | A list of strings to be removed from the filename during parse. | []                |\n| keep_delimiters      | boolean         | If the delimiters should be kept or not in anime/episode title. | False             |\n| max_extension_length | integer         | Maximum extension length.                                       | 4                 |\n| title_before_episode | boolean         | If the anime title should be before the episode number or not.  | True              |\n\n## License\n\n*Aniparse* is licensed under [Mozilla Public License 2.0](https://www.mozilla.org/en-US/MPL/2.0/FAQ/).\n",
    "bugtrack_url": null,
    "license": "License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)",
    "summary": "An anime video filename parser",
    "version": "1.2.2",
    "project_urls": {
        "Homepage": "https://github.com/MeGaNeKoS/aniparse"
    },
    "split_keywords": [
        "anime",
        "filename parser"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "97a3528b4840dc2360e54be939978a2db13d92df206bf5f8613f4f04d7c807ad",
                "md5": "09b3da97924f67f341ef60679afae487",
                "sha256": "60f1a197c88b8f32b1b29cb112cbc37c13663cc9caf84117ef797b2b45a4ed04"
            },
            "downloads": -1,
            "filename": "aniparse-1.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "09b3da97924f67f341ef60679afae487",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 49055,
            "upload_time": "2024-02-23T14:05:35",
            "upload_time_iso_8601": "2024-02-23T14:05:35.040167Z",
            "url": "https://files.pythonhosted.org/packages/97/a3/528b4840dc2360e54be939978a2db13d92df206bf5f8613f4f04d7c807ad/aniparse-1.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3e0abfc2cd489b5a8e15b0b612aa427e27efdd3ec9db8238abbdde2654faf70e",
                "md5": "1a4b0f3165300cf84bf63ce4034f9c08",
                "sha256": "6657be0bdb31c625acf8575798d269d6f722ee29064781f62501387f1b60934c"
            },
            "downloads": -1,
            "filename": "aniparse-1.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "1a4b0f3165300cf84bf63ce4034f9c08",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 48379,
            "upload_time": "2024-02-23T14:05:36",
            "upload_time_iso_8601": "2024-02-23T14:05:36.550509Z",
            "url": "https://files.pythonhosted.org/packages/3e/0a/bfc2cd489b5a8e15b0b612aa427e27efdd3ec9db8238abbdde2654faf70e/aniparse-1.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-23 14:05:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MeGaNeKoS",
    "github_project": "aniparse",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "aniparse"
}

めがねこ