eurlex-parser


Nameeurlex-parser JSON
Version 0.0.13 PyPI version JSON
download
home_pagehttps://github.com/noworneverev/eurlex-parser
SummaryEurlex parser for fetching and parsing Eurlex data.
upload_time2024-08-14 18:56:22
maintainerNone
docs_urlNone
authorYan-Ying Liao
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Eurlex Parser

This Python package fetches and parses data(regulations, directives and proposals) from Eurlex, the official website for European Union law. It extracts various parts of legal documents by their CELEX IDs and supports exporting the data in JSON and Pandas DataFrame formats.

## Installation

```bash
pip install eurlex-parser
```

## Usage

### Functions

- `get_data_by_celex_id(celex_id: str, language: str = "en") -> dict`: Fetches and parses the data for the given CELEX ID. Returns a dictionary with the document's title, preamble, articles, final part, and annexes.
  
- `get_json_by_celex_id(celex_id: str) -> str`: Fetches and parses the data for the given CELEX ID and returns it in JSON format.

- `get_articles_by_celex_id(celex_id: str) -> pd.DataFrame`: Fetches and parses the articles for the given CELEX ID and returns them as a Pandas DataFrame.

- `get_summary_by_celex_id(celex_id: str, language: str = "en")` -> dict: Fetches and parses the summary for the given CELEX ID and returns it as a dictionary containing the document's title, chapters, and the last modified date. (Note: The summary is not available for all documents.)

### Examples

Following are some examples of how to use the functions to fetch and parse data from a CELEX ID. For example, the CELEX ID `32013R0575` corresponds to the following URL: https://eur-lex.europa.eu/legal-content/en/TXT/?uri=celex:32013R0575
1. Fetch and print data for a given CELEX ID:
    ```python
    from eurlex import get_data_by_celex_id

    data = get_data_by_celex_id('32013R0575')
    print(data)
    ```

2. Save data as a JSON file:
    ```python
    from eurlex import get_json_by_celex_id

    json_data = get_json_by_celex_id('32013R0575')
    with open('32013R0575.json', 'w', encoding='utf-8') as f:
        f.write(json_data)
    ```

3. Load articles into a Pandas DataFrame:
    ```python
    from eurlex import get_articles_by_celex_id

    df = get_articles_by_celex_id('32013R0575')
    print(df.head())
    ```
4. Fetch and print summary for a given CELEX ID:
    ```python
    from eurlex import get_summary_by_celex_id

    summary = get_summary_by_celex_id('32013R0575')
    print(summary)
    ```


You can find some generated JSON files in the `examples` directory.

### Data Structure

The main data structure returned by `get_data_by_celex_id` is a dictionary with the following format:
```json
{
  "title": "Document Title",
  "preamble": {
    "text": "Preamble text",
    "notes": [
      {
        "id": "1",
        "text": "Note text",
        "url": "https://eur-lex.europa.eu/...",
        "reference": null
      }
    ]
  },
  "articles": [
    {
      "id": "Article ID",
      "title": "Article Title",
      "text": "Article text",
      "metadata": {
        "parent_title1": "Parent Title 1",
        "parent_title2": "Parent Title 2",
      },
      "notes": [
        {
          "id": "1",
          "text": "Note text",
          "url": "https://eur-lex.europa.eu/...",
          "reference": null
        }
      ],
      "references": [
        "Directive ..../../..",
        "Regulation (EU) No .../....",
      ]
    }
  ],
  "notes": [
    {
      "id": "1",
      "text": "Note text",
      "url": "https://eur-lex.europa.eu/...",
      "reference": null
    }
  ],  
  "references": [
    "Directive ..../../..",
    "Regulation (EU) No .../....",
  ],
  "final_part": "Final part text",
  "annexes": [
    {
      "id": "Annex ID",
      "title": "Annex Title",
      "text": "Annex text",
      "table": "Markdown table text"
    }
  ],
  "summary": {
    "title": "Document Title",
    "chapters": {
      "Chapter Title 1": "Chapter content 1",
      "Chapter Title 2": "Chapter content 2"
    },
    "last_modified": "Last modified date"
  },
  "related_documents": {
    "modifies": [
      {
        "Relation": "Modifies",
        "Act": {
            "celex": "CELEX Number",
            "url": "https://eur-lex.europa.eu/..."
        },
        "Comment": "Addition",
        "Subdivision concerned": "Article number/paragraph",
        "From": "date",
        "To": "date"
      }
    ],
    "modified_by": [
      {
        "Relation": "Corrected by",
        "Act": {
            "celex": "CELEX Number",
            "url": "https://eur-lex.europa.eu/..."
        },
        "Comment": "",
        "Subdivision concerned": "Article number/paragraph",
        "From": "date",
        "To": "date"
      }
    ],
  }
}
```

### Notes

- The script currently supports fetching data in English (`en`) only.

## License

This project is licensed under the MIT License.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/noworneverev/eurlex-parser",
    "name": "eurlex-parser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Yan-Ying Liao",
    "author_email": "n9102125@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/4d/91/c4f918f4fd0493daf9452d56ef2bd44ebb8ce7f856ff357cdb801287c34a/eurlex-parser-0.0.13.tar.gz",
    "platform": null,
    "description": "# Eurlex Parser\r\n\r\nThis Python package fetches and parses data(regulations, directives and proposals) from Eurlex, the official website for European Union law. It extracts various parts of legal documents by their CELEX IDs and supports exporting the data in JSON and Pandas DataFrame formats.\r\n\r\n## Installation\r\n\r\n```bash\r\npip install eurlex-parser\r\n```\r\n\r\n## Usage\r\n\r\n### Functions\r\n\r\n- `get_data_by_celex_id(celex_id: str, language: str = \"en\") -> dict`: Fetches and parses the data for the given CELEX ID. Returns a dictionary with the document's title, preamble, articles, final part, and annexes.\r\n  \r\n- `get_json_by_celex_id(celex_id: str) -> str`: Fetches and parses the data for the given CELEX ID and returns it in JSON format.\r\n\r\n- `get_articles_by_celex_id(celex_id: str) -> pd.DataFrame`: Fetches and parses the articles for the given CELEX ID and returns them as a Pandas DataFrame.\r\n\r\n- `get_summary_by_celex_id(celex_id: str, language: str = \"en\")` -> dict: Fetches and parses the summary for the given CELEX ID and returns it as a dictionary containing the document's title, chapters, and the last modified date. (Note: The summary is not available for all documents.)\r\n\r\n### Examples\r\n\r\nFollowing are some examples of how to use the functions to fetch and parse data from a CELEX ID. For example, the CELEX ID `32013R0575` corresponds to the following URL: https://eur-lex.europa.eu/legal-content/en/TXT/?uri=celex:32013R0575\r\n1. Fetch and print data for a given CELEX ID:\r\n    ```python\r\n    from eurlex import get_data_by_celex_id\r\n\r\n    data = get_data_by_celex_id('32013R0575')\r\n    print(data)\r\n    ```\r\n\r\n2. Save data as a JSON file:\r\n    ```python\r\n    from eurlex import get_json_by_celex_id\r\n\r\n    json_data = get_json_by_celex_id('32013R0575')\r\n    with open('32013R0575.json', 'w', encoding='utf-8') as f:\r\n        f.write(json_data)\r\n    ```\r\n\r\n3. Load articles into a Pandas DataFrame:\r\n    ```python\r\n    from eurlex import get_articles_by_celex_id\r\n\r\n    df = get_articles_by_celex_id('32013R0575')\r\n    print(df.head())\r\n    ```\r\n4. Fetch and print summary for a given CELEX ID:\r\n    ```python\r\n    from eurlex import get_summary_by_celex_id\r\n\r\n    summary = get_summary_by_celex_id('32013R0575')\r\n    print(summary)\r\n    ```\r\n\r\n\r\nYou can find some generated JSON files in the `examples` directory.\r\n\r\n### Data Structure\r\n\r\nThe main data structure returned by `get_data_by_celex_id` is a dictionary with the following format:\r\n```json\r\n{\r\n  \"title\": \"Document Title\",\r\n  \"preamble\": {\r\n    \"text\": \"Preamble text\",\r\n    \"notes\": [\r\n      {\r\n        \"id\": \"1\",\r\n        \"text\": \"Note text\",\r\n        \"url\": \"https://eur-lex.europa.eu/...\",\r\n        \"reference\": null\r\n      }\r\n    ]\r\n  },\r\n  \"articles\": [\r\n    {\r\n      \"id\": \"Article ID\",\r\n      \"title\": \"Article Title\",\r\n      \"text\": \"Article text\",\r\n      \"metadata\": {\r\n        \"parent_title1\": \"Parent Title 1\",\r\n        \"parent_title2\": \"Parent Title 2\",\r\n      },\r\n      \"notes\": [\r\n        {\r\n          \"id\": \"1\",\r\n          \"text\": \"Note text\",\r\n          \"url\": \"https://eur-lex.europa.eu/...\",\r\n          \"reference\": null\r\n        }\r\n      ],\r\n      \"references\": [\r\n        \"Directive ..../../..\",\r\n        \"Regulation (EU) No .../....\",\r\n      ]\r\n    }\r\n  ],\r\n  \"notes\": [\r\n    {\r\n      \"id\": \"1\",\r\n      \"text\": \"Note text\",\r\n      \"url\": \"https://eur-lex.europa.eu/...\",\r\n      \"reference\": null\r\n    }\r\n  ],  \r\n  \"references\": [\r\n    \"Directive ..../../..\",\r\n    \"Regulation (EU) No .../....\",\r\n  ],\r\n  \"final_part\": \"Final part text\",\r\n  \"annexes\": [\r\n    {\r\n      \"id\": \"Annex ID\",\r\n      \"title\": \"Annex Title\",\r\n      \"text\": \"Annex text\",\r\n      \"table\": \"Markdown table text\"\r\n    }\r\n  ],\r\n  \"summary\": {\r\n    \"title\": \"Document Title\",\r\n    \"chapters\": {\r\n      \"Chapter Title 1\": \"Chapter content 1\",\r\n      \"Chapter Title 2\": \"Chapter content 2\"\r\n    },\r\n    \"last_modified\": \"Last modified date\"\r\n  },\r\n  \"related_documents\": {\r\n    \"modifies\": [\r\n      {\r\n        \"Relation\": \"Modifies\",\r\n        \"Act\": {\r\n            \"celex\": \"CELEX Number\",\r\n            \"url\": \"https://eur-lex.europa.eu/...\"\r\n        },\r\n        \"Comment\": \"Addition\",\r\n        \"Subdivision concerned\": \"Article number/paragraph\",\r\n        \"From\": \"date\",\r\n        \"To\": \"date\"\r\n      }\r\n    ],\r\n    \"modified_by\": [\r\n      {\r\n        \"Relation\": \"Corrected by\",\r\n        \"Act\": {\r\n            \"celex\": \"CELEX Number\",\r\n            \"url\": \"https://eur-lex.europa.eu/...\"\r\n        },\r\n        \"Comment\": \"\",\r\n        \"Subdivision concerned\": \"Article number/paragraph\",\r\n        \"From\": \"date\",\r\n        \"To\": \"date\"\r\n      }\r\n    ],\r\n  }\r\n}\r\n```\r\n\r\n### Notes\r\n\r\n- The script currently supports fetching data in English (`en`) only.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Eurlex parser for fetching and parsing Eurlex data.",
    "version": "0.0.13",
    "project_urls": {
        "Homepage": "https://github.com/noworneverev/eurlex-parser"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e36144fc44f68f9047d28f948560acac8163e4fd13fbf11c0f88a04dda4b0092",
                "md5": "190934ad8fe6323f55d484cb827faa38",
                "sha256": "94d0a4c84fdf17de6378a26b95492a5e19cc343efc2e2fdae076c36aad52ae1d"
            },
            "downloads": -1,
            "filename": "eurlex_parser-0.0.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "190934ad8fe6323f55d484cb827faa38",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 13701,
            "upload_time": "2024-08-14T18:56:19",
            "upload_time_iso_8601": "2024-08-14T18:56:19.249766Z",
            "url": "https://files.pythonhosted.org/packages/e3/61/44fc44f68f9047d28f948560acac8163e4fd13fbf11c0f88a04dda4b0092/eurlex_parser-0.0.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4d91c4f918f4fd0493daf9452d56ef2bd44ebb8ce7f856ff357cdb801287c34a",
                "md5": "00d7cb8559d3c2de67f40d5b6c9ed061",
                "sha256": "e5985797abc71e456255f35e4c21ab44231830cb9bef832f7e4298cd84dd9a91"
            },
            "downloads": -1,
            "filename": "eurlex-parser-0.0.13.tar.gz",
            "has_sig": false,
            "md5_digest": "00d7cb8559d3c2de67f40d5b6c9ed061",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 11905,
            "upload_time": "2024-08-14T18:56:22",
            "upload_time_iso_8601": "2024-08-14T18:56:22.348910Z",
            "url": "https://files.pythonhosted.org/packages/4d/91/c4f918f4fd0493daf9452d56ef2bd44ebb8ce7f856ff357cdb801287c34a/eurlex-parser-0.0.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-14 18:56:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "noworneverev",
    "github_project": "eurlex-parser",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "eurlex-parser"
}
        
Elapsed time: 0.41249s