inoutlists


Nameinoutlists JSON
Version 1.0.1 PyPI version JSON
download
home_pageNone
Summaryinoutlists is a python package to parse and normalize different sources of lists (OFAC, EU, UN, etc) to a common dictionary interface.
upload_time2024-06-03 22:18:59
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT License Copyright (c) 2024 Eusebio José de la Torre Niño Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords eu ofac un decoding deduplication encoding entity-resolution lists mapping normalization parsing record-linkage sanctions
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # inoutlists

inoutlists is a python package to parse and normalize different sources of lists (OFAC, EU, UN, etc) to a common dictionary interface. 

Once the lists are parsed and normalized, the user can dump the information to other formats such as JSON, CSV or a Pandas data frame for further research or transfer to other systems. 

Moreover, the package can be extended to parse any kind of source creating specific Loaders classes or to dump the information to any kind of formats creating specific Dumpers classes. 

## Basic Usage

inoutlists main entry points are the functions load and dump:

### inoutlists.load(data, loader=Loader, *args, **kwargs)

Parameters:

- data: The data to parse. The type of the data parameter depends on the Loader chosen. It could be a url, a file, a string, etc.
- loader: Loader class. The loader class must inherit from the class Loader. It defines the logic of the transformation from the data to the common dictionary interface implementing the methods defined in the class Loader, specially the function load.
- *args, **kwargs. Positional arguments and keyword arguments passed to the loader class.

Returns: Dictionary. The list in the dictionary common interface.

### inoutlists.dump(data, dumper=Dumper, *args, **kwargs)

Parameters:

- data: An python dictionary based on the common interface.
- dumper. Dumper class. The dumper class must inherit from the class Dumper. It defines the logic of the transformation from the dictionary common interface to the target format implementing the method dump.
- *args, **kwargs. Positional arguments and keyword arguments passed to the dumper class.

Returns: Any. It depends on the Dumper class.

```python
>>> from inoutlists import load, dump, LoaderOFACXML, DumperPandas, DumperJSON
>>> from pprint import pprint 
>>> OFAC_SDN_URL = "https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/SDN.XML"
>>> OFAC_SDN = load(OFAC_SDN_URL, loader=LoaderOFACXML, description="OFAC SDN list")
>>> pprint(OFAC_SDN.keys())
dict_keys(['meta', 'list_entries'])
>>> pprint(OFAC_SDN["meta"])
{'description': 'OFAC SDN list',
 'list_date': '2024-05-24',
 'source': 'https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/SDN.XML'}
>>> pprint(f'# list entries: {len(OFAC_SDN["list_entries"])}')
 '# list entries: 14978'
>>> pprint(f'{OFAC_SDN["list_entries"][0]}')
 ("{'id': '36', 'type': 'O', 'names': [{'whole_name': 'AEROCARIBBEAN AIRLINES', "
 "'strong': True, 'first_name': '', 'last_name': ''}, {'whole_name': "
 "'AERO-CARIBBEAN', 'strong': True, 'first_name': '', 'last_name': ''}], "
 "'addresses': [{'address': 'HAVANA CUBA', 'street': '', 'city': 'HAVANA', "
 "'country_subdivision': '', 'country_ori': 'CUBA', 'country_ISO_code': 'CU', "
 "'country_desc': 'CUBA'}], 'programs': ['CUBA']}")
>>> df = dump(OFAC_SDN, dumper=DumperPandas)
>>> pprint(df[df.type=="O"].iloc[0].T)
id                                                                                 10001
type                                                                                   O
names_whole_name                                INVERSIONES MACARNIC PATINO Y CIA S.C.S.
names_strong                                                                        True
names_first_name                                                                        
names_last_name                                                                         
addresses_address                      CALLE 19 NO. 9-50 OFC. 505 OFC. 505 PEREIRA RI...
addresses_street                                     CALLE 19 NO. 9-50 OFC. 505 OFC. 505
addresses_city                                                                   PEREIRA
addresses_country_subdivision                                                  RISARALDA
addresses_country_ori                                                           COLOMBIA
addresses_country_ISO_code                                                            CO
addresses_country_desc                                                          COLOMBIA
nationalities_country_ori                                                            NaN
nationalities_country_ISO_code                                                       NaN
nationalities_country_desc                                                           NaN
dates_of_birth_date_of_birth                                                         NaN
dates_of_birth_year                                                                  NaN
dates_of_birth_month                                                                 NaN
dates_of_birth_day                                                                   NaN
places_of_birth_place_of_birth                                                       NaN
places_of_birth_street                                                               NaN
places_of_birth_city                                                                 NaN
places_of_birth_country_subdivision                                                  NaN
places_of_birth_country_ori                                                          NaN
places_of_birth_country_ISO_code                                                     NaN
places_of_birth_country_desc                                                         NaN
identifications_type                                                               NIT #
identifications_id                                                           816005011-4
identifications_country_ori                                                     COLOMBIA
identifications_country_ISO_code                                                      CO
identifications_country_desc                                                    COLOMBIA
programs                                                                            SDNT
source                                                                          OFAC SDN
Name: 32, dtype: object
>>> OFAC_SDN_JSON = dump(OFAC_SDN, dumper=DumperJSON)
>>> pprint(OFAC_SDN_JSON[0:200])
('{"meta": {"description": "OFAC SDN list", "source": '
 '"https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/SDN.XML", '
 '"list_date": "2024-05-24"}, "list_entries": [{"id": "36", "typ')
```

## Installing inoutlists

```console
$ python -m pip install inoutlists
```

## Current loaders distributed with inoutlists

- Loader. Generic loader class. All the loader classes must inherit and implement the methods defined in this class.

- LoaderXML. Generic class for loading lists based on XML. The data parameter of the load function can be the url of the xml file, a OS path to the file or a string. Parameters:
    - description: string for informative purposes. Default: ""
    - schema: Path object to the schema. Used to validate the data. Default: OFAC_xml.xsd. The OFAC schema distributed with the package.

- LoaderOFACXML. Class for parsing lists distributed by OFAC [SDN list](https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/SDN.XML) and [OFAC Consolidated](https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/CONSOLIDATED.XML). It inherits from class LoaderXML. Parameters:
    - description: string for informative purposes. Default: ""
    - schema: Path object to the schema. Used to validate the data. Default: OFAC_xml.xsd. The OFAC schema distributed with the package.

- LoaderEUXML. Class for parsing lists distributed by EU on [EU sanctions list source](https://webgate.ec.europa.eu/fsd/fsf/public/files/xmlFullSanctionsList_1_1/content?token=dG9rZW4tMjAxNw). It inherits from class LoaderXML. Parameters:
    - description: string for informative purposes. Default: ""
    - schema: Path object to the schema. Used to validate the data. Default: EU_20171012-FULL-schema-1_1(xsd).xsd. The EU schema distributed with the package.

- LoaderUNXML. Class for parsing lists distributed by UN on [UN sanctions list source](https://scsanctions.un.org/resources/xml/en/consolidated.xml). It inherits from class LoaderXML. Parameters:
    - description: string for informative purposes. Default: ""
    - schema: Path object to the schema. Used to validate the data. Default: UN_consolidated.xsd. The UN schema distributed with the package.


## Current dumpers distributed with inoutlists

- Dumper. Generic dumper class. All the dumper classes must inherit and implement the methods defined in this class.

- DumperJSON. Dumper class for dumping the parsed list as a dictionary common interface to JSON. The parameter output of the dump method can be a string or path representing a file. If that parameter is not provided then returns a data as a string. It accepts all the keyword arguments of the functions dump and dumps of the JSON package.

- DumperPandas: Dumper class for dumping the parsed list as a dictionary common interface to a Pandas data frame. Because the dictionary common interface there one to many relations (several names, several addresses, etc) the returned data frame represents the cartesian product of those relations. 

- DumperCSV: Dumper class for dumping the parsed list as a dictionary common interface to csv. The data is dumped following the same rules of the DumperPandas class. The parameter output of the dump method can be a string or path representing a file. If that parameter is not provided then returns a the data as a string. It accepts all the keywords arguments of the method to_csv of a Pandas data frame.
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "inoutlists",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Eusebio Jos\u00e9 de la Torre Ni\u00f1o <ej.torre.nino@gmail.com>",
    "keywords": "EU, OFAC, UN, decoding, deduplication, encoding, entity-resolution, lists, mapping, normalization, parsing, record-linkage, sanctions",
    "author": null,
    "author_email": "Eusebio Jos\u00e9 de la Torre Ni\u00f1o <ej.torre.nino@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e8/7a/4d0d14378324eae9d92db6ecceed87976af18c37a9a373f5e58854c85a0c/inoutlists-1.0.1.tar.gz",
    "platform": null,
    "description": "# inoutlists\n\ninoutlists is a python package to parse and normalize different sources of lists (OFAC, EU, UN, etc) to a common dictionary interface. \n\nOnce the lists are parsed and normalized, the user can dump the information to other formats such as JSON, CSV or a Pandas data frame for further research or transfer to other systems. \n\nMoreover, the package can be extended to parse any kind of source creating specific Loaders classes or to dump the information to any kind of formats creating specific Dumpers classes. \n\n## Basic Usage\n\ninoutlists main entry points are the functions load and dump:\n\n### inoutlists.load(data, loader=Loader, *args, **kwargs)\n\nParameters:\n\n- data: The data to parse. The type of the data parameter depends on the Loader chosen. It could be a url, a file, a string, etc.\n- loader: Loader class. The loader class must inherit from the class Loader. It defines the logic of the transformation from the data to the common dictionary interface implementing the methods defined in the class Loader, specially the function load.\n- *args, **kwargs. Positional arguments and keyword arguments passed to the loader class.\n\nReturns: Dictionary. The list in the dictionary common interface.\n\n### inoutlists.dump(data, dumper=Dumper, *args, **kwargs)\n\nParameters:\n\n- data: An python dictionary based on the common interface.\n- dumper. Dumper class. The dumper class must inherit from the class Dumper. It defines the logic of the transformation from the dictionary common interface to the target format implementing the method dump.\n- *args, **kwargs. Positional arguments and keyword arguments passed to the dumper class.\n\nReturns: Any. It depends on the Dumper class.\n\n```python\n>>> from inoutlists import load, dump, LoaderOFACXML, DumperPandas, DumperJSON\n>>> from pprint import pprint \n>>> OFAC_SDN_URL = \"https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/SDN.XML\"\n>>> OFAC_SDN = load(OFAC_SDN_URL, loader=LoaderOFACXML, description=\"OFAC SDN list\")\n>>> pprint(OFAC_SDN.keys())\ndict_keys(['meta', 'list_entries'])\n>>> pprint(OFAC_SDN[\"meta\"])\n{'description': 'OFAC SDN list',\n 'list_date': '2024-05-24',\n 'source': 'https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/SDN.XML'}\n>>> pprint(f'# list entries: {len(OFAC_SDN[\"list_entries\"])}')\n '# list entries: 14978'\n>>> pprint(f'{OFAC_SDN[\"list_entries\"][0]}')\n (\"{'id': '36', 'type': 'O', 'names': [{'whole_name': 'AEROCARIBBEAN AIRLINES', \"\n \"'strong': True, 'first_name': '', 'last_name': ''}, {'whole_name': \"\n \"'AERO-CARIBBEAN', 'strong': True, 'first_name': '', 'last_name': ''}], \"\n \"'addresses': [{'address': 'HAVANA CUBA', 'street': '', 'city': 'HAVANA', \"\n \"'country_subdivision': '', 'country_ori': 'CUBA', 'country_ISO_code': 'CU', \"\n \"'country_desc': 'CUBA'}], 'programs': ['CUBA']}\")\n>>> df = dump(OFAC_SDN, dumper=DumperPandas)\n>>> pprint(df[df.type==\"O\"].iloc[0].T)\nid                                                                                 10001\ntype                                                                                   O\nnames_whole_name                                INVERSIONES MACARNIC PATINO Y CIA S.C.S.\nnames_strong                                                                        True\nnames_first_name                                                                        \nnames_last_name                                                                         \naddresses_address                      CALLE 19 NO. 9-50 OFC. 505 OFC. 505 PEREIRA RI...\naddresses_street                                     CALLE 19 NO. 9-50 OFC. 505 OFC. 505\naddresses_city                                                                   PEREIRA\naddresses_country_subdivision                                                  RISARALDA\naddresses_country_ori                                                           COLOMBIA\naddresses_country_ISO_code                                                            CO\naddresses_country_desc                                                          COLOMBIA\nnationalities_country_ori                                                            NaN\nnationalities_country_ISO_code                                                       NaN\nnationalities_country_desc                                                           NaN\ndates_of_birth_date_of_birth                                                         NaN\ndates_of_birth_year                                                                  NaN\ndates_of_birth_month                                                                 NaN\ndates_of_birth_day                                                                   NaN\nplaces_of_birth_place_of_birth                                                       NaN\nplaces_of_birth_street                                                               NaN\nplaces_of_birth_city                                                                 NaN\nplaces_of_birth_country_subdivision                                                  NaN\nplaces_of_birth_country_ori                                                          NaN\nplaces_of_birth_country_ISO_code                                                     NaN\nplaces_of_birth_country_desc                                                         NaN\nidentifications_type                                                               NIT #\nidentifications_id                                                           816005011-4\nidentifications_country_ori                                                     COLOMBIA\nidentifications_country_ISO_code                                                      CO\nidentifications_country_desc                                                    COLOMBIA\nprograms                                                                            SDNT\nsource                                                                          OFAC SDN\nName: 32, dtype: object\n>>> OFAC_SDN_JSON = dump(OFAC_SDN, dumper=DumperJSON)\n>>> pprint(OFAC_SDN_JSON[0:200])\n('{\"meta\": {\"description\": \"OFAC SDN list\", \"source\": '\n '\"https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/SDN.XML\", '\n '\"list_date\": \"2024-05-24\"}, \"list_entries\": [{\"id\": \"36\", \"typ')\n```\n\n## Installing inoutlists\n\n```console\n$ python -m pip install inoutlists\n```\n\n## Current loaders distributed with inoutlists\n\n- Loader. Generic loader class. All the loader classes must inherit and implement the methods defined in this class.\n\n- LoaderXML. Generic class for loading lists based on XML. The data parameter of the load function can be the url of the xml file, a OS path to the file or a string. Parameters:\n    - description: string for informative purposes. Default: \"\"\n    - schema: Path object to the schema. Used to validate the data. Default: OFAC_xml.xsd. The OFAC schema distributed with the package.\n\n- LoaderOFACXML. Class for parsing lists distributed by OFAC [SDN list](https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/SDN.XML) and [OFAC Consolidated](https://sanctionslistservice.ofac.treas.gov/api/PublicationPreview/exports/CONSOLIDATED.XML). It inherits from class LoaderXML. Parameters:\n    - description: string for informative purposes. Default: \"\"\n    - schema: Path object to the schema. Used to validate the data. Default: OFAC_xml.xsd. The OFAC schema distributed with the package.\n\n- LoaderEUXML. Class for parsing lists distributed by EU on [EU sanctions list source](https://webgate.ec.europa.eu/fsd/fsf/public/files/xmlFullSanctionsList_1_1/content?token=dG9rZW4tMjAxNw). It inherits from class LoaderXML. Parameters:\n    - description: string for informative purposes. Default: \"\"\n    - schema: Path object to the schema. Used to validate the data. Default: EU_20171012-FULL-schema-1_1(xsd).xsd. The EU schema distributed with the package.\n\n- LoaderUNXML. Class for parsing lists distributed by UN on [UN sanctions list source](https://scsanctions.un.org/resources/xml/en/consolidated.xml). It inherits from class LoaderXML. Parameters:\n    - description: string for informative purposes. Default: \"\"\n    - schema: Path object to the schema. Used to validate the data. Default: UN_consolidated.xsd. The UN schema distributed with the package.\n\n\n## Current dumpers distributed with inoutlists\n\n- Dumper. Generic dumper class. All the dumper classes must inherit and implement the methods defined in this class.\n\n- DumperJSON. Dumper class for dumping the parsed list as a dictionary common interface to JSON. The parameter output of the dump method can be a string or path representing a file. If that parameter is not provided then returns a data as a string. It accepts all the keyword arguments of the functions dump and dumps of the JSON package.\n\n- DumperPandas: Dumper class for dumping the parsed list as a dictionary common interface to a Pandas data frame. Because the dictionary common interface there one to many relations (several names, several addresses, etc) the returned data frame represents the cartesian product of those relations. \n\n- DumperCSV: Dumper class for dumping the parsed list as a dictionary common interface to csv. The data is dumped following the same rules of the DumperPandas class. The parameter output of the dump method can be a string or path representing a file. If that parameter is not provided then returns a the data as a string. It accepts all the keywords arguments of the method to_csv of a Pandas data frame.",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 Eusebio Jos\u00e9 de la Torre Ni\u00f1o  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "inoutlists is a python package to parse and normalize different sources of lists (OFAC, EU, UN, etc) to a common dictionary interface.",
    "version": "1.0.1",
    "project_urls": {
        "Documentation": "https://ejtorre.github.io/inoutlists/",
        "Homepage": "https://github.com/ejtorre/inoutlists",
        "Repository": "https://github.com/ejtorre/inoutlists",
        "Source": "https://github.com/ejtorre/inoutlists"
    },
    "split_keywords": [
        "eu",
        " ofac",
        " un",
        " decoding",
        " deduplication",
        " encoding",
        " entity-resolution",
        " lists",
        " mapping",
        " normalization",
        " parsing",
        " record-linkage",
        " sanctions"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0b591ac69bade8cf3cb97f931955c8901d08b1b2e262daf48c360b5d699754e7",
                "md5": "0b4f9a7149d48ee4c6b849dd46b64b6a",
                "sha256": "5144869c16f780e88ec7fe1baf8bd0f07e45e622aea266879ac32cd1d523ac32"
            },
            "downloads": -1,
            "filename": "inoutlists-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0b4f9a7149d48ee4c6b849dd46b64b6a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 24125,
            "upload_time": "2024-06-03T22:18:57",
            "upload_time_iso_8601": "2024-06-03T22:18:57.815219Z",
            "url": "https://files.pythonhosted.org/packages/0b/59/1ac69bade8cf3cb97f931955c8901d08b1b2e262daf48c360b5d699754e7/inoutlists-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e87a4d0d14378324eae9d92db6ecceed87976af18c37a9a373f5e58854c85a0c",
                "md5": "c4c6cf22bea91edcb2a880be63477e77",
                "sha256": "55d758a517209ddd71e6d72736d5ea102a890e5f5a5f6d6d3f5f5aa800d8458a"
            },
            "downloads": -1,
            "filename": "inoutlists-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c4c6cf22bea91edcb2a880be63477e77",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 3419873,
            "upload_time": "2024-06-03T22:18:59",
            "upload_time_iso_8601": "2024-06-03T22:18:59.674357Z",
            "url": "https://files.pythonhosted.org/packages/e8/7a/4d0d14378324eae9d92db6ecceed87976af18c37a9a373f5e58854c85a0c/inoutlists-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-03 22:18:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ejtorre",
    "github_project": "inoutlists",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "inoutlists"
}
        
Elapsed time: 0.69690s