orgpedia-cabsec


Nameorgpedia-cabsec JSON
Version 0.0.12 PyPI version JSON
download
home_pageNone
SummaryDatapackage containing orders of Cabinet Secretariat from https://cabsec.gov.in/.
upload_time2024-10-08 02:35:04
maintainerNone
docs_urlNone
authorOrgpedia Foundation
requires_python<4.0,>3.8.1
licenseMIT
keywords information extraction data package government data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Data-package: orgpedia_cabsec

Posting data of Ministers of India. The data is obtained by processing posting orders from [Cabinet Secretariat's website](https://cabsec.gov.in/).

To get a quick peek check out the [tenures-sample.csv](flow/buildTenure_/output/tenures-sample.csv) it contains a snapshot of the tenure information of Cabinet Secretariat officers.

The tenures information is built by processing orders found on the Cabinet Secretariat's webpage ([import/documents](import/documents)). The orders are processed to build higher level concepts of Tenure and an Org chart. To undersand the processing logic please check out the [Data Processing](#Data_Processiong) section.

## Accessing the data

All the data is available in the [flow/buildTenure_/output](flow/buildTenure_/output) folder and it contains the following files

1. [tenures.json](flow/buildTenure_/output/tenures.json), [tenures.csv](flow/buildTenure_/output/tenures.csv): Tenure information in json and csv format

2. [orders.json](flow/buildTenure_/output/orders.json): Order information in json format.

3. [officer_infos.json](flow/buildTenure_/output/officer_infos.json): Officer ID to name mapping and additional information if available.

4. [post_infos.json](flow/buildTenure_/output/officer_infos.json): Contains hierarchis of different components making up the post `dept`, `role`, `juri`, `loca` and `stat`, which map to Department, Rank, Jurisdiction, Location and Status.

5. [orders/*.order.json](flow/buildTenure_/output/orders/*.order.json): Individual orders in json format.

6. [schema/*.schema.json](flow/buildTenure_/output/schema/*.schema.json): Schema information for all these json files can be found in the [data/schema](flow/buildTenure_/output/schema) directory. check out the [README.md](flow/buildTenure_/output/schema) for an introduction.

You can also Install the orgpedia_cabsec package, the package contains all the data created by this repository.

```
python -m pip install orgpedia_cabsec

```

Once you install the package, all the data is available in `data.zip`. Use this command to get the path of the `data.zip` installed on your computer.

```

python -c "import pkg_resources;pkg_resources.resource_filename('orgpedia_cabsec', 'data.zip')"

<path/to/data.zip>
```

## Data Stats

These are high level statistics, please check [flow](flow) directory for more information.

 - Number of Documents: 904
 - Documents Processed: 817
 - Number of Pages: 2,145

 - Total Edits: 3,885
 - Edits per Page:  *1.8112* (3,885/2,145)

## Data Processing
This is a data package repository - it contains documents, configuration and code for processing the documents and creating data. In a sense it is different from code repositories that only contain code and not the artifacts the code generates.

The data processing is broken down in series of Tasks, where each task processes the data created in the upstream task (links in the `input` folder) and generages new data stored in the `output` folder. The directory layout of this repository follows the ideas mentioned in this video: [Principled Data Processing by Patrick Ball](https://www.youtube.com/watch?v=ZSunU9GQdcI). There are 3 main top-level directories `import`, `flow` and `export`. A *simple* `makefile`  orchestrates the document flow across these folders, run `make help` to find out more about the commands.

You can check out the template repository [template.datapackage](https://github.com/orgpedia/template.datapackage) where each directory and sub-directory is explained. To understand how the data (`/flow/buildTenure_/output`) is generated from documents (`/import/documents/`) explore the [flow](flow) directory.

## Deverloper Notes

If you want to make changes and regenerate data you have two choices

1. Use GitHub codespaces (WIP).
2. Build locally, for this you will need at least 20 GB of space, as we store documents, intermediate data and final data locally. To minimize the space requirement it is recommended that you work only on the buildOrder/* and downstream tasks.


## Local Development
### Prerequisites
- Git with Git LFS
- Python 3.7+
- Poetry
- make


### Installation

#### Git & Git LFS
To install Git, visit the [Git website](https://git-scm.com/) and follow the installation instructions for your operating system. For make sure Git-LFS stays enabled (default option). For othe platforms follow these [instructions](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) on Github.

#### Python
To install Python, visit the [Python website](https://www.python.org/downloads/) and download the  version of Python 3.x for your operating system. Follow the installation instructions for your operating system.

#### Poetry
To install Poetry, visit the [Poetry website](https://python-poetry.org/docs/#installation) and follow installation instructions for your operating system:

#### Make
On Unix based `make` should come pre-installed, on Windows use `winget` to install `make`, follow instructions [here](https://winget.run/pkg/GnuWin32/Make).



### Setup
Orgpedia repository makes heavy use of soft-links, soft-links are stored in the GitHub repository. On non-windows platforms this is not a problem for Windows you need to do two things 1) enable soft-links and 2) tell git about it.

#### Symlinks Setup On Windows
On Windows 11, make sure you have enabled deverloper mode this will automatically enable soft-links on your machine. On windows 10 soft-links were added in Build 14972 and only works on Administrator cmd prompt. More info at this [link](https://blogs.windows.com/windowsdeveloper/2016/12/02/symlinks-windows-10/).

Next you need to tell git it should create soft-links when it sees them in the respository, check the Stack Overflow [answer](https://stackoverflow.com/questions/5917249/git-symbolic-links-in-windows/59761201#59761201) to know more about this. Execute the following command.

```
git config --global core.symlinks true
```


To setup the project, clone the repository using git (this is a large repository, will take several minutes):

```
git clone https://github.com/orgpedia/cabsec.git
```

Navigate to the project directory:

```
cd cabsec
```
Use poetry to install software dependencies(one time only):
```
make install
```

Import models and other data-packages required for the document flow (one time only), these will be downloaded in the `import` folders and it takes a long time.
```
make import
```
### Generate Data

After this you should have all the files needed to generate the data. Make whatever changes you need to make and then execute

```
make flow
```
This will generate the data based on your changes. Currently, make does not track dependencies as a result the entire document flow is re-executed !!

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "orgpedia-cabsec",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>3.8.1",
    "maintainer_email": null,
    "keywords": "information extraction, data package, government data",
    "author": "Orgpedia Foundation",
    "author_email": "orgpedia.foundation@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a2/6e/67d20d2ee87cda0b902b6640843c57c34ad12c943fb72ae39ebd8f55c5b6/orgpedia_cabsec-0.0.12.tar.gz",
    "platform": null,
    "description": "# Data-package: orgpedia_cabsec\n\nPosting data of Ministers of India. The data is obtained by processing posting orders from [Cabinet Secretariat's website](https://cabsec.gov.in/).\n\nTo get a quick peek check out the [tenures-sample.csv](flow/buildTenure_/output/tenures-sample.csv) it contains a snapshot of the tenure information of Cabinet Secretariat officers.\n\nThe tenures information is built by processing orders found on the Cabinet Secretariat's webpage ([import/documents](import/documents)). The orders are processed to build higher level concepts of Tenure and an Org chart. To undersand the processing logic please check out the [Data Processing](#Data_Processiong) section.\n\n## Accessing the data\n\nAll the data is available in the [flow/buildTenure_/output](flow/buildTenure_/output) folder and it contains the following files\n\n1. [tenures.json](flow/buildTenure_/output/tenures.json), [tenures.csv](flow/buildTenure_/output/tenures.csv): Tenure information in json and csv format\n\n2. [orders.json](flow/buildTenure_/output/orders.json): Order information in json format.\n\n3. [officer_infos.json](flow/buildTenure_/output/officer_infos.json): Officer ID to name mapping and additional information if available.\n\n4. [post_infos.json](flow/buildTenure_/output/officer_infos.json): Contains hierarchis of different components making up the post `dept`, `role`, `juri`, `loca` and `stat`, which map to Department, Rank, Jurisdiction, Location and Status.\n\n5. [orders/*.order.json](flow/buildTenure_/output/orders/*.order.json): Individual orders in json format.\n\n6. [schema/*.schema.json](flow/buildTenure_/output/schema/*.schema.json): Schema information for all these json files can be found in the [data/schema](flow/buildTenure_/output/schema) directory. check out the [README.md](flow/buildTenure_/output/schema) for an introduction.\n\nYou can also Install the orgpedia_cabsec package, the package contains all the data created by this repository.\n\n```\npython -m pip install orgpedia_cabsec\n\n```\n\nOnce you install the package, all the data is available in `data.zip`. Use this command to get the path of the `data.zip` installed on your computer.\n\n```\n\npython -c \"import pkg_resources;pkg_resources.resource_filename('orgpedia_cabsec', 'data.zip')\"\n\n<path/to/data.zip>\n```\n\n## Data Stats\n\nThese are high level statistics, please check [flow](flow) directory for more information.\n\n - Number of Documents: 904\n - Documents Processed: 817\n - Number of Pages: 2,145\n\n - Total Edits: 3,885\n - Edits per Page:  *1.8112* (3,885/2,145)\n\n## Data Processing\nThis is a data package repository - it contains documents, configuration and code for processing the documents and creating data. In a sense it is different from code repositories that only contain code and not the artifacts the code generates.\n\nThe data processing is broken down in series of Tasks, where each task processes the data created in the upstream task (links in the `input` folder) and generages new data stored in the `output` folder. The directory layout of this repository follows the ideas mentioned in this video: [Principled Data Processing by Patrick Ball](https://www.youtube.com/watch?v=ZSunU9GQdcI). There are 3 main top-level directories `import`, `flow` and `export`. A *simple* `makefile`  orchestrates the document flow across these folders, run `make help` to find out more about the commands.\n\nYou can check out the template repository [template.datapackage](https://github.com/orgpedia/template.datapackage) where each directory and sub-directory is explained. To understand how the data (`/flow/buildTenure_/output`) is generated from documents (`/import/documents/`) explore the [flow](flow) directory.\n\n## Deverloper Notes\n\nIf you want to make changes and regenerate data you have two choices\n\n1. Use GitHub codespaces (WIP).\n2. Build locally, for this you will need at least 20 GB of space, as we store documents, intermediate data and final data locally. To minimize the space requirement it is recommended that you work only on the buildOrder/* and downstream tasks.\n\n\n## Local Development\n### Prerequisites\n- Git with Git LFS\n- Python 3.7+\n- Poetry\n- make\n\n\n### Installation\n\n#### Git & Git LFS\nTo install Git, visit the [Git website](https://git-scm.com/) and follow the installation instructions for your operating system. For make sure Git-LFS stays enabled (default option). For othe platforms follow these [instructions](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) on Github.\n\n#### Python\nTo install Python, visit the [Python website](https://www.python.org/downloads/) and download the  version of Python 3.x for your operating system. Follow the installation instructions for your operating system.\n\n#### Poetry\nTo install Poetry, visit the [Poetry website](https://python-poetry.org/docs/#installation) and follow installation instructions for your operating system:\n\n#### Make\nOn Unix based `make` should come pre-installed, on Windows use `winget` to install `make`, follow instructions [here](https://winget.run/pkg/GnuWin32/Make).\n\n\n\n### Setup\nOrgpedia repository makes heavy use of soft-links, soft-links are stored in the GitHub repository. On non-windows platforms this is not a problem for Windows you need to do two things 1) enable soft-links and 2) tell git about it.\n\n#### Symlinks Setup On Windows\nOn Windows 11, make sure you have enabled deverloper mode this will automatically enable soft-links on your machine. On windows 10 soft-links were added in Build 14972 and only works on Administrator cmd prompt. More info at this [link](https://blogs.windows.com/windowsdeveloper/2016/12/02/symlinks-windows-10/).\n\nNext you need to tell git it should create soft-links when it sees them in the respository, check the Stack Overflow [answer](https://stackoverflow.com/questions/5917249/git-symbolic-links-in-windows/59761201#59761201) to know more about this. Execute the following command.\n\n```\ngit config --global core.symlinks true\n```\n\n\nTo setup the project, clone the repository using git (this is a large repository, will take several minutes):\n\n```\ngit clone https://github.com/orgpedia/cabsec.git\n```\n\nNavigate to the project directory:\n\n```\ncd cabsec\n```\nUse poetry to install software dependencies(one time only):\n```\nmake install\n```\n\nImport models and other data-packages required for the document flow (one time only), these will be downloaded in the `import` folders and it takes a long time.\n```\nmake import\n```\n### Generate Data\n\nAfter this you should have all the files needed to generate the data. Make whatever changes you need to make and then execute\n\n```\nmake flow\n```\nThis will generate the data based on your changes. Currently, make does not track dependencies as a result the entire document flow is re-executed !!\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Datapackage containing orders of Cabinet Secretariat from https://cabsec.gov.in/.",
    "version": "0.0.12",
    "project_urls": {
        "data issues": "https://github.com/orgpedia/cabsec/discussions",
        "homepage": "https://www.orgpedia.in/",
        "repository": "https://github.com/orgpedia/cabsec"
    },
    "split_keywords": [
        "information extraction",
        " data package",
        " government data"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2f1ef9adc675d9bde29f274191c4d003fda2085dcb81ae54744babd0db6325be",
                "md5": "8b749c21aea4d0d5b32133426b7d740d",
                "sha256": "5bc1d8948def0eb211201bbd1652a04c48f2c485b3a6d4accbf78ea0df431b8c"
            },
            "downloads": -1,
            "filename": "orgpedia_cabsec-0.0.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8b749c21aea4d0d5b32133426b7d740d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>3.8.1",
            "size": 21744444,
            "upload_time": "2024-10-08T02:33:31",
            "upload_time_iso_8601": "2024-10-08T02:33:31.852797Z",
            "url": "https://files.pythonhosted.org/packages/2f/1e/f9adc675d9bde29f274191c4d003fda2085dcb81ae54744babd0db6325be/orgpedia_cabsec-0.0.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a26e67d20d2ee87cda0b902b6640843c57c34ad12c943fb72ae39ebd8f55c5b6",
                "md5": "fa85980d07c7a5fdb4338ab52736c9b2",
                "sha256": "4c7d927690fc8c7cab279cb3a9157950dbb2cd5823c358660dda3a0b2df20dd7"
            },
            "downloads": -1,
            "filename": "orgpedia_cabsec-0.0.12.tar.gz",
            "has_sig": false,
            "md5_digest": "fa85980d07c7a5fdb4338ab52736c9b2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>3.8.1",
            "size": 21747843,
            "upload_time": "2024-10-08T02:35:04",
            "upload_time_iso_8601": "2024-10-08T02:35:04.866953Z",
            "url": "https://files.pythonhosted.org/packages/a2/6e/67d20d2ee87cda0b902b6640843c57c34ad12c943fb72ae39ebd8f55c5b6/orgpedia_cabsec-0.0.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-08 02:35:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "orgpedia",
    "github_project": "cabsec",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "orgpedia-cabsec"
}
        
Elapsed time: 0.56319s