ndi-formatter


Namendi-formatter JSON
Version 1.1.0 PyPI version JSON
download
home_page
SummaryFormat data for National Death Index (NDI) requests.
upload_time2023-03-24 22:48:08
maintainer
docs_urlNone
author
requires_python>=3.6
license
keywords nlp information extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Simple module to convert data table (CSV/SAS7BDAT/JSON) into National Death Index (NDI) format released under the MIT license.

## About ##
The formatting and validation convert supported data files into acceptable NDI datasets for submission. The validation is not intended to support an arbitrary NDI file, but one which has been generated by the included formatter.

### Disclaimer ###
No guarantee of any kind is made that this code produces the desired output. Please inspect your own data to ensure that it is correct, and contribute to improve the current formatter/validator.

### References ###
* National Center for Health Statistics. National Death Index user’s guide. Hyattsville, MD. 2013.
* Above cited is available at http://www.cdc.gov/nchs/data/ndi/NDI_Users_Guide.pdf

## Requirements ##
* Python 3.3+
* Optional packages:
    * dateutil: enables inference of date (for birthdate)
    * sas7bdat: enables parsing of sas7bdat files
    
## Prerequisites ##
1. A supported data file with information that needs to be converted to NDI format.
2. Each subject/record must have either...
    *  FIRST and LAST NAME and SOCIAL SECURITY NUMBER
    *  FIRST and LAST NAME and MONTH and YEAR OF BIRTH
    *  SOCIAL SECURITY NUMBER and full DATE OF BIRTH and SEX
4. Install Python 3.6+ 
5. (Optional) Install optional packages:
    * Install sas7bdat by running `pip install sas7bdat`
    * Install dateutil by running `pip install dateutil`
    * For issues with proxy, try the answers to this SO question: http://stackoverflow.com/questions/14149422/

## Doco ##

### Installation ###
Either with pip:

    pip install ndi_formatter
     
Or download the repository:
    
    git clone git@bitbucket.org:dcronkite/ndi_formatter.git
    cd ndi_formatter
    python setup.py install

### Basics ###
The best way to get started is to figure out which options you need to pass.

    # create a sample configuration file
    ndi-formatter --create-sample >> sample.config
    
    # see all arguments
    ndi-formatter --help
    
    # run with a config file
    ndi-formatter "@configfile.conf"
    
### Program Options ###

Once the sample config has been created, you can customize the parameters. 
The following should be helpful in more explicitly documenting the parameters.
Most of these options are mapping a variable/column name in a CSV, SAS, etc. dataset
to the type of data which that variable/column contains. 

```
  -i INPUT_FILE, --input-file INPUT_FILE
                        Input file path.
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        NDI-formatted output file.
  -f {sas,csv,json}, --input-format {sas,csv,json}
                        Input file format.
  -L LOG_FILE, --log-file LOG_FILE
                        Logfile name.
  --fname FNAME         Name/index of column with first name
  --lname LNAME         Name/index of column with last name
  --mname MNAME         Name/index of column with middle name/initial
  --sname SNAME         Name/index of column with father name
  --name NAME           Name/index of column with full name
  --ssn SSN             Name/index of column with ssn; accepts multiple
                        columns
  --birth-day BIRTH_DAY
                        Name/index of column with birth day
  --birth-month BIRTH_MONTH
                        Name/index of column with birth month
  --birth-year BIRTH_YEAR
                        Name/index of column with birth year
  --birthdate BIRTHDATE
                        Name/index of column with birthdate
  --sex SEX             Name/index of column with sex; accepts multiple
                        columns
  --death-age DEATH_AGE
                        Name/index of column with age at death (in years)
  --race RACE           Name/index of column with race; accepts multiple
                        columns
  --marital-status MARITAL_STATUS
                        Name/index of column with marital status; accepts
                        multiple columns
  --state-of-residence STATE_OF_RESIDENCE
                        Name/index of column with state of residence; accepts
                        multiple columns
  --state-of-birth STATE_OF_BIRTH
                        Name/index of column with state of birth; accepts
                        multiple columns
  --id ID               Name/index of column with id number
  --race-mapping OA/PI WH BA NA/IN CH JP HI Onon-WH FL
                        Mapping of variable to NDI race in following order:
                        Other Asian/Pacific Islander, White, Black, Native American,
                        Chinese, Japanese, Hawaiian, Other nonwhite, Filipino;
                        everything else will be treated as unknown; use an "X"
                        instead of a value to skip a race
  --marital-status-mapping Single Married Widowed Divorced
                        Mapping of variable to ND marital status in following
                        order: Never married/single, Married, Widowed,
                        Divorced; everything else will be treated as unknown;
                        use an "X" instead of a value to skip a status
  --same-state-of-residence-for-all SAME_STATE_OF_RESIDENCE_FOR_ALL
                        State abbreviation/number for all subjects
  --same-state-of-birth-for-all SAME_STATE_OF_BIRTH_FOR_ALL
                        State abbreviation/number for all subjects
  --age-at-death-units-for-all {MONTH,WEEK,DAY,HOUR,MINUTE}
                        Specify units for age of death it not years.
  --name-format NAME_FORMAT
                        Format to parse full names. L=Last name, F=first name,
                        M=Middle name, S=father name, X=ignore; algorithm will
                        continue to add any character found to the name until
                        the next non-[LFMSX] character is found
  --date-format DATE_FORMAT
                        Date format for parsing year/month/day from a date;
                        for more documentation, see https://docs.python.org/de
                        v/library/datetime.html#strftime-and-strptime-behavior
  --sex-format SEX_FORMAT
                        Specify the values for male/female if different than
                        NDI using "MALE,FEMALE"; NDI default is "M,F" or "1,2"
                        or "M1,F2"
  --validate-generated-file [VALIDATE_GENERATED_FILE]
                        Validate NDI file and output results to specified
                        file.
  --strip-lname-suffix [STRIP_LNAME_SUFFIX]
                        Look for suffixes in lname column and strip them out;
                        default: JR, SR, II, III, IV; if specifying an
                        argument, use a comma-separated list as a single
                        string
  --strip-lname-suffix-attached [STRIP_LNAME_SUFFIX_ATTACHED]
                        Look for suffixes in last word of lname column and
                        strip them out even if they are attached to the word
                        itself; default: JR, SR, II, III, IV; if specifying an
                        argument, use a comma-separated list as a single
                        string

optional arguments:
  --duplicate-records-on-lname
                        If space or hyphen in last name, duplicate the subject
                        into three records: 1) both together; 2) only the
                        first part; 3) only the second part
  --female-hyphen-lname-to-sname
                        If hyphen in last name of female, duplicate the
                        subject into two records: 1) both together; 2) only
                        the first part with the second part in the father last
                        name field
  --duplicate-records-on-year-only
                        Create 12 duplicate records if only a year and no
                        month
  --ignore-invalid-records
                        Ignore records which invalid per NDI requirements due
                        to insufficient information
  --include-invalid-records
                        Include records which invalid per NDI requirements due
                        to insufficient information
  --case-sensitive-columns
                        All columns will be treated as case-sensitive.
```

### Advanced ###
#### Multiple Columns ####
You can output multiple columns on most options (not names or birthdate due to complexities with how they are handled, and not id because that wouldn't make any sense) by inserting a comma-separated set of values to arguments.
 
If the columns have the same input, only one output will be produced. If the columns have different values, then multiple records will be output.

    # option to look at two columns for state of residence
    # if PRIMARY_STATE == SECONDARY_STATE, only one record will be output
    --state-of-residence=PRIMARY_STATE,SECONDARY_STATE

    
### Validation ###
Validation is done during formatting to ensure that patients are eligible to be submitted to NDI (unless suppressed by `--ignore-invalid-record` option).

Additional validation is available by including the [recommended] `--validate-generated-file VALIDATION_ERROR_FILE` option and to optionally supply a file. This will launch the validator on the NDI file generated by formatter.

Validation comes in two forms:
1. Is the data formatted correctly? (Done by validator)
2. Is the record eligibile for NDI review? (Done by both formatter and validator)

# License
MIT licensed: https://kpwhri.mit-license.org/

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "ndi-formatter",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "nlp,information extraction",
    "author": "",
    "author_email": "dcronkite <dcronkite+pypi@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d1/f9/70955daf62200a530f1ac15e4857b61cd67ae64df6c58ebcfc4812dd3739/ndi_formatter-1.1.0.tar.gz",
    "platform": null,
    "description": "Simple module to convert data table (CSV/SAS7BDAT/JSON) into National Death Index (NDI) format released under the MIT license.\n\n## About ##\nThe formatting and validation convert supported data files into acceptable NDI datasets for submission. The validation is not intended to support an arbitrary NDI file, but one which has been generated by the included formatter.\n\n### Disclaimer ###\nNo guarantee of any kind is made that this code produces the desired output. Please inspect your own data to ensure that it is correct, and contribute to improve the current formatter/validator.\n\n### References ###\n* National Center for Health Statistics. National Death Index user\u2019s guide. Hyattsville, MD. 2013.\n* Above cited is available at http://www.cdc.gov/nchs/data/ndi/NDI_Users_Guide.pdf\n\n## Requirements ##\n* Python 3.3+\n* Optional packages:\n    * dateutil: enables inference of date (for birthdate)\n    * sas7bdat: enables parsing of sas7bdat files\n    \n## Prerequisites ##\n1. A supported data file with information that needs to be converted to NDI format.\n2. Each subject/record must have either...\n    *  FIRST and LAST NAME and SOCIAL SECURITY NUMBER\n    *  FIRST and LAST NAME and MONTH and YEAR OF BIRTH\n    *  SOCIAL SECURITY NUMBER and full DATE OF BIRTH and SEX\n4. Install Python 3.6+ \n5. (Optional) Install optional packages:\n    * Install sas7bdat by running `pip install sas7bdat`\n    * Install dateutil by running `pip install dateutil`\n    * For issues with proxy, try the answers to this SO question: http://stackoverflow.com/questions/14149422/\n\n## Doco ##\n\n### Installation ###\nEither with pip:\n\n    pip install ndi_formatter\n     \nOr download the repository:\n    \n    git clone git@bitbucket.org:dcronkite/ndi_formatter.git\n    cd ndi_formatter\n    python setup.py install\n\n### Basics ###\nThe best way to get started is to figure out which options you need to pass.\n\n    # create a sample configuration file\n    ndi-formatter --create-sample >> sample.config\n    \n    # see all arguments\n    ndi-formatter --help\n    \n    # run with a config file\n    ndi-formatter \"@configfile.conf\"\n    \n### Program Options ###\n\nOnce the sample config has been created, you can customize the parameters. \nThe following should be helpful in more explicitly documenting the parameters.\nMost of these options are mapping a variable/column name in a CSV, SAS, etc. dataset\nto the type of data which that variable/column contains. \n\n```\n  -i INPUT_FILE, --input-file INPUT_FILE\n                        Input file path.\n  -o OUTPUT_FILE, --output-file OUTPUT_FILE\n                        NDI-formatted output file.\n  -f {sas,csv,json}, --input-format {sas,csv,json}\n                        Input file format.\n  -L LOG_FILE, --log-file LOG_FILE\n                        Logfile name.\n  --fname FNAME         Name/index of column with first name\n  --lname LNAME         Name/index of column with last name\n  --mname MNAME         Name/index of column with middle name/initial\n  --sname SNAME         Name/index of column with father name\n  --name NAME           Name/index of column with full name\n  --ssn SSN             Name/index of column with ssn; accepts multiple\n                        columns\n  --birth-day BIRTH_DAY\n                        Name/index of column with birth day\n  --birth-month BIRTH_MONTH\n                        Name/index of column with birth month\n  --birth-year BIRTH_YEAR\n                        Name/index of column with birth year\n  --birthdate BIRTHDATE\n                        Name/index of column with birthdate\n  --sex SEX             Name/index of column with sex; accepts multiple\n                        columns\n  --death-age DEATH_AGE\n                        Name/index of column with age at death (in years)\n  --race RACE           Name/index of column with race; accepts multiple\n                        columns\n  --marital-status MARITAL_STATUS\n                        Name/index of column with marital status; accepts\n                        multiple columns\n  --state-of-residence STATE_OF_RESIDENCE\n                        Name/index of column with state of residence; accepts\n                        multiple columns\n  --state-of-birth STATE_OF_BIRTH\n                        Name/index of column with state of birth; accepts\n                        multiple columns\n  --id ID               Name/index of column with id number\n  --race-mapping OA/PI WH BA NA/IN CH JP HI Onon-WH FL\n                        Mapping of variable to NDI race in following order:\n                        Other Asian/Pacific Islander, White, Black, Native American,\n                        Chinese, Japanese, Hawaiian, Other nonwhite, Filipino;\n                        everything else will be treated as unknown; use an \"X\"\n                        instead of a value to skip a race\n  --marital-status-mapping Single Married Widowed Divorced\n                        Mapping of variable to ND marital status in following\n                        order: Never married/single, Married, Widowed,\n                        Divorced; everything else will be treated as unknown;\n                        use an \"X\" instead of a value to skip a status\n  --same-state-of-residence-for-all SAME_STATE_OF_RESIDENCE_FOR_ALL\n                        State abbreviation/number for all subjects\n  --same-state-of-birth-for-all SAME_STATE_OF_BIRTH_FOR_ALL\n                        State abbreviation/number for all subjects\n  --age-at-death-units-for-all {MONTH,WEEK,DAY,HOUR,MINUTE}\n                        Specify units for age of death it not years.\n  --name-format NAME_FORMAT\n                        Format to parse full names. L=Last name, F=first name,\n                        M=Middle name, S=father name, X=ignore; algorithm will\n                        continue to add any character found to the name until\n                        the next non-[LFMSX] character is found\n  --date-format DATE_FORMAT\n                        Date format for parsing year/month/day from a date;\n                        for more documentation, see https://docs.python.org/de\n                        v/library/datetime.html#strftime-and-strptime-behavior\n  --sex-format SEX_FORMAT\n                        Specify the values for male/female if different than\n                        NDI using \"MALE,FEMALE\"; NDI default is \"M,F\" or \"1,2\"\n                        or \"M1,F2\"\n  --validate-generated-file [VALIDATE_GENERATED_FILE]\n                        Validate NDI file and output results to specified\n                        file.\n  --strip-lname-suffix [STRIP_LNAME_SUFFIX]\n                        Look for suffixes in lname column and strip them out;\n                        default: JR, SR, II, III, IV; if specifying an\n                        argument, use a comma-separated list as a single\n                        string\n  --strip-lname-suffix-attached [STRIP_LNAME_SUFFIX_ATTACHED]\n                        Look for suffixes in last word of lname column and\n                        strip them out even if they are attached to the word\n                        itself; default: JR, SR, II, III, IV; if specifying an\n                        argument, use a comma-separated list as a single\n                        string\n\noptional arguments:\n  --duplicate-records-on-lname\n                        If space or hyphen in last name, duplicate the subject\n                        into three records: 1) both together; 2) only the\n                        first part; 3) only the second part\n  --female-hyphen-lname-to-sname\n                        If hyphen in last name of female, duplicate the\n                        subject into two records: 1) both together; 2) only\n                        the first part with the second part in the father last\n                        name field\n  --duplicate-records-on-year-only\n                        Create 12 duplicate records if only a year and no\n                        month\n  --ignore-invalid-records\n                        Ignore records which invalid per NDI requirements due\n                        to insufficient information\n  --include-invalid-records\n                        Include records which invalid per NDI requirements due\n                        to insufficient information\n  --case-sensitive-columns\n                        All columns will be treated as case-sensitive.\n```\n\n### Advanced ###\n#### Multiple Columns ####\nYou can output multiple columns on most options (not names or birthdate due to complexities with how they are handled, and not id because that wouldn't make any sense) by inserting a comma-separated set of values to arguments.\n \nIf the columns have the same input, only one output will be produced. If the columns have different values, then multiple records will be output.\n\n    # option to look at two columns for state of residence\n    # if PRIMARY_STATE == SECONDARY_STATE, only one record will be output\n    --state-of-residence=PRIMARY_STATE,SECONDARY_STATE\n\n    \n### Validation ###\nValidation is done during formatting to ensure that patients are eligible to be submitted to NDI (unless suppressed by `--ignore-invalid-record` option).\n\nAdditional validation is available by including the [recommended] `--validate-generated-file VALIDATION_ERROR_FILE` option and to optionally supply a file. This will launch the validator on the NDI file generated by formatter.\n\nValidation comes in two forms:\n1. Is the data formatted correctly? (Done by validator)\n2. Is the record eligibile for NDI review? (Done by both formatter and validator)\n\n# License\nMIT licensed: https://kpwhri.mit-license.org/\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Format data for National Death Index (NDI) requests.",
    "version": "1.1.0",
    "split_keywords": [
        "nlp",
        "information extraction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cfff987ae0f032de611056202c06b7e1ccba6e121f3ec90b63356c48dcba3f97",
                "md5": "cac85134fcfdba1740226132ca78b5a0",
                "sha256": "444f5d49e3df23c666969d2145ca5bce8c3ca41e7bd56e6c0519e9c5efb84a33"
            },
            "downloads": -1,
            "filename": "ndi_formatter-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cac85134fcfdba1740226132ca78b5a0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 18279,
            "upload_time": "2023-03-24T22:48:07",
            "upload_time_iso_8601": "2023-03-24T22:48:07.539318Z",
            "url": "https://files.pythonhosted.org/packages/cf/ff/987ae0f032de611056202c06b7e1ccba6e121f3ec90b63356c48dcba3f97/ndi_formatter-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d1f970955daf62200a530f1ac15e4857b61cd67ae64df6c58ebcfc4812dd3739",
                "md5": "4bd001800711c02d6100f39daac03509",
                "sha256": "df8b1a5837f4823430095d895af889345875a5a3c51315fc0324398d05e4b6e3"
            },
            "downloads": -1,
            "filename": "ndi_formatter-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4bd001800711c02d6100f39daac03509",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 20443,
            "upload_time": "2023-03-24T22:48:08",
            "upload_time_iso_8601": "2023-03-24T22:48:08.594981Z",
            "url": "https://files.pythonhosted.org/packages/d1/f9/70955daf62200a530f1ac15e4857b61cd67ae64df6c58ebcfc4812dd3739/ndi_formatter-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-24 22:48:08",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "ndi-formatter"
}
        
Elapsed time: 0.90450s