Name | alacorder JSON |
Version |
78.4.2
JSON |
| download |
home_page | |
Summary | Alacorder collects and processes case detail PDFs into data tables suitable for research purposes. Alacorder also generates compressed text archives from the source PDFs to speed future data collection from the same set of cases. Google Chrome required for direct access to case PDFs via query template (see /templates on GitHub). |
upload_time | 2023-03-24 00:29:06 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.9 |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
```
___ __ __
/ | / /___ _________ _________/ /__ _____
/ /| | / / __ `/ ___/ __ \/ ___/ __ / _ \/ ___/
/ ___ |/ / /_/ / /__/ /_/ / / / /_/ / __/ /
/_/ |_/_/\__,_/\___/\____/_/ \__,_/\___/_/
ALACORDER 78.4
```
# **Getting Started with Alacorder**
### Alacorder collects and processes case detail PDFs into data tables suitable for research purposes. Alacorder also generates compressed text archives from the source PDFs to speed future data collection from the same set of cases.
<sup>[GitHub](https://github.com/sbrobson959/alacorder) | [PyPI](https://pypi.org/project/alacorder/) | [Report an issue](mailto:sbrobson@crimson.ua.edu)
</sup>
```
Usage: python -m alacorder [OPTIONS] COMMAND [ARGS]...
ALACORDER beta 78.4
Alacorder retrieves case detail PDFs from Alacourt.com and processes them
into text archives and data tables suitable for research purposes.
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
append Append one case text archive to another
archive Create full text archive from case PDFs
fetch Fetch cases from Alacourt.com with input query spreadsheet...
mark Mark query template sheet with cases found in archive or PDF...
table Export data tables from archive or directory
```
## **Installation**
**Alacorder can run on most devices. If your device can run Python 3.9 or later, it can run Alacorder.**
* To skip installation, download prebuilt executable for your OS (MacOSX, Windows, Linux)
* To install on Windows or Mac, open Command Prompt (Terminal) and enter `pip install alacorder` or `pip3 install alacorder`.
* On Mac, open the Terminal and enter `pip install alacorder` or `pip3 install alacorder`.
* Install [Anaconda Distribution](https://www.anaconda.com/products/distribution) to install Alacorder if the above methods do not work, or if you would like to open an interactive browser notebook equipped with Alacorder on your desktop.
* After installation, create a virtual environment, open a terminal, and then repeat these instructions. If your copy of Alacorder is corrupted, use `pip uninstall alacorder` or `pip3 uninstall alacorder` and then reinstall it. There may be a newer version available.
```python
pip install alacorder
```
## **The `alacorder` package includes a desktop interface, command line interface, and python libraries for parsing case PDFs.**
#### **Once you have a Python environment up and running, you can launch the guided interface in two ways:**
1. *Utilize the `alacorder` desktop app:* Use the command line tool `python -m alacorder`, or `python3 -m alacorder`.
2. *Use the command line interface:* Add the flag `--help` to see how to use
#### **Alacorder can be used without writing any code, and exports to common formats like Excel (`.xls`, `.xlsx`), Stata (`.dta`), CSV (`.csv`), and JSON (`.json`).**
* Alacorder compresses case text into `pickle` archives (`.pkl.xz`) to save storage and processing time. If you need to unpack a `pickle` archive without importing `alac`, use a `.xz` compression tool, then read the `pickle` into Python with the `pandas` method [`pd.read_pickle()`](https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html).
# **Special Queries with `alac`**
```python
from alacorder import alac
```
### **For more advanced queries, the `alac` module can extract fields and tables from case records with just a few lines of code.**
* Call `alac.setinputs("/pdf/dir/")` and `alac.setoutputs("/to/table.xlsx")` to configure your input and output paths. Then call `alac.set(input_conf, output_conf, **kwargs)` to complete the configuration process. Feed the output to any of the `alac.write...()` functions to start a task.
* Call `alac.archive(config)` to export a full text archive. It's recommended that you create a full text archive (`.pkl.xz`) file before making tables from your data. Full text archives can be scanned faster than PDF directories and require less storage. Full text archives can be imported to Alacorder the same way as PDF directories.
* Call `alac.tables(config)` to export detailed case information tables. If export type is `.xls` or `.xlsx`, the `cases`, `fees`, and `charges` tables will be exported.
* Call `alac.charges(config)` to export `charges` table only.
* Call `alac.fees(config)` to export `fees` table only.
* Call `alac.cases(config)` to export `cases` table or `all` if output extension supports `multitable` export.
```python
import warnings
warnings.filterwarnings('ignore')
from alacorder import alac
pdf_directory = "/Users/crimson/Desktop/Tutwiler/"
archive = "/Users/crimson/Desktop/Tutwiler.pkl.xz"
tables = "/Users/crimson/Desktop/Tutwiler.xlsx"
pdfconf = alac.setinputs(pdf_directory)
arcconf = alac.setoutputs(archive)
# write archive to Tutwiler.pkl.xz
c = alac.set(pdfconf, arcconf)
alac.archive(c)
print("Full text archive complete. Now processing case information into tables at " + tables)
d = alac.setpaths(archive, tables) # runs setinputs(), setoutputs() and set() at once
alac.tables(d)
# write tables to Tutwiler.xlsx
alac.tables(tabconf)
```
## **Custom Parsing with `alac.map()`**
### If you need to conduct a custom search of case records, Alacorder has the tools you need to extract custom fields from case PDFs without any fuss. Try out `alac.map()` to search thousands of cases in seconds.
```python
from alacorder import alac
import re
archive = "/Users/crimson/Desktop/Tutwiler.pkl.xz"
tables = "/Users/crimson/Desktop/Tutwiler.xlsx"
def findName(text):
name = ""
if bool(re.search(r'(?a)(VS\.|V\.)(.+)(Case)*', text, re.MULTILINE)) == True:
name = re.search(r'(?a)(VS\.|V\.)(.+)(Case)*', text, re.MULTILINE).group(2).replace("Case Number:","").strip()
else:
if bool(re.search(r'(?:DOB)(.+)(?:Name)', text, re.MULTILINE)) == True:
name = re.search(r'(?:DOB)(.+)(?:Name)', text, re.MULTILINE).group(1).replace(":","").replace("Case Number:","").strip()
return name
c = alac.setpaths(archive, tables, count=2000) # set configuration
alac.map(c, findName, alac.getConvictions) # Name, Convictions table
```
| Method | Description |
| ------------- | ------ |
| `getPDFText(path) -> text` | Returns full text of case |
| `getCaseInfo(text) -> [case_number, name, alias, date_of_birth, race, sex, address, phone]` | Returns basic case details |
| `getFeeSheet(text, cnum = '') -> [total_amtdue, total_balance, total_d999, feecodes_w_bal, all_fee_codes, table_string, feesheet: pd.DataFrame]` | Returns fee sheet and summary as `str` and `pd.DataFrame` |
| `getCharges(text, cnum = '') -> [convictions_string, disposition_charges, filing_charges, cerv_eligible_convictions, pardon_to_vote_convictions, permanently_disqualifying_convictions, conviction_count, charge_count, cerv_charge_count, pardontovote_charge_count, permanent_dq_charge_count, cerv_convictions_count, pardontovote_convictions_count, charge_codes, conviction_codes, all_charges_string, charges: pd.DataFrame]` | Returns charges table and summary as `str`, `int`, and `pd.DataFrame` |
| `getCaseNumber(text) -> case_number` | Returns case number
| `getName(text) -> name` | Returns name
| `getFeeTotals(text) -> [total_row, tdue, tpaid, tbal, tdue]` | Return totals without parsing fee sheet
# **Working with case data in Python**
### Out of the box, Alacorder exports to `.xlsx`, `.xls`, `.csv`, `.json`, and `.dta`. But you can use `alac`, `pandas`, and other python libraries to create your own data collection workflows and design custom exports.
***The snippet below prints the fee sheets from a directory of case PDFs as it reads them.***
```python
from alacorder import alac
c = alac.setpaths("/Users/crimson/Desktop/Tutwiler/","/Users/crimson/Desktop/Tutwiler.xls")
for path in c['contents']:
text = alac.getPDFText(path)
cnum = alac.getCaseNumber(text)
charges_outputs = alac.getCharges(text, cnum)
if len(charges_outputs[0]) > 1:
print(charges_outputs[0])
```
## Extending Alacorder with `pandas` and other tools
Alacorder runs on [`pandas`](https://pandas.pydata.org/docs/getting_started/index.html#getting-started), a python library you can use to perform calculations, process text data, and make tables and charts. `pandas` can read from and write to all major data storage formats. It can connect to a wide variety of services to provide for easy export. When Alacorder table data is exported to `.pkl.xz`, it is stored as a `pd.DataFrame` and can be imported into other python [modules](https://www.anaconda.com/open-source) and scripts with `pd.read_pickle()` like below:
```python
import pandas as pd
contents = pd.read_pickle("/path/to/pkl")
```
If you would like to visualize data without exporting to Excel or another format, create a `jupyter notebook` and import a data visualization library like `matplotlib` to get started. The resources below can help you get started. [`jupyter`](https://docs.jupyter.org/en/latest/start/index.html) is a Python kernel you can use to create interactive notebooks for data analysis and other purposes. It can be installed using `pip install jupyter` or `pip3 install jupyter` and launched using `jupyter notebook`. Your device may already be equipped to view `.ipynb` notebooks.
## **Resources**
* [`pandas` cheat sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)
* [regex cheat sheet](https://www.rexegg.com/regex-quickstart.html)
* [anaconda (tutorials on python data analysis)](https://www.anaconda.com/open-source)
* [The Python Tutorial](https://docs.python.org/3/tutorial/)
* [`jupyter` introduction](https://realpython.com/jupyter-notebook-introduction/)
-------------------------------------
© 2023 Sam Robson
Raw data
{
"_id": null,
"home_page": "",
"name": "alacorder",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "",
"author": "",
"author_email": "Sam Robson <sbrobson@crimson.ua.edu>",
"download_url": "https://files.pythonhosted.org/packages/34/80/deccfa2590f5e64b421a2d96351a8473802e247b91eeb67cde9741986d09/alacorder-78.4.2.tar.gz",
"platform": null,
"description": "```\n ___ __ __ \n / | / /___ _________ _________/ /__ _____\n / /| | / / __ `/ ___/ __ \\/ ___/ __ / _ \\/ ___/\n / ___ |/ / /_/ / /__/ /_/ / / / /_/ / __/ / \n/_/ |_/_/\\__,_/\\___/\\____/_/ \\__,_/\\___/_/ \n\nALACORDER 78.4\n```\n# **Getting Started with Alacorder**\n### Alacorder collects and processes case detail PDFs into data tables suitable for research purposes. Alacorder also generates compressed text archives from the source PDFs to speed future data collection from the same set of cases.\n\n<sup>[GitHub](https://github.com/sbrobson959/alacorder) | [PyPI](https://pypi.org/project/alacorder/) | [Report an issue](mailto:sbrobson@crimson.ua.edu)\n</sup>\n```\nUsage: python -m alacorder [OPTIONS] COMMAND [ARGS]...\n\n ALACORDER beta 78.4\n\n Alacorder retrieves case detail PDFs from Alacourt.com and processes them\n into text archives and data tables suitable for research purposes.\n\nOptions:\n --version Show the version and exit.\n --help Show this message and exit.\n\nCommands:\n append Append one case text archive to another\n archive Create full text archive from case PDFs\n fetch Fetch cases from Alacourt.com with input query spreadsheet...\n mark Mark query template sheet with cases found in archive or PDF...\n table Export data tables from archive or directory\n```\n\n## **Installation**\n\n**Alacorder can run on most devices. If your device can run Python 3.9 or later, it can run Alacorder.**\n* To skip installation, download prebuilt executable for your OS (MacOSX, Windows, Linux)\n* To install on Windows or Mac, open Command Prompt (Terminal) and enter `pip install alacorder` or `pip3 install alacorder`. \n* On Mac, open the Terminal and enter `pip install alacorder` or `pip3 install alacorder`.\n* Install [Anaconda Distribution](https://www.anaconda.com/products/distribution) to install Alacorder if the above methods do not work, or if you would like to open an interactive browser notebook equipped with Alacorder on your desktop.\n * After installation, create a virtual environment, open a terminal, and then repeat these instructions. If your copy of Alacorder is corrupted, use `pip uninstall alacorder` or `pip3 uninstall alacorder` and then reinstall it. There may be a newer version available.\n\n```python\npip install alacorder\n```\n\n## **The `alacorder` package includes a desktop interface, command line interface, and python libraries for parsing case PDFs.**\n\n#### **Once you have a Python environment up and running, you can launch the guided interface in two ways:**\n\n1. *Utilize the `alacorder` desktop app:* Use the command line tool `python -m alacorder`, or `python3 -m alacorder`. \n\n2. *Use the command line interface:* Add the flag `--help` to see how to use \n\n#### **Alacorder can be used without writing any code, and exports to common formats like Excel (`.xls`, `.xlsx`), Stata (`.dta`), CSV (`.csv`), and JSON (`.json`).**\n\n* Alacorder compresses case text into `pickle` archives (`.pkl.xz`) to save storage and processing time. If you need to unpack a `pickle` archive without importing `alac`, use a `.xz` compression tool, then read the `pickle` into Python with the `pandas` method [`pd.read_pickle()`](https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html).\n\n\n# **Special Queries with `alac`**\n\n```python\nfrom alacorder import alac\n```\n\n### **For more advanced queries, the `alac` module can extract fields and tables from case records with just a few lines of code.**\n\n* Call `alac.setinputs(\"/pdf/dir/\")` and `alac.setoutputs(\"/to/table.xlsx\")` to configure your input and output paths. Then call `alac.set(input_conf, output_conf, **kwargs)` to complete the configuration process. Feed the output to any of the `alac.write...()` functions to start a task.\n\n* Call `alac.archive(config)` to export a full text archive. It's recommended that you create a full text archive (`.pkl.xz`) file before making tables from your data. Full text archives can be scanned faster than PDF directories and require less storage. Full text archives can be imported to Alacorder the same way as PDF directories. \n\n* Call `alac.tables(config)` to export detailed case information tables. If export type is `.xls` or `.xlsx`, the `cases`, `fees`, and `charges` tables will be exported.\n\n* Call `alac.charges(config)` to export `charges` table only.\n\n* Call `alac.fees(config)` to export `fees` table only.\n\n* Call `alac.cases(config)` to export `cases` table or `all` if output extension supports `multitable` export. \n\n\n```python\nimport warnings\nwarnings.filterwarnings('ignore')\n\nfrom alacorder import alac\n\npdf_directory = \"/Users/crimson/Desktop/Tutwiler/\"\narchive = \"/Users/crimson/Desktop/Tutwiler.pkl.xz\"\ntables = \"/Users/crimson/Desktop/Tutwiler.xlsx\"\n\npdfconf = alac.setinputs(pdf_directory)\narcconf = alac.setoutputs(archive)\n\n# write archive to Tutwiler.pkl.xz\nc = alac.set(pdfconf, arcconf)\nalac.archive(c) \n\nprint(\"Full text archive complete. Now processing case information into tables at \" + tables)\n\nd = alac.setpaths(archive, tables) # runs setinputs(), setoutputs() and set() at once\nalac.tables(d)\n\n# write tables to Tutwiler.xlsx\nalac.tables(tabconf)\n```\n\n## **Custom Parsing with `alac.map()`**\n### If you need to conduct a custom search of case records, Alacorder has the tools you need to extract custom fields from case PDFs without any fuss. Try out `alac.map()` to search thousands of cases in seconds.\n\n\n```python\nfrom alacorder import alac\nimport re\n\narchive = \"/Users/crimson/Desktop/Tutwiler.pkl.xz\"\ntables = \"/Users/crimson/Desktop/Tutwiler.xlsx\"\n\ndef findName(text):\n name = \"\"\n if bool(re.search(r'(?a)(VS\\.|V\\.)(.+)(Case)*', text, re.MULTILINE)) == True:\n name = re.search(r'(?a)(VS\\.|V\\.)(.+)(Case)*', text, re.MULTILINE).group(2).replace(\"Case Number:\",\"\").strip()\n else:\n if bool(re.search(r'(?:DOB)(.+)(?:Name)', text, re.MULTILINE)) == True:\n name = re.search(r'(?:DOB)(.+)(?:Name)', text, re.MULTILINE).group(1).replace(\":\",\"\").replace(\"Case Number:\",\"\").strip()\n return name\n\nc = alac.setpaths(archive, tables, count=2000) # set configuration\n\nalac.map(c, findName, alac.getConvictions) # Name, Convictions table\n```\n\n\n| Method | Description |\n| ------------- | ------ |\n| `getPDFText(path) -> text` | Returns full text of case |\n| `getCaseInfo(text) -> [case_number, name, alias, date_of_birth, race, sex, address, phone]` | Returns basic case details | \n| `getFeeSheet(text, cnum = '') -> [total_amtdue, total_balance, total_d999, feecodes_w_bal, all_fee_codes, table_string, feesheet: pd.DataFrame]` | Returns fee sheet and summary as `str` and `pd.DataFrame` |\n| `getCharges(text, cnum = '') -> [convictions_string, disposition_charges, filing_charges, cerv_eligible_convictions, pardon_to_vote_convictions, permanently_disqualifying_convictions, conviction_count, charge_count, cerv_charge_count, pardontovote_charge_count, permanent_dq_charge_count, cerv_convictions_count, pardontovote_convictions_count, charge_codes, conviction_codes, all_charges_string, charges: pd.DataFrame]` | Returns charges table and summary as `str`, `int`, and `pd.DataFrame` |\n| `getCaseNumber(text) -> case_number` | Returns case number\n| `getName(text) -> name` | Returns name\n| `getFeeTotals(text) -> [total_row, tdue, tpaid, tbal, tdue]` | Return totals without parsing fee sheet\n\n\n\n# **Working with case data in Python**\n\n\n### Out of the box, Alacorder exports to `.xlsx`, `.xls`, `.csv`, `.json`, and `.dta`. But you can use `alac`, `pandas`, and other python libraries to create your own data collection workflows and design custom exports. \n\n***The snippet below prints the fee sheets from a directory of case PDFs as it reads them.***\n\n\n```python\nfrom alacorder import alac\n\nc = alac.setpaths(\"/Users/crimson/Desktop/Tutwiler/\",\"/Users/crimson/Desktop/Tutwiler.xls\")\n\nfor path in c['contents']:\n text = alac.getPDFText(path)\n cnum = alac.getCaseNumber(text)\n charges_outputs = alac.getCharges(text, cnum)\n if len(charges_outputs[0]) > 1:\n print(charges_outputs[0])\n```\n\n## Extending Alacorder with `pandas` and other tools\n\nAlacorder runs on [`pandas`](https://pandas.pydata.org/docs/getting_started/index.html#getting-started), a python library you can use to perform calculations, process text data, and make tables and charts. `pandas` can read from and write to all major data storage formats. It can connect to a wide variety of services to provide for easy export. When Alacorder table data is exported to `.pkl.xz`, it is stored as a `pd.DataFrame` and can be imported into other python [modules](https://www.anaconda.com/open-source) and scripts with `pd.read_pickle()` like below:\n```python\nimport pandas as pd\ncontents = pd.read_pickle(\"/path/to/pkl\")\n```\n\nIf you would like to visualize data without exporting to Excel or another format, create a `jupyter notebook` and import a data visualization library like `matplotlib` to get started. The resources below can help you get started. [`jupyter`](https://docs.jupyter.org/en/latest/start/index.html) is a Python kernel you can use to create interactive notebooks for data analysis and other purposes. It can be installed using `pip install jupyter` or `pip3 install jupyter` and launched using `jupyter notebook`. Your device may already be equipped to view `.ipynb` notebooks. \n\n## **Resources**\n\n* [`pandas` cheat sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)\n* [regex cheat sheet](https://www.rexegg.com/regex-quickstart.html)\n* [anaconda (tutorials on python data analysis)](https://www.anaconda.com/open-source)\n* [The Python Tutorial](https://docs.python.org/3/tutorial/)\n* [`jupyter` introduction](https://realpython.com/jupyter-notebook-introduction/)\n\n\n\t\n\n\t\n-------------------------------------\t\t\n\u00a9 2023 Sam Robson\n",
"bugtrack_url": null,
"license": "",
"summary": "Alacorder collects and processes case detail PDFs into data tables suitable for research purposes. Alacorder also generates compressed text archives from the source PDFs to speed future data collection from the same set of cases. Google Chrome required for direct access to case PDFs via query template (see /templates on GitHub).",
"version": "78.4.2",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5b94427310aebb6cc5e1e17965e1fd859913dbea0d001961dd9da34550090914",
"md5": "66ed3e314aa091270406c916c33f19f3",
"sha256": "35a513917288e9be5955a3e51c86708b87499c73e197b46de9663a31f09e35ae"
},
"downloads": -1,
"filename": "alacorder-78.4.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "66ed3e314aa091270406c916c33f19f3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 34532,
"upload_time": "2023-03-24T00:29:02",
"upload_time_iso_8601": "2023-03-24T00:29:02.769576Z",
"url": "https://files.pythonhosted.org/packages/5b/94/427310aebb6cc5e1e17965e1fd859913dbea0d001961dd9da34550090914/alacorder-78.4.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3480deccfa2590f5e64b421a2d96351a8473802e247b91eeb67cde9741986d09",
"md5": "f86de6d71dc57cfe87ef444bc7f0fea7",
"sha256": "1a9530d91fae7d9541006e0c702846586d9d46abe3f13b1cc40abab50b673c70"
},
"downloads": -1,
"filename": "alacorder-78.4.2.tar.gz",
"has_sig": false,
"md5_digest": "f86de6d71dc57cfe87ef444bc7f0fea7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 36189,
"upload_time": "2023-03-24T00:29:06",
"upload_time_iso_8601": "2023-03-24T00:29:06.315631Z",
"url": "https://files.pythonhosted.org/packages/34/80/deccfa2590f5e64b421a2d96351a8473802e247b91eeb67cde9741986d09/alacorder-78.4.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-24 00:29:06",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "alacorder"
}