juriscraper


Namejuriscraper JSON
Version 2.5.94 PyPI version JSON
download
home_pagehttps://github.com/freelawproject/juriscraper
SummaryAn API to scrape American court websites for metadata.
upload_time2024-02-13 19:28:38
maintainerFree Law Project
docs_urlNone
authorFree Law Project
requires_python
licenseBSD
keywords scraping legal pacer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            +---------------+---------------------+-------------------+
| |Lint Badge|  | |Test Badge|        |  |Version Badge|  |
+---------------+---------------------+-------------------+


.. |Lint Badge| image:: https://github.com/freelawproject/juriscraper/workflows/Lint/badge.svg
.. |Test Badge| image:: https://github.com/freelawproject/juriscraper/workflows/Tests/badge.svg
.. |Version Badge| image:: https://badge.fury.io/py/juriscraper.svg


What is This?
=============

Juriscraper is a scraper library started several years ago that gathers judicial opinions, oral arguments, and PACER data in the American court system. It is currently able to scrape:

-  a variety of pages and reports within the PACER system
-  opinions from all major appellate Federal courts
-  opinions from all state courts of last resort except for Georgia (typically their "Supreme Court")
-  oral arguments from all appellate federal courts that offer them

Juriscraper is part of a two-part system. The second part is your code,
which calls Juriscraper. Your code is responsible for calling a scraper,
downloading and saving its results. A reference implementation of the
caller has been developed and is in use at
`CourtListener.com <https://www.courtlistener.com>`__. The code for that
caller can be `found
here <https://github.com/freelawproject/courtlistener/tree/master/cl/scrapers/management/commands>`__.
There is also a basic sample caller `included in
Juriscraper <https://github.com/freelawproject/juriscraper/blob/main/sample_caller.py>`__
that can be used for testing or as a starting point when developing your
own.

Some of the design goals for this project are:

-  extensibility to support video, oral argument audio, etc.
-  extensibility to support geographies (US, Cuba, Mexico, California)
-  Mime type identification through magic numbers
-  Generalized architecture with minimal code repetition
-  XPath-based scraping powered by lxml's html parser
-  return all meta data available on court websites (caller can pick
   what it needs)
-  no need for a database
-  clear log levels (DEBUG, INFO, WARN, CRITICAL)
-  friendly as possible to court websites

Installation & Dependencies
===========================

First step: Install Python 3.8+.x, then:

Install the dependencies
------------------------

On Ubuntu/Debian Linux::

    sudo apt-get install libxml2-dev libxslt-dev libyaml-dev

On macOS with Homebrew <https://brew.sh>::

    brew install libyaml


Then install the code
---------------------

::

    pip install juriscraper

You can set an environment variable for where you want to stash your logs (this
can be skipped, and `/var/log/juriscraper/debug.log` will be used as the
default if it exists on the filesystem)::

    export JURISCRAPER_LOG=/path/to/your/log.txt

Finally, do your WebDriver
--------------------------
Some websites are too difficult to crawl without some sort of automated
WebDriver. For these, Juriscraper either uses a locally-installed copy of
geckodriver or can be configured to connect to a remote webdriver. If you prefer
the local installation, you can download Selenium FireFox Geckodriver::

    # choose OS compatible package from:
    #   https://github.com/mozilla/geckodriver/releases/tag/v0.26.0
    # un-tar/zip your download
    sudo mv geckodriver /usr/local/bin

If you prefer to use a remote webdriver, like `Selenium's docker image <https://hub.docker.com/r/selenium/standalone-firefox>`__, you can
configure it with the following variables:

``WEBDRIVER_CONN``: Use this to set the connection string to your remote
webdriver. By default, this is ``local``, meaning it will look for a local
installation of geckodriver. Instead, you can set this to something like,
``'http://YOUR_DOCKER_IP:4444/wd/hub'``, which will switch it to using a remote
driver and connect it to that location.

``SELENIUM_VISIBLE``: Set this to any value to disable headless mode in your
selenium driver, if it supports it. Otherwise, it defaults to headless.

For example, if you want to watch a headless browser run, you can do so by
starting selenium with::

    docker run \
        -p 4444:4444 \
        -p 5900:5900 \
        -v /dev/shm:/dev/shm \
        selenium/standalone-firefox-debug

That'll launch it on your local machine with two open ports. 4444 is the
default on the image for accessing the webdriver. 5900 can be used to connect
via a VNC viewer, and can be used to watch progress if the ``SELENIUM_VISIBLE``
variable is set.

Once you have selenium running like that, you can do a test like::

    WEBDRIVER_CONN='http://localhost:4444/wd/hub' \
        SELENIUM_VISIBLE=yes \
        python sample_caller.py -c juriscraper.opinions.united_states.state.kan_p

Kansas's precedential scraper uses a webdriver. If you do this and watch
selenium, you should see it in action.


Joining the Project as a Developer
==================================

For scrapers to be merged:

-  Automated testing should pass. The test suite will be run automatically by Github Actions. If changes are being made to the pacer code, the pacer tests must also pass when run. These tests are skipped by default. To run them, set environment variables for PACER_USERNAME and PACER_PASSWORD.

-  A \*\_example\* file must be included in the ``tests/examples``
   directory (this is needed for the tests to run your code).

-  Your code should be
   `PEP8 <http://www.python.org/dev/peps/pep-0008/>`__ compliant with no
   major Pylint problems or Intellij inspection issues.

-  We use the `black <https://black.readthedocs.io/en/stable/>`__ code formatter to make sure all our Python code has the same formatting. This is an automated tool that you must run on any code you run before you push it to Github. When you run it, it will reformat your code. We recommend `integrating into your editor  <https://black.readthedocs.io/en/stable/integrations/editors.html>`__.

- This project is configured to use git pre-commit hooks managed by the
  Python program `pre-commit <https://pre-commit.com/>`__. Pre-
  commit checks let us easily ensure that the code is properly formatted with
  black before it can even be commited. If you install the dev dependencies in
  `requirements-dev.txt`, you should then be able to run `$ pre-commit install`
  which will set up a git pre-commit hook for you. This install step is only
  necessary once in your repository. When using this hook, any code
  files that do not comply to black will automatically be unstaged and re-
  formatted. You will see a message to this effect. It is your job to then re-stage
  and commit the files.

-  Beyond what black will do for you by default, if you somehow find a way to do whitespace or other formatting changes, do so in their own commit and ideally in its own PR. When whitespace is combined with other code changes, the PR's become impossible to read and risky to merge. This is a big reason we use black.

-  Your code should efficiently parse a page, returning no exceptions or
   speed warnings during tests on a modern machine.

When you're ready to develop a scraper, get in touch, and we'll find you
a scraper that makes sense and that nobody else is working on. We have `a wiki
list <https://github.com/freelawproject/juriscraper/wiki/Court-Websites>`__
of courts that you can browse yourself. There are templates for new
scrapers `here (for
opinions) <https://github.com/freelawproject/juriscraper/blob/master/juriscraper/opinions/opinion_template.py>`__
and `here (for oral
arguments) <https://github.com/freelawproject/juriscraper/blob/master/juriscraper/oral_args/oral_argument_template.py>`__.

When you're done with your scraper, fork this repository, push your
changes into your fork, and then send a pull request for your changes.
Be sure to remember to update the ``__init__.py`` file as well, since it
contains a list of completed scrapers.

Before we can accept any changes from any contributor, we need a signed
and completed Contributor License Agreement. You can find this agreement
in the root of the repository. While an annoying bit of paperwork, this
license is for your protection as a Contributor as well as the
protection of Free Law Project and our users; it does not change your
rights to use your own Contributions for any other purpose.


Getting Set Up as a Developer
=============================

To get set up as a developer of Juriscraper, you'll want to install the code
from git. To do that, install the dependencies and geckodriver as described above.
Instead of installing Juriscraper via pip, do the following:

::

    git clone https://github.com/freelawproject/juriscraper.git .
    pip install -r requirements.txt
    python setup.py test

    # run tests against multiple python versions via tox
    tox

    # run network tests (on demand, not run via default command above)
    python setup.py testnetwork

You may need to also install Juriscraper locally with:

::

   pip install .

If you've not installed juriscraper, you can run `sample_caller.py` as:

::

   PYTHONPATH=`pwd` python  sample_caller.py


Usage
=====

The scrapers are written in Python, and can can scrape a court as
follows:

::

    from juriscraper.opinions.united_states.federal_appellate import ca1

    # Create a site object
    site = ca1.Site()

    # Populate it with data, downloading the page if necessary
    site.parse()

    # Print out the object
    print(str(site))

    # Print it out as JSON
    print(site.to_json())

    # Iterate over the item
    for opinion in site:
        print(opinion)

That will print out all the current meta data for a site, including
links to the objects you wish to download (typically opinions or oral
arguments). If you download those opinions, we also recommend running the
``_cleanup_content()`` method against the items that you download (PDFs,
HTML, etc.). See the ``sample_caller.py`` for an example and see
``_cleanup_content()`` for an explanation of what it does.

It's also possible to iterate over all courts in a Python package, even
if they're not known before starting the scraper. For example:

::

    # Start with an import path. This will do all federal courts.
    court_id = 'juriscraper.opinions.united_states.federal'
    # Import all the scrapers
    scrapers = __import__(
        court_id,
        globals(),
        locals(),
        ['*']
    ).__all__
    for scraper in scrapers:
        mod = __import__(
            '%s.%s' % (court_id, scraper),
            globals(),
            locals(),
            [scraper]
        )
        # Create a Site instance, then get the contents
        site = mod.Site()
        site.parse()
        print(str(site))

This can be useful if you wish to create a command line scraper that
iterates over all courts of a certain jurisdiction that is provided by a
script. See ``lib/importer.py`` for an example that's used in
the sample caller.

District Court Parser
=====================
A sample driver to run the PACER District Court parser on an html file is included.
It takes HTML file(s) as arguments and outputs JSON to stdout.

Example usage:

::

   PYTHONPATH=`pwd` python juriscraper/pacerdocket.py tests/examples/pacer/dockets/district/nysd.html


Tests
=====

We got that! You can (and should) run the tests with
``tox``. This will run ``python setup.py test`` for all supported Python runtimes,
iterating over all of the ``*_example*`` files and run the scrapers against them.

Each scraper has one or more ``*_example*`` files.  When creating a new scraper,
or covering a new use case for an existing scraper, you will have to create an
example file yourself.  Please see the files under ``tests/examples/`` to see
for yourself how the naming structure works.  What you want to put in your new
example file is the HTML/json/xml that the scraper in question needs to test
parsing.  Sometimes creating these files can be tricky, but more often than not,
it is as simple as getting the data to display in your browser, viewing then copying
the page source, then pasting that text into your new example file.

Each ``*_example*`` file has a corresponding ``*_example*.compare.json`` file. This
file contains a json data object that represents the data extracted when parsing
the corresponding ``*_example*`` file.  These are used to ensure that each scraper
parses the exact data we expect from each of its ``*_example*`` files. You do not
need to create these ``*_example*.compare.json`` files yourself.  Simply create
your ``*_example*`` file, then run the test suite.  It will fail the first time,
indicating that a new ``*_example*.compare.json`` file was generated.  You should
review that file, make sure the data is correct, then re-run the test suite.  This
time, the tests should pass (or at least they shouldn't fail because of the newly
generated ``*_example*.compare.json`` file).  Once the tests are passing,
feel free to commit, but **please remember** to include the new ``*_example*``
**and** ``*_example*.compare.json`` files in your commit.

Individual tests can be run with:

   python -m unittest -v tests.local.test_DateTest.DateTest.test_various_date_extractions

Or, to run and drop to the Python debugger if it fails, but you must install `nost` to have `nosetests`:

  nosetests -v --pdb tests/local/test_DateTest.py:DateTest.test_various_date_extractions


Future Goals
============
-  Support for additional PACER pages and utilities
-  Support opinions from for all intermediate appellate state courts
-  Support opinions from for all courts of U.S. territories (Guam, American Samoa, etc.)
-  Support opinions from for all federal district courts with non-PACER opinion listings
-  For every court above where a backscraper is possible, it is implemented.
-  Support video, additional oral argument audio, and transcripts everywhere available


Deployment
==========
Deployment to PyPi should happen automatically when a tagged version is pushed
to master in the format v*.*.*. If you do not have push permission on master,
this will also work for merged, tagged pull requests. Simply update setup.py,
tag your commit with the correct tag (v.*.*.*), and do a PR with that.

If you wish to create a new version manually, the process is:

1. Update CHANGES.md

1. Update version info in ``setup.py``

1. Install the requirements in requirements_dev.txt

1. Set up a config file at ~/.pypirc

1. Generate a distribution

    ::

        python setup.py bdist_wheel

1. Upload the distribution

    ::

        twine upload dist/* -r pypi (or pypitest)



License
=======

Juriscraper is licensed under the permissive BSD license.

|forthebadge made-with-python|

.. |forthebadge made-with-python| image:: http://ForTheBadge.com/images/badges/made-with-python.svg
    :target: https://www.python.org/

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/freelawproject/juriscraper",
    "name": "juriscraper",
    "maintainer": "Free Law Project",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "info@free.law",
    "keywords": "scraping,legal,pacer",
    "author": "Free Law Project",
    "author_email": "info@free.law",
    "download_url": "https://files.pythonhosted.org/packages/5e/e2/f739db4b4b50a6df2e60e52b598ef46b745c3fd2c43d19246a6c01493367/juriscraper-2.5.94.tar.gz",
    "platform": null,
    "description": "+---------------+---------------------+-------------------+\n| |Lint Badge|  | |Test Badge|        |  |Version Badge|  |\n+---------------+---------------------+-------------------+\n\n\n.. |Lint Badge| image:: https://github.com/freelawproject/juriscraper/workflows/Lint/badge.svg\n.. |Test Badge| image:: https://github.com/freelawproject/juriscraper/workflows/Tests/badge.svg\n.. |Version Badge| image:: https://badge.fury.io/py/juriscraper.svg\n\n\nWhat is This?\n=============\n\nJuriscraper is a scraper library started several years ago that gathers judicial opinions, oral arguments, and PACER data in the American court system. It is currently able to scrape:\n\n-  a variety of pages and reports within the PACER system\n-  opinions from all major appellate Federal courts\n-  opinions from all state courts of last resort except for Georgia (typically their \"Supreme Court\")\n-  oral arguments from all appellate federal courts that offer them\n\nJuriscraper is part of a two-part system. The second part is your code,\nwhich calls Juriscraper. Your code is responsible for calling a scraper,\ndownloading and saving its results. A reference implementation of the\ncaller has been developed and is in use at\n`CourtListener.com <https://www.courtlistener.com>`__. The code for that\ncaller can be `found\nhere <https://github.com/freelawproject/courtlistener/tree/master/cl/scrapers/management/commands>`__.\nThere is also a basic sample caller `included in\nJuriscraper <https://github.com/freelawproject/juriscraper/blob/main/sample_caller.py>`__\nthat can be used for testing or as a starting point when developing your\nown.\n\nSome of the design goals for this project are:\n\n-  extensibility to support video, oral argument audio, etc.\n-  extensibility to support geographies (US, Cuba, Mexico, California)\n-  Mime type identification through magic numbers\n-  Generalized architecture with minimal code repetition\n-  XPath-based scraping powered by lxml's html parser\n-  return all meta data available on court websites (caller can pick\n   what it needs)\n-  no need for a database\n-  clear log levels (DEBUG, INFO, WARN, CRITICAL)\n-  friendly as possible to court websites\n\nInstallation & Dependencies\n===========================\n\nFirst step: Install Python 3.8+.x, then:\n\nInstall the dependencies\n------------------------\n\nOn Ubuntu/Debian Linux::\n\n    sudo apt-get install libxml2-dev libxslt-dev libyaml-dev\n\nOn macOS with Homebrew <https://brew.sh>::\n\n    brew install libyaml\n\n\nThen install the code\n---------------------\n\n::\n\n    pip install juriscraper\n\nYou can set an environment variable for where you want to stash your logs (this\ncan be skipped, and `/var/log/juriscraper/debug.log` will be used as the\ndefault if it exists on the filesystem)::\n\n    export JURISCRAPER_LOG=/path/to/your/log.txt\n\nFinally, do your WebDriver\n--------------------------\nSome websites are too difficult to crawl without some sort of automated\nWebDriver. For these, Juriscraper either uses a locally-installed copy of\ngeckodriver or can be configured to connect to a remote webdriver. If you prefer\nthe local installation, you can download Selenium FireFox Geckodriver::\n\n    # choose OS compatible package from:\n    #   https://github.com/mozilla/geckodriver/releases/tag/v0.26.0\n    # un-tar/zip your download\n    sudo mv geckodriver /usr/local/bin\n\nIf you prefer to use a remote webdriver, like `Selenium's docker image <https://hub.docker.com/r/selenium/standalone-firefox>`__, you can\nconfigure it with the following variables:\n\n``WEBDRIVER_CONN``: Use this to set the connection string to your remote\nwebdriver. By default, this is ``local``, meaning it will look for a local\ninstallation of geckodriver. Instead, you can set this to something like,\n``'http://YOUR_DOCKER_IP:4444/wd/hub'``, which will switch it to using a remote\ndriver and connect it to that location.\n\n``SELENIUM_VISIBLE``: Set this to any value to disable headless mode in your\nselenium driver, if it supports it. Otherwise, it defaults to headless.\n\nFor example, if you want to watch a headless browser run, you can do so by\nstarting selenium with::\n\n    docker run \\\n        -p 4444:4444 \\\n        -p 5900:5900 \\\n        -v /dev/shm:/dev/shm \\\n        selenium/standalone-firefox-debug\n\nThat'll launch it on your local machine with two open ports. 4444 is the\ndefault on the image for accessing the webdriver. 5900 can be used to connect\nvia a VNC viewer, and can be used to watch progress if the ``SELENIUM_VISIBLE``\nvariable is set.\n\nOnce you have selenium running like that, you can do a test like::\n\n    WEBDRIVER_CONN='http://localhost:4444/wd/hub' \\\n        SELENIUM_VISIBLE=yes \\\n        python sample_caller.py -c juriscraper.opinions.united_states.state.kan_p\n\nKansas's precedential scraper uses a webdriver. If you do this and watch\nselenium, you should see it in action.\n\n\nJoining the Project as a Developer\n==================================\n\nFor scrapers to be merged:\n\n-  Automated testing should pass. The test suite will be run automatically by Github Actions. If changes are being made to the pacer code, the pacer tests must also pass when run. These tests are skipped by default. To run them, set environment variables for PACER_USERNAME and PACER_PASSWORD.\n\n-  A \\*\\_example\\* file must be included in the ``tests/examples``\n   directory (this is needed for the tests to run your code).\n\n-  Your code should be\n   `PEP8 <http://www.python.org/dev/peps/pep-0008/>`__ compliant with no\n   major Pylint problems or Intellij inspection issues.\n\n-  We use the `black <https://black.readthedocs.io/en/stable/>`__ code formatter to make sure all our Python code has the same formatting. This is an automated tool that you must run on any code you run before you push it to Github. When you run it, it will reformat your code. We recommend `integrating into your editor  <https://black.readthedocs.io/en/stable/integrations/editors.html>`__.\n\n- This project is configured to use git pre-commit hooks managed by the\n  Python program `pre-commit <https://pre-commit.com/>`__. Pre-\n  commit checks let us easily ensure that the code is properly formatted with\n  black before it can even be commited. If you install the dev dependencies in\n  `requirements-dev.txt`, you should then be able to run `$ pre-commit install`\n  which will set up a git pre-commit hook for you. This install step is only\n  necessary once in your repository. When using this hook, any code\n  files that do not comply to black will automatically be unstaged and re-\n  formatted. You will see a message to this effect. It is your job to then re-stage\n  and commit the files.\n\n-  Beyond what black will do for you by default, if you somehow find a way to do whitespace or other formatting changes, do so in their own commit and ideally in its own PR. When whitespace is combined with other code changes, the PR's become impossible to read and risky to merge. This is a big reason we use black.\n\n-  Your code should efficiently parse a page, returning no exceptions or\n   speed warnings during tests on a modern machine.\n\nWhen you're ready to develop a scraper, get in touch, and we'll find you\na scraper that makes sense and that nobody else is working on. We have `a wiki\nlist <https://github.com/freelawproject/juriscraper/wiki/Court-Websites>`__\nof courts that you can browse yourself. There are templates for new\nscrapers `here (for\nopinions) <https://github.com/freelawproject/juriscraper/blob/master/juriscraper/opinions/opinion_template.py>`__\nand `here (for oral\narguments) <https://github.com/freelawproject/juriscraper/blob/master/juriscraper/oral_args/oral_argument_template.py>`__.\n\nWhen you're done with your scraper, fork this repository, push your\nchanges into your fork, and then send a pull request for your changes.\nBe sure to remember to update the ``__init__.py`` file as well, since it\ncontains a list of completed scrapers.\n\nBefore we can accept any changes from any contributor, we need a signed\nand completed Contributor License Agreement. You can find this agreement\nin the root of the repository. While an annoying bit of paperwork, this\nlicense is for your protection as a Contributor as well as the\nprotection of Free Law Project and our users; it does not change your\nrights to use your own Contributions for any other purpose.\n\n\nGetting Set Up as a Developer\n=============================\n\nTo get set up as a developer of Juriscraper, you'll want to install the code\nfrom git. To do that, install the dependencies and geckodriver as described above.\nInstead of installing Juriscraper via pip, do the following:\n\n::\n\n    git clone https://github.com/freelawproject/juriscraper.git .\n    pip install -r requirements.txt\n    python setup.py test\n\n    # run tests against multiple python versions via tox\n    tox\n\n    # run network tests (on demand, not run via default command above)\n    python setup.py testnetwork\n\nYou may need to also install Juriscraper locally with:\n\n::\n\n   pip install .\n\nIf you've not installed juriscraper, you can run `sample_caller.py` as:\n\n::\n\n   PYTHONPATH=`pwd` python  sample_caller.py\n\n\nUsage\n=====\n\nThe scrapers are written in Python, and can can scrape a court as\nfollows:\n\n::\n\n    from juriscraper.opinions.united_states.federal_appellate import ca1\n\n    # Create a site object\n    site = ca1.Site()\n\n    # Populate it with data, downloading the page if necessary\n    site.parse()\n\n    # Print out the object\n    print(str(site))\n\n    # Print it out as JSON\n    print(site.to_json())\n\n    # Iterate over the item\n    for opinion in site:\n        print(opinion)\n\nThat will print out all the current meta data for a site, including\nlinks to the objects you wish to download (typically opinions or oral\narguments). If you download those opinions, we also recommend running the\n``_cleanup_content()`` method against the items that you download (PDFs,\nHTML, etc.). See the ``sample_caller.py`` for an example and see\n``_cleanup_content()`` for an explanation of what it does.\n\nIt's also possible to iterate over all courts in a Python package, even\nif they're not known before starting the scraper. For example:\n\n::\n\n    # Start with an import path. This will do all federal courts.\n    court_id = 'juriscraper.opinions.united_states.federal'\n    # Import all the scrapers\n    scrapers = __import__(\n        court_id,\n        globals(),\n        locals(),\n        ['*']\n    ).__all__\n    for scraper in scrapers:\n        mod = __import__(\n            '%s.%s' % (court_id, scraper),\n            globals(),\n            locals(),\n            [scraper]\n        )\n        # Create a Site instance, then get the contents\n        site = mod.Site()\n        site.parse()\n        print(str(site))\n\nThis can be useful if you wish to create a command line scraper that\niterates over all courts of a certain jurisdiction that is provided by a\nscript. See ``lib/importer.py`` for an example that's used in\nthe sample caller.\n\nDistrict Court Parser\n=====================\nA sample driver to run the PACER District Court parser on an html file is included.\nIt takes HTML file(s) as arguments and outputs JSON to stdout.\n\nExample usage:\n\n::\n\n   PYTHONPATH=`pwd` python juriscraper/pacerdocket.py tests/examples/pacer/dockets/district/nysd.html\n\n\nTests\n=====\n\nWe got that! You can (and should) run the tests with\n``tox``. This will run ``python setup.py test`` for all supported Python runtimes,\niterating over all of the ``*_example*`` files and run the scrapers against them.\n\nEach scraper has one or more ``*_example*`` files.  When creating a new scraper,\nor covering a new use case for an existing scraper, you will have to create an\nexample file yourself.  Please see the files under ``tests/examples/`` to see\nfor yourself how the naming structure works.  What you want to put in your new\nexample file is the HTML/json/xml that the scraper in question needs to test\nparsing.  Sometimes creating these files can be tricky, but more often than not,\nit is as simple as getting the data to display in your browser, viewing then copying\nthe page source, then pasting that text into your new example file.\n\nEach ``*_example*`` file has a corresponding ``*_example*.compare.json`` file. This\nfile contains a json data object that represents the data extracted when parsing\nthe corresponding ``*_example*`` file.  These are used to ensure that each scraper\nparses the exact data we expect from each of its ``*_example*`` files. You do not\nneed to create these ``*_example*.compare.json`` files yourself.  Simply create\nyour ``*_example*`` file, then run the test suite.  It will fail the first time,\nindicating that a new ``*_example*.compare.json`` file was generated.  You should\nreview that file, make sure the data is correct, then re-run the test suite.  This\ntime, the tests should pass (or at least they shouldn't fail because of the newly\ngenerated ``*_example*.compare.json`` file).  Once the tests are passing,\nfeel free to commit, but **please remember** to include the new ``*_example*``\n**and** ``*_example*.compare.json`` files in your commit.\n\nIndividual tests can be run with:\n\n   python -m unittest -v tests.local.test_DateTest.DateTest.test_various_date_extractions\n\nOr, to run and drop to the Python debugger if it fails, but you must install `nost` to have `nosetests`:\n\n  nosetests -v --pdb tests/local/test_DateTest.py:DateTest.test_various_date_extractions\n\n\nFuture Goals\n============\n-  Support for additional PACER pages and utilities\n-  Support opinions from for all intermediate appellate state courts\n-  Support opinions from for all courts of U.S. territories (Guam, American Samoa, etc.)\n-  Support opinions from for all federal district courts with non-PACER opinion listings\n-  For every court above where a backscraper is possible, it is implemented.\n-  Support video, additional oral argument audio, and transcripts everywhere available\n\n\nDeployment\n==========\nDeployment to PyPi should happen automatically when a tagged version is pushed\nto master in the format v*.*.*. If you do not have push permission on master,\nthis will also work for merged, tagged pull requests. Simply update setup.py,\ntag your commit with the correct tag (v.*.*.*), and do a PR with that.\n\nIf you wish to create a new version manually, the process is:\n\n1. Update CHANGES.md\n\n1. Update version info in ``setup.py``\n\n1. Install the requirements in requirements_dev.txt\n\n1. Set up a config file at ~/.pypirc\n\n1. Generate a distribution\n\n    ::\n\n        python setup.py bdist_wheel\n\n1. Upload the distribution\n\n    ::\n\n        twine upload dist/* -r pypi (or pypitest)\n\n\n\nLicense\n=======\n\nJuriscraper is licensed under the permissive BSD license.\n\n|forthebadge made-with-python|\n\n.. |forthebadge made-with-python| image:: http://ForTheBadge.com/images/badges/made-with-python.svg\n    :target: https://www.python.org/\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "An API to scrape American court websites for metadata.",
    "version": "2.5.94",
    "project_urls": {
        "Homepage": "https://github.com/freelawproject/juriscraper"
    },
    "split_keywords": [
        "scraping",
        "legal",
        "pacer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7626039ee7d4dd4fcae776403f211adb0dbbf579f257239eab392ddc878ede33",
                "md5": "9c02fd082ff9eb46536a1c7afb21ba07",
                "sha256": "2b4019f556173065b090217ba9360c77e4ebe5655323f9c09efab2c3e158f262"
            },
            "downloads": -1,
            "filename": "juriscraper-2.5.94-py27-none-any.whl",
            "has_sig": false,
            "md5_digest": "9c02fd082ff9eb46536a1c7afb21ba07",
            "packagetype": "bdist_wheel",
            "python_version": "py27",
            "requires_python": null,
            "size": 450520,
            "upload_time": "2024-02-13T19:28:30",
            "upload_time_iso_8601": "2024-02-13T19:28:30.252265Z",
            "url": "https://files.pythonhosted.org/packages/76/26/039ee7d4dd4fcae776403f211adb0dbbf579f257239eab392ddc878ede33/juriscraper-2.5.94-py27-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5ee2f739db4b4b50a6df2e60e52b598ef46b745c3fd2c43d19246a6c01493367",
                "md5": "bc8031502304d0be4a44a4d9931d5a5f",
                "sha256": "272f12be3600277012afff44e2c75e0a1b1704c95352761b81099bd754050ffd"
            },
            "downloads": -1,
            "filename": "juriscraper-2.5.94.tar.gz",
            "has_sig": false,
            "md5_digest": "bc8031502304d0be4a44a4d9931d5a5f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 264628,
            "upload_time": "2024-02-13T19:28:38",
            "upload_time_iso_8601": "2024-02-13T19:28:38.159808Z",
            "url": "https://files.pythonhosted.org/packages/5e/e2/f739db4b4b50a6df2e60e52b598ef46b745c3fd2c43d19246a6c01493367/juriscraper-2.5.94.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-13 19:28:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "freelawproject",
    "github_project": "juriscraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "juriscraper"
}
        
Elapsed time: 0.20274s