formasaurus


Nameformasaurus JSON
Version 0.9.0 PyPI version JSON
download
home_pagehttps://github.com/scrapinghub/Formasaurus
SummaryFormasaurus tells you the types of HTML forms and their fields using machine learning
upload_time2024-06-19 14:21:21
maintainerNone
docs_urlNone
authorMikhail Korobov
requires_pythonNone
licenseMIT license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            ===========
Formasaurus
===========

.. image:: https://img.shields.io/pypi/v/Formasaurus.svg
   :target: https://pypi.python.org/pypi/Formasaurus
   :alt: PyPI Version

.. image:: https://github.com/scrapinghub/Formasaurus/workflows/tox/badge.svg
   :target: https://github.com/scrapinghub/Formasaurus/actions
   :alt: Build Status

.. image:: http://codecov.io/github/scrapinghub/Formasaurus/coverage.svg?branch=master
   :target: http://codecov.io/github/scrapinghub/Formasaurus?branch=master
   :alt: Code Coverage

.. image:: https://readthedocs.org/projects/formasaurus/badge/?version=latest
   :target: http://formasaurus.readthedocs.org/en/latest/?badge=latest
   :alt: Documentation

.. description starts

Formasaurus is a Python package that tells you the type of an HTML form
and its fields using machine learning.

It can detect if a form is a login, search, registration, password recovery,
"join mailing list", contact, order form or something else, which field
is a password field and which is a search query, etc.

License is MIT.

.. description ends

Check `docs <http://formasaurus.readthedocs.org/>`_ for more.


Changes
=======

0.9.0 (2024-06-19)
------------------

* Dropped official support for Python 3.7 and lower, and added official support
  for Python 3.8 and higher.

* Added support for the latest versions of all dependencies, and upgraded
  minimum supported versions of dependencies as follows:

  * ``docopt``: ``0.4.0``

  * ``requests``: ``1.0.0``

  * ``tldextract``: ``1.2.0``

  * ``with-deps`` extra dependencies:

    * ``joblib``: ``1.2.0``

    * ``lxml``: ``4.4.1``

    * ``lxml-html-clean``: ``0.1.0``

    * ``scikit-learn``: ``0.18.0`` → ``0.24.0``

    * ``scipy``: ``1.5.1``

    * ``sklearn-crfsuite``: ``0.3.1`` → ``0.5.1``

* https://github.com/scrapinghub/formasaurus is the new code repository,
  replacing https://github.com/TeamHG-Memex/Formasaurus.

* Updated the CI configuration and development tooling.

0.8.1 (2018-07-02)
------------------

* Support for scikit-learn < 0.18 is dropped;
* Formasaurus is no longer tested with Python 3.3;
* tests are fixed to account for upstream changes; Python 3.6 build is enabled.

0.8 (2016-05-24)
----------------

* more annotated data for captchas;
* ``formasaurus init`` command which trains & caches the model.

0.7.2 (2016-04-18)
------------------

* pip bug with ``pip install formasaurus[with-deps]`` is worked around;
  it should work now as ``pip install formasaurus[with_deps]``.

0.7.1 (2016-03-03)
------------------

* fixed API documentation at readthedocs.org

0.7 (2016-03-03)
----------------

* more annotated data;
* new ``form_classes`` and ``field_classes`` attributes of FormFieldClassifer;
* more robust web page encoding detection in ``formasaurus.utils.download``;
* bug fixes in annotation widgets;

0.6 (2016-01-27)
----------------

* ``fields=False`` argument is supported in ``formasaurus.extract_forms``,
  ``formasaurus.classify``, ``formasaurus.classify_proba`` functions and
  in related ``FormFieldClassifier`` methods. It allows to avoid predicting
  form field types if they are not needed.
* ``formasaurus.classifiers.instance()`` is renamed to
  ``formasaurus.classifiers.get_instance()``.
* Bias is no longer regularized for form type classifier.

0.5 (2015-12-19)
----------------

This is a major backwards-incompatible release.

* Formasaurus now can detect field types, not only form types;
* API is changed - check the updated documentation;
* there are more form types detected;
* evaluation setup is improved;
* annotation UI is rewritten using IPython widgets;
* more training data is added.

0.2 (2015-08-10)
----------------

* Python 3 support;
* fixed model auto-creation.

0.1 (2015-07-09)
----------------

Initial release.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/scrapinghub/Formasaurus",
    "name": "formasaurus",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Mikhail Korobov",
    "author_email": "kmike84@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/29/fc/9c311a6c75ec7d609cd2b580a461594a6344c71d7e89ac000db97319d54b/formasaurus-0.9.0.tar.gz",
    "platform": null,
    "description": "===========\nFormasaurus\n===========\n\n.. image:: https://img.shields.io/pypi/v/Formasaurus.svg\n   :target: https://pypi.python.org/pypi/Formasaurus\n   :alt: PyPI Version\n\n.. image:: https://github.com/scrapinghub/Formasaurus/workflows/tox/badge.svg\n   :target: https://github.com/scrapinghub/Formasaurus/actions\n   :alt: Build Status\n\n.. image:: http://codecov.io/github/scrapinghub/Formasaurus/coverage.svg?branch=master\n   :target: http://codecov.io/github/scrapinghub/Formasaurus?branch=master\n   :alt: Code Coverage\n\n.. image:: https://readthedocs.org/projects/formasaurus/badge/?version=latest\n   :target: http://formasaurus.readthedocs.org/en/latest/?badge=latest\n   :alt: Documentation\n\n.. description starts\n\nFormasaurus is a Python package that tells you the type of an HTML form\nand its fields using machine learning.\n\nIt can detect if a form is a login, search, registration, password recovery,\n\"join mailing list\", contact, order form or something else, which field\nis a password field and which is a search query, etc.\n\nLicense is MIT.\n\n.. description ends\n\nCheck `docs <http://formasaurus.readthedocs.org/>`_ for more.\n\n\nChanges\n=======\n\n0.9.0 (2024-06-19)\n------------------\n\n* Dropped official support for Python 3.7 and lower, and added official support\n  for Python 3.8 and higher.\n\n* Added support for the latest versions of all dependencies, and upgraded\n  minimum supported versions of dependencies as follows:\n\n  * ``docopt``: ``0.4.0``\n\n  * ``requests``: ``1.0.0``\n\n  * ``tldextract``: ``1.2.0``\n\n  * ``with-deps`` extra dependencies:\n\n    * ``joblib``: ``1.2.0``\n\n    * ``lxml``: ``4.4.1``\n\n    * ``lxml-html-clean``: ``0.1.0``\n\n    * ``scikit-learn``: ``0.18.0`` \u2192 ``0.24.0``\n\n    * ``scipy``: ``1.5.1``\n\n    * ``sklearn-crfsuite``: ``0.3.1`` \u2192 ``0.5.1``\n\n* https://github.com/scrapinghub/formasaurus is the new code repository,\n  replacing https://github.com/TeamHG-Memex/Formasaurus.\n\n* Updated the CI configuration and development tooling.\n\n0.8.1 (2018-07-02)\n------------------\n\n* Support for scikit-learn < 0.18 is dropped;\n* Formasaurus is no longer tested with Python 3.3;\n* tests are fixed to account for upstream changes; Python 3.6 build is enabled.\n\n0.8 (2016-05-24)\n----------------\n\n* more annotated data for captchas;\n* ``formasaurus init`` command which trains & caches the model.\n\n0.7.2 (2016-04-18)\n------------------\n\n* pip bug with ``pip install formasaurus[with-deps]`` is worked around;\n  it should work now as ``pip install formasaurus[with_deps]``.\n\n0.7.1 (2016-03-03)\n------------------\n\n* fixed API documentation at readthedocs.org\n\n0.7 (2016-03-03)\n----------------\n\n* more annotated data;\n* new ``form_classes`` and ``field_classes`` attributes of FormFieldClassifer;\n* more robust web page encoding detection in ``formasaurus.utils.download``;\n* bug fixes in annotation widgets;\n\n0.6 (2016-01-27)\n----------------\n\n* ``fields=False`` argument is supported in ``formasaurus.extract_forms``,\n  ``formasaurus.classify``, ``formasaurus.classify_proba`` functions and\n  in related ``FormFieldClassifier`` methods. It allows to avoid predicting\n  form field types if they are not needed.\n* ``formasaurus.classifiers.instance()`` is renamed to\n  ``formasaurus.classifiers.get_instance()``.\n* Bias is no longer regularized for form type classifier.\n\n0.5 (2015-12-19)\n----------------\n\nThis is a major backwards-incompatible release.\n\n* Formasaurus now can detect field types, not only form types;\n* API is changed - check the updated documentation;\n* there are more form types detected;\n* evaluation setup is improved;\n* annotation UI is rewritten using IPython widgets;\n* more training data is added.\n\n0.2 (2015-08-10)\n----------------\n\n* Python 3 support;\n* fixed model auto-creation.\n\n0.1 (2015-07-09)\n----------------\n\nInitial release.\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Formasaurus tells you the types of HTML forms and their fields using machine learning",
    "version": "0.9.0",
    "project_urls": {
        "Homepage": "https://github.com/scrapinghub/Formasaurus"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3a247358509d067e31b0d9d183799853e56d102b3e0e0c12904789769e623dc8",
                "md5": "6ebd1f62cf1886e6f2441bc7528f48f3",
                "sha256": "a6faac9701359ea601ea139c0de51b345a0d4b9fc54d1e85a2d6dc08bd1120ea"
            },
            "downloads": -1,
            "filename": "formasaurus-0.9.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6ebd1f62cf1886e6f2441bc7528f48f3",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 13670057,
            "upload_time": "2024-06-19T14:21:12",
            "upload_time_iso_8601": "2024-06-19T14:21:12.083230Z",
            "url": "https://files.pythonhosted.org/packages/3a/24/7358509d067e31b0d9d183799853e56d102b3e0e0c12904789769e623dc8/formasaurus-0.9.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "29fc9c311a6c75ec7d609cd2b580a461594a6344c71d7e89ac000db97319d54b",
                "md5": "7abfc6fa8eadf4b787736fbc83db2f40",
                "sha256": "b54a49a1c4274bdffa4f53b35e46eae2d300517285e9db811dd5124b95ca5b19"
            },
            "downloads": -1,
            "filename": "formasaurus-0.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7abfc6fa8eadf4b787736fbc83db2f40",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12565311,
            "upload_time": "2024-06-19T14:21:21",
            "upload_time_iso_8601": "2024-06-19T14:21:21.470008Z",
            "url": "https://files.pythonhosted.org/packages/29/fc/9c311a6c75ec7d609cd2b580a461594a6344c71d7e89ac000db97319d54b/formasaurus-0.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-19 14:21:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "scrapinghub",
    "github_project": "Formasaurus",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "tox": true,
    "lcname": "formasaurus"
}
        
Elapsed time: 0.35430s