===========
Formasaurus
===========
.. image:: https://img.shields.io/pypi/v/Formasaurus.svg
:target: https://pypi.python.org/pypi/Formasaurus
:alt: PyPI Version
.. image:: https://github.com/scrapinghub/Formasaurus/workflows/tox/badge.svg
:target: https://github.com/scrapinghub/Formasaurus/actions
:alt: Build Status
.. image:: http://codecov.io/github/scrapinghub/Formasaurus/coverage.svg?branch=master
:target: http://codecov.io/github/scrapinghub/Formasaurus?branch=master
:alt: Code Coverage
.. image:: https://readthedocs.org/projects/formasaurus/badge/?version=latest
:target: http://formasaurus.readthedocs.org/en/latest/?badge=latest
:alt: Documentation
.. description starts
Formasaurus is a Python package that tells you the type of an HTML form
and its fields using machine learning.
It can detect if a form is a login, search, registration, password recovery,
"join mailing list", contact, order form or something else, which field
is a password field and which is a search query, etc.
License is MIT.
.. description ends
Check `docs <http://formasaurus.readthedocs.org/>`_ for more.
Changes
=======
0.9.0 (2024-06-19)
------------------
* Dropped official support for Python 3.7 and lower, and added official support
for Python 3.8 and higher.
* Added support for the latest versions of all dependencies, and upgraded
minimum supported versions of dependencies as follows:
* ``docopt``: ``0.4.0``
* ``requests``: ``1.0.0``
* ``tldextract``: ``1.2.0``
* ``with-deps`` extra dependencies:
* ``joblib``: ``1.2.0``
* ``lxml``: ``4.4.1``
* ``lxml-html-clean``: ``0.1.0``
* ``scikit-learn``: ``0.18.0`` → ``0.24.0``
* ``scipy``: ``1.5.1``
* ``sklearn-crfsuite``: ``0.3.1`` → ``0.5.1``
* https://github.com/scrapinghub/formasaurus is the new code repository,
replacing https://github.com/TeamHG-Memex/Formasaurus.
* Updated the CI configuration and development tooling.
0.8.1 (2018-07-02)
------------------
* Support for scikit-learn < 0.18 is dropped;
* Formasaurus is no longer tested with Python 3.3;
* tests are fixed to account for upstream changes; Python 3.6 build is enabled.
0.8 (2016-05-24)
----------------
* more annotated data for captchas;
* ``formasaurus init`` command which trains & caches the model.
0.7.2 (2016-04-18)
------------------
* pip bug with ``pip install formasaurus[with-deps]`` is worked around;
it should work now as ``pip install formasaurus[with_deps]``.
0.7.1 (2016-03-03)
------------------
* fixed API documentation at readthedocs.org
0.7 (2016-03-03)
----------------
* more annotated data;
* new ``form_classes`` and ``field_classes`` attributes of FormFieldClassifer;
* more robust web page encoding detection in ``formasaurus.utils.download``;
* bug fixes in annotation widgets;
0.6 (2016-01-27)
----------------
* ``fields=False`` argument is supported in ``formasaurus.extract_forms``,
``formasaurus.classify``, ``formasaurus.classify_proba`` functions and
in related ``FormFieldClassifier`` methods. It allows to avoid predicting
form field types if they are not needed.
* ``formasaurus.classifiers.instance()`` is renamed to
``formasaurus.classifiers.get_instance()``.
* Bias is no longer regularized for form type classifier.
0.5 (2015-12-19)
----------------
This is a major backwards-incompatible release.
* Formasaurus now can detect field types, not only form types;
* API is changed - check the updated documentation;
* there are more form types detected;
* evaluation setup is improved;
* annotation UI is rewritten using IPython widgets;
* more training data is added.
0.2 (2015-08-10)
----------------
* Python 3 support;
* fixed model auto-creation.
0.1 (2015-07-09)
----------------
Initial release.
Raw data
{
"_id": null,
"home_page": "https://github.com/scrapinghub/Formasaurus",
"name": "formasaurus",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Mikhail Korobov",
"author_email": "kmike84@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/29/fc/9c311a6c75ec7d609cd2b580a461594a6344c71d7e89ac000db97319d54b/formasaurus-0.9.0.tar.gz",
"platform": null,
"description": "===========\nFormasaurus\n===========\n\n.. image:: https://img.shields.io/pypi/v/Formasaurus.svg\n :target: https://pypi.python.org/pypi/Formasaurus\n :alt: PyPI Version\n\n.. image:: https://github.com/scrapinghub/Formasaurus/workflows/tox/badge.svg\n :target: https://github.com/scrapinghub/Formasaurus/actions\n :alt: Build Status\n\n.. image:: http://codecov.io/github/scrapinghub/Formasaurus/coverage.svg?branch=master\n :target: http://codecov.io/github/scrapinghub/Formasaurus?branch=master\n :alt: Code Coverage\n\n.. image:: https://readthedocs.org/projects/formasaurus/badge/?version=latest\n :target: http://formasaurus.readthedocs.org/en/latest/?badge=latest\n :alt: Documentation\n\n.. description starts\n\nFormasaurus is a Python package that tells you the type of an HTML form\nand its fields using machine learning.\n\nIt can detect if a form is a login, search, registration, password recovery,\n\"join mailing list\", contact, order form or something else, which field\nis a password field and which is a search query, etc.\n\nLicense is MIT.\n\n.. description ends\n\nCheck `docs <http://formasaurus.readthedocs.org/>`_ for more.\n\n\nChanges\n=======\n\n0.9.0 (2024-06-19)\n------------------\n\n* Dropped official support for Python 3.7 and lower, and added official support\n for Python 3.8 and higher.\n\n* Added support for the latest versions of all dependencies, and upgraded\n minimum supported versions of dependencies as follows:\n\n * ``docopt``: ``0.4.0``\n\n * ``requests``: ``1.0.0``\n\n * ``tldextract``: ``1.2.0``\n\n * ``with-deps`` extra dependencies:\n\n * ``joblib``: ``1.2.0``\n\n * ``lxml``: ``4.4.1``\n\n * ``lxml-html-clean``: ``0.1.0``\n\n * ``scikit-learn``: ``0.18.0`` \u2192 ``0.24.0``\n\n * ``scipy``: ``1.5.1``\n\n * ``sklearn-crfsuite``: ``0.3.1`` \u2192 ``0.5.1``\n\n* https://github.com/scrapinghub/formasaurus is the new code repository,\n replacing https://github.com/TeamHG-Memex/Formasaurus.\n\n* Updated the CI configuration and development tooling.\n\n0.8.1 (2018-07-02)\n------------------\n\n* Support for scikit-learn < 0.18 is dropped;\n* Formasaurus is no longer tested with Python 3.3;\n* tests are fixed to account for upstream changes; Python 3.6 build is enabled.\n\n0.8 (2016-05-24)\n----------------\n\n* more annotated data for captchas;\n* ``formasaurus init`` command which trains & caches the model.\n\n0.7.2 (2016-04-18)\n------------------\n\n* pip bug with ``pip install formasaurus[with-deps]`` is worked around;\n it should work now as ``pip install formasaurus[with_deps]``.\n\n0.7.1 (2016-03-03)\n------------------\n\n* fixed API documentation at readthedocs.org\n\n0.7 (2016-03-03)\n----------------\n\n* more annotated data;\n* new ``form_classes`` and ``field_classes`` attributes of FormFieldClassifer;\n* more robust web page encoding detection in ``formasaurus.utils.download``;\n* bug fixes in annotation widgets;\n\n0.6 (2016-01-27)\n----------------\n\n* ``fields=False`` argument is supported in ``formasaurus.extract_forms``,\n ``formasaurus.classify``, ``formasaurus.classify_proba`` functions and\n in related ``FormFieldClassifier`` methods. It allows to avoid predicting\n form field types if they are not needed.\n* ``formasaurus.classifiers.instance()`` is renamed to\n ``formasaurus.classifiers.get_instance()``.\n* Bias is no longer regularized for form type classifier.\n\n0.5 (2015-12-19)\n----------------\n\nThis is a major backwards-incompatible release.\n\n* Formasaurus now can detect field types, not only form types;\n* API is changed - check the updated documentation;\n* there are more form types detected;\n* evaluation setup is improved;\n* annotation UI is rewritten using IPython widgets;\n* more training data is added.\n\n0.2 (2015-08-10)\n----------------\n\n* Python 3 support;\n* fixed model auto-creation.\n\n0.1 (2015-07-09)\n----------------\n\nInitial release.\n",
"bugtrack_url": null,
"license": "MIT license",
"summary": "Formasaurus tells you the types of HTML forms and their fields using machine learning",
"version": "0.9.0",
"project_urls": {
"Homepage": "https://github.com/scrapinghub/Formasaurus"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3a247358509d067e31b0d9d183799853e56d102b3e0e0c12904789769e623dc8",
"md5": "6ebd1f62cf1886e6f2441bc7528f48f3",
"sha256": "a6faac9701359ea601ea139c0de51b345a0d4b9fc54d1e85a2d6dc08bd1120ea"
},
"downloads": -1,
"filename": "formasaurus-0.9.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "6ebd1f62cf1886e6f2441bc7528f48f3",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 13670057,
"upload_time": "2024-06-19T14:21:12",
"upload_time_iso_8601": "2024-06-19T14:21:12.083230Z",
"url": "https://files.pythonhosted.org/packages/3a/24/7358509d067e31b0d9d183799853e56d102b3e0e0c12904789769e623dc8/formasaurus-0.9.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "29fc9c311a6c75ec7d609cd2b580a461594a6344c71d7e89ac000db97319d54b",
"md5": "7abfc6fa8eadf4b787736fbc83db2f40",
"sha256": "b54a49a1c4274bdffa4f53b35e46eae2d300517285e9db811dd5124b95ca5b19"
},
"downloads": -1,
"filename": "formasaurus-0.9.0.tar.gz",
"has_sig": false,
"md5_digest": "7abfc6fa8eadf4b787736fbc83db2f40",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 12565311,
"upload_time": "2024-06-19T14:21:21",
"upload_time_iso_8601": "2024-06-19T14:21:21.470008Z",
"url": "https://files.pythonhosted.org/packages/29/fc/9c311a6c75ec7d609cd2b580a461594a6344c71d7e89ac000db97319d54b/formasaurus-0.9.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-19 14:21:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "scrapinghub",
"github_project": "Formasaurus",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"tox": true,
"lcname": "formasaurus"
}