sosse


Namesosse JSON
Version 1.12.0 PyPI version JSON
download
home_pageNone
SummarySelenium Open Source Search Engine
upload_time2025-01-31 07:20:36
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseGNU Affero General Public License v3
keywords search engine crawler
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p>
  <img src="https://raw.githubusercontent.com/biolds/sosse/main/se/static/se/logo.svg" width="64" align="right">
  <a href="https://gitlab.com/biolds1/sosse/" alt="Gitlab code coverage" style="text-decoration: none">
    <img src="https://img.shields.io/gitlab/pipeline-coverage/biolds1/sosse?branch=main&style=flat-square">
  </a>
  <a href="https://gitlab.com/biolds1/sosse/-/pipelines" alt="Gitlab pipeline status" style="text-decoration: none">
    <img src="https://img.shields.io/gitlab/pipeline-status/biolds1/sosse?branch=main&style=flat-square">
  </a>
  <a href="https://sosse.readthedocs.io/en/stable/" alt="Documentation" style="text-decoration: none">
    <img src="https://img.shields.io/readthedocs/sosse?style=flat-square">
  </a>
  <a href="https://discord.gg/Vt9cMf7BGK" alt="Discord" style="text-decoration: none">
    <img src="https://img.shields.io/discord/1102142186423844944?style=flat-square&color=%235865f2">
  </a>
  <a href="https://gitlab.com/biolds1/sosse/-/blob/main/LICENSE" alt="License" style="text-decoration: none">
    <img src="https://img.shields.io/gitlab/license/biolds1/sosse?style=flat-square">
  </a>
</p>

# SOSSE 🦦

SOSSE (Selenium Open Source Search Engine) is a web archiving software, crawler, and search engine. It’s hosted on both
[GitLab](https://gitlab.com/biolds1/sosse) and [GitHub](https://github.com/biolds/sosse). Feel free to use either platform to
submit feature requests, bug reports, merge requests, or [start a discussion](https://github.com/biolds/sosse/discussions).

## Key Features

- 🌍 **Web Page Search**: Search the content of web pages, including dynamically rendered ones, with advanced queries.
  ([doc](https://sosse.readthedocs.io/en/stable/guides/search.html))

- πŸ•‘ **Recurring Crawling**: Crawl pages at fixed intervals or adapt the rate based on content changes.
  ([doc](https://sosse.readthedocs.io/en/stable/crawl/policies.html))

- πŸ”– **Web Page Archiving**: Archive HTML content, adjust links for local use, download required assets, and support
  dynamic content. ([doc](https://sosse.readthedocs.io/en/stable/guides/archive.html))

- πŸ“‚ **File Downloads**: Batch download binary files from web pages.
  ([doc](https://sosse.readthedocs.io/en/stable/guides/download.html))

- πŸ”” **Atom Feeds**: Generate content feeds for websites that don’t have them, or receive updates when a new page
  containing a keyword is published.
  ([doc](https://sosse.readthedocs.io/en/stable/guides/feed_website_monitor.html))

- πŸ”’ **Authentication**: The crawler can authenticate to access private pages and retrieve content.
  ([doc](https://sosse.readthedocs.io/en/stable/guides/authentication.html))

- πŸ‘₯ **Permissions**: Admins can configure crawlers and view statistics, while authenticated users can search or do so anonymously.
  ([doc](https://sosse.readthedocs.io/en/stable/permissions.html))

- πŸ‘€ **Search Features**: Includes private search history ([doc](https://sosse.readthedocs.io/en/stable/user/history.html)),
  and external search engine shortcuts ([doc](https://sosse.readthedocs.io/en/stable/user/shortcuts.html)), etc.

Explore the πŸ“š [documentation](https://sosse.readthedocs.io/en/stable/index.html) and check out some
πŸ“· [screenshots](https://sosse.readthedocs.io/en/stable/screenshots.html).

SOSSE is written in Python and is distributed under the [GNU AGPLv3 license](https://www.gnu.org/licenses/agpl-3.0.en.html). It uses browser-based crawling with [Mozilla Firefox](https://www.mozilla.org/firefox/) or
[Google Chromium](https://www.chromium.org/Home) alongside [Selenium](https://www.selenium.dev/) to index pages that rely on JavaScript. For faster crawling, [Requests](https://docs.python-requests.org/en/latest/index.html) can also be used. SOSSE is lightweight and uses
[PostgreSQL](https://www.postgresql.org/) for data storage.

## Try It Out

To quickly try the latest version with Docker:

```
docker run -p 8005:80 biolds/sosse:latest
```

Then, open [http://127.0.0.1:8005/](http://127.0.0.1:8005/) and log in with the username `admin` and password `admin`.

For persistence of Docker data or alternative installation methods, please refer to the [installation guide](https://sosse.readthedocs.io/en/stable/install.html).

## Stay Connected

Join the [Discord server](https://discord.gg/Vt9cMf7BGK) to get help, share ideas, or discuss SOSSE!

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sosse",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "search engine, crawler",
    "author": null,
    "author_email": "Laurent Defert <laurent_defert@yahoo.fr>",
    "download_url": "https://files.pythonhosted.org/packages/85/da/edba792da0002d3c442b1b1b4c351dbf50e02b6e196a6317822e0cdcf2c3/sosse-1.12.0.tar.gz",
    "platform": null,
    "description": "<p>\n  <img src=\"https://raw.githubusercontent.com/biolds/sosse/main/se/static/se/logo.svg\" width=\"64\" align=\"right\">\n  <a href=\"https://gitlab.com/biolds1/sosse/\" alt=\"Gitlab code coverage\" style=\"text-decoration: none\">\n    <img src=\"https://img.shields.io/gitlab/pipeline-coverage/biolds1/sosse?branch=main&style=flat-square\">\n  </a>\n  <a href=\"https://gitlab.com/biolds1/sosse/-/pipelines\" alt=\"Gitlab pipeline status\" style=\"text-decoration: none\">\n    <img src=\"https://img.shields.io/gitlab/pipeline-status/biolds1/sosse?branch=main&style=flat-square\">\n  </a>\n  <a href=\"https://sosse.readthedocs.io/en/stable/\" alt=\"Documentation\" style=\"text-decoration: none\">\n    <img src=\"https://img.shields.io/readthedocs/sosse?style=flat-square\">\n  </a>\n  <a href=\"https://discord.gg/Vt9cMf7BGK\" alt=\"Discord\" style=\"text-decoration: none\">\n    <img src=\"https://img.shields.io/discord/1102142186423844944?style=flat-square&color=%235865f2\">\n  </a>\n  <a href=\"https://gitlab.com/biolds1/sosse/-/blob/main/LICENSE\" alt=\"License\" style=\"text-decoration: none\">\n    <img src=\"https://img.shields.io/gitlab/license/biolds1/sosse?style=flat-square\">\n  </a>\n</p>\n\n# SOSSE \ud83e\udda6\n\nSOSSE (Selenium Open Source Search Engine) is a web archiving software, crawler, and search engine. It\u2019s hosted on both\n[GitLab](https://gitlab.com/biolds1/sosse) and [GitHub](https://github.com/biolds/sosse). Feel free to use either platform to\nsubmit feature requests, bug reports, merge requests, or [start a discussion](https://github.com/biolds/sosse/discussions).\n\n## Key Features\n\n- \ud83c\udf0d **Web Page Search**: Search the content of web pages, including dynamically rendered ones, with advanced queries.\n  ([doc](https://sosse.readthedocs.io/en/stable/guides/search.html))\n\n- \ud83d\udd51 **Recurring Crawling**: Crawl pages at fixed intervals or adapt the rate based on content changes.\n  ([doc](https://sosse.readthedocs.io/en/stable/crawl/policies.html))\n\n- \ud83d\udd16 **Web Page Archiving**: Archive HTML content, adjust links for local use, download required assets, and support\n  dynamic content. ([doc](https://sosse.readthedocs.io/en/stable/guides/archive.html))\n\n- \ud83d\udcc2 **File Downloads**: Batch download binary files from web pages.\n  ([doc](https://sosse.readthedocs.io/en/stable/guides/download.html))\n\n- \ud83d\udd14 **Atom Feeds**: Generate content feeds for websites that don\u2019t have them, or receive updates when a new page\n  containing a keyword is published.\n  ([doc](https://sosse.readthedocs.io/en/stable/guides/feed_website_monitor.html))\n\n- \ud83d\udd12 **Authentication**: The crawler can authenticate to access private pages and retrieve content.\n  ([doc](https://sosse.readthedocs.io/en/stable/guides/authentication.html))\n\n- \ud83d\udc65 **Permissions**: Admins can configure crawlers and view statistics, while authenticated users can search or do so anonymously.\n  ([doc](https://sosse.readthedocs.io/en/stable/permissions.html))\n\n- \ud83d\udc64 **Search Features**: Includes private search history ([doc](https://sosse.readthedocs.io/en/stable/user/history.html)),\n  and external search engine shortcuts ([doc](https://sosse.readthedocs.io/en/stable/user/shortcuts.html)), etc.\n\nExplore the \ud83d\udcda [documentation](https://sosse.readthedocs.io/en/stable/index.html) and check out some\n\ud83d\udcf7 [screenshots](https://sosse.readthedocs.io/en/stable/screenshots.html).\n\nSOSSE is written in Python and is distributed under the [GNU AGPLv3 license](https://www.gnu.org/licenses/agpl-3.0.en.html). It uses browser-based crawling with [Mozilla Firefox](https://www.mozilla.org/firefox/) or\n[Google Chromium](https://www.chromium.org/Home) alongside [Selenium](https://www.selenium.dev/) to index pages that rely on JavaScript. For faster crawling, [Requests](https://docs.python-requests.org/en/latest/index.html) can also be used. SOSSE is lightweight and uses\n[PostgreSQL](https://www.postgresql.org/) for data storage.\n\n## Try It Out\n\nTo quickly try the latest version with Docker:\n\n```\ndocker run -p 8005:80 biolds/sosse:latest\n```\n\nThen, open [http://127.0.0.1:8005/](http://127.0.0.1:8005/) and log in with the username `admin` and password `admin`.\n\nFor persistence of Docker data or alternative installation methods, please refer to the [installation guide](https://sosse.readthedocs.io/en/stable/install.html).\n\n## Stay Connected\n\nJoin the [Discord server](https://discord.gg/Vt9cMf7BGK) to get help, share ideas, or discuss SOSSE!\n",
    "bugtrack_url": null,
    "license": "GNU Affero General Public License v3",
    "summary": "Selenium Open Source Search Engine",
    "version": "1.12.0",
    "project_urls": null,
    "split_keywords": [
        "search engine",
        " crawler"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4dd0499c067ed31f0eccce1f10ea8899325734b30637252babb843cb5a117b6d",
                "md5": "15c0fbdf90414baa72760800e83f825a",
                "sha256": "e9d0ae18029b2cfc11e7b259a580689d554277753e47331f1d105ec3f4d6199f"
            },
            "downloads": -1,
            "filename": "sosse-1.12.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "15c0fbdf90414baa72760800e83f825a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 3454794,
            "upload_time": "2025-01-31T07:20:32",
            "upload_time_iso_8601": "2025-01-31T07:20:32.683479Z",
            "url": "https://files.pythonhosted.org/packages/4d/d0/499c067ed31f0eccce1f10ea8899325734b30637252babb843cb5a117b6d/sosse-1.12.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "85daedba792da0002d3c442b1b1b4c351dbf50e02b6e196a6317822e0cdcf2c3",
                "md5": "e3a17e4977487255159fad4eb3da399d",
                "sha256": "92091c33ff020c088066bd9891118aff44ba02a904b47a5fc1d4150d0f1e12ce"
            },
            "downloads": -1,
            "filename": "sosse-1.12.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e3a17e4977487255159fad4eb3da399d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 4454354,
            "upload_time": "2025-01-31T07:20:36",
            "upload_time_iso_8601": "2025-01-31T07:20:36.102267Z",
            "url": "https://files.pythonhosted.org/packages/85/da/edba792da0002d3c442b1b1b4c351dbf50e02b6e196a6317822e0cdcf2c3/sosse-1.12.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-31 07:20:36",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "sosse"
}
        
Elapsed time: 0.40498s