<p>
<img src="https://raw.githubusercontent.com/biolds/sosse/main/se/static/se/logo.svg" width="64" align="right">
<a href="https://gitlab.com/biolds1/sosse/" alt="Gitlab code coverage" style="text-decoration: none">
<img src="https://img.shields.io/gitlab/pipeline-coverage/biolds1/sosse?branch=main&style=flat-square">
</a>
<a href="https://gitlab.com/biolds1/sosse/-/pipelines" alt="Gitlab pipeline status" style="text-decoration: none">
<img src="https://img.shields.io/gitlab/pipeline-status/biolds1/sosse?branch=main&style=flat-square">
</a>
<a href="https://sosse.readthedocs.io/en/stable/" alt="Documentation" style="text-decoration: none">
<img src="https://img.shields.io/readthedocs/sosse?style=flat-square">
</a>
<a href="https://discord.gg/Vt9cMf7BGK" alt="Discord" style="text-decoration: none">
<img src="https://img.shields.io/discord/1102142186423844944?style=flat-square&color=%235865f2">
</a>
<a href="https://gitlab.com/biolds1/sosse/-/blob/main/LICENSE" alt="License" style="text-decoration: none">
<img src="https://img.shields.io/gitlab/license/biolds1/sosse?style=flat-square">
</a>
</p>
# SOSSE π¦¦
SOSSE (Selenium Open Source Search Engine) is a web archiving software, crawler, and search engine. Itβs hosted on both
[GitLab](https://gitlab.com/biolds1/sosse) and [GitHub](https://github.com/biolds/sosse). Feel free to use either platform to
submit feature requests, bug reports, merge requests, or [start a discussion](https://github.com/biolds/sosse/discussions).
## Key Features
- π **Web Page Search**: Search the content of web pages, including dynamically rendered ones, with advanced queries.
([doc](https://sosse.readthedocs.io/en/stable/guides/search.html))
- π **Recurring Crawling**: Crawl pages at fixed intervals or adapt the rate based on content changes.
([doc](https://sosse.readthedocs.io/en/stable/crawl/policies.html))
- π **Web Page Archiving**: Archive HTML content, adjust links for local use, download required assets, and support
dynamic content. ([doc](https://sosse.readthedocs.io/en/stable/guides/archive.html))
- π **File Downloads**: Batch download binary files from web pages.
([doc](https://sosse.readthedocs.io/en/stable/guides/download.html))
- π **Atom Feeds**: Generate content feeds for websites that donβt have them, or receive updates when a new page
containing a keyword is published.
([doc](https://sosse.readthedocs.io/en/stable/guides/feed_website_monitor.html))
- π **Authentication**: The crawler can authenticate to access private pages and retrieve content.
([doc](https://sosse.readthedocs.io/en/stable/guides/authentication.html))
- π₯ **Permissions**: Admins can configure crawlers and view statistics, while authenticated users can search or do so anonymously.
([doc](https://sosse.readthedocs.io/en/stable/permissions.html))
- π€ **Search Features**: Includes private search history ([doc](https://sosse.readthedocs.io/en/stable/user/history.html)),
and external search engine shortcuts ([doc](https://sosse.readthedocs.io/en/stable/user/shortcuts.html)), etc.
Explore the π [documentation](https://sosse.readthedocs.io/en/stable/index.html) and check out some
π· [screenshots](https://sosse.readthedocs.io/en/stable/screenshots.html).
SOSSE is written in Python and is distributed under the [GNU AGPLv3 license](https://www.gnu.org/licenses/agpl-3.0.en.html). It uses browser-based crawling with [Mozilla Firefox](https://www.mozilla.org/firefox/) or
[Google Chromium](https://www.chromium.org/Home) alongside [Selenium](https://www.selenium.dev/) to index pages that rely on JavaScript. For faster crawling, [Requests](https://docs.python-requests.org/en/latest/index.html) can also be used. SOSSE is lightweight and uses
[PostgreSQL](https://www.postgresql.org/) for data storage.
## Try It Out
To quickly try the latest version with Docker:
```
docker run -p 8005:80 biolds/sosse:latest
```
Then, open [http://127.0.0.1:8005/](http://127.0.0.1:8005/) and log in with the username `admin` and password `admin`.
For persistence of Docker data or alternative installation methods, please refer to the [installation guide](https://sosse.readthedocs.io/en/stable/install.html).
## Stay Connected
Join the [Discord server](https://discord.gg/Vt9cMf7BGK) to get help, share ideas, or discuss SOSSE!
Raw data
{
"_id": null,
"home_page": null,
"name": "sosse",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "search engine, crawler",
"author": null,
"author_email": "Laurent Defert <laurent_defert@yahoo.fr>",
"download_url": "https://files.pythonhosted.org/packages/85/da/edba792da0002d3c442b1b1b4c351dbf50e02b6e196a6317822e0cdcf2c3/sosse-1.12.0.tar.gz",
"platform": null,
"description": "<p>\n <img src=\"https://raw.githubusercontent.com/biolds/sosse/main/se/static/se/logo.svg\" width=\"64\" align=\"right\">\n <a href=\"https://gitlab.com/biolds1/sosse/\" alt=\"Gitlab code coverage\" style=\"text-decoration: none\">\n <img src=\"https://img.shields.io/gitlab/pipeline-coverage/biolds1/sosse?branch=main&style=flat-square\">\n </a>\n <a href=\"https://gitlab.com/biolds1/sosse/-/pipelines\" alt=\"Gitlab pipeline status\" style=\"text-decoration: none\">\n <img src=\"https://img.shields.io/gitlab/pipeline-status/biolds1/sosse?branch=main&style=flat-square\">\n </a>\n <a href=\"https://sosse.readthedocs.io/en/stable/\" alt=\"Documentation\" style=\"text-decoration: none\">\n <img src=\"https://img.shields.io/readthedocs/sosse?style=flat-square\">\n </a>\n <a href=\"https://discord.gg/Vt9cMf7BGK\" alt=\"Discord\" style=\"text-decoration: none\">\n <img src=\"https://img.shields.io/discord/1102142186423844944?style=flat-square&color=%235865f2\">\n </a>\n <a href=\"https://gitlab.com/biolds1/sosse/-/blob/main/LICENSE\" alt=\"License\" style=\"text-decoration: none\">\n <img src=\"https://img.shields.io/gitlab/license/biolds1/sosse?style=flat-square\">\n </a>\n</p>\n\n# SOSSE \ud83e\udda6\n\nSOSSE (Selenium Open Source Search Engine) is a web archiving software, crawler, and search engine. It\u2019s hosted on both\n[GitLab](https://gitlab.com/biolds1/sosse) and [GitHub](https://github.com/biolds/sosse). Feel free to use either platform to\nsubmit feature requests, bug reports, merge requests, or [start a discussion](https://github.com/biolds/sosse/discussions).\n\n## Key Features\n\n- \ud83c\udf0d **Web Page Search**: Search the content of web pages, including dynamically rendered ones, with advanced queries.\n ([doc](https://sosse.readthedocs.io/en/stable/guides/search.html))\n\n- \ud83d\udd51 **Recurring Crawling**: Crawl pages at fixed intervals or adapt the rate based on content changes.\n ([doc](https://sosse.readthedocs.io/en/stable/crawl/policies.html))\n\n- \ud83d\udd16 **Web Page Archiving**: Archive HTML content, adjust links for local use, download required assets, and support\n dynamic content. ([doc](https://sosse.readthedocs.io/en/stable/guides/archive.html))\n\n- \ud83d\udcc2 **File Downloads**: Batch download binary files from web pages.\n ([doc](https://sosse.readthedocs.io/en/stable/guides/download.html))\n\n- \ud83d\udd14 **Atom Feeds**: Generate content feeds for websites that don\u2019t have them, or receive updates when a new page\n containing a keyword is published.\n ([doc](https://sosse.readthedocs.io/en/stable/guides/feed_website_monitor.html))\n\n- \ud83d\udd12 **Authentication**: The crawler can authenticate to access private pages and retrieve content.\n ([doc](https://sosse.readthedocs.io/en/stable/guides/authentication.html))\n\n- \ud83d\udc65 **Permissions**: Admins can configure crawlers and view statistics, while authenticated users can search or do so anonymously.\n ([doc](https://sosse.readthedocs.io/en/stable/permissions.html))\n\n- \ud83d\udc64 **Search Features**: Includes private search history ([doc](https://sosse.readthedocs.io/en/stable/user/history.html)),\n and external search engine shortcuts ([doc](https://sosse.readthedocs.io/en/stable/user/shortcuts.html)), etc.\n\nExplore the \ud83d\udcda [documentation](https://sosse.readthedocs.io/en/stable/index.html) and check out some\n\ud83d\udcf7 [screenshots](https://sosse.readthedocs.io/en/stable/screenshots.html).\n\nSOSSE is written in Python and is distributed under the [GNU AGPLv3 license](https://www.gnu.org/licenses/agpl-3.0.en.html). It uses browser-based crawling with [Mozilla Firefox](https://www.mozilla.org/firefox/) or\n[Google Chromium](https://www.chromium.org/Home) alongside [Selenium](https://www.selenium.dev/) to index pages that rely on JavaScript. For faster crawling, [Requests](https://docs.python-requests.org/en/latest/index.html) can also be used. SOSSE is lightweight and uses\n[PostgreSQL](https://www.postgresql.org/) for data storage.\n\n## Try It Out\n\nTo quickly try the latest version with Docker:\n\n```\ndocker run -p 8005:80 biolds/sosse:latest\n```\n\nThen, open [http://127.0.0.1:8005/](http://127.0.0.1:8005/) and log in with the username `admin` and password `admin`.\n\nFor persistence of Docker data or alternative installation methods, please refer to the [installation guide](https://sosse.readthedocs.io/en/stable/install.html).\n\n## Stay Connected\n\nJoin the [Discord server](https://discord.gg/Vt9cMf7BGK) to get help, share ideas, or discuss SOSSE!\n",
"bugtrack_url": null,
"license": "GNU Affero General Public License v3",
"summary": "Selenium Open Source Search Engine",
"version": "1.12.0",
"project_urls": null,
"split_keywords": [
"search engine",
" crawler"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4dd0499c067ed31f0eccce1f10ea8899325734b30637252babb843cb5a117b6d",
"md5": "15c0fbdf90414baa72760800e83f825a",
"sha256": "e9d0ae18029b2cfc11e7b259a580689d554277753e47331f1d105ec3f4d6199f"
},
"downloads": -1,
"filename": "sosse-1.12.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "15c0fbdf90414baa72760800e83f825a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 3454794,
"upload_time": "2025-01-31T07:20:32",
"upload_time_iso_8601": "2025-01-31T07:20:32.683479Z",
"url": "https://files.pythonhosted.org/packages/4d/d0/499c067ed31f0eccce1f10ea8899325734b30637252babb843cb5a117b6d/sosse-1.12.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "85daedba792da0002d3c442b1b1b4c351dbf50e02b6e196a6317822e0cdcf2c3",
"md5": "e3a17e4977487255159fad4eb3da399d",
"sha256": "92091c33ff020c088066bd9891118aff44ba02a904b47a5fc1d4150d0f1e12ce"
},
"downloads": -1,
"filename": "sosse-1.12.0.tar.gz",
"has_sig": false,
"md5_digest": "e3a17e4977487255159fad4eb3da399d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 4454354,
"upload_time": "2025-01-31T07:20:36",
"upload_time_iso_8601": "2025-01-31T07:20:36.102267Z",
"url": "https://files.pythonhosted.org/packages/85/da/edba792da0002d3c442b1b1b4c351dbf50e02b6e196a6317822e0cdcf2c3/sosse-1.12.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-31 07:20:36",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "sosse"
}