<h1 align="center">
<img src="https://raw.githubusercontent.com/DanchukIvan/byteflows/main/docs/icons/logo.png" alt="byteflows" width="200px">
<br>
</h1>
# **Simple data workflows**
Byteflows is a microframework that makes it easier to retrieve information from APIs and regular websites.
Byteflows, unlike complex projects like Scrapy or simple libraries like BeautifulSoup, is extremely easy to use due to the unification of the information extraction process and at the same time has quite a wide range of functionality.
## **Why use Byteflows?**
* 🚀 Byteflows is built on top of asyncio and asynchronous libraries, which significantly speeds up your code in the context of I/O operations.
* 🔁 With Byteflows, there is no need to continuously customize the data scraping process. From project to project, you will have a single, transparent architecture.
* ![s3](https://raw.githubusercontent.com/DanchukIvan/byteflows/main/docs/img/amazons3.svg) ![kafka](https://raw.githubusercontent.com/DanchukIvan/byteflows/main/docs/img/apachekafka.svg) ![psql](https://raw.githubusercontent.com/DanchukIvan/byteflows/main/docs/img/postgresql.svg) ![clickhouse](https://raw.githubusercontent.com/DanchukIvan/byteflows/main/docs/img/clickhouse.svg) Byteflows allows you to route data to any backend: s3-like storage, database, network file system, broker/message bus, etc.
* ⚙️ Byteflows allows the user to choose what to do with the data: hold it in memory until a certain critical value accumulates, or immediately send it to the backend, perform pre-processing, or leave it as is.
## **Installation**
Installation is as simple as:
`
pip install byteflows
`
## **Dependencies**
>The list of core Byteflows dependencies is represented by the following libraries:
>
> * aiohttp
> * aioitertools
> * fsspec
> * more-itertools
> * regex
> * uvloop (for Unix platforms)
> * yarl
> * dateparser
## **More information about the project**
You can learn more about Byteflows in the [project documentation](https://danchukivan.github.io/Byteflows/), including the API and Tutorial sections. Changes can be monitored in the Changelog section.
## **Project status**
Byteflows is currently a deep alpha project with an unstable API and limited functionality. Its use in production is **strictly not recommended**.
Raw data
{
"_id": null,
"home_page": null,
"name": "byteflows",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": "Danchuk Ivan <ivan.s.danchuk@gmail.com>",
"keywords": "scraping, web scraping, asyncio, web crawler, api crawler, api scraping",
"author": null,
"author_email": "Danchuk Ivan <ivan.s.danchuk@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/03/7b/c9f4e7c59326b93bc9a382a91d9b6716d12979dfaf2098ea07b99fc3a598/byteflows-0.2.1a1.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">\n <img src=\"https://raw.githubusercontent.com/DanchukIvan/byteflows/main/docs/icons/logo.png\" alt=\"byteflows\" width=\"200px\">\n <br>\n</h1>\n\n# **Simple data workflows**\n\nByteflows is a microframework that makes it easier to retrieve information from APIs and regular websites.\n\nByteflows, unlike complex projects like Scrapy or simple libraries like BeautifulSoup, is extremely easy to use due to the unification of the information extraction process and at the same time has quite a wide range of functionality.\n\n## **Why use Byteflows?**\n\n* \ud83d\ude80 Byteflows is built on top of asyncio and asynchronous libraries, which significantly speeds up your code in the context of I/O operations.\n\n* \ud83d\udd01 With Byteflows, there is no need to continuously customize the data scraping process. From project to project, you will have a single, transparent architecture.\n\n* ![s3](https://raw.githubusercontent.com/DanchukIvan/byteflows/main/docs/img/amazons3.svg) ![kafka](https://raw.githubusercontent.com/DanchukIvan/byteflows/main/docs/img/apachekafka.svg) ![psql](https://raw.githubusercontent.com/DanchukIvan/byteflows/main/docs/img/postgresql.svg) ![clickhouse](https://raw.githubusercontent.com/DanchukIvan/byteflows/main/docs/img/clickhouse.svg) Byteflows allows you to route data to any backend: s3-like storage, database, network file system, broker/message bus, etc.\n\n* \u2699\ufe0f Byteflows allows the user to choose what to do with the data: hold it in memory until a certain critical value accumulates, or immediately send it to the backend, perform pre-processing, or leave it as is.\n\n## **Installation**\n\nInstallation is as simple as:\n\n`\npip install byteflows\n`\n\n## **Dependencies**\n\n>The list of core Byteflows dependencies is represented by the following libraries:\n>\n> * aiohttp\n> * aioitertools\n> * fsspec\n> * more-itertools\n> * regex\n> * uvloop (for Unix platforms)\n> * yarl\n> * dateparser\n\n## **More information about the project**\n\nYou can learn more about Byteflows in the [project documentation](https://danchukivan.github.io/Byteflows/), including the API and Tutorial sections. Changes can be monitored in the Changelog section.\n\n## **Project status**\n\nByteflows is currently a deep alpha project with an unstable API and limited functionality. Its use in production is **strictly not recommended**.\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Simple scrape as SELECT * FROM ANYTHING in network",
"version": "0.2.1a1",
"project_urls": {
"Documentation": "https://danchukivan.github.io/byteflows/",
"Repository": "https://github.com/DanchukIvan/byteflows.git"
},
"split_keywords": [
"scraping",
" web scraping",
" asyncio",
" web crawler",
" api crawler",
" api scraping"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f714b6356f6f36ffa7b8473c01fd897753ec46290d68ea8abfa50bf5774f26dc",
"md5": "6390fd1238d64dde37ba37b654f261c0",
"sha256": "4ff6bae3ee7bfcf7a4ba43c856612e8fe8a96dd049e30bdc6d1cbc0e707223b9"
},
"downloads": -1,
"filename": "byteflows-0.2.1a1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6390fd1238d64dde37ba37b654f261c0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 52530,
"upload_time": "2024-08-21T08:40:38",
"upload_time_iso_8601": "2024-08-21T08:40:38.159315Z",
"url": "https://files.pythonhosted.org/packages/f7/14/b6356f6f36ffa7b8473c01fd897753ec46290d68ea8abfa50bf5774f26dc/byteflows-0.2.1a1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "037bc9f4e7c59326b93bc9a382a91d9b6716d12979dfaf2098ea07b99fc3a598",
"md5": "96c3eb416329a8456e6cbcaa39c86fd7",
"sha256": "eab08bfa2f28dc22b6cfa366195580d074f67e0b78b92762c4fd0bc96c2e2bbd"
},
"downloads": -1,
"filename": "byteflows-0.2.1a1.tar.gz",
"has_sig": false,
"md5_digest": "96c3eb416329a8456e6cbcaa39c86fd7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 42282,
"upload_time": "2024-08-21T08:40:39",
"upload_time_iso_8601": "2024-08-21T08:40:39.535800Z",
"url": "https://files.pythonhosted.org/packages/03/7b/c9f4e7c59326b93bc9a382a91d9b6716d12979dfaf2098ea07b99fc3a598/byteflows-0.2.1a1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-21 08:40:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DanchukIvan",
"github_project": "byteflows",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "aiohttp",
"specs": [
[
">=",
"3.8"
],
[
"<",
"4.0.0"
]
]
},
{
"name": "aioitertools",
"specs": [
[
"<",
"1.0.0"
],
[
">=",
"0.11.0"
]
]
},
{
"name": "asyncio",
"specs": []
},
{
"name": "dateparser",
"specs": [
[
">=",
"1.2.0"
]
]
},
{
"name": "fsspec",
"specs": [
[
"<",
"2024.0.0"
],
[
">=",
"2023.12.2"
]
]
},
{
"name": "more-itertools",
"specs": []
},
{
"name": "regex",
"specs": [
[
">=",
"2023.12.25"
],
[
"<",
"2024.0.0"
]
]
},
{
"name": "uvloop",
"specs": []
},
{
"name": "yarl",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.9.4"
]
]
}
],
"lcname": "byteflows"
}