torequests

Name	torequests JSON
Version	6.0.0 JSON
	download
home_page	https://github.com/ClericPy/torequests
Summary	Async wrapper for requests / aiohttp, and some python crawler toolkits. Let synchronization code enjoy the performance of asynchronous programming. Read more: https://github.com/ClericPy/torequests.
upload_time	2022-07-03 08:20:15
maintainer
docs_url	None
author	ClericPy
requires_python	>=3.7
license	MIT License
keywords	requests async multi-thread aiohttp asyncio asynchronous
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI
coveralls test coverage	No coveralls.

            # [torequests](https://github.com/ClericPy/torequests) [![PyPI](https://img.shields.io/pypi/v/torequests?style=plastic)](https://pypi.org/project/torequests/)[![GitHub Workflow Status](https://img.shields.io/github/workflow/status/clericpy/torequests/Python%20package?style=plastic)](https://github.com/ClericPy/torequests/actions?query=workflow%3A%22Python+package%22)![PyPI - Wheel](https://img.shields.io/pypi/wheel/torequests?style=plastic)![PyPI - Python Version](https://img.shields.io/pypi/pyversions/torequests?style=plastic)![PyPI - Downloads](https://img.shields.io/pypi/dm/torequests?style=plastic)![PyPI - License](https://img.shields.io/pypi/l/torequests?style=plastic)

<!-- [![Downloads](https://pepy.tech/badge/torequests)](https://pepy.tech/project/torequests) -->

Briefly speaking, requests & aiohttp wrapper for asynchronous programming rookie, to shorten the code quantity. 

# Install:

> pip install torequests -U

### requirements

python3.7+ (follow requests settings)

    | requests			# python
    | aiohttp >= 3.6.2 	# python3

# Features

Inspired by [tomorrow](https://github.com/madisonmay/Tomorrow), to make async-coding brief & smooth, compatible for win32 / python 2&3.

* Convert any funtions into async-mode with concurrent.futures.
* Wrap requests lib with concurrent.futures to enjoy the concurrent performance.
* Simplify aiohttp, make it `requests-like`.
* Add FailureException to check the context of request failure.
* Add frequency control, prevent from anti-spider based on frequency check.
* Add retry for request. (support `response_validator`)
* Alenty of common crawler utils.
* Compatible with [uniparser]( https://github.com/ClericPy/uniparser )

# Getting started

## 1. Async, threads - make functions asynchronous

```python
from torequests import threads, Async
import time


@threads(5)
def test1(n):
    time.sleep(n)
    return 'test1 ok'


def test2(n):
    time.sleep(n)
    return 'test1 ok'


start = int(time.time())
# here async_test2 is same as test1
async_test2 = Async(test2)
future = test1(1)
# future run in non blocking thread pool
print(future, ', %s s passed' % (int(time.time() - start)))
# call future.x will block main thread and get the future.result()
print(future.x, ', %s s passed' % (int(time.time() - start)))
# output:
# <NewFuture at 0x34b1d30 state=running> , 0 s passed
# test1 ok , 1 s passed
```
## 2. tPool - thread pool for async-requests

```python
from torequests.main import tPool
from torequests.logs import print_info
from torequests.utils import ttime

req = tPool()
test_url = 'http://p.3.cn'


def callback(task):
    return (len(task.content),
            print_info(
                ttime(task.task_start_time), '-> ', ttime(task.task_end_time),
                ',', round(task.task_cost_time * 1000), 'ms'))


ss = [req.get(test_url, retry=2, callback=callback) for i in range(3)]
# or [i.x for i in ss]
req.x
# task.cx returns the callback_result
ss = [i.cx for i in ss]
print_info(ss)
# [2020-01-21 16:56:23] temp_code.py(11): 2020-01-21 16:56:23 ->  2020-01-21 16:56:23 , 54 ms
# [2020-01-21 16:56:23] temp_code.py(11): 2020-01-21 16:56:23 ->  2020-01-21 16:56:23 , 55 ms
# [2020-01-21 16:56:23] temp_code.py(11): 2020-01-21 16:56:23 ->  2020-01-21 16:56:23 , 57 ms
# [2020-01-21 16:56:23] temp_code.py(18): [(612, None), (612, None), (612, None)]

```

### 2.1 Performance.

```verilog
[3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]]: 2000 / 2000, 100.0%, cost 4.2121 seconds, 475.0 qps.
```

```python
import timeit
from torequests import tPool
import sys

req = tPool()
start_time = timeit.default_timer()
oks = 0
total = 2000
# concurrent all the tasks
tasks = [req.get('http://127.0.0.1:8080') for num in range(total)]
for task in tasks:
    r = task.x
    if r.text == 'ok':
        oks += 1
end_time = timeit.default_timer()
succ_rate = oks * 100 / total
cost_time = round(end_time - start_time, 4)
version = sys.version
qps = round(total / cost_time, 0)
print(
    '[{version}]: {oks} / {total}, {succ_rate}%, cost {cost_time} seconds, {qps} qps.'
    .format(**vars()))
```

## 3. Requests - aiohttp-wrapper

```python
from torequests.dummy import Requests
from torequests.logs import print_info
from torequests.utils import ttime
req = Requests(frequencies={'p.3.cn': (2, 1)})


def callback(task):
    return (len(task.content),
            print_info(
                ttime(task.task_start_time), '->', ttime(task.task_end_time),
                ',', round(task.task_cost_time * 1000), 'ms'))


ss = [
    req.get('http://p.3.cn', retry=1, timeout=5, callback=callback)
    for i in range(4)
]
req.x  # this line can be removed
ss = [i.cx for i in ss]
print_info(ss)
# [2020-01-21 18:15:33] temp_code.py(11): 2020-01-21 18:15:32 -> 2020-01-21 18:15:33 , 1060 ms
# [2020-01-21 18:15:33] temp_code.py(11): 2020-01-21 18:15:32 -> 2020-01-21 18:15:33 , 1061 ms
# [2020-01-21 18:15:34] temp_code.py(11): 2020-01-21 18:15:32 -> 2020-01-21 18:15:34 , 2081 ms
# [2020-01-21 18:15:34] temp_code.py(11): 2020-01-21 18:15:32 -> 2020-01-21 18:15:34 , 2081 ms
# [2020-01-21 18:15:34] temp_code.py(20): [(612, None), (612, None), (612, None), (612, None)]
```

### 3.1 Performance.

>  aiohttp is almostly 3 times faster than requests + ThreadPoolExecutor, even without uvloop on windows10.

```verilog
3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]
sync_test: 2000 / 2000, 100.0%, cost 1.2965 seconds, 1543.0 qps.
async_test: 2000 / 2000, 100.0%, cost 1.2834 seconds, 1558.0 qps.

Sync usage has little performance lost.
```

```python
import asyncio
import sys
import timeit

from torequests.dummy import Requests


def sync_test():
    with Requests() as req:
        start_time = timeit.default_timer()
        oks = 0
        total = 2000
        # concurrent all the tasks
        tasks = [req.get('http://127.0.0.1:8080') for num in range(total)]
        req.x
        for task in tasks:
            if task.text == 'ok':
                oks += 1
        end_time = timeit.default_timer()
        succ_rate = oks * 100 / total
        cost_time = round(end_time - start_time, 4)
        qps = round(total / cost_time, 0)
        print(
            f'sync_test: {oks} / {total}, {succ_rate}%, cost {cost_time} seconds, {qps} qps.'
        )


async def async_test():
    req = Requests()
    async with Requests() as req:
        start_time = timeit.default_timer()
        oks = 0
        total = 2000
        # concurrent all the tasks
        tasks = [req.get('http://127.0.0.1:8080') for num in range(total)]
        for task in tasks:
            r = await task
            if r.text == 'ok':
                oks += 1
        end_time = timeit.default_timer()
        succ_rate = oks * 100 / total
        cost_time = round(end_time - start_time, 4)
        qps = round(total / cost_time, 0)
        print(
            f'async_test: {oks} / {total}, {succ_rate}%, cost {cost_time} seconds, {qps} qps.'
        )


if __name__ == "__main__":
    print(sys.version)
    sync_test()
    loop = asyncio.get_event_loop()
    loop.run_until_complete(async_test())

```

### 3.2 using torequests.dummy.Requests in async environment.

```python
import asyncio

import uvicorn
from starlette.applications import Starlette
from starlette.responses import PlainTextResponse
from torequests.dummy import Requests

api = Starlette()
api.req = Requests()


@api.route('/')
async def index(req):
    # await for Response or FailureException
    r = await api.req.get('http://p.3.cn', timeout=(1, 1))
    return PlainTextResponse(r.content)


if __name__ == "__main__":
    uvicorn.run(api)

```

### 3.3 using torequests.aiohttp_dummy.Requests for better performance.

> Removes the frequency_controller & sync usage (task.x) & compatible args of requests for good performance, but remains retry / callback / referer_info.

```python
# -*- coding: utf-8 -*-

from torequests.aiohttp_dummy import Requests
from asyncio import get_event_loop


async def main():
    url = 'http://httpbin.org/json'
    # simple usage, catch_exception defaults to True
    async with Requests() as req:
        r = await req.get(url, retry=1, encoding='u8', timeout=3)
        print(r.ok)
        print(r.url)
        print(r.content[:5])
        print(r.status_code)
        print(r.text[:5])
        print(r.json()['slideshow']['title'])
        print(r.is_redirect)
        print(r.encoding)
    # or use `req = Requests()` without context
    req = Requests()
    r = await req.get(
        url,
        referer_info=123,
        callback=lambda r: r.referer_info + 1,
    )
    print(r)  # 124


get_event_loop().run_until_complete(main())

```


## 4. utils: some useful crawler toolkits

    ClipboardWatcher: watch your clipboard changing.
    Counts: counter while every time being called.
    Null: will return self when be called, and alway be False.
    Regex: Regex Mapper for string -> regex -> object.
    Saver: simple object persistent toolkit with pickle/json.
    Timer: timing tool.
    UA: some common User-Agents for crawler.
    curlparse: translate curl-string into dict of request.
    md5: str(obj) -> md5_string.
    print_mem: show the proc-mem-cost with psutil, use this only for lazinesssss.
    ptime: %Y-%m-%d %H:%M:%S -> timestamp.
    ttime: timestamp -> %Y-%m-%d %H:%M:%S
    slice_by_size: slice a sequence into chunks, return as a generation of chunks with size.
    slice_into_pieces: slice a sequence into n pieces, return a generation of n pieces.
    timeago: show the seconds as human-readable.
    unique: unique one sequence.
    find_one: use regex like Javascript to find one string with index(like [0], [1]).
    find_jsons: find all json substrings from a mass string contains list / dict JSON.
    ...


# Benchmarks
> Benchmark of concurrent is not very necessary and accurate, just take a look.

## Test Server: golang net/http

Source code: [go_test_server.go](https://github.com/ClericPy/torequests/blob/master/benchmarks/go_test_server.go)

## Client Testing Code

Source code: [py_test_client.py](https://github.com/ClericPy/torequests/blob/master/benchmarks/py_test_client.py)

## Test Result on Windows (without uvloop)

> Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz (12 CPUs), ~2.2GHz; 12 logical CPUs.

```verilog
Windows-10-10.0.18362-SP0
3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]
['aiohttp(3.6.2)', 'torequests(4.9.14)', 'requests(2.23.0)', 'httpx(0.12.1)']
================================================================================
test_aiohttp             : 2000 / 2000 = 100.0%, cost 1.289s, 1551 qps, 100% standard.
test_dummy               : 2000 / 2000 = 100.0%, cost 1.397s, 1432 qps,  92% standard.
test_aiohttp_dummy       : 2000 / 2000 = 100.0%, cost 1.341s, 1491 qps,  96% standard.
test_httpx               : 2000 / 2000 = 100.0%, cost 4.065s,  492 qps,  32% standard.
test_tPool               : 2000 / 2000 = 100.0%, cost 5.161s,  388 qps,  25% standard.
```

## Test Result on Linux (with uvloop)

> Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz; 1 logical CPU.

```verilog
Linux-4.15.0-13-generic-x86_64-with-Ubuntu-18.04-bionic
3.7.3 (default, Apr  3 2019, 19:16:38)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]
['aiohttp(3.6.2)', 'torequests(4.9.14)', 'requests(2.23.0)', 'httpx(0.12.1)']
================================================================================
test_aiohttp             : 2000 / 2000 = 100.0%, cost 0.652s, 3068 qps, 100% standard.
test_dummy               : 2000 / 2000 = 100.0%, cost 0.802s, 2495 qps,  81% standard.
test_aiohttp_dummy       : 2000 / 2000 = 100.0%, cost 0.726s, 2754 qps,  90% standard.
test_httpx               : 2000 / 2000 = 100.0%, cost 2.349s,  852 qps,  28% standard.
test_tPool               : 2000 / 2000 = 100.0%, cost 2.708s,  739 qps,  24% standard.
```

## Conclusion

1. **aiohttp** is the fastest, for the cython utils
   1. aiohttp's qps is 3068 on 1 cpu linux with uvloop, near to golang's 3300.
2. **torequests.dummy.Requests** based on **aiohttp**.
   1. about **8%** performance lost **without** uvloop.
   2. about **20%** performance lost with uvloop.
3. **torequests.aiohttp_dummy.Requests** based on **aiohttp**.
   1. less than **4%** performance lost **without** uvloop.
   2. less than **10%** performance lost with uvloop.
4. **httpx** is faster than **requests + threading,** but not very obviously on linux.

### PS:

**golang - net/http** 's performance

```
`2000 / 2000, 100.00 %, cost 0.26 seconds, 7567.38 qps` on windows (12 cpus)
`2000 / 2000, 100.00 %, cost 0.61 seconds, 3302.48 qps` on linux (1 cpu)
   1. slower than windows, because golang benefit from multiple CPU count
   2. linux 1 cpu, but windows is 12
```

**golang http client testing code:**

[go_test_client.go](https://github.com/ClericPy/torequests/blob/master/benchmarks/go_test_client.go)

# Documentation
> [Document & Usage](https://torequests.readthedocs.io/en/latest/)

# License
> [MIT license](LICENSE)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ClericPy/torequests",
    "name": "torequests",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "requests,async,multi-thread,aiohttp,asyncio,asynchronous",
    "author": "ClericPy",
    "author_email": "clericpy@gmail.com",
    "download_url": "",
    "platform": "any",
    "description": "# [torequests](https://github.com/ClericPy/torequests) [![PyPI](https://img.shields.io/pypi/v/torequests?style=plastic)](https://pypi.org/project/torequests/)[![GitHub Workflow Status](https://img.shields.io/github/workflow/status/clericpy/torequests/Python%20package?style=plastic)](https://github.com/ClericPy/torequests/actions?query=workflow%3A%22Python+package%22)![PyPI - Wheel](https://img.shields.io/pypi/wheel/torequests?style=plastic)![PyPI - Python Version](https://img.shields.io/pypi/pyversions/torequests?style=plastic)![PyPI - Downloads](https://img.shields.io/pypi/dm/torequests?style=plastic)![PyPI - License](https://img.shields.io/pypi/l/torequests?style=plastic)\n\n<!-- [![Downloads](https://pepy.tech/badge/torequests)](https://pepy.tech/project/torequests) -->\n\nBriefly speaking, requests & aiohttp wrapper for asynchronous programming rookie, to shorten the code quantity. \n\n# Install:\n\n> pip install torequests -U\n\n### requirements\n\npython3.7+ (follow requests settings)\n\n    | requests\t\t\t# python\n    | aiohttp >= 3.6.2 \t# python3\n\n# Features\n\nInspired by [tomorrow](https://github.com/madisonmay/Tomorrow), to make async-coding brief & smooth, compatible for win32 / python 2&3.\n\n* Convert any funtions into async-mode with concurrent.futures.\n* Wrap requests lib with concurrent.futures to enjoy the concurrent performance.\n* Simplify aiohttp, make it `requests-like`.\n* Add FailureException to check the context of request failure.\n* Add frequency control, prevent from anti-spider based on frequency check.\n* Add retry for request. (support `response_validator`)\n* Alenty of common crawler utils.\n* Compatible with [uniparser]( https://github.com/ClericPy/uniparser )\n\n# Getting started\n\n## 1. Async, threads - make functions asynchronous\n\n```python\nfrom torequests import threads, Async\nimport time\n\n\n@threads(5)\ndef test1(n):\n    time.sleep(n)\n    return 'test1 ok'\n\n\ndef test2(n):\n    time.sleep(n)\n    return 'test1 ok'\n\n\nstart = int(time.time())\n# here async_test2 is same as test1\nasync_test2 = Async(test2)\nfuture = test1(1)\n# future run in non blocking thread pool\nprint(future, ', %s s passed' % (int(time.time() - start)))\n# call future.x will block main thread and get the future.result()\nprint(future.x, ', %s s passed' % (int(time.time() - start)))\n# output:\n# <NewFuture at 0x34b1d30 state=running> , 0 s passed\n# test1 ok , 1 s passed\n```\n## 2. tPool - thread pool for async-requests\n\n```python\nfrom torequests.main import tPool\nfrom torequests.logs import print_info\nfrom torequests.utils import ttime\n\nreq = tPool()\ntest_url = 'http://p.3.cn'\n\n\ndef callback(task):\n    return (len(task.content),\n            print_info(\n                ttime(task.task_start_time), '-> ', ttime(task.task_end_time),\n                ',', round(task.task_cost_time * 1000), 'ms'))\n\n\nss = [req.get(test_url, retry=2, callback=callback) for i in range(3)]\n# or [i.x for i in ss]\nreq.x\n# task.cx returns the callback_result\nss = [i.cx for i in ss]\nprint_info(ss)\n# [2020-01-21 16:56:23] temp_code.py(11): 2020-01-21 16:56:23 ->  2020-01-21 16:56:23 , 54 ms\n# [2020-01-21 16:56:23] temp_code.py(11): 2020-01-21 16:56:23 ->  2020-01-21 16:56:23 , 55 ms\n# [2020-01-21 16:56:23] temp_code.py(11): 2020-01-21 16:56:23 ->  2020-01-21 16:56:23 , 57 ms\n# [2020-01-21 16:56:23] temp_code.py(18): [(612, None), (612, None), (612, None)]\n\n```\n\n### 2.1 Performance.\n\n```verilog\n[3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]]: 2000 / 2000, 100.0%, cost 4.2121 seconds, 475.0 qps.\n```\n\n```python\nimport timeit\nfrom torequests import tPool\nimport sys\n\nreq = tPool()\nstart_time = timeit.default_timer()\noks = 0\ntotal = 2000\n# concurrent all the tasks\ntasks = [req.get('http://127.0.0.1:8080') for num in range(total)]\nfor task in tasks:\n    r = task.x\n    if r.text == 'ok':\n        oks += 1\nend_time = timeit.default_timer()\nsucc_rate = oks * 100 / total\ncost_time = round(end_time - start_time, 4)\nversion = sys.version\nqps = round(total / cost_time, 0)\nprint(\n    '[{version}]: {oks} / {total}, {succ_rate}%, cost {cost_time} seconds, {qps} qps.'\n    .format(**vars()))\n```\n\n## 3. Requests - aiohttp-wrapper\n\n```python\nfrom torequests.dummy import Requests\nfrom torequests.logs import print_info\nfrom torequests.utils import ttime\nreq = Requests(frequencies={'p.3.cn': (2, 1)})\n\n\ndef callback(task):\n    return (len(task.content),\n            print_info(\n                ttime(task.task_start_time), '->', ttime(task.task_end_time),\n                ',', round(task.task_cost_time * 1000), 'ms'))\n\n\nss = [\n    req.get('http://p.3.cn', retry=1, timeout=5, callback=callback)\n    for i in range(4)\n]\nreq.x  # this line can be removed\nss = [i.cx for i in ss]\nprint_info(ss)\n# [2020-01-21 18:15:33] temp_code.py(11): 2020-01-21 18:15:32 -> 2020-01-21 18:15:33 , 1060 ms\n# [2020-01-21 18:15:33] temp_code.py(11): 2020-01-21 18:15:32 -> 2020-01-21 18:15:33 , 1061 ms\n# [2020-01-21 18:15:34] temp_code.py(11): 2020-01-21 18:15:32 -> 2020-01-21 18:15:34 , 2081 ms\n# [2020-01-21 18:15:34] temp_code.py(11): 2020-01-21 18:15:32 -> 2020-01-21 18:15:34 , 2081 ms\n# [2020-01-21 18:15:34] temp_code.py(20): [(612, None), (612, None), (612, None), (612, None)]\n```\n\n### 3.1 Performance.\n\n>  aiohttp is almostly 3 times faster than requests + ThreadPoolExecutor, even without uvloop on windows10.\n\n```verilog\n3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]\nsync_test: 2000 / 2000, 100.0%, cost 1.2965 seconds, 1543.0 qps.\nasync_test: 2000 / 2000, 100.0%, cost 1.2834 seconds, 1558.0 qps.\n\nSync usage has little performance lost.\n```\n\n```python\nimport asyncio\nimport sys\nimport timeit\n\nfrom torequests.dummy import Requests\n\n\ndef sync_test():\n    with Requests() as req:\n        start_time = timeit.default_timer()\n        oks = 0\n        total = 2000\n        # concurrent all the tasks\n        tasks = [req.get('http://127.0.0.1:8080') for num in range(total)]\n        req.x\n        for task in tasks:\n            if task.text == 'ok':\n                oks += 1\n        end_time = timeit.default_timer()\n        succ_rate = oks * 100 / total\n        cost_time = round(end_time - start_time, 4)\n        qps = round(total / cost_time, 0)\n        print(\n            f'sync_test: {oks} / {total}, {succ_rate}%, cost {cost_time} seconds, {qps} qps.'\n        )\n\n\nasync def async_test():\n    req = Requests()\n    async with Requests() as req:\n        start_time = timeit.default_timer()\n        oks = 0\n        total = 2000\n        # concurrent all the tasks\n        tasks = [req.get('http://127.0.0.1:8080') for num in range(total)]\n        for task in tasks:\n            r = await task\n            if r.text == 'ok':\n                oks += 1\n        end_time = timeit.default_timer()\n        succ_rate = oks * 100 / total\n        cost_time = round(end_time - start_time, 4)\n        qps = round(total / cost_time, 0)\n        print(\n            f'async_test: {oks} / {total}, {succ_rate}%, cost {cost_time} seconds, {qps} qps.'\n        )\n\n\nif __name__ == \"__main__\":\n    print(sys.version)\n    sync_test()\n    loop = asyncio.get_event_loop()\n    loop.run_until_complete(async_test())\n\n```\n\n### 3.2 using torequests.dummy.Requests in async environment.\n\n```python\nimport asyncio\n\nimport uvicorn\nfrom starlette.applications import Starlette\nfrom starlette.responses import PlainTextResponse\nfrom torequests.dummy import Requests\n\napi = Starlette()\napi.req = Requests()\n\n\n@api.route('/')\nasync def index(req):\n    # await for Response or FailureException\n    r = await api.req.get('http://p.3.cn', timeout=(1, 1))\n    return PlainTextResponse(r.content)\n\n\nif __name__ == \"__main__\":\n    uvicorn.run(api)\n\n```\n\n### 3.3 using torequests.aiohttp_dummy.Requests for better performance.\n\n> Removes the frequency_controller & sync usage (task.x) & compatible args of requests for good performance, but remains retry / callback / referer_info.\n\n```python\n# -*- coding: utf-8 -*-\n\nfrom torequests.aiohttp_dummy import Requests\nfrom asyncio import get_event_loop\n\n\nasync def main():\n    url = 'http://httpbin.org/json'\n    # simple usage, catch_exception defaults to True\n    async with Requests() as req:\n        r = await req.get(url, retry=1, encoding='u8', timeout=3)\n        print(r.ok)\n        print(r.url)\n        print(r.content[:5])\n        print(r.status_code)\n        print(r.text[:5])\n        print(r.json()['slideshow']['title'])\n        print(r.is_redirect)\n        print(r.encoding)\n    # or use `req = Requests()` without context\n    req = Requests()\n    r = await req.get(\n        url,\n        referer_info=123,\n        callback=lambda r: r.referer_info + 1,\n    )\n    print(r)  # 124\n\n\nget_event_loop().run_until_complete(main())\n\n```\n\n\n## 4. utils: some useful crawler toolkits\n\n    ClipboardWatcher: watch your clipboard changing.\n    Counts: counter while every time being called.\n    Null: will return self when be called, and alway be False.\n    Regex: Regex Mapper for string -> regex -> object.\n    Saver: simple object persistent toolkit with pickle/json.\n    Timer: timing tool.\n    UA: some common User-Agents for crawler.\n    curlparse: translate curl-string into dict of request.\n    md5: str(obj) -> md5_string.\n    print_mem: show the proc-mem-cost with psutil, use this only for lazinesssss.\n    ptime: %Y-%m-%d %H:%M:%S -> timestamp.\n    ttime: timestamp -> %Y-%m-%d %H:%M:%S\n    slice_by_size: slice a sequence into chunks, return as a generation of chunks with size.\n    slice_into_pieces: slice a sequence into n pieces, return a generation of n pieces.\n    timeago: show the seconds as human-readable.\n    unique: unique one sequence.\n    find_one: use regex like Javascript to find one string with index(like [0], [1]).\n    find_jsons: find all json substrings from a mass string contains list / dict JSON.\n    ...\n\n\n# Benchmarks\n> Benchmark of concurrent is not very necessary and accurate, just take a look.\n\n## Test Server: golang net/http\n\nSource code: [go_test_server.go](https://github.com/ClericPy/torequests/blob/master/benchmarks/go_test_server.go)\n\n## Client Testing Code\n\nSource code: [py_test_client.py](https://github.com/ClericPy/torequests/blob/master/benchmarks/py_test_client.py)\n\n## Test Result on Windows (without uvloop)\n\n> Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz (12 CPUs), ~2.2GHz; 12 logical CPUs.\n\n```verilog\nWindows-10-10.0.18362-SP0\n3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]\n['aiohttp(3.6.2)', 'torequests(4.9.14)', 'requests(2.23.0)', 'httpx(0.12.1)']\n================================================================================\ntest_aiohttp             : 2000 / 2000 = 100.0%, cost 1.289s, 1551 qps, 100% standard.\ntest_dummy               : 2000 / 2000 = 100.0%, cost 1.397s, 1432 qps,  92% standard.\ntest_aiohttp_dummy       : 2000 / 2000 = 100.0%, cost 1.341s, 1491 qps,  96% standard.\ntest_httpx               : 2000 / 2000 = 100.0%, cost 4.065s,  492 qps,  32% standard.\ntest_tPool               : 2000 / 2000 = 100.0%, cost 5.161s,  388 qps,  25% standard.\n```\n\n## Test Result on Linux (with uvloop)\n\n> Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz; 1 logical CPU.\n\n```verilog\nLinux-4.15.0-13-generic-x86_64-with-Ubuntu-18.04-bionic\n3.7.3 (default, Apr  3 2019, 19:16:38)\n[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]\n['aiohttp(3.6.2)', 'torequests(4.9.14)', 'requests(2.23.0)', 'httpx(0.12.1)']\n================================================================================\ntest_aiohttp             : 2000 / 2000 = 100.0%, cost 0.652s, 3068 qps, 100% standard.\ntest_dummy               : 2000 / 2000 = 100.0%, cost 0.802s, 2495 qps,  81% standard.\ntest_aiohttp_dummy       : 2000 / 2000 = 100.0%, cost 0.726s, 2754 qps,  90% standard.\ntest_httpx               : 2000 / 2000 = 100.0%, cost 2.349s,  852 qps,  28% standard.\ntest_tPool               : 2000 / 2000 = 100.0%, cost 2.708s,  739 qps,  24% standard.\n```\n\n## Conclusion\n\n1. **aiohttp** is the fastest, for the cython utils\n   1. aiohttp's qps is 3068 on 1 cpu linux with uvloop, near to golang's 3300.\n2. **torequests.dummy.Requests** based on **aiohttp**.\n   1. about **8%** performance lost **without** uvloop.\n   2. about **20%** performance lost with uvloop.\n3. **torequests.aiohttp_dummy.Requests** based on **aiohttp**.\n   1. less than **4%** performance lost **without** uvloop.\n   2. less than **10%** performance lost with uvloop.\n4. **httpx** is faster than **requests + threading,** but not very obviously on linux.\n\n### PS:\n\n**golang - net/http** 's performance\n\n```\n`2000 / 2000, 100.00 %, cost 0.26 seconds, 7567.38 qps` on windows (12 cpus)\n`2000 / 2000, 100.00 %, cost 0.61 seconds, 3302.48 qps` on linux (1 cpu)\n   1. slower than windows, because golang benefit from multiple CPU count\n   2. linux 1 cpu, but windows is 12\n```\n\n**golang http client testing code:**\n\n[go_test_client.go](https://github.com/ClericPy/torequests/blob/master/benchmarks/go_test_client.go)\n\n# Documentation\n> [Document & Usage](https://torequests.readthedocs.io/en/latest/)\n\n# License\n> [MIT license](LICENSE)\n\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Async wrapper for requests / aiohttp, and some python crawler toolkits. Let synchronization code enjoy the performance of asynchronous programming. Read more: https://github.com/ClericPy/torequests.",
    "version": "6.0.0",
    "project_urls": {
        "Homepage": "https://github.com/ClericPy/torequests"
    },
    "split_keywords": [
        "requests",
        "async",
        "multi-thread",
        "aiohttp",
        "asyncio",
        "asynchronous"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3cc33120ff2d3e080ac30723008d6569ed5eb171145c57ca92ea03bd772d7327",
                "md5": "9007c9f92756de073c273e89d4c5c2f8",
                "sha256": "f462fe3d76543481fb7c7a42181be7049686d6fcf19fe9ea61c02d928c4499d7"
            },
            "downloads": -1,
            "filename": "torequests-6.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9007c9f92756de073c273e89d4c5c2f8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 62684,
            "upload_time": "2022-07-03T08:20:15",
            "upload_time_iso_8601": "2022-07-03T08:20:15.670673Z",
            "url": "https://files.pythonhosted.org/packages/3c/c3/3120ff2d3e080ac30723008d6569ed5eb171145c57ca92ea03bd772d7327/torequests-6.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-07-03 08:20:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ClericPy",
    "github_project": "torequests",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "lcname": "torequests"
}