gimagedownload


Namegimagedownload JSON
Version 0.10 PyPI version JSON
download
home_pagehttps://github.com/hansalemaos/gimagedownload
SummaryDownload Google Image search results with free proxies
upload_time2023-01-23 23:54:21
maintainer
docs_urlNone
authorJohannes Fischer
requires_python
licenseMIT
keywords download scrape site google images
VCS
bugtrack_url
requirements a_pandas_ex_apply_ignore_exceptions get_consecutive_filename list_all_files_recursively pandas Pillow requests site2hdd tolerant_isinstance url_analyzer
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Download Google Image search results with free proxies



```python

pip install gimagedownload 

```



## First step



## Create a proxy pkl file using [freeproxydownloader]([freeproxydownloader · PyPI](https://pypi.org/project/freeproxydownloader/))



##### If you have already installed  [site2hdd]([site2hdd · PyPI](https://pypi.org/project/site2hdd/))   (before 2023-01-24  , you need to update it)



### pip install --upgrade site2hdd



##### or



### pip install --upgrade gimagedownload



```python

from site2hdd import download_url_list,get_proxies,download_webpage



xlsxfile,pklfile = get_proxies(

  save_path_proxies_all_filtered='c:\\newfilepath\\myproxiefile\\proxy', #  path doesn't have to exist, it will be created, last 

 # part (proxy) is the name of the file - pkl and xlsx will be added

 # important: There will be 2 files, in this case: c:\\newfilepath\\myproxiefile\\proxy.pkl and c:\\newfilepath\\myproxiefile\\proxy.xlsx



  http_check_timeout=4, # if proxy can't connect within 4 seconds to wikipedia, it is invalid



  threads_httpcheck=50, # threads to check if the http connection is working



  threads_ping=100 ,  # before the http test, there is a ping test to check if the server exists



  silent=False, # show results when a working server has been found



  max_proxies_to_check=20000, # stops the search at 20000

)

```



## CLI



```python

#You can download the pictures using the command line:

python "C:\Users\Gamer\anaconda3\envs\dfdir\Lib\site-packages\gimagedownload\__init__.py" -p c:\newfilepath\myproxiefile\proxy.pkl -s house,elephant,lion -d f:\googleimgdownload -v 3 -t 50 -r 7 -q 9

```



```python

#Underlines are converted to space

python "C:\Users\Gamer\anaconda3\envs\dfdir\googleimgs.py" -p c:\newfilepath\myproxiefile\proxy.pkl -s brazilian_food,ferrari,Los_Angeles -d f:\googleimgdownload -v 3 -t 50 -r 7 -q 9

```



## CLI Arguments



```python

#arguments: 

-p / --proxy_pickle_file  

The proxy pkl file created with freeproxydownloader

# pip install freeproxydownloader



-s / --search 

Search terms separated by comma: house,elephant,lion





-d / --download_folder 

Download folder, will be created if it doesn't exist 

default: os.path.join(os.getcwd(), "GOOGLE_IMAGE_DOWNLOADS")



-v / --variations 

Grab links from slightly different search terms

dog/DoG/Dog/doG (only uppercase/lowercase variations) 

with different proxies. Don't exaggerate. It might get very slow, but it helps to get more results. 3 or 4 is a good start 

default = 3



-t / --threads 

How many requests threads

default = 50



-r / --requests_timeout 

Timeout in seconds for requests

default = 7



-q / --thread_timeout 

Timeout in seconds for running thread. It should be higher than the timeout for requests to avoid problems. 

default = 9

```



## Import the function



```python

from gimagedownload import start_image_download

start_image_download(

   ProxyPickleFile=r'c:\newfilepath\myproxiefilexxx\proxy.pkl', # pip install freeproxydownloader

   search_terms=['halloween'],

   download_folder=r'f:\googleimgdownload',

   search_variations=3,

   threadlimit=50,

   RequestsTimeout=7,

   ThreadTimeout=9,

)

```



<img title="" src="https://github.com/hansalemaos/screenshots/raw/main/gimages/dl.png" alt="">



### Results "brazilian_food"



<img title="" src="https://github.com/hansalemaos/screenshots/raw/main/gimages/brazilian_food.png" alt="">



### Results "elephant"



<img title="" src="https://github.com/hansalemaos/screenshots/raw/main/gimages/elephant.png" alt="">



### Results "ferrari"



<img title="" src="https://github.com/hansalemaos/screenshots/raw/main/gimages/ferrari.png" alt="">



### Results "house"



<img title="" src="https://github.com/hansalemaos/screenshots/raw/main/gimages/house.png" alt="">



### Results "lion"



<img title="" src="https://github.com/hansalemaos/screenshots/raw/main/gimages/lion.png" alt="">



### Results "los_angeles"



<img title="" src="https://github.com/hansalemaos/screenshots/raw/main/gimages/los_angeles.png" alt="">


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hansalemaos/gimagedownload",
    "name": "gimagedownload",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "download,scrape,site,google,images",
    "author": "Johannes Fischer",
    "author_email": "<aulasparticularesdealemaosp@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/12/01/af67a3ccca305141c4385ca400dbeafbe4e48e000a782bc27163626e6931/gimagedownload-0.10.tar.gz",
    "platform": null,
    "description": "\n# Download Google Image search results with free proxies\n\n\n\n```python\n\npip install gimagedownload \n\n```\n\n\n\n## First step\n\n\n\n## Create a proxy pkl file using [freeproxydownloader]([freeproxydownloader \u00b7 PyPI](https://pypi.org/project/freeproxydownloader/))\n\n\n\n##### If you have already installed  [site2hdd]([site2hdd \u00b7 PyPI](https://pypi.org/project/site2hdd/))   (before 2023-01-24  , you need to update it)\n\n\n\n### pip install --upgrade site2hdd\n\n\n\n##### or\n\n\n\n### pip install --upgrade gimagedownload\n\n\n\n```python\n\nfrom site2hdd import download_url_list,get_proxies,download_webpage\n\n\n\nxlsxfile,pklfile = get_proxies(\n\n  save_path_proxies_all_filtered='c:\\\\newfilepath\\\\myproxiefile\\\\proxy', #  path doesn't have to exist, it will be created, last \n\n # part (proxy) is the name of the file - pkl and xlsx will be added\n\n # important: There will be 2 files, in this case: c:\\\\newfilepath\\\\myproxiefile\\\\proxy.pkl and c:\\\\newfilepath\\\\myproxiefile\\\\proxy.xlsx\n\n\n\n  http_check_timeout=4, # if proxy can't connect within 4 seconds to wikipedia, it is invalid\n\n\n\n  threads_httpcheck=50, # threads to check if the http connection is working\n\n\n\n  threads_ping=100 ,  # before the http test, there is a ping test to check if the server exists\n\n\n\n  silent=False, # show results when a working server has been found\n\n\n\n  max_proxies_to_check=20000, # stops the search at 20000\n\n)\n\n```\n\n\n\n## CLI\n\n\n\n```python\n\n#You can download the pictures using the command line:\n\npython \"C:\\Users\\Gamer\\anaconda3\\envs\\dfdir\\Lib\\site-packages\\gimagedownload\\__init__.py\" -p c:\\newfilepath\\myproxiefile\\proxy.pkl -s house,elephant,lion -d f:\\googleimgdownload -v 3 -t 50 -r 7 -q 9\n\n```\n\n\n\n```python\n\n#Underlines are converted to space\n\npython \"C:\\Users\\Gamer\\anaconda3\\envs\\dfdir\\googleimgs.py\" -p c:\\newfilepath\\myproxiefile\\proxy.pkl -s brazilian_food,ferrari,Los_Angeles -d f:\\googleimgdownload -v 3 -t 50 -r 7 -q 9\n\n```\n\n\n\n## CLI Arguments\n\n\n\n```python\n\n#arguments: \n\n-p / --proxy_pickle_file  \n\nThe proxy pkl file created with freeproxydownloader\n\n# pip install freeproxydownloader\n\n\n\n-s / --search \n\nSearch terms separated by comma: house,elephant,lion\n\n\n\n\n\n-d / --download_folder \n\nDownload folder, will be created if it doesn't exist \n\ndefault: os.path.join(os.getcwd(), \"GOOGLE_IMAGE_DOWNLOADS\")\n\n\n\n-v / --variations \n\nGrab links from slightly different search terms\n\ndog/DoG/Dog/doG (only uppercase/lowercase variations) \n\nwith different proxies. Don't exaggerate. It might get very slow, but it helps to get more results. 3 or 4 is a good start \n\ndefault = 3\n\n\n\n-t / --threads \n\nHow many requests threads\n\ndefault = 50\n\n\n\n-r / --requests_timeout \n\nTimeout in seconds for requests\n\ndefault = 7\n\n\n\n-q / --thread_timeout \n\nTimeout in seconds for running thread. It should be higher than the timeout for requests to avoid problems. \n\ndefault = 9\n\n```\n\n\n\n## Import the function\n\n\n\n```python\n\nfrom gimagedownload import start_image_download\n\nstart_image_download(\n\n   ProxyPickleFile=r'c:\\newfilepath\\myproxiefilexxx\\proxy.pkl', # pip install freeproxydownloader\n\n   search_terms=['halloween'],\n\n   download_folder=r'f:\\googleimgdownload',\n\n   search_variations=3,\n\n   threadlimit=50,\n\n   RequestsTimeout=7,\n\n   ThreadTimeout=9,\n\n)\n\n```\n\n\n\n<img title=\"\" src=\"https://github.com/hansalemaos/screenshots/raw/main/gimages/dl.png\" alt=\"\">\n\n\n\n### Results \"brazilian_food\"\n\n\n\n<img title=\"\" src=\"https://github.com/hansalemaos/screenshots/raw/main/gimages/brazilian_food.png\" alt=\"\">\n\n\n\n### Results \"elephant\"\n\n\n\n<img title=\"\" src=\"https://github.com/hansalemaos/screenshots/raw/main/gimages/elephant.png\" alt=\"\">\n\n\n\n### Results \"ferrari\"\n\n\n\n<img title=\"\" src=\"https://github.com/hansalemaos/screenshots/raw/main/gimages/ferrari.png\" alt=\"\">\n\n\n\n### Results \"house\"\n\n\n\n<img title=\"\" src=\"https://github.com/hansalemaos/screenshots/raw/main/gimages/house.png\" alt=\"\">\n\n\n\n### Results \"lion\"\n\n\n\n<img title=\"\" src=\"https://github.com/hansalemaos/screenshots/raw/main/gimages/lion.png\" alt=\"\">\n\n\n\n### Results \"los_angeles\"\n\n\n\n<img title=\"\" src=\"https://github.com/hansalemaos/screenshots/raw/main/gimages/los_angeles.png\" alt=\"\">\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Download Google Image search results with free proxies",
    "version": "0.10",
    "split_keywords": [
        "download",
        "scrape",
        "site",
        "google",
        "images"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c3bd4151c941a12fe058e13d7b98e3d50b61d4e42ba6e8c036ac59189f5989b7",
                "md5": "66a7f9d58e3f423e6fb8c1a9cd4d78ce",
                "sha256": "829d707888391a994ae6458e371723b2eafa6200599ca8c7845c00d1b7449c88"
            },
            "downloads": -1,
            "filename": "gimagedownload-0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "66a7f9d58e3f423e6fb8c1a9cd4d78ce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9740,
            "upload_time": "2023-01-23T23:54:20",
            "upload_time_iso_8601": "2023-01-23T23:54:20.091602Z",
            "url": "https://files.pythonhosted.org/packages/c3/bd/4151c941a12fe058e13d7b98e3d50b61d4e42ba6e8c036ac59189f5989b7/gimagedownload-0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1201af67a3ccca305141c4385ca400dbeafbe4e48e000a782bc27163626e6931",
                "md5": "db812d0d898bc38234d4c64a9be90093",
                "sha256": "d9fdb3827bdf054e6f76471ca58c213cd7f23fb4549f8febc8b4c35c6b8d34d3"
            },
            "downloads": -1,
            "filename": "gimagedownload-0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "db812d0d898bc38234d4c64a9be90093",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7876,
            "upload_time": "2023-01-23T23:54:21",
            "upload_time_iso_8601": "2023-01-23T23:54:21.765104Z",
            "url": "https://files.pythonhosted.org/packages/12/01/af67a3ccca305141c4385ca400dbeafbe4e48e000a782bc27163626e6931/gimagedownload-0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-23 23:54:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "hansalemaos",
    "github_project": "gimagedownload",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "a_pandas_ex_apply_ignore_exceptions",
            "specs": []
        },
        {
            "name": "get_consecutive_filename",
            "specs": []
        },
        {
            "name": "list_all_files_recursively",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "Pillow",
            "specs": []
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "site2hdd",
            "specs": []
        },
        {
            "name": "tolerant_isinstance",
            "specs": []
        },
        {
            "name": "url_analyzer",
            "specs": []
        }
    ],
    "lcname": "gimagedownload"
}
        
Elapsed time: 0.14316s