profilescout

Name	profilescout JSON
Version	0.3.2.post1 JSON
	download
home_page	None
Summary	Profile Scout is a kit that uses crawling and machine learning to identify profile pages on any website, simplifying the process of extracting user profiles, gathering information, and performing targeted actions.
upload_time	2023-08-12 16:30:57
maintainer	None
docs_url	None
author	None
requires_python	>=3.7
license	None
keywords	analysis data_collection data_extraction detection profile profile_page scraping
VCS
bugtrack_url
requirements	beautifulsoup4 html2text numpy phonenumbers Pillow selenium tensorflow tldextract
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Profile Scout

![License](https://img.shields.io/github/license/todorovicsrdjan/profilescout)
![Last commit](https://img.shields.io/github/last-commit/todorovicsrdjan/profilescout/master)
![Repo size](https://img.shields.io/github/repo-size/todorovicsrdjan/profilescout)
[![PyPI - Version](https://img.shields.io/pypi/v/profilescout.svg)](https://pypi.org/project/profilescout)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/profilescout.svg)](https://pypi.org/project/profilescout)

**Table of Contents**
* [About](#about)
* [Capabilities](#capabilities)
* [Common Use Cases](#common-use-cases)
  * [Scraping](#scraping)
  * [Profile related tasks](#profile-related-tasks)
  * [Information extraction](#information-extraction)
* [Installation](#installation)
  * [PyPI](#pypi)
  * [Source](#source)
    * [Host](#host)
    * [Docker container](#docker-container)
* [Possibilities for future improvements](#possibilities-for-future-improvements)
* [Contributing](#contributing)

# About

**Profile Scout** is a versatile Python package that offers scraping and detection capabilities for profile pages on any given website, including support for information extraction. By leveraging its robust search functionality and machine learning, this tool crawls the provided URL and identifies the URLs of profile pages within the website. Profile Scout offers a convenient solution for extracting user profiles, gathering valuable information, and performing targeted actions on profile pages. With its streamlined approach, this tool simplifies the process of locating and accessing profile pages, making it an invaluable asset for data collection, web scraping, and analysis tasks. Additionally, it supports information extraction techniques, allowing users to extract specific data from profile pages efficiently.

Profiel Scout can be useful to: 
1. Investigators and [OSINT Specialists](https://en.wikipedia.org/wiki/Open-source_intelligence) (information extraction, creating information graphs, ...)
2. [Penetration Testers](https://en.wikipedia.org/wiki/Penetration_test) and Ethical Hackers/[Social Engineers](https://en.wikipedia.org/wiki/Social_engineering_(security)) (information extraction, reconnaissance, profile building)
3. Scientists and researchers (data engineering, data science, social science, research)
4. Companies (talent research, marketing, contact acquisition/harvesting)
5. Organizations (contact acquisition/harvesting, data collecting, database updating)

# Capabilities

Profile Scout is mainly a crawler. For given URL, it will crawl the site and perform selected actions.
If the file with URLs is provided, each URL will be processed in seperate thread.

Main features:
1. Flexible and controlled page scraping (HTML, page screenshot, or both)
2. Detecting and scraping profile pages during the crawling process
3. Locating the collective page from which all profile pages originate.
4. Information extraction from HTML files

Options:
```
-h, --help            
    show this help message and exit
    
--url URL             
    URL of the website to crawl
    
-f URLS_FILE_PATH, --file URLS_FILE_PATH
    Path to the file with URLs of the websites to crawl
    
-D DIRECTORY, --directory DIRECTORY
    Extract data from HTML files in the directory. To avoid saving output, set '-ep'/'--export-path' to ''

-v, --version
    print current version of the program

-a {scrape_pages,scrape_profiles,find_origin}, --action {scrape_pages,scrape_profiles,find_origin}
    Action to perform at a time of visiting the page (default: scrape_pages)
    
-b, --buffer          
    Buffer errors and outputs until crawling of website is finished and then create logs
    
-br, --bump-relevant  
    Bump relevant links to the top of the visiting queue (based on RELEVANT_WORDS list)
    
-ep EXPORT_PATH, --export-path EXPORT_PATH
    Path to destination directory for exporting
    
-ic {scooby}, --image-classifier {scooby}
    Image classifier to be used for identifying profile pages (default: scooby)
    
-cs CRAWL_SLEEP, --crawl-sleep CRAWL_SLEEP
    Time to sleep between each page visit (default: 2)
    
-d DEPTH, --depth DEPTH
    Maximum crawl depth (default: 2)
    
-if, --include-fragment
    Consider links with URI Fragment (e.g. http://example.com/some#fragment) as seperate page
    
-ol OUT_LOG_PATH, --output-log-path OUT_LOG_PATH
    Path to output log file. Ignored if '-f'/'--file' is used
    
-el ERR_LOG_PATH, --error-log-path ERR_LOG_PATH
    Path to error log file. Ignored if '-f'/'--file' is used
    
-so {all,html,screenshot}, --scrape-option {all,html,screenshot}
    Data to be scraped (default: all)
                
-t MAX_THREADS, --threads MAX_THREADS
    Maximum number of threads to use if '-f'/'--file' is provided (default: 4)
    
-mp MAX_PAGES, --max-pages MAX_PAGES
    Maximum number of pages to scrape and page is considered scraped if the action is performed successfully (default: unlimited)
    
-p, --preserve        
    Preserve whole URI (e.g. 'http://example.com/something/' instead of 'http://example.com/')

-r RESOLUTION, --resolution RESOLUTION
    Resolution of headless browser and output images. Format: WIDTHxHIGHT (default: 2880x1620)

Full input line format is: '[DEPTH [CRAWL_SLEEP]] URL"

DEPTH and CRAWL_SLEEP are optional and if a number is present it will be consider as DEPTH.
For example, "3 https://example.com" means that the URL should be crawled to a depth of 3.

If some of the fields (DEPTH or CRAWL_SLEEP) are present in the line then corresponding argument is ignored.

Writing too much on the storage drive can reduce its lifespan. To mitigate this issue, if there are more than
30 links, informational and error messages will be buffered and written at the end of
the crawling process.

RELEVANT_WORDS=['profile', 'user', 'users', 'about-us', 'team', 'employees', 'staff', 'professor', 
                'profil', 'o-nama', 'zaposlen', 'nastavnik', 'nastavnici', 'saradnici', 'profesor', 'osoblje', 
                'запослен', 'наставник', 'наставници', 'сарадници', 'професор', 'особље']
```

# Common Use Cases

Note: Order of arguments/switches doesn't matter

## Scraping

Scrape the URL up to a depth of 2 (`-d`) or a maximum of 300 scraped pages (`-mp`), 
depending on which comes first. Store scraped data at `/data` (`-ep`)
```Bash
profilescout --url https://example.com -d 2 -mp 300 -ep /data
```

Scrape HTML (`-so html`) for every page up to a depth of 2 for the list of URLs (`-f`). 
Number of threads to be used is set with `-t`
```Bash
profilescout -ep /data -t `nproc` -f links.txt -d 2 -so html
```

Start scraping screenshots from specific page (`-p`). It is import to note here that
without `-p`, program would ignore full path, to be precise `/about-us/meet-the-team/` part
```Bash
profilescout -p --url https://www.wowt.com/about-us/meet-the-team/ -mp 4 -so screenshot
```

Scrape each website in the URLs list and postpone writing to the storge disk (by using buffer, `-b`)
```Bash
profilescout -b -t `nproc` -f links.txt -d 0 -ep /data
```

## Profile related tasks

Scrape profile pages (`-a scrape_profiles`) and prioritize links that are relevant to some specific domain (`-br`). 
For example, if we were searching for profile pages of professors we would like to give priority to links that 
contain related terms which could lead us to the profile page. Note: you can change it in file 
[constants.py](./profilescout/common/constants.py#34)
```Bash
profilescout -br -t `nproc` -f links.txt -a scrape_profiles -mp 30
```

Find and screenshot profile, store it as 600x400 (`-r`) image and then wait (`-cs`) 30 seconds before moving to the next profile
```Bash
profilescout -br -t `nproc` -f links.txt -a scrape_profiles -mp 1000 -d 3 -cs 30 -r 600x400
```

Locate the origin page of profile pages (`-a locate_origin`) with classifier called `scooby` (`-ic scooby`).
Note that visited pages are lond so in can be used for something like scanning the website
```Bash
profilescout -t `nproc` -f links.txt -a locate_origin -ic scooby
```

## Information extraction

Extract information (`-D`) contained in profile HTMLs that are located at `/data` and store it at `~/results` (`-ep`)
```Bash
profilescout -D /data -ep ~/results
```


# Installation

## PyPi

```Bash
pip3 install profilescout
```

## Source

### Host

1. Create virtual environment (optional, but recommended)
```Bash
python3 -m venv /path/to/some/dir
```

2. Activate virtual environment (skip if you skipped the first step)
```Bash 
source /path/to/some/dir/bin/activate
```

3. Install requirements
```Bash
pip3 install -r requirements.txt
```

4. Install package locally
```Bash
pip3 install -e .
```

5. Explore
```Bash
profilescout -h
```

### Docker container

1. Create image and run container. Execute this in project's directory
```Bash
mkdir "/path/to/screenshot/dir/"            # if it does not exist
# this line may differ depending on your shell, 
# so check the documentation for the equivalent file to .bashrc
echo 'export SS_EXPORT_PATH="/path/to/screenshot/dir/"' >> ~/.bashrc
docker build -t profilescout .
docker run -it -v "$SS_EXPORT_PATH":/data profilescout
```

Add `--rm` if you want it to be disposable (one-time task)

2. Test deployment (inside docker container)
```Bash
profilescout -mp 4 -t 1 -ep '/data' -p --url https://en.wikipedia.org/wiki/GNU
```

# Possibilities for future improvements

* Classification
  * Profile classification based on existing data (without crawling)
  * Classification using HTML and images, as well as the selection of appropriate classifiers
* Scraping
  * Intelligent downloading of files through links available on the profile page
* Crawling
  * Support for scraping using proxies.
* Crawling actions
  * Ability to provide custom actions
  * Actions before and after page loading.
  * Multiple actions for each stage of page processing (before, during, and after access).
* Crawling strategy
  * Ability to provide custom heuristics
  * Ability to choose crawling strategy (link filters, etc.)
  * Support for deeper link bump
  * Selection of relevant words using CLI
* Usability
  * Saving progress and the ability to resume
  * Increased automation (if the profile is not found at depth DEPTH, increase the depth and continue).
* Extraction
  * Support for national numbers, e.g. `011/123-4567`
  * Experiment with lightweight LLMs
  * Experiment with Key-Value extraction and Layout techniques like [LayoutLM](https://arxiv.org/abs/1912.13318)

# Contributing

If you discover a bug or have a feature idea, feel free to open an issue or PR.  
Any improvements or suggestions are welcome!

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "profilescout",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "analysis,data_collection,data_extraction,detection,profile,profile_page,scraping",
    "author": null,
    "author_email": "TodorovicSrdjan <tsrdjan@pm.me>",
    "download_url": "https://files.pythonhosted.org/packages/5e/f5/d3bcceb16cf311fe0a4136a559be18d4edc28fa98196450e6c61a804ae8d/profilescout-0.3.2.post1.tar.gz",
    "platform": null,
    "description": "# Profile Scout\n\n![License](https://img.shields.io/github/license/todorovicsrdjan/profilescout)\n![Last commit](https://img.shields.io/github/last-commit/todorovicsrdjan/profilescout/master)\n![Repo size](https://img.shields.io/github/repo-size/todorovicsrdjan/profilescout)\n[![PyPI - Version](https://img.shields.io/pypi/v/profilescout.svg)](https://pypi.org/project/profilescout)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/profilescout.svg)](https://pypi.org/project/profilescout)\n\n**Table of Contents**\n* [About](#about)\n* [Capabilities](#capabilities)\n* [Common Use Cases](#common-use-cases)\n  * [Scraping](#scraping)\n  * [Profile related tasks](#profile-related-tasks)\n  * [Information extraction](#information-extraction)\n* [Installation](#installation)\n  * [PyPI](#pypi)\n  * [Source](#source)\n    * [Host](#host)\n    * [Docker container](#docker-container)\n* [Possibilities for future improvements](#possibilities-for-future-improvements)\n* [Contributing](#contributing)\n\n# About\n\n**Profile Scout** is a versatile Python package that offers scraping and detection capabilities for profile pages on any given website, including support for information extraction. By leveraging its robust search functionality and machine learning, this tool crawls the provided URL and identifies the URLs of profile pages within the website. Profile Scout offers a convenient solution for extracting user profiles, gathering valuable information, and performing targeted actions on profile pages. With its streamlined approach, this tool simplifies the process of locating and accessing profile pages, making it an invaluable asset for data collection, web scraping, and analysis tasks. Additionally, it supports information extraction techniques, allowing users to extract specific data from profile pages efficiently.\n\nProfiel Scout can be useful to: \n1. Investigators and [OSINT Specialists](https://en.wikipedia.org/wiki/Open-source_intelligence) (information extraction, creating information graphs, ...)\n2. [Penetration Testers](https://en.wikipedia.org/wiki/Penetration_test) and Ethical Hackers/[Social Engineers](https://en.wikipedia.org/wiki/Social_engineering_(security)) (information extraction, reconnaissance, profile building)\n3. Scientists and researchers (data engineering, data science, social science, research)\n4. Companies (talent research, marketing, contact acquisition/harvesting)\n5. Organizations (contact acquisition/harvesting, data collecting, database updating)\n\n# Capabilities\n\nProfile Scout is mainly a crawler. For given URL, it will crawl the site and perform selected actions.\nIf the file with URLs is provided, each URL will be processed in seperate thread.\n\nMain features:\n1. Flexible and controlled page scraping (HTML, page screenshot, or both)\n2. Detecting and scraping profile pages during the crawling process\n3. Locating the collective page from which all profile pages originate.\n4. Information extraction from HTML files\n\nOptions:\n```\n-h, --help            \n    show this help message and exit\n    \n--url URL             \n    URL of the website to crawl\n    \n-f URLS_FILE_PATH, --file URLS_FILE_PATH\n    Path to the file with URLs of the websites to crawl\n    \n-D DIRECTORY, --directory DIRECTORY\n    Extract data from HTML files in the directory. To avoid saving output, set '-ep'/'--export-path' to ''\n\n-v, --version\n    print current version of the program\n\n-a {scrape_pages,scrape_profiles,find_origin}, --action {scrape_pages,scrape_profiles,find_origin}\n    Action to perform at a time of visiting the page (default: scrape_pages)\n    \n-b, --buffer          \n    Buffer errors and outputs until crawling of website is finished and then create logs\n    \n-br, --bump-relevant  \n    Bump relevant links to the top of the visiting queue (based on RELEVANT_WORDS list)\n    \n-ep EXPORT_PATH, --export-path EXPORT_PATH\n    Path to destination directory for exporting\n    \n-ic {scooby}, --image-classifier {scooby}\n    Image classifier to be used for identifying profile pages (default: scooby)\n    \n-cs CRAWL_SLEEP, --crawl-sleep CRAWL_SLEEP\n    Time to sleep between each page visit (default: 2)\n    \n-d DEPTH, --depth DEPTH\n    Maximum crawl depth (default: 2)\n    \n-if, --include-fragment\n    Consider links with URI Fragment (e.g. http://example.com/some#fragment) as seperate page\n    \n-ol OUT_LOG_PATH, --output-log-path OUT_LOG_PATH\n    Path to output log file. Ignored if '-f'/'--file' is used\n    \n-el ERR_LOG_PATH, --error-log-path ERR_LOG_PATH\n    Path to error log file. Ignored if '-f'/'--file' is used\n    \n-so {all,html,screenshot}, --scrape-option {all,html,screenshot}\n    Data to be scraped (default: all)\n                \n-t MAX_THREADS, --threads MAX_THREADS\n    Maximum number of threads to use if '-f'/'--file' is provided (default: 4)\n    \n-mp MAX_PAGES, --max-pages MAX_PAGES\n    Maximum number of pages to scrape and page is considered scraped if the action is performed successfully (default: unlimited)\n    \n-p, --preserve        \n    Preserve whole URI (e.g. 'http://example.com/something/' instead of 'http://example.com/')\n\n-r RESOLUTION, --resolution RESOLUTION\n    Resolution of headless browser and output images. Format: WIDTHxHIGHT (default: 2880x1620)\n\nFull input line format is: '[DEPTH [CRAWL_SLEEP]] URL\"\n\nDEPTH and CRAWL_SLEEP are optional and if a number is present it will be consider as DEPTH.\nFor example, \"3 https://example.com\" means that the URL should be crawled to a depth of 3.\n\nIf some of the fields (DEPTH or CRAWL_SLEEP) are present in the line then corresponding argument is ignored.\n\nWriting too much on the storage drive can reduce its lifespan. To mitigate this issue, if there are more than\n30 links, informational and error messages will be buffered and written at the end of\nthe crawling process.\n\nRELEVANT_WORDS=['profile', 'user', 'users', 'about-us', 'team', 'employees', 'staff', 'professor', \n                'profil', 'o-nama', 'zaposlen', 'nastavnik', 'nastavnici', 'saradnici', 'profesor', 'osoblje', \n                '\u0437\u0430\u043f\u043e\u0441\u043b\u0435\u043d', '\u043d\u0430\u0441\u0442\u0430\u0432\u043d\u0438\u043a', '\u043d\u0430\u0441\u0442\u0430\u0432\u043d\u0438\u0446\u0438', '\u0441\u0430\u0440\u0430\u0434\u043d\u0438\u0446\u0438', '\u043f\u0440\u043e\u0444\u0435\u0441\u043e\u0440', '\u043e\u0441\u043e\u0431\u0459\u0435']\n```\n\n# Common Use Cases\n\nNote: Order of arguments/switches doesn't matter\n\n## Scraping\n\nScrape the URL up to a depth of 2 (`-d`) or a maximum of 300 scraped pages (`-mp`), \ndepending on which comes first. Store scraped data at `/data` (`-ep`)\n```Bash\nprofilescout --url https://example.com -d 2 -mp 300 -ep /data\n```\n\nScrape HTML (`-so html`) for every page up to a depth of 2 for the list of URLs (`-f`). \nNumber of threads to be used is set with `-t`\n```Bash\nprofilescout -ep /data -t `nproc` -f links.txt -d 2 -so html\n```\n\nStart scraping screenshots from specific page (`-p`). It is import to note here that\nwithout `-p`, program would ignore full path, to be precise `/about-us/meet-the-team/` part\n```Bash\nprofilescout -p --url https://www.wowt.com/about-us/meet-the-team/ -mp 4 -so screenshot\n```\n\nScrape each website in the URLs list and postpone writing to the storge disk (by using buffer, `-b`)\n```Bash\nprofilescout -b -t `nproc` -f links.txt -d 0 -ep /data\n```\n\n## Profile related tasks\n\nScrape profile pages (`-a scrape_profiles`) and prioritize links that are relevant to some specific domain (`-br`). \nFor example, if we were searching for profile pages of professors we would like to give priority to links that \ncontain related terms which could lead us to the profile page. Note: you can change it in file \n[constants.py](./profilescout/common/constants.py#34)\n```Bash\nprofilescout -br -t `nproc` -f links.txt -a scrape_profiles -mp 30\n```\n\nFind and screenshot profile, store it as 600x400 (`-r`) image and then wait (`-cs`) 30 seconds before moving to the next profile\n```Bash\nprofilescout -br -t `nproc` -f links.txt -a scrape_profiles -mp 1000 -d 3 -cs 30 -r 600x400\n```\n\nLocate the origin page of profile pages (`-a locate_origin`) with classifier called `scooby` (`-ic scooby`).\nNote that visited pages are lond so in can be used for something like scanning the website\n```Bash\nprofilescout -t `nproc` -f links.txt -a locate_origin -ic scooby\n```\n\n## Information extraction\n\nExtract information (`-D`) contained in profile HTMLs that are located at `/data` and store it at `~/results` (`-ep`)\n```Bash\nprofilescout -D /data -ep ~/results\n```\n\n\n# Installation\n\n## PyPi\n\n```Bash\npip3 install profilescout\n```\n\n## Source\n\n### Host\n\n1. Create virtual environment (optional, but recommended)\n```Bash\npython3 -m venv /path/to/some/dir\n```\n\n2. Activate virtual environment (skip if you skipped the first step)\n```Bash \nsource /path/to/some/dir/bin/activate\n```\n\n3. Install requirements\n```Bash\npip3 install -r requirements.txt\n```\n\n4. Install package locally\n```Bash\npip3 install -e .\n```\n\n5. Explore\n```Bash\nprofilescout -h\n```\n\n### Docker container\n\n1. Create image and run container. Execute this in project's directory\n```Bash\nmkdir \"/path/to/screenshot/dir/\"            # if it does not exist\n# this line may differ depending on your shell, \n# so check the documentation for the equivalent file to .bashrc\necho 'export SS_EXPORT_PATH=\"/path/to/screenshot/dir/\"' >> ~/.bashrc\ndocker build -t profilescout .\ndocker run -it -v \"$SS_EXPORT_PATH\":/data profilescout\n```\n\nAdd `--rm` if you want it to be disposable (one-time task)\n\n2. Test deployment (inside docker container)\n```Bash\nprofilescout -mp 4 -t 1 -ep '/data' -p --url https://en.wikipedia.org/wiki/GNU\n```\n\n# Possibilities for future improvements\n\n* Classification\n  * Profile classification based on existing data (without crawling)\n  * Classification using HTML and images, as well as the selection of appropriate classifiers\n* Scraping\n  * Intelligent downloading of files through links available on the profile page\n* Crawling\n  * Support for scraping using proxies.\n* Crawling actions\n  * Ability to provide custom actions\n  * Actions before and after page loading.\n  * Multiple actions for each stage of page processing (before, during, and after access).\n* Crawling strategy\n  * Ability to provide custom heuristics\n  * Ability to choose crawling strategy (link filters, etc.)\n  * Support for deeper link bump\n  * Selection of relevant words using CLI\n* Usability\n  * Saving progress and the ability to resume\n  * Increased automation (if the profile is not found at depth DEPTH, increase the depth and continue).\n* Extraction\n  * Support for national numbers, e.g. `011/123-4567`\n  * Experiment with lightweight LLMs\n  * Experiment with Key-Value extraction and Layout techniques like [LayoutLM](https://arxiv.org/abs/1912.13318)\n\n# Contributing\n\nIf you discover a bug or have a feature idea, feel free to open an issue or PR.  \nAny improvements or suggestions are welcome!",
    "bugtrack_url": null,
    "license": null,
    "summary": "Profile Scout is a kit that uses crawling and machine learning to identify profile pages on any website, simplifying the process of extracting user profiles, gathering information, and performing targeted actions.",
    "version": "0.3.2.post1",
    "project_urls": {
        "Documentation": "https://github.com/todorovicsrdjan/profilescout#readme",
        "Issues": "https://github.com/todorovicsrdjan/profilescout/issues",
        "Source": "https://github.com/todorovicsrdjan/profilescout"
    },
    "split_keywords": [
        "analysis",
        "data_collection",
        "data_extraction",
        "detection",
        "profile",
        "profile_page",
        "scraping"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9a95cad95877c2cbeb2e0dc6389714c5ea5fcef2c27e3c7ff24e8c6da62e1e20",
                "md5": "d0ecf7e063063eeb32f5ba2f23ffd658",
                "sha256": "9ec8d68533ad67daf6bf139d959895ad187ac8b40d12559419cc99bbecdcf950"
            },
            "downloads": -1,
            "filename": "profilescout-0.3.2.post1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d0ecf7e063063eeb32f5ba2f23ffd658",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 46498,
            "upload_time": "2023-08-12T16:30:59",
            "upload_time_iso_8601": "2023-08-12T16:30:59.375049Z",
            "url": "https://files.pythonhosted.org/packages/9a/95/cad95877c2cbeb2e0dc6389714c5ea5fcef2c27e3c7ff24e8c6da62e1e20/profilescout-0.3.2.post1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5ef5d3bcceb16cf311fe0a4136a559be18d4edc28fa98196450e6c61a804ae8d",
                "md5": "e510a911685578e76fc79453a3e074af",
                "sha256": "7063d4c744968ac08d23d1311151c5fa3813e0f51b4b4cb86c27fa4a4975cddb"
            },
            "downloads": -1,
            "filename": "profilescout-0.3.2.post1.tar.gz",
            "has_sig": false,
            "md5_digest": "e510a911685578e76fc79453a3e074af",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 41732,
            "upload_time": "2023-08-12T16:30:57",
            "upload_time_iso_8601": "2023-08-12T16:30:57.676700Z",
            "url": "https://files.pythonhosted.org/packages/5e/f5/d3bcceb16cf311fe0a4136a559be18d4edc28fa98196450e6c61a804ae8d/profilescout-0.3.2.post1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-12 16:30:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "todorovicsrdjan",
    "github_project": "profilescout#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    "==",
                    "4.11.1"
                ]
            ]
        },
        {
            "name": "html2text",
            "specs": [
                [
                    "==",
                    "2020.1.16"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.23.5"
                ]
            ]
        },
        {
            "name": "phonenumbers",
            "specs": [
                [
                    "==",
                    "8.13.18"
                ]
            ]
        },
        {
            "name": "Pillow",
            "specs": [
                [
                    ">=",
                    "9.4.0"
                ]
            ]
        },
        {
            "name": "selenium",
            "specs": [
                [
                    "==",
                    "4.11.2"
                ]
            ]
        },
        {
            "name": "tensorflow",
            "specs": [
                [
                    "==",
                    "2.13.0"
                ]
            ]
        },
        {
            "name": "tldextract",
            "specs": [
                [
                    "==",
                    "3.2.0"
                ]
            ]
        }
    ],
    "lcname": "profilescout"
}

None