webscrapex


Namewebscrapex JSON
Version 0.0.8 PyPI version JSON
download
home_pagehttps://github.com/ILisa250/WebScrapeX
SummaryA unified collection of web data premises eg Apartment for sale, apartment for rent, house for sale, house for rent
upload_time2023-06-23 07:10:51
maintainer
docs_urlNone
authorLisa Yvette INYANGE
requires_python>=3,<4
licenseMIT
keywords webscrapex web-data-extraction web-scraping web data collection web data integration web data aggregation automated data extraction lisa yvette inyange
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            A comprehensive web scraping pipeline for extracting and storing real estate data

### Purpose of the package
+ The primary objective of this package is to provide an efficient solution for web scraping tasks. It has essentially functionalities including link extraction, data extraction, data cleaning, and data storage to database.

### Features
    - Date                      - Bathrooms  
    - Build Year                - Car Parking
    - Floors                    - Ancillary
    - Sitting Rooms             - LandSize
    - Dining Rooms              - Price
    - Bedrooms                  - District
    - Wardrobes                 - Sector


#### Installation
To install the package, run the following command:
``` bash
!pip install WebScrapeX
```

#### Contribution
Contributions are welcome. If you encounter any bugs or have suggestions for improvements, please let me know at inyangel@yahoo.com. Thanks

#### Author
 + This package was developed by Lisa Yvette INYANGE (https://github.com/ILisa250) 

#### License
The package is released under the MIT license. (https://choosealicense.com/licenses/mit/)

#### Dependencies
The package has the following dependencies:

 + Python Decouple: Used for managing settings and configuration.
 + Python Dotenv: Used for loading environment variables from a .env file

#### Scraping URLs

The package supports scraping the following types of real estate listings from the Imali.biz website:

    Apartment for Sale: https://imali.biz/category/1/125/search?pg=
    Apartment for Rent: https://imali.biz/category/0/91/search?pg=
    House for Rent: https://imali.biz/category/0/27/search?pg=
    House for Sale: https://imali.biz/category/0/24/search?pg=

#### Usage example
Here's an example of how to use the WebScrapeX package to scrape, clean, and save real estate data:
``` bash
from WebScrapeX import scrape_clean_save_data
import os 

env_path = os.path.abspath('.env')
# Specify the link of the real estate type to scrape
url = "https://imali.biz/category/1/125/search?pg="

# Specify the name of the file to save the data (in lowercase)
file_name = "real_estate_data.csv"

# Scrape, clean, and save the data
scrape_clean_save_data(url, file_name, env_path)
```

##### Note: File name should be either "house_sale" or "house_for_rent" or "apartment_for_sale" or "apartment_for_rent".

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ILisa250/WebScrapeX",
    "name": "webscrapex",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3,<4",
    "maintainer_email": "",
    "keywords": "WebScrapeX,web-data-extraction,web-scraping,web data collection,web data integration,web data aggregation,automated data extraction,Lisa Yvette INYANGE",
    "author": "Lisa Yvette INYANGE",
    "author_email": "linyange@andrew.cmu.edu",
    "download_url": "https://files.pythonhosted.org/packages/82/7c/81fe383b753ed8e158a561dc943538315656ff738888c282f150e9773058/webscrapex-0.0.8.tar.gz",
    "platform": null,
    "description": "A comprehensive web scraping pipeline for extracting and storing real estate data\n\n### Purpose of the package\n+ The primary objective of this package is to provide an efficient solution for web scraping tasks. It has essentially functionalities including link extraction, data extraction, data cleaning, and data storage to database.\n\n### Features\n    - Date                      - Bathrooms  \n    - Build Year                - Car Parking\n    - Floors                    - Ancillary\n    - Sitting Rooms             - LandSize\n    - Dining Rooms              - Price\n    - Bedrooms                  - District\n    - Wardrobes                 - Sector\n\n\n#### Installation\nTo install the package, run the following command:\n``` bash\n!pip install WebScrapeX\n```\n\n#### Contribution\nContributions are welcome. If you encounter any bugs or have suggestions for improvements, please let me know at inyangel@yahoo.com. Thanks\n\n#### Author\n + This package was developed by Lisa Yvette INYANGE (https://github.com/ILisa250) \n\n#### License\nThe package is released under the MIT license. (https://choosealicense.com/licenses/mit/)\n\n#### Dependencies\nThe package has the following dependencies:\n\n + Python Decouple: Used for managing settings and configuration.\n + Python Dotenv: Used for loading environment variables from a .env file\n\n#### Scraping URLs\n\nThe package supports scraping the following types of real estate listings from the Imali.biz website:\n\n    Apartment for Sale: https://imali.biz/category/1/125/search?pg=\n    Apartment for Rent: https://imali.biz/category/0/91/search?pg=\n    House for Rent: https://imali.biz/category/0/27/search?pg=\n    House for Sale: https://imali.biz/category/0/24/search?pg=\n\n#### Usage example\nHere's an example of how to use the WebScrapeX package to scrape, clean, and save real estate data:\n``` bash\nfrom WebScrapeX import scrape_clean_save_data\nimport os \n\nenv_path = os.path.abspath('.env')\n# Specify the link of the real estate type to scrape\nurl = \"https://imali.biz/category/1/125/search?pg=\"\n\n# Specify the name of the file to save the data (in lowercase)\nfile_name = \"real_estate_data.csv\"\n\n# Scrape, clean, and save the data\nscrape_clean_save_data(url, file_name, env_path)\n```\n\n##### Note: File name should be either \"house_sale\" or \"house_for_rent\" or \"apartment_for_sale\" or \"apartment_for_rent\".\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A unified collection of web data premises eg Apartment for sale, apartment for rent, house for sale, house for rent",
    "version": "0.0.8",
    "project_urls": {
        "Homepage": "https://github.com/ILisa250/WebScrapeX",
        "Repository": "https://github.com/ILisa250/WebScrapeX"
    },
    "split_keywords": [
        "webscrapex",
        "web-data-extraction",
        "web-scraping",
        "web data collection",
        "web data integration",
        "web data aggregation",
        "automated data extraction",
        "lisa yvette inyange"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "16ae6ed4b1b422f9b0bee113345189c214a85f2bad05fa936c1766514d0dfc06",
                "md5": "5c96122464e31ab2aa30f8483cfed2d7",
                "sha256": "86c8db039f34c971d7594dcbc97a272b07be1ea289f4c2ac21c15080b0a84601"
            },
            "downloads": -1,
            "filename": "webscrapex-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5c96122464e31ab2aa30f8483cfed2d7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3,<4",
            "size": 6884,
            "upload_time": "2023-06-23T07:10:49",
            "upload_time_iso_8601": "2023-06-23T07:10:49.094505Z",
            "url": "https://files.pythonhosted.org/packages/16/ae/6ed4b1b422f9b0bee113345189c214a85f2bad05fa936c1766514d0dfc06/webscrapex-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "827c81fe383b753ed8e158a561dc943538315656ff738888c282f150e9773058",
                "md5": "4d95e71073ac7d7ac1068d353ad99b0b",
                "sha256": "8f25dba53c2c64937ce71f41a3a1883a8018ebdd1cb57c6740f718566fb55141"
            },
            "downloads": -1,
            "filename": "webscrapex-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "4d95e71073ac7d7ac1068d353ad99b0b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3,<4",
            "size": 6480,
            "upload_time": "2023-06-23T07:10:51",
            "upload_time_iso_8601": "2023-06-23T07:10:51.593403Z",
            "url": "https://files.pythonhosted.org/packages/82/7c/81fe383b753ed8e158a561dc943538315656ff738888c282f150e9773058/webscrapex-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-23 07:10:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ILisa250",
    "github_project": "WebScrapeX",
    "github_not_found": true,
    "lcname": "webscrapex"
}
        
Elapsed time: 0.08584s