homeharvest


Namehomeharvest JSON
Version 0.4.12 PyPI version JSON
download
home_pageNone
SummaryReal estate scraping library
upload_time2025-07-15 00:09:18
maintainerNone
docs_urlNone
authorZachary Hampton
requires_python<3.13,>=3.9
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <img src="https://github.com/ZacharyHampton/HomeHarvest/assets/78247585/d1a2bf8b-09f5-4c57-b33a-0ada8a34f12d" width="400">

**HomeHarvest** is a real estate scraping library that extracts and formats data in the style of MLS listings.

## HomeHarvest Features

- **Source**: Fetches properties directly from **Realtor.com**.
- **Data Format**: Structures data to resemble MLS listings.
- **Export Flexibility**: Options to save as either CSV or Excel.

![homeharvest](https://github.com/ZacharyHampton/HomeHarvest/assets/78247585/b3d5d727-e67b-4a9f-85d8-1e65fd18620a)

## Installation

```bash
pip install -U homeharvest
```
  _Python version >= [3.9](https://www.python.org/downloads/release/python-3100/) required_

## Usage

### Python

```py
from homeharvest import scrape_property
from datetime import datetime

# Generate filename based on current timestamp
current_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"HomeHarvest_{current_timestamp}.csv"

properties = scrape_property(
  location="San Diego, CA",
  listing_type="sold",  # or (for_sale, for_rent, pending)
  past_days=30,  # sold in last 30 days - listed in last 30 days if (for_sale, for_rent)

  # property_type=['single_family','multi_family'],
  # date_from="2023-05-01", # alternative to past_days
  # date_to="2023-05-28",
  # foreclosure=True
  # mls_only=True,  # only fetch MLS listings
)
print(f"Number of properties: {len(properties)}")

# Export to csv
properties.to_csv(filename, index=False)
print(properties.head())
```

## Output
```plaintext
>>> properties.head()
    MLS       MLS # Status          Style  ...     COEDate LotSFApx PrcSqft Stories
0  SDCA   230018348   SOLD         CONDOS  ...  2023-10-03   290110     803       2
1  SDCA   230016614   SOLD      TOWNHOMES  ...  2023-10-03     None     838       3
2  SDCA   230016367   SOLD         CONDOS  ...  2023-10-03    30056     649       1
3  MRCA  NDP2306335   SOLD  SINGLE_FAMILY  ...  2023-10-03     7519     661       2
4  SDCA   230014532   SOLD         CONDOS  ...  2023-10-03     None     752       1
[5 rows x 22 columns]
```

### Parameters for `scrape_property()`
```
Required
├── location (str): The address in various formats - this could be just a zip code, a full address, or city/state, etc.
├── listing_type (option): Choose the type of listing.
    - 'for_rent'
    - 'for_sale'
    - 'sold'
    - 'pending' (for pending/contingent sales)

Optional
├── property_type (list): Choose the type of properties.
    - 'single_family'
    - 'multi_family'
    - 'condos'
    - 'condo_townhome_rowhome_coop'
    - 'condo_townhome'
    - 'townhomes'
    - 'duplex_triplex'
    - 'farm'
    - 'land'
    - 'mobile'
│
├── return_type (option): Choose the return type.
│    - 'pandas' (default)
│    - 'pydantic'
│    - 'raw' (json)
│
├── radius (decimal): Radius in miles to find comparable properties based on individual addresses.
│    Example: 5.5 (fetches properties within a 5.5-mile radius if location is set to a specific address; otherwise, ignored)
│
├── past_days (integer): Number of past days to filter properties. Utilizes 'last_sold_date' for 'sold' listing types, and 'list_date' for others (for_rent, for_sale).
│    Example: 30 (fetches properties listed/sold in the last 30 days)
│
├── date_from, date_to (string): Start and end dates to filter properties listed or sold, both dates are required.
|    (use this to get properties in chunks as there's a 10k result limit)
│    Format for both must be "YYYY-MM-DD".
│    Example: "2023-05-01", "2023-05-15" (fetches properties listed/sold between these dates)
│
├── mls_only (True/False): If set, fetches only MLS listings (mainly applicable to 'sold' listings)
│
├── foreclosure (True/False): If set, fetches only foreclosures
│
├── proxy (string): In format 'http://user:pass@host:port'
│
├── extra_property_data (True/False): Increases requests by O(n). If set, this fetches additional property data for general searches (e.g. schools, tax appraisals etc.)
│
├── exclude_pending (True/False): If set, excludes 'pending' properties from the 'for_sale' results unless listing_type is 'pending'
│
└── limit (integer): Limit the number of properties to fetch. Max & default is 10000.
```

### Property Schema
```plaintext
Property
├── Basic Information:
│ ├── property_url
│ ├── property_id
│ ├── listing_id
│ ├── mls
│ ├── mls_id
│ └── status

├── Address Details:
│ ├── street
│ ├── unit
│ ├── city
│ ├── state
│ └── zip_code

├── Property Description:
│ ├── style
│ ├── beds
│ ├── full_baths
│ ├── half_baths
│ ├── sqft
│ ├── year_built
│ ├── stories
│ ├── garage
│ └── lot_sqft

├── Property Listing Details:
│ ├── days_on_mls
│ ├── list_price
│ ├── list_price_min
│ ├── list_price_max
│ ├── list_date
│ ├── pending_date
│ ├── sold_price
│ ├── last_sold_date
│ ├── price_per_sqft
│ ├── new_construction
│ └── hoa_fee

├── Tax Information:
│  ├── year
│  ├── tax
│  ├── assessment
│  │   ├── building
│  │   ├── land
│  │   └── total

├── Location Details:
│ ├── latitude
│ ├── longitude
│ ├── nearby_schools

├── Agent Info:
│ ├── agent_id
│ ├── agent_name
│ ├── agent_email
│ └── agent_phone

├── Broker Info:
│ ├── broker_id
│ └── broker_name

├── Builder Info:
│ ├── builder_id
│ └── builder_name

├── Office Info:
│ ├── office_id
│ ├── office_name
│ ├── office_phones
│ └── office_email

```

### Exceptions
The following exceptions may be raised when using HomeHarvest:

- `InvalidListingType` - valid options: `for_sale`, `for_rent`, `sold`, `pending`.
- `InvalidDate` - date_from or date_to is not in the format YYYY-MM-DD.
- `AuthenticationError` - Realtor.com token request failed.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "homeharvest",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Zachary Hampton",
    "author_email": "zachary@bunsly.com",
    "download_url": "https://files.pythonhosted.org/packages/a0/72/9fdd4af799e87307194928ecf3f6d5bcc49149ddf29078ef6e2b2a89a457/homeharvest-0.4.12.tar.gz",
    "platform": null,
    "description": "<img src=\"https://github.com/ZacharyHampton/HomeHarvest/assets/78247585/d1a2bf8b-09f5-4c57-b33a-0ada8a34f12d\" width=\"400\">\n\n**HomeHarvest** is a real estate scraping library that extracts and formats data in the style of MLS listings.\n\n## HomeHarvest Features\n\n- **Source**: Fetches properties directly from **Realtor.com**.\n- **Data Format**: Structures data to resemble MLS listings.\n- **Export Flexibility**: Options to save as either CSV or Excel.\n\n![homeharvest](https://github.com/ZacharyHampton/HomeHarvest/assets/78247585/b3d5d727-e67b-4a9f-85d8-1e65fd18620a)\n\n## Installation\n\n```bash\npip install -U homeharvest\n```\n  _Python version >= [3.9](https://www.python.org/downloads/release/python-3100/) required_\n\n## Usage\n\n### Python\n\n```py\nfrom homeharvest import scrape_property\nfrom datetime import datetime\n\n# Generate filename based on current timestamp\ncurrent_timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\nfilename = f\"HomeHarvest_{current_timestamp}.csv\"\n\nproperties = scrape_property(\n  location=\"San Diego, CA\",\n  listing_type=\"sold\",  # or (for_sale, for_rent, pending)\n  past_days=30,  # sold in last 30 days - listed in last 30 days if (for_sale, for_rent)\n\n  # property_type=['single_family','multi_family'],\n  # date_from=\"2023-05-01\", # alternative to past_days\n  # date_to=\"2023-05-28\",\n  # foreclosure=True\n  # mls_only=True,  # only fetch MLS listings\n)\nprint(f\"Number of properties: {len(properties)}\")\n\n# Export to csv\nproperties.to_csv(filename, index=False)\nprint(properties.head())\n```\n\n## Output\n```plaintext\n>>> properties.head()\n    MLS       MLS # Status          Style  ...     COEDate LotSFApx PrcSqft Stories\n0  SDCA   230018348   SOLD         CONDOS  ...  2023-10-03   290110     803       2\n1  SDCA   230016614   SOLD      TOWNHOMES  ...  2023-10-03     None     838       3\n2  SDCA   230016367   SOLD         CONDOS  ...  2023-10-03    30056     649       1\n3  MRCA  NDP2306335   SOLD  SINGLE_FAMILY  ...  2023-10-03     7519     661       2\n4  SDCA   230014532   SOLD         CONDOS  ...  2023-10-03     None     752       1\n[5 rows x 22 columns]\n```\n\n### Parameters for `scrape_property()`\n```\nRequired\n\u251c\u2500\u2500 location (str): The address in various formats - this could be just a zip code, a full address, or city/state, etc.\n\u251c\u2500\u2500 listing_type (option): Choose the type of listing.\n    - 'for_rent'\n    - 'for_sale'\n    - 'sold'\n    - 'pending' (for pending/contingent sales)\n\nOptional\n\u251c\u2500\u2500 property_type (list): Choose the type of properties.\n    - 'single_family'\n    - 'multi_family'\n    - 'condos'\n    - 'condo_townhome_rowhome_coop'\n    - 'condo_townhome'\n    - 'townhomes'\n    - 'duplex_triplex'\n    - 'farm'\n    - 'land'\n    - 'mobile'\n\u2502\n\u251c\u2500\u2500 return_type (option): Choose the return type.\n\u2502    - 'pandas' (default)\n\u2502    - 'pydantic'\n\u2502    - 'raw' (json)\n\u2502\n\u251c\u2500\u2500 radius (decimal): Radius in miles to find comparable properties based on individual addresses.\n\u2502    Example: 5.5 (fetches properties within a 5.5-mile radius if location is set to a specific address; otherwise, ignored)\n\u2502\n\u251c\u2500\u2500 past_days (integer): Number of past days to filter properties. Utilizes 'last_sold_date' for 'sold' listing types, and 'list_date' for others (for_rent, for_sale).\n\u2502    Example: 30 (fetches properties listed/sold in the last 30 days)\n\u2502\n\u251c\u2500\u2500 date_from, date_to (string): Start and end dates to filter properties listed or sold, both dates are required.\n|    (use this to get properties in chunks as there's a 10k result limit)\n\u2502    Format for both must be \"YYYY-MM-DD\".\n\u2502    Example: \"2023-05-01\", \"2023-05-15\" (fetches properties listed/sold between these dates)\n\u2502\n\u251c\u2500\u2500 mls_only (True/False): If set, fetches only MLS listings (mainly applicable to 'sold' listings)\n\u2502\n\u251c\u2500\u2500 foreclosure (True/False): If set, fetches only foreclosures\n\u2502\n\u251c\u2500\u2500 proxy (string): In format 'http://user:pass@host:port'\n\u2502\n\u251c\u2500\u2500 extra_property_data (True/False): Increases requests by O(n). If set, this fetches additional property data for general searches (e.g. schools, tax appraisals etc.)\n\u2502\n\u251c\u2500\u2500 exclude_pending (True/False): If set, excludes 'pending' properties from the 'for_sale' results unless listing_type is 'pending'\n\u2502\n\u2514\u2500\u2500 limit (integer): Limit the number of properties to fetch. Max & default is 10000.\n```\n\n### Property Schema\n```plaintext\nProperty\n\u251c\u2500\u2500 Basic Information:\n\u2502 \u251c\u2500\u2500 property_url\n\u2502 \u251c\u2500\u2500 property_id\n\u2502 \u251c\u2500\u2500 listing_id\n\u2502 \u251c\u2500\u2500 mls\n\u2502 \u251c\u2500\u2500 mls_id\n\u2502 \u2514\u2500\u2500 status\n\n\u251c\u2500\u2500 Address Details:\n\u2502 \u251c\u2500\u2500 street\n\u2502 \u251c\u2500\u2500 unit\n\u2502 \u251c\u2500\u2500 city\n\u2502 \u251c\u2500\u2500 state\n\u2502 \u2514\u2500\u2500 zip_code\n\n\u251c\u2500\u2500 Property Description:\n\u2502 \u251c\u2500\u2500 style\n\u2502 \u251c\u2500\u2500 beds\n\u2502 \u251c\u2500\u2500 full_baths\n\u2502 \u251c\u2500\u2500 half_baths\n\u2502 \u251c\u2500\u2500 sqft\n\u2502 \u251c\u2500\u2500 year_built\n\u2502 \u251c\u2500\u2500 stories\n\u2502 \u251c\u2500\u2500 garage\n\u2502 \u2514\u2500\u2500 lot_sqft\n\n\u251c\u2500\u2500 Property Listing Details:\n\u2502 \u251c\u2500\u2500 days_on_mls\n\u2502 \u251c\u2500\u2500 list_price\n\u2502 \u251c\u2500\u2500 list_price_min\n\u2502 \u251c\u2500\u2500 list_price_max\n\u2502 \u251c\u2500\u2500 list_date\n\u2502 \u251c\u2500\u2500 pending_date\n\u2502 \u251c\u2500\u2500 sold_price\n\u2502 \u251c\u2500\u2500 last_sold_date\n\u2502 \u251c\u2500\u2500 price_per_sqft\n\u2502 \u251c\u2500\u2500 new_construction\n\u2502 \u2514\u2500\u2500 hoa_fee\n\n\u251c\u2500\u2500 Tax Information:\n\u2502  \u251c\u2500\u2500 year\n\u2502  \u251c\u2500\u2500 tax\n\u2502  \u251c\u2500\u2500 assessment\n\u2502  \u2502   \u251c\u2500\u2500 building\n\u2502  \u2502   \u251c\u2500\u2500 land\n\u2502  \u2502   \u2514\u2500\u2500 total\n\n\u251c\u2500\u2500 Location Details:\n\u2502 \u251c\u2500\u2500 latitude\n\u2502 \u251c\u2500\u2500 longitude\n\u2502 \u251c\u2500\u2500 nearby_schools\n\n\u251c\u2500\u2500 Agent Info:\n\u2502 \u251c\u2500\u2500 agent_id\n\u2502 \u251c\u2500\u2500 agent_name\n\u2502 \u251c\u2500\u2500 agent_email\n\u2502 \u2514\u2500\u2500 agent_phone\n\n\u251c\u2500\u2500 Broker Info:\n\u2502 \u251c\u2500\u2500 broker_id\n\u2502 \u2514\u2500\u2500 broker_name\n\n\u251c\u2500\u2500 Builder Info:\n\u2502 \u251c\u2500\u2500 builder_id\n\u2502 \u2514\u2500\u2500 builder_name\n\n\u251c\u2500\u2500 Office Info:\n\u2502 \u251c\u2500\u2500 office_id\n\u2502 \u251c\u2500\u2500 office_name\n\u2502 \u251c\u2500\u2500 office_phones\n\u2502 \u2514\u2500\u2500 office_email\n\n```\n\n### Exceptions\nThe following exceptions may be raised when using HomeHarvest:\n\n- `InvalidListingType` - valid options: `for_sale`, `for_rent`, `sold`, `pending`.\n- `InvalidDate` - date_from or date_to is not in the format YYYY-MM-DD.\n- `AuthenticationError` - Realtor.com token request failed.\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Real estate scraping library",
    "version": "0.4.12",
    "project_urls": {
        "Homepage": "https://github.com/Bunsly/HomeHarvest"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1a7699d4a2cc896c205c3db9f11766fceb044c8b74a64be8a77bda2d22d4e0cc",
                "md5": "082020ccb8c4163b5511587151fd6196",
                "sha256": "76f3bbf77e834b62ad59f5829e453069bdb8af16ef5ef8b455553c1756a73e51"
            },
            "downloads": -1,
            "filename": "homeharvest-0.4.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "082020ccb8c4163b5511587151fd6196",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 20055,
            "upload_time": "2025-07-15T00:09:17",
            "upload_time_iso_8601": "2025-07-15T00:09:17.647607Z",
            "url": "https://files.pythonhosted.org/packages/1a/76/99d4a2cc896c205c3db9f11766fceb044c8b74a64be8a77bda2d22d4e0cc/homeharvest-0.4.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a0729fdd4af799e87307194928ecf3f6d5bcc49149ddf29078ef6e2b2a89a457",
                "md5": "12d441a04d82c96073914332d5bd3d40",
                "sha256": "85083ee6bfddc281d05d5f3ecf11581415be2615d5854091e433954311d99ae5"
            },
            "downloads": -1,
            "filename": "homeharvest-0.4.12.tar.gz",
            "has_sig": false,
            "md5_digest": "12d441a04d82c96073914332d5bd3d40",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 18260,
            "upload_time": "2025-07-15T00:09:18",
            "upload_time_iso_8601": "2025-07-15T00:09:18.928238Z",
            "url": "https://files.pythonhosted.org/packages/a0/72/9fdd4af799e87307194928ecf3f6d5bcc49149ddf29078ef6e2b2a89a457/homeharvest-0.4.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-15 00:09:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Bunsly",
    "github_project": "HomeHarvest",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "homeharvest"
}
        
Elapsed time: 0.92136s