Name | homeharvest JSON |
Version |
0.4.12
JSON |
| download |
home_page | None |
Summary | Real estate scraping library |
upload_time | 2025-07-15 00:09:18 |
maintainer | None |
docs_url | None |
author | Zachary Hampton |
requires_python | <3.13,>=3.9 |
license | None |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<img src="https://github.com/ZacharyHampton/HomeHarvest/assets/78247585/d1a2bf8b-09f5-4c57-b33a-0ada8a34f12d" width="400">
**HomeHarvest** is a real estate scraping library that extracts and formats data in the style of MLS listings.
## HomeHarvest Features
- **Source**: Fetches properties directly from **Realtor.com**.
- **Data Format**: Structures data to resemble MLS listings.
- **Export Flexibility**: Options to save as either CSV or Excel.

## Installation
```bash
pip install -U homeharvest
```
_Python version >= [3.9](https://www.python.org/downloads/release/python-3100/) required_
## Usage
### Python
```py
from homeharvest import scrape_property
from datetime import datetime
# Generate filename based on current timestamp
current_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"HomeHarvest_{current_timestamp}.csv"
properties = scrape_property(
location="San Diego, CA",
listing_type="sold", # or (for_sale, for_rent, pending)
past_days=30, # sold in last 30 days - listed in last 30 days if (for_sale, for_rent)
# property_type=['single_family','multi_family'],
# date_from="2023-05-01", # alternative to past_days
# date_to="2023-05-28",
# foreclosure=True
# mls_only=True, # only fetch MLS listings
)
print(f"Number of properties: {len(properties)}")
# Export to csv
properties.to_csv(filename, index=False)
print(properties.head())
```
## Output
```plaintext
>>> properties.head()
MLS MLS # Status Style ... COEDate LotSFApx PrcSqft Stories
0 SDCA 230018348 SOLD CONDOS ... 2023-10-03 290110 803 2
1 SDCA 230016614 SOLD TOWNHOMES ... 2023-10-03 None 838 3
2 SDCA 230016367 SOLD CONDOS ... 2023-10-03 30056 649 1
3 MRCA NDP2306335 SOLD SINGLE_FAMILY ... 2023-10-03 7519 661 2
4 SDCA 230014532 SOLD CONDOS ... 2023-10-03 None 752 1
[5 rows x 22 columns]
```
### Parameters for `scrape_property()`
```
Required
├── location (str): The address in various formats - this could be just a zip code, a full address, or city/state, etc.
├── listing_type (option): Choose the type of listing.
- 'for_rent'
- 'for_sale'
- 'sold'
- 'pending' (for pending/contingent sales)
Optional
├── property_type (list): Choose the type of properties.
- 'single_family'
- 'multi_family'
- 'condos'
- 'condo_townhome_rowhome_coop'
- 'condo_townhome'
- 'townhomes'
- 'duplex_triplex'
- 'farm'
- 'land'
- 'mobile'
│
├── return_type (option): Choose the return type.
│ - 'pandas' (default)
│ - 'pydantic'
│ - 'raw' (json)
│
├── radius (decimal): Radius in miles to find comparable properties based on individual addresses.
│ Example: 5.5 (fetches properties within a 5.5-mile radius if location is set to a specific address; otherwise, ignored)
│
├── past_days (integer): Number of past days to filter properties. Utilizes 'last_sold_date' for 'sold' listing types, and 'list_date' for others (for_rent, for_sale).
│ Example: 30 (fetches properties listed/sold in the last 30 days)
│
├── date_from, date_to (string): Start and end dates to filter properties listed or sold, both dates are required.
| (use this to get properties in chunks as there's a 10k result limit)
│ Format for both must be "YYYY-MM-DD".
│ Example: "2023-05-01", "2023-05-15" (fetches properties listed/sold between these dates)
│
├── mls_only (True/False): If set, fetches only MLS listings (mainly applicable to 'sold' listings)
│
├── foreclosure (True/False): If set, fetches only foreclosures
│
├── proxy (string): In format 'http://user:pass@host:port'
│
├── extra_property_data (True/False): Increases requests by O(n). If set, this fetches additional property data for general searches (e.g. schools, tax appraisals etc.)
│
├── exclude_pending (True/False): If set, excludes 'pending' properties from the 'for_sale' results unless listing_type is 'pending'
│
└── limit (integer): Limit the number of properties to fetch. Max & default is 10000.
```
### Property Schema
```plaintext
Property
├── Basic Information:
│ ├── property_url
│ ├── property_id
│ ├── listing_id
│ ├── mls
│ ├── mls_id
│ └── status
├── Address Details:
│ ├── street
│ ├── unit
│ ├── city
│ ├── state
│ └── zip_code
├── Property Description:
│ ├── style
│ ├── beds
│ ├── full_baths
│ ├── half_baths
│ ├── sqft
│ ├── year_built
│ ├── stories
│ ├── garage
│ └── lot_sqft
├── Property Listing Details:
│ ├── days_on_mls
│ ├── list_price
│ ├── list_price_min
│ ├── list_price_max
│ ├── list_date
│ ├── pending_date
│ ├── sold_price
│ ├── last_sold_date
│ ├── price_per_sqft
│ ├── new_construction
│ └── hoa_fee
├── Tax Information:
│ ├── year
│ ├── tax
│ ├── assessment
│ │ ├── building
│ │ ├── land
│ │ └── total
├── Location Details:
│ ├── latitude
│ ├── longitude
│ ├── nearby_schools
├── Agent Info:
│ ├── agent_id
│ ├── agent_name
│ ├── agent_email
│ └── agent_phone
├── Broker Info:
│ ├── broker_id
│ └── broker_name
├── Builder Info:
│ ├── builder_id
│ └── builder_name
├── Office Info:
│ ├── office_id
│ ├── office_name
│ ├── office_phones
│ └── office_email
```
### Exceptions
The following exceptions may be raised when using HomeHarvest:
- `InvalidListingType` - valid options: `for_sale`, `for_rent`, `sold`, `pending`.
- `InvalidDate` - date_from or date_to is not in the format YYYY-MM-DD.
- `AuthenticationError` - Realtor.com token request failed.
Raw data
{
"_id": null,
"home_page": null,
"name": "homeharvest",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Zachary Hampton",
"author_email": "zachary@bunsly.com",
"download_url": "https://files.pythonhosted.org/packages/a0/72/9fdd4af799e87307194928ecf3f6d5bcc49149ddf29078ef6e2b2a89a457/homeharvest-0.4.12.tar.gz",
"platform": null,
"description": "<img src=\"https://github.com/ZacharyHampton/HomeHarvest/assets/78247585/d1a2bf8b-09f5-4c57-b33a-0ada8a34f12d\" width=\"400\">\n\n**HomeHarvest** is a real estate scraping library that extracts and formats data in the style of MLS listings.\n\n## HomeHarvest Features\n\n- **Source**: Fetches properties directly from **Realtor.com**.\n- **Data Format**: Structures data to resemble MLS listings.\n- **Export Flexibility**: Options to save as either CSV or Excel.\n\n\n\n## Installation\n\n```bash\npip install -U homeharvest\n```\n _Python version >= [3.9](https://www.python.org/downloads/release/python-3100/) required_\n\n## Usage\n\n### Python\n\n```py\nfrom homeharvest import scrape_property\nfrom datetime import datetime\n\n# Generate filename based on current timestamp\ncurrent_timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\nfilename = f\"HomeHarvest_{current_timestamp}.csv\"\n\nproperties = scrape_property(\n location=\"San Diego, CA\",\n listing_type=\"sold\", # or (for_sale, for_rent, pending)\n past_days=30, # sold in last 30 days - listed in last 30 days if (for_sale, for_rent)\n\n # property_type=['single_family','multi_family'],\n # date_from=\"2023-05-01\", # alternative to past_days\n # date_to=\"2023-05-28\",\n # foreclosure=True\n # mls_only=True, # only fetch MLS listings\n)\nprint(f\"Number of properties: {len(properties)}\")\n\n# Export to csv\nproperties.to_csv(filename, index=False)\nprint(properties.head())\n```\n\n## Output\n```plaintext\n>>> properties.head()\n MLS MLS # Status Style ... COEDate LotSFApx PrcSqft Stories\n0 SDCA 230018348 SOLD CONDOS ... 2023-10-03 290110 803 2\n1 SDCA 230016614 SOLD TOWNHOMES ... 2023-10-03 None 838 3\n2 SDCA 230016367 SOLD CONDOS ... 2023-10-03 30056 649 1\n3 MRCA NDP2306335 SOLD SINGLE_FAMILY ... 2023-10-03 7519 661 2\n4 SDCA 230014532 SOLD CONDOS ... 2023-10-03 None 752 1\n[5 rows x 22 columns]\n```\n\n### Parameters for `scrape_property()`\n```\nRequired\n\u251c\u2500\u2500 location (str): The address in various formats - this could be just a zip code, a full address, or city/state, etc.\n\u251c\u2500\u2500 listing_type (option): Choose the type of listing.\n - 'for_rent'\n - 'for_sale'\n - 'sold'\n - 'pending' (for pending/contingent sales)\n\nOptional\n\u251c\u2500\u2500 property_type (list): Choose the type of properties.\n - 'single_family'\n - 'multi_family'\n - 'condos'\n - 'condo_townhome_rowhome_coop'\n - 'condo_townhome'\n - 'townhomes'\n - 'duplex_triplex'\n - 'farm'\n - 'land'\n - 'mobile'\n\u2502\n\u251c\u2500\u2500 return_type (option): Choose the return type.\n\u2502 - 'pandas' (default)\n\u2502 - 'pydantic'\n\u2502 - 'raw' (json)\n\u2502\n\u251c\u2500\u2500 radius (decimal): Radius in miles to find comparable properties based on individual addresses.\n\u2502 Example: 5.5 (fetches properties within a 5.5-mile radius if location is set to a specific address; otherwise, ignored)\n\u2502\n\u251c\u2500\u2500 past_days (integer): Number of past days to filter properties. Utilizes 'last_sold_date' for 'sold' listing types, and 'list_date' for others (for_rent, for_sale).\n\u2502 Example: 30 (fetches properties listed/sold in the last 30 days)\n\u2502\n\u251c\u2500\u2500 date_from, date_to (string): Start and end dates to filter properties listed or sold, both dates are required.\n| (use this to get properties in chunks as there's a 10k result limit)\n\u2502 Format for both must be \"YYYY-MM-DD\".\n\u2502 Example: \"2023-05-01\", \"2023-05-15\" (fetches properties listed/sold between these dates)\n\u2502\n\u251c\u2500\u2500 mls_only (True/False): If set, fetches only MLS listings (mainly applicable to 'sold' listings)\n\u2502\n\u251c\u2500\u2500 foreclosure (True/False): If set, fetches only foreclosures\n\u2502\n\u251c\u2500\u2500 proxy (string): In format 'http://user:pass@host:port'\n\u2502\n\u251c\u2500\u2500 extra_property_data (True/False): Increases requests by O(n). If set, this fetches additional property data for general searches (e.g. schools, tax appraisals etc.)\n\u2502\n\u251c\u2500\u2500 exclude_pending (True/False): If set, excludes 'pending' properties from the 'for_sale' results unless listing_type is 'pending'\n\u2502\n\u2514\u2500\u2500 limit (integer): Limit the number of properties to fetch. Max & default is 10000.\n```\n\n### Property Schema\n```plaintext\nProperty\n\u251c\u2500\u2500 Basic Information:\n\u2502 \u251c\u2500\u2500 property_url\n\u2502 \u251c\u2500\u2500 property_id\n\u2502 \u251c\u2500\u2500 listing_id\n\u2502 \u251c\u2500\u2500 mls\n\u2502 \u251c\u2500\u2500 mls_id\n\u2502 \u2514\u2500\u2500 status\n\n\u251c\u2500\u2500 Address Details:\n\u2502 \u251c\u2500\u2500 street\n\u2502 \u251c\u2500\u2500 unit\n\u2502 \u251c\u2500\u2500 city\n\u2502 \u251c\u2500\u2500 state\n\u2502 \u2514\u2500\u2500 zip_code\n\n\u251c\u2500\u2500 Property Description:\n\u2502 \u251c\u2500\u2500 style\n\u2502 \u251c\u2500\u2500 beds\n\u2502 \u251c\u2500\u2500 full_baths\n\u2502 \u251c\u2500\u2500 half_baths\n\u2502 \u251c\u2500\u2500 sqft\n\u2502 \u251c\u2500\u2500 year_built\n\u2502 \u251c\u2500\u2500 stories\n\u2502 \u251c\u2500\u2500 garage\n\u2502 \u2514\u2500\u2500 lot_sqft\n\n\u251c\u2500\u2500 Property Listing Details:\n\u2502 \u251c\u2500\u2500 days_on_mls\n\u2502 \u251c\u2500\u2500 list_price\n\u2502 \u251c\u2500\u2500 list_price_min\n\u2502 \u251c\u2500\u2500 list_price_max\n\u2502 \u251c\u2500\u2500 list_date\n\u2502 \u251c\u2500\u2500 pending_date\n\u2502 \u251c\u2500\u2500 sold_price\n\u2502 \u251c\u2500\u2500 last_sold_date\n\u2502 \u251c\u2500\u2500 price_per_sqft\n\u2502 \u251c\u2500\u2500 new_construction\n\u2502 \u2514\u2500\u2500 hoa_fee\n\n\u251c\u2500\u2500 Tax Information:\n\u2502 \u251c\u2500\u2500 year\n\u2502 \u251c\u2500\u2500 tax\n\u2502 \u251c\u2500\u2500 assessment\n\u2502 \u2502 \u251c\u2500\u2500 building\n\u2502 \u2502 \u251c\u2500\u2500 land\n\u2502 \u2502 \u2514\u2500\u2500 total\n\n\u251c\u2500\u2500 Location Details:\n\u2502 \u251c\u2500\u2500 latitude\n\u2502 \u251c\u2500\u2500 longitude\n\u2502 \u251c\u2500\u2500 nearby_schools\n\n\u251c\u2500\u2500 Agent Info:\n\u2502 \u251c\u2500\u2500 agent_id\n\u2502 \u251c\u2500\u2500 agent_name\n\u2502 \u251c\u2500\u2500 agent_email\n\u2502 \u2514\u2500\u2500 agent_phone\n\n\u251c\u2500\u2500 Broker Info:\n\u2502 \u251c\u2500\u2500 broker_id\n\u2502 \u2514\u2500\u2500 broker_name\n\n\u251c\u2500\u2500 Builder Info:\n\u2502 \u251c\u2500\u2500 builder_id\n\u2502 \u2514\u2500\u2500 builder_name\n\n\u251c\u2500\u2500 Office Info:\n\u2502 \u251c\u2500\u2500 office_id\n\u2502 \u251c\u2500\u2500 office_name\n\u2502 \u251c\u2500\u2500 office_phones\n\u2502 \u2514\u2500\u2500 office_email\n\n```\n\n### Exceptions\nThe following exceptions may be raised when using HomeHarvest:\n\n- `InvalidListingType` - valid options: `for_sale`, `for_rent`, `sold`, `pending`.\n- `InvalidDate` - date_from or date_to is not in the format YYYY-MM-DD.\n- `AuthenticationError` - Realtor.com token request failed.\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Real estate scraping library",
"version": "0.4.12",
"project_urls": {
"Homepage": "https://github.com/Bunsly/HomeHarvest"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "1a7699d4a2cc896c205c3db9f11766fceb044c8b74a64be8a77bda2d22d4e0cc",
"md5": "082020ccb8c4163b5511587151fd6196",
"sha256": "76f3bbf77e834b62ad59f5829e453069bdb8af16ef5ef8b455553c1756a73e51"
},
"downloads": -1,
"filename": "homeharvest-0.4.12-py3-none-any.whl",
"has_sig": false,
"md5_digest": "082020ccb8c4163b5511587151fd6196",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 20055,
"upload_time": "2025-07-15T00:09:17",
"upload_time_iso_8601": "2025-07-15T00:09:17.647607Z",
"url": "https://files.pythonhosted.org/packages/1a/76/99d4a2cc896c205c3db9f11766fceb044c8b74a64be8a77bda2d22d4e0cc/homeharvest-0.4.12-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a0729fdd4af799e87307194928ecf3f6d5bcc49149ddf29078ef6e2b2a89a457",
"md5": "12d441a04d82c96073914332d5bd3d40",
"sha256": "85083ee6bfddc281d05d5f3ecf11581415be2615d5854091e433954311d99ae5"
},
"downloads": -1,
"filename": "homeharvest-0.4.12.tar.gz",
"has_sig": false,
"md5_digest": "12d441a04d82c96073914332d5bd3d40",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.9",
"size": 18260,
"upload_time": "2025-07-15T00:09:18",
"upload_time_iso_8601": "2025-07-15T00:09:18.928238Z",
"url": "https://files.pythonhosted.org/packages/a0/72/9fdd4af799e87307194928ecf3f6d5bcc49149ddf29078ef6e2b2a89a457/homeharvest-0.4.12.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-15 00:09:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Bunsly",
"github_project": "HomeHarvest",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "homeharvest"
}