python-craigslist-headless


Namepython-craigslist-headless JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/f3mshep/python-craigslist-headless
SummarySimple Craigslist wrapper.
upload_time2023-03-12 02:39:54
maintainer
docs_urlNone
authorJulio M Alegria, Alexandra Wright
requires_python
licenseMIT-Zero
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            python-craigslist
=================

A simple `Craigslist <http://www.craigslist.org>`__ wrapper.

License: `MIT-Zero <https://romanrm.net/mit-zero>`__.

Disclaimer
----------

* I don't work for or have any affiliation with Craigslist.
* This module was implemented for educational purposes. It should not be used for crawling or downloading data from Craigslist.
* This is a fork of the pip package by Julio M Alegria. This version uses a headless version of chrome for more accurate parsing
* of data that is generated dynamically via JavaScript

Installation
------------

::

    pip install python-craigslist-headless

Classes
-------

Base class:

* ``CraigslistBase``

Subclasses:

* ``CraigslistCommunity`` (craigslist.org > community)
* ``CraigslistHousing`` (craigslist.org > housing)
* ``CraigslistJobs`` (craigslist.org > jobs)
* ``CraigslistForSale`` (craigslist.org > for sale)
* ``CraigslistEvents`` (craigslist.org > event calendar)
* ``CraigslistServices`` (craigslist.org > services)
* ``CraigslistGigs`` (craigslist.org > gigs)
* ``CraigslistResumes`` (craigslist.org > resumes)

Examples
--------

Looking for a room in San Francisco?

.. code:: python

    from craigslist import CraigslistHousing
    cl_h = CraigslistHousing(site='sfbay', area='sfc', category='roo',
                             filters={'max_price': 1200, 'private_room': True})

    # You can get an approximate amount of results with the following call:
    print(cl_h.get_results_approx_count())

    992

    for result in cl_h.get_results(sort_by='newest', geotagged=True):
        print(result)

    {
        'id': u'4851150747',
        'name': u'Near SFSU, UCSF and NEWLY FURNISHED - CLEAN, CONVENIENT and CLEAN!',
        'url': u'http://sfbay.craigslist.org/sfc/roo/4851150747.html',
        'datetime': u'2015-01-27 23:44',
        'price': u'$1100',
        'where': u'inner sunset / UCSF',
        'has_image': False,
        'has_map': True,
        'geotag': (37.738473, -122.494721)
    }
    # ...

Maybe a software engineering internship in Silicon Valley?

.. code:: python

    from craigslist import CraigslistJobs
    cl_j = CraigslistJobs(site='sfbay', area='sby', category='sof',
                          filters={'is_internship': True, 'employment_type': ['full-time', 'part-time']})

    for result in cl_j.get_results():
        print(result)

    {
        'id': u'5708651182',
        'name': u'GAME DEVELOPER INTERNSHIP AT TYNKER - AVAILABLE NOW!',
	'url': u'http://sfbay.craigslist.org/pen/eng/5708651182.html',
	'datetime': u'2016-07-30 13:30',
	'price': None,
	'where': u'mountain view',
	'has_image': True,
	'has_map': True,
	'geotag': None
    }
    # ...

Events with free food in New York?

.. code:: python

    from craigslist import CraigslistEvents
    cl_e = CraigslistEvents(site='newyork', filters={'free': True, 'food': True})

    for result in cl_e.get_results(sort_by='newest', limit=5):
        print(result)

    {
        'id': u'4866178242',
        'name': u'Lituation Thursdays @ Le Reve',
        'url': u'http://newyork.craigslist.org/mnh/eve/4866178242.html',
        'datetime': u'1/29',
        'price': None,
        'where': u'Midtown East',
        'has_image': True,
        'has_map': True,
        'geotag': None
    }
    # ...

Where to get `filters` from?
----------------------------

Every subclass has its own set of filters. To get a list of all the filters
supported by a specific subclass, use the ``.show_filters()`` class-method:

.. code:: python

   >>> from craigslist import CraigslistJobs, CraigslistForSale
   >>> CraigslistJobs.show_filters()

   Base filters:
   * query = ...
   * search_titles = True/False
   * has_image = True/False
   * posted_today = True/False
   * bundle_duplicates = True/False
   * search_distance = ...
   * zip_code = ...
   
   CraigslistJobs filters:
   * is_internship = True/False
   * is_nonprofit = True/False
   * is_telecommuting = True/False
   * employment_type = u'full-time', u'part-time', u'contract', u"employee's choice"


   >>> CraigslistForSale.show_filters(category='cta')

   Base filters:
   * query = ...
   * search_titles = True/False
   * has_image = True/False
   * posted_today = True/False
   * bundle_duplicates = True/False
   * search_distance = ...
   * zip_code = ...
   
   CraigslistForSale filters with category 'cta':
   * min_price = ...
   * max_price = ...
   * make = ...
   * model = ...
   * min_year = ...
   * max_year = ...
   * min_miles = ...
   * max_miles = ...
   * min_engine_displacement = ...
   * max_engine_displacement = ...
   * condition = u'new', u'like new', u'excellent', u'good', u'fair', u'salvage'
   * auto_cylinders = u'3 cylinders', u'4 cylinders', u'5 cylinders', u'6 cylinders', u'8 cylinders', u'10 cylinders', u'12 cylinders', u'other'
   * auto_drivetrain = u'fwd', u'rwd', u'4wd'
   * auto_fuel_type = u'gas', u'diesel', u'hybrid', u'electric', u'other'
   * auto_paint = u'black', u'blue', u'brown', u'green', u'grey', u'orange', u'purple', u'red', u'silver', u'white', u'yellow', u'custom'
   * auto_size = u'compact', u'full-size', u'mid-size', u'sub-compact'
   * auto_title_status = u'clean', u'salvage', u'rebuilt', u'parts only', u'lien', u'missing'
   * auto_transmission = u'manual', u'automatic', u'other'
   * auto_bodytype = u'bus', u'convertible', u'coupe', u'hatchback', u'mini-van', u'offroad', u'pickup', u'sedan', u'truck', u'SUV', u'wagon', u'van', u'other'

Where to get ``site`` and ``area`` from?
----------------------------------------

When initializing any of the subclasses, you'll need to provide the ``site``, and optionall the ``area``, from where you want to query data.

To get the correct ``site``, follow these steps:

1. Go to `craigslist.org/about/sites <https://www.craigslist.org/about/sites>`__.
2. Find the country or city you're interested on, and click on it.
3. You'll be directed to ``<site>.craigslist.org``. The value of ``<site>`` in the URL is the one you should use.

Not all sites have areas. To check if your site has areas, check for links next to the title of the Craigslist page, on the top center. For example, for New York you'll see:

.. image:: https://user-images.githubusercontent.com/1008637/45307206-bb404d80-b51e-11e8-8e6d-edfbdbd0a6fa.png

Click on the one you're interested, and you'll be redirected to ``<site>.craigslist.org/<area>``. The value of ``<area>`` in the URL is the one you should use. If there are no areas next to the title, it means your site has no areas, and you can leave that argument unset.

Where to get ``category`` from?
-------------------------------

You can additionally provide a ``category`` when initializing any of the subclasses. To get a list of all the categories
supported by a specific subclass, use the ``.show_categories()`` class-method:

.. code:: python
    
    >>> from craigslist import CraigslistServices
    >>> CraigslistServices.show_categories()

    CraigslistServices categories:  
    * aos = automotive services
    * bts = beauty services
    * cms = cell phone / mobile services
    * cps = computer services
    * crs = creative services
    * cys = cycle services
    * evs = event services
    * fgs = farm & garden services
    * fns = financial services
    * hws = health/wellness services
    * hss = household services
    * lbs = labor / hauling / moving
    * lgs = legal services
    * lss = lessons & tutoring
    * mas = marine services
    * pas = pet services
    * rts = real estate services
    * sks = skilled trade services
    * biz = small biz ads
    * trv = travel/vacation services
    * wet = writing / editing / translation

Is there a limit for the number of results?
--------------------------------------------

Yes, Craigslist caps the results for any search to 3000.

Support
-------

If you find any bug or you want to propose a new feature, please use the `issues tracker <https://github.com/juliomalegria/python-craigslist/issues>`__. I'll be happy to help you! :-)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/f3mshep/python-craigslist-headless",
    "name": "python-craigslist-headless",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Julio M Alegria, Alexandra Wright",
    "author_email": "superbiscuit@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/64/c4/d5375517e19dd4134ca5d98c787d531acbcbe64f1c9796a769864b20a87c/python-craigslist_headless-0.0.5.tar.gz",
    "platform": null,
    "description": "python-craigslist\n=================\n\nA simple `Craigslist <http://www.craigslist.org>`__ wrapper.\n\nLicense: `MIT-Zero <https://romanrm.net/mit-zero>`__.\n\nDisclaimer\n----------\n\n* I don't work for or have any affiliation with Craigslist.\n* This module was implemented for educational purposes. It should not be used for crawling or downloading data from Craigslist.\n* This is a fork of the pip package by Julio M Alegria. This version uses a headless version of chrome for more accurate parsing\n* of data that is generated dynamically via JavaScript\n\nInstallation\n------------\n\n::\n\n    pip install python-craigslist-headless\n\nClasses\n-------\n\nBase class:\n\n* ``CraigslistBase``\n\nSubclasses:\n\n* ``CraigslistCommunity`` (craigslist.org > community)\n* ``CraigslistHousing`` (craigslist.org > housing)\n* ``CraigslistJobs`` (craigslist.org > jobs)\n* ``CraigslistForSale`` (craigslist.org > for sale)\n* ``CraigslistEvents`` (craigslist.org > event calendar)\n* ``CraigslistServices`` (craigslist.org > services)\n* ``CraigslistGigs`` (craigslist.org > gigs)\n* ``CraigslistResumes`` (craigslist.org > resumes)\n\nExamples\n--------\n\nLooking for a room in San Francisco?\n\n.. code:: python\n\n    from craigslist import CraigslistHousing\n    cl_h = CraigslistHousing(site='sfbay', area='sfc', category='roo',\n                             filters={'max_price': 1200, 'private_room': True})\n\n    # You can get an approximate amount of results with the following call:\n    print(cl_h.get_results_approx_count())\n\n    992\n\n    for result in cl_h.get_results(sort_by='newest', geotagged=True):\n        print(result)\n\n    {\n        'id': u'4851150747',\n        'name': u'Near SFSU, UCSF and NEWLY FURNISHED - CLEAN, CONVENIENT and CLEAN!',\n        'url': u'http://sfbay.craigslist.org/sfc/roo/4851150747.html',\n        'datetime': u'2015-01-27 23:44',\n        'price': u'$1100',\n        'where': u'inner sunset / UCSF',\n        'has_image': False,\n        'has_map': True,\n        'geotag': (37.738473, -122.494721)\n    }\n    # ...\n\nMaybe a software engineering internship in Silicon Valley?\n\n.. code:: python\n\n    from craigslist import CraigslistJobs\n    cl_j = CraigslistJobs(site='sfbay', area='sby', category='sof',\n                          filters={'is_internship': True, 'employment_type': ['full-time', 'part-time']})\n\n    for result in cl_j.get_results():\n        print(result)\n\n    {\n        'id': u'5708651182',\n        'name': u'GAME DEVELOPER INTERNSHIP AT TYNKER - AVAILABLE NOW!',\n\t'url': u'http://sfbay.craigslist.org/pen/eng/5708651182.html',\n\t'datetime': u'2016-07-30 13:30',\n\t'price': None,\n\t'where': u'mountain view',\n\t'has_image': True,\n\t'has_map': True,\n\t'geotag': None\n    }\n    # ...\n\nEvents with free food in New York?\n\n.. code:: python\n\n    from craigslist import CraigslistEvents\n    cl_e = CraigslistEvents(site='newyork', filters={'free': True, 'food': True})\n\n    for result in cl_e.get_results(sort_by='newest', limit=5):\n        print(result)\n\n    {\n        'id': u'4866178242',\n        'name': u'Lituation Thursdays @ Le Reve',\n        'url': u'http://newyork.craigslist.org/mnh/eve/4866178242.html',\n        'datetime': u'1/29',\n        'price': None,\n        'where': u'Midtown East',\n        'has_image': True,\n        'has_map': True,\n        'geotag': None\n    }\n    # ...\n\nWhere to get `filters` from?\n----------------------------\n\nEvery subclass has its own set of filters. To get a list of all the filters\nsupported by a specific subclass, use the ``.show_filters()`` class-method:\n\n.. code:: python\n\n   >>> from craigslist import CraigslistJobs, CraigslistForSale\n   >>> CraigslistJobs.show_filters()\n\n   Base filters:\n   * query = ...\n   * search_titles = True/False\n   * has_image = True/False\n   * posted_today = True/False\n   * bundle_duplicates = True/False\n   * search_distance = ...\n   * zip_code = ...\n   \n   CraigslistJobs filters:\n   * is_internship = True/False\n   * is_nonprofit = True/False\n   * is_telecommuting = True/False\n   * employment_type = u'full-time', u'part-time', u'contract', u\"employee's choice\"\n\n\n   >>> CraigslistForSale.show_filters(category='cta')\n\n   Base filters:\n   * query = ...\n   * search_titles = True/False\n   * has_image = True/False\n   * posted_today = True/False\n   * bundle_duplicates = True/False\n   * search_distance = ...\n   * zip_code = ...\n   \n   CraigslistForSale filters with category 'cta':\n   * min_price = ...\n   * max_price = ...\n   * make = ...\n   * model = ...\n   * min_year = ...\n   * max_year = ...\n   * min_miles = ...\n   * max_miles = ...\n   * min_engine_displacement = ...\n   * max_engine_displacement = ...\n   * condition = u'new', u'like new', u'excellent', u'good', u'fair', u'salvage'\n   * auto_cylinders = u'3 cylinders', u'4 cylinders', u'5 cylinders', u'6 cylinders', u'8 cylinders', u'10 cylinders', u'12 cylinders', u'other'\n   * auto_drivetrain = u'fwd', u'rwd', u'4wd'\n   * auto_fuel_type = u'gas', u'diesel', u'hybrid', u'electric', u'other'\n   * auto_paint = u'black', u'blue', u'brown', u'green', u'grey', u'orange', u'purple', u'red', u'silver', u'white', u'yellow', u'custom'\n   * auto_size = u'compact', u'full-size', u'mid-size', u'sub-compact'\n   * auto_title_status = u'clean', u'salvage', u'rebuilt', u'parts only', u'lien', u'missing'\n   * auto_transmission = u'manual', u'automatic', u'other'\n   * auto_bodytype = u'bus', u'convertible', u'coupe', u'hatchback', u'mini-van', u'offroad', u'pickup', u'sedan', u'truck', u'SUV', u'wagon', u'van', u'other'\n\nWhere to get ``site`` and ``area`` from?\n----------------------------------------\n\nWhen initializing any of the subclasses, you'll need to provide the ``site``, and optionall the ``area``, from where you want to query data.\n\nTo get the correct ``site``, follow these steps:\n\n1. Go to `craigslist.org/about/sites <https://www.craigslist.org/about/sites>`__.\n2. Find the country or city you're interested on, and click on it.\n3. You'll be directed to ``<site>.craigslist.org``. The value of ``<site>`` in the URL is the one you should use.\n\nNot all sites have areas. To check if your site has areas, check for links next to the title of the Craigslist page, on the top center. For example, for New York you'll see:\n\n.. image:: https://user-images.githubusercontent.com/1008637/45307206-bb404d80-b51e-11e8-8e6d-edfbdbd0a6fa.png\n\nClick on the one you're interested, and you'll be redirected to ``<site>.craigslist.org/<area>``. The value of ``<area>`` in the URL is the one you should use. If there are no areas next to the title, it means your site has no areas, and you can leave that argument unset.\n\nWhere to get ``category`` from?\n-------------------------------\n\nYou can additionally provide a ``category`` when initializing any of the subclasses. To get a list of all the categories\nsupported by a specific subclass, use the ``.show_categories()`` class-method:\n\n.. code:: python\n    \n    >>> from craigslist import CraigslistServices\n    >>> CraigslistServices.show_categories()\n\n    CraigslistServices categories:  \n    * aos = automotive services\n    * bts = beauty services\n    * cms = cell phone / mobile services\n    * cps = computer services\n    * crs = creative services\n    * cys = cycle services\n    * evs = event services\n    * fgs = farm & garden services\n    * fns = financial services\n    * hws = health/wellness services\n    * hss = household services\n    * lbs = labor / hauling / moving\n    * lgs = legal services\n    * lss = lessons & tutoring\n    * mas = marine services\n    * pas = pet services\n    * rts = real estate services\n    * sks = skilled trade services\n    * biz = small biz ads\n    * trv = travel/vacation services\n    * wet = writing / editing / translation\n\nIs there a limit for the number of results?\n--------------------------------------------\n\nYes, Craigslist caps the results for any search to 3000.\n\nSupport\n-------\n\nIf you find any bug or you want to propose a new feature, please use the `issues tracker <https://github.com/juliomalegria/python-craigslist/issues>`__. I'll be happy to help you! :-)\n",
    "bugtrack_url": null,
    "license": "MIT-Zero",
    "summary": "Simple Craigslist wrapper.",
    "version": "0.0.5",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "64c4d5375517e19dd4134ca5d98c787d531acbcbe64f1c9796a769864b20a87c",
                "md5": "d265c0039dff7c2f63a860f9da279943",
                "sha256": "db4a51f2e985cb3450ff963d7869934c097a14fad1f4cda8b66543851ea407ca"
            },
            "downloads": -1,
            "filename": "python-craigslist_headless-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "d265c0039dff7c2f63a860f9da279943",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 16410,
            "upload_time": "2023-03-12T02:39:54",
            "upload_time_iso_8601": "2023-03-12T02:39:54.360500Z",
            "url": "https://files.pythonhosted.org/packages/64/c4/d5375517e19dd4134ca5d98c787d531acbcbe64f1c9796a769864b20a87c/python-craigslist_headless-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-12 02:39:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "f3mshep",
    "github_project": "python-craigslist-headless",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "python-craigslist-headless"
}
        
Elapsed time: 0.05215s