# 🍏🔍 App Store Web Scraper
`app-store-web-scraper` is a Python package for extracting reviews for iOS,
iPadOS, macOS and tvOS apps from the web version of Apple's App Store.
> __Note:__ Whenever possible, prefer using Apple's [App Store Connect
> API][connect], which provides the full set of customer review data for your
> apps and is more reliable.
* [Installation](#installation)
* [Basic Usage](#basic-usage)
* [App IDs](#app-ids)
* [App Store Entries](#app-store-entries)
* [App Reviews](#app-reviews)
* [Advanced Usage](#advanced-usage)
* [Sessions](#sessions)
* [How It Works](#how-it-works)
* [License](#license)
* [Acknowledgements](#acknowledgements)
[connect]: https://developer.apple.com/app-store-connect/api/
## Installation
At the moment, this package not yet available on PyPI, but can be installed
directly from GitHub as a source package:
```sh
pip install app-store-web-scraper
```
## Basic Usage
The sample code below fetches the 10 most recent app reviews for
[Minecraft][minecraft].
```python
from app_store_web_scraper import AppStoreEntry
# See below for instructions on finding an app's ID.
MINECRAFT_APP_ID = 479516143
# Look up the app in the British version of the App Store.
app = AppStoreEntry(app_id=MINECRAFT_APP_ID, country="gb")
# Iterate over the app's reviews, which are fetched lazily in batches.
for review in app.reviews():
print("-----")
print("ID:", review.id)
print("Rating:", review.rating)
print("Review:", review.content)
```
[minecraft]: https://apps.apple.com/gb/app/multicraft-build-and-mine/id1174039276
### App IDs
`app-store-web-scraper` requires you to pass the App Store ID of the app(s) you
are interested in. To find an app's ID, find out its App Store page URL by
searching for the app on [apple.com][apple] or in the App Store app (use
_Share_ → _Copy Link_). The ID is the last part of the URL's path, without the
"id" prefix.
For example, the URL of the Minecraft app (on the British App Store) is
`https://apps.apple.com/gb/app/multicraft-build-and-mine/id1174039276`,
from which one can extract the app ID `1174039276`.
[apple]: https://www.apple.com/
### App Store Entries
To start scraping an app in the App Store, create an `AppStoreEntry` instance
with an app ID and the (lowercase) ISO code of the country whose App Store
should be scraped. If the app ID is invalid or the app is not available in the
specified region, an `AppNotFound` error is raised.
The entry's `reviews()` method returns an iterator that can be used to fetch
some or all of the app reviews available through the App Store's public API.
Note that this is usually only a small subset of all reviews that the app
received, so the number of reviews retrieved will not match the review count
displayed on the App Store page.
### App Reviews
Each review is returned as an `AppReview` object with the following attributes:
- `id`
- `date`
- `user_name`
- `rating`
- `title`
- `review`
- `developer_response` (if the developer replied)
- `id`
- `body`
- `modified`
The list of reviews split into pages by the App Store's servers, so iterating
over all reviews will regularly make a network request to fetch the next page.
To limit the total amount of network requests, you can pass a `limit` to
`reviews()` so that only a certain maximum amount of app reviews is returned by
the iterator. By default, no limit is set.
## Advanced Usage
### Sessions
The `AppStoreSession` class implements the communication with the App Store's
servers. Internally, it uses an [`urllib3.PoolManager`][urllib3-pool] the reuse
HTTP connections between requests, which reduces the load on Apple's servers
and increases performance.
By default, `AppStoreEntry` takes care of creating an `AppStoreSession` itself,
so you don't need to deal with sessions for simple use cases. However,
constructing and passing an `AppStoreSession` manually can be beneficial for two
reasons. First, it allows you to share a session between multiple
`AppStoreEntry` objects for additional efficiency:
```python
from app_store_web_scraper import AppStoreEntry, AppStoreSession
session = AppStoreSession()
pages = AppStoreEntry(app_id=361309726, country="de", session=session)
numbers = AppStoreEntry(app_id=361304891, country="de", session=session)
# ...
```
Second, you can pass several parameters to `AppStoreSession` to control how
requests to the App Store are handled. For instance:
```python
session = AppStoreSession(
# Wait between 0.4 and 0.6 seconds before every request after the first,
# to avoid being rate-limited by Apple's servers
delay=0.5,
delay_jitter=0.1,
# Retry failed requests up to 5 times, with an initial backoff time of
# 0.5 seconds that doubles after each failed retry (but is capped at 20
# seconds)
retries=5,
retries_backoff_factor=3,
retries_backoff_max=20,
)
```
For a list of all available parameters with descriptions, see the docstring
of the `AppStoreSession` class.
[urllib3-pool]: https://urllib3.readthedocs.io/en/stable/reference/urllib3.poolmanager.html
## How It Works
To fetch app reviews, this library uses the undocumented iTunes Customer Reviews
API, which offers the following endpoint to retrieve the pages of an app's
reviews feed in JSON format:
```
https://itunes.apple.com/{country}/rss/customerreviews/page={page}/id={app_id}/sortby=mostrecent/json
```
`app_id` and `country` are the respective values passed to the `AppStoreEntry`.
`page` is a number between 1 and 10 (the highest allowed page number). Each
page contains up to 50 reviews, which results in the maximum number of 500
reviews per app ID and country.
## License
`app-store-web-scraper` is licensed under the Apache License, Version 2.0.
See the [LICENSE](./LICENSE) file for more details.
[license]: https://github.com/futurice/app-store-web-scraper/blob/main/LICENCE
## Acknowledgements
This library is a fork and rewrite of [`app-store-scraper`][original] by [Eric
Lim][eric-lim] and takes further inspiration from the [Node.js package of the
same name][npm-package] by [Facundo Olando][facundo-olando]. Without these
authors' efforts, this library would not exist. 💚
[original]: https://pypi.org/project/app-store-scraper/
[npm-package]: https://www.npmjs.com/package/app-store-scraper
[eric-lim]: https://github.com/cowboy-bebug
[facundo-olando]: https://github.com/facundoolano
Raw data
{
"_id": null,
"home_page": null,
"name": "app-store-web-scraper",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "app store, ios, ios apps, podcasts, review, scraper, scraping",
"author": "Denis Washington, Eric Lim",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/4d/cb/6ce95fb4a3572fa374ab5b9940c98233fe79b1edd8fedfdc94276f9dc6cb/app_store_web_scraper-0.2.0.tar.gz",
"platform": null,
"description": "# \ud83c\udf4f\ud83d\udd0d App Store Web Scraper\n\n`app-store-web-scraper` is a Python package for extracting reviews for iOS,\niPadOS, macOS and tvOS apps from the web version of Apple's App Store.\n\n> __Note:__ Whenever possible, prefer using Apple's [App Store Connect\n> API][connect], which provides the full set of customer review data for your\n> apps and is more reliable.\n\n* [Installation](#installation)\n* [Basic Usage](#basic-usage)\n * [App IDs](#app-ids)\n * [App Store Entries](#app-store-entries)\n * [App Reviews](#app-reviews)\n* [Advanced Usage](#advanced-usage)\n * [Sessions](#sessions)\n* [How It Works](#how-it-works)\n* [License](#license)\n* [Acknowledgements](#acknowledgements)\n\n[connect]: https://developer.apple.com/app-store-connect/api/\n\n## Installation\n\nAt the moment, this package not yet available on PyPI, but can be installed\ndirectly from GitHub as a source package:\n\n```sh\npip install app-store-web-scraper\n```\n\n## Basic Usage\n\nThe sample code below fetches the 10 most recent app reviews for\n[Minecraft][minecraft].\n\n```python\nfrom app_store_web_scraper import AppStoreEntry\n\n# See below for instructions on finding an app's ID.\nMINECRAFT_APP_ID = 479516143\n\n# Look up the app in the British version of the App Store.\napp = AppStoreEntry(app_id=MINECRAFT_APP_ID, country=\"gb\")\n\n# Iterate over the app's reviews, which are fetched lazily in batches.\nfor review in app.reviews():\n print(\"-----\")\n print(\"ID:\", review.id)\n print(\"Rating:\", review.rating)\n print(\"Review:\", review.content)\n```\n\n[minecraft]: https://apps.apple.com/gb/app/multicraft-build-and-mine/id1174039276\n\n### App IDs\n\n`app-store-web-scraper` requires you to pass the App Store ID of the app(s) you\nare interested in. To find an app's ID, find out its App Store page URL by\nsearching for the app on [apple.com][apple] or in the App Store app (use\n_Share_ \u2192 _Copy Link_). The ID is the last part of the URL's path, without the\n\"id\" prefix.\n\nFor example, the URL of the Minecraft app (on the British App Store) is\n`https://apps.apple.com/gb/app/multicraft-build-and-mine/id1174039276`,\nfrom which one can extract the app ID `1174039276`.\n\n[apple]: https://www.apple.com/\n\n### App Store Entries\n\nTo start scraping an app in the App Store, create an `AppStoreEntry` instance\nwith an app ID and the (lowercase) ISO code of the country whose App Store\nshould be scraped. If the app ID is invalid or the app is not available in the\nspecified region, an `AppNotFound` error is raised.\n\nThe entry's `reviews()` method returns an iterator that can be used to fetch\nsome or all of the app reviews available through the App Store's public API.\nNote that this is usually only a small subset of all reviews that the app\nreceived, so the number of reviews retrieved will not match the review count\ndisplayed on the App Store page.\n\n### App Reviews\n\nEach review is returned as an `AppReview` object with the following attributes:\n\n- `id`\n- `date`\n- `user_name`\n- `rating`\n- `title`\n- `review`\n- `developer_response` (if the developer replied)\n - `id`\n - `body`\n - `modified`\n\n\nThe list of reviews split into pages by the App Store's servers, so iterating\nover all reviews will regularly make a network request to fetch the next page.\nTo limit the total amount of network requests, you can pass a `limit` to\n`reviews()` so that only a certain maximum amount of app reviews is returned by\nthe iterator. By default, no limit is set.\n\n## Advanced Usage\n\n### Sessions\n\nThe `AppStoreSession` class implements the communication with the App Store's\nservers. Internally, it uses an [`urllib3.PoolManager`][urllib3-pool] the reuse\nHTTP connections between requests, which reduces the load on Apple's servers\nand increases performance.\n\nBy default, `AppStoreEntry` takes care of creating an `AppStoreSession` itself,\nso you don't need to deal with sessions for simple use cases. However,\nconstructing and passing an `AppStoreSession` manually can be beneficial for two\nreasons. First, it allows you to share a session between multiple\n`AppStoreEntry` objects for additional efficiency:\n\n```python\nfrom app_store_web_scraper import AppStoreEntry, AppStoreSession\n\nsession = AppStoreSession()\npages = AppStoreEntry(app_id=361309726, country=\"de\", session=session)\nnumbers = AppStoreEntry(app_id=361304891, country=\"de\", session=session)\n\n# ...\n```\n\nSecond, you can pass several parameters to `AppStoreSession` to control how\nrequests to the App Store are handled. For instance:\n\n```python\nsession = AppStoreSession(\n # Wait between 0.4 and 0.6 seconds before every request after the first,\n # to avoid being rate-limited by Apple's servers\n delay=0.5,\n delay_jitter=0.1,\n\n # Retry failed requests up to 5 times, with an initial backoff time of\n # 0.5 seconds that doubles after each failed retry (but is capped at 20\n # seconds)\n retries=5,\n retries_backoff_factor=3,\n retries_backoff_max=20,\n)\n```\n\nFor a list of all available parameters with descriptions, see the docstring\nof the `AppStoreSession` class.\n\n[urllib3-pool]: https://urllib3.readthedocs.io/en/stable/reference/urllib3.poolmanager.html\n\n## How It Works\n\nTo fetch app reviews, this library uses the undocumented iTunes Customer Reviews\nAPI, which offers the following endpoint to retrieve the pages of an app's\nreviews feed in JSON format:\n\n```\nhttps://itunes.apple.com/{country}/rss/customerreviews/page={page}/id={app_id}/sortby=mostrecent/json\n```\n\n`app_id` and `country` are the respective values passed to the `AppStoreEntry`.\n`page` is a number between 1 and 10 (the highest allowed page number). Each\npage contains up to 50 reviews, which results in the maximum number of 500\nreviews per app ID and country.\n\n## License\n\n`app-store-web-scraper` is licensed under the Apache License, Version 2.0.\nSee the [LICENSE](./LICENSE) file for more details.\n\n[license]: https://github.com/futurice/app-store-web-scraper/blob/main/LICENCE\n\n## Acknowledgements\n\nThis library is a fork and rewrite of [`app-store-scraper`][original] by [Eric\nLim][eric-lim] and takes further inspiration from the [Node.js package of the\nsame name][npm-package] by [Facundo Olando][facundo-olando]. Without these\nauthors' efforts, this library would not exist. \ud83d\udc9a\n\n[original]: https://pypi.org/project/app-store-scraper/\n[npm-package]: https://www.npmjs.com/package/app-store-scraper\n[eric-lim]: https://github.com/cowboy-bebug\n[facundo-olando]: https://github.com/facundoolano\n",
"bugtrack_url": null,
"license": null,
"summary": "Scrape reviews from Apple's App Store",
"version": "0.2.0",
"project_urls": {
"Homepage": "https://github.com/futurice/app-store-web-scraper",
"Source": "https://github.com/futurice/app-store-web-scraper"
},
"split_keywords": [
"app store",
" ios",
" ios apps",
" podcasts",
" review",
" scraper",
" scraping"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0fd7e78d462952d9db2f7a0edce7b11075c32b62430d82fdf6aef4884eaa7925",
"md5": "1c500ff7936c15ac5e88be7bac334a67",
"sha256": "4cd250e52169f48e29f361f323dbf1456d776af7ea54a087d7d3f95cbcbf3686"
},
"downloads": -1,
"filename": "app_store_web_scraper-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1c500ff7936c15ac5e88be7bac334a67",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 13201,
"upload_time": "2024-06-22T12:54:58",
"upload_time_iso_8601": "2024-06-22T12:54:58.615459Z",
"url": "https://files.pythonhosted.org/packages/0f/d7/e78d462952d9db2f7a0edce7b11075c32b62430d82fdf6aef4884eaa7925/app_store_web_scraper-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4dcb6ce95fb4a3572fa374ab5b9940c98233fe79b1edd8fedfdc94276f9dc6cb",
"md5": "acf770ca01ee7102509cd1695f6df666",
"sha256": "d1d78d882e85b054cfb08f0521b29e8e549cc568c37c4dc23c8b98a276d4216e"
},
"downloads": -1,
"filename": "app_store_web_scraper-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "acf770ca01ee7102509cd1695f6df666",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 12624,
"upload_time": "2024-06-22T12:54:59",
"upload_time_iso_8601": "2024-06-22T12:54:59.582171Z",
"url": "https://files.pythonhosted.org/packages/4d/cb/6ce95fb4a3572fa374ab5b9940c98233fe79b1edd8fedfdc94276f9dc6cb/app_store_web_scraper-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-22 12:54:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "futurice",
"github_project": "app-store-web-scraper",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "app-store-web-scraper"
}