Name | instagraper JSON |
Version |
0.1.6
JSON |
| download |
home_page | |
Summary | |
upload_time | 2024-01-13 15:49:09 |
maintainer | |
docs_url | None |
author | Francisco Macedo |
requires_python | >=3.10,<4.0 |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# `instagraper`
Scrape instagram profile posts including the corresponding locations. It can preform cleaning and produce json, geojson or leaflet map outputs.
It can be used as a [python package](#python-package) or as a [CLI tool](#cli).
# Example
**Installation**
```console
pip install instagraper[cli]
```
**Usage**
```console
instagraper anthonybourdain --x-ig-app-id your-ig-app-id --session-id your-session-id --map --images
```
which creates the following example map (in `anthonybourdain.html`), with each location plotted, the corresponding instagram post details and link:

# Authentication
To use this tool, you need to get the instagram app id and session id and set them as environment variables (`X_IG_APP_ID` and `SESSION_ID`, as shown on [`.env.example`](.env.example)) or pass them as variables for both the python package or the CLI tool.
This variables can be extracted from the developer tools, saved locally on your machine and they will last for a long time:
1. In your browser, navigate to [instagram](https://www.instagram.com/) and authenticate
2. open the developer tools (right click on some blank space on the webpage and click "Inspect")
3. To get the `X_IG_APP_ID` variable, go to the `Elements` tab, press `CTRL+F` (or `CMD+F`) and search for `X-IG-App-ID`. You should see a large number next to it - copy it and save it.
4. To get the `SESSION_ID` variable, go to the `Application` tab > `Cookies` > `https://www.instagram.com` > and search for `sessionid`. copy it's value and save it.
## Python package
### Installation
```bash
pip install "instagraper"
```
### Usage
as mentioned in [authentication](#authentication), you need 2 keys to authenticate through instagram. They can be saved as environment variables or passed as parameters, like this:
```python
import instagraper
posts = instagraper.scrape(x_ig_app_id="your-x-ig-app-id", session_id="your-session-id")
```
In the following examples, it's assumed that this variables were saved as environment variables.
#### Get posts
returns a list of `Post` dataclass defined [here](./instagraper/models.py).
```python
import instagraper
posts = instagraper.scrape("anthonybourdain", compact=True)
print(posts)
"""
[
Post(
taken_at=datetime.datetime(2018, 6, 4, 11, 48, 9),
username='anthonybourdain',
caption='Light lunch. #Alsace',
lng=11.25,
lat=43.7833,
image_url='https://scontent-lhr6-1.cdninstagram.com/v/t51...',
pk='1794233220902862216',
id='1794233220902862216_6113104',
code='BjmZZuwHr2I',
user_id='6113104',
...
),
...
]
"""
```
#### Dump posts into a json file
```python
import instagraper
# in raw format
instagraper.scrape("anthonybourdain", json_output="anthonybourdain_posts.json")
# creates the "anthonybourdain_posts.json" file
# in compact format
instagraper.scrape("anthonybourdain", compact=True, json_output="anthonybourdain_compact_posts.json")
# creates the "anthonybourdain_compact_posts.json" file
```
#### Dumps posts into geojson
Creates geojson points with the posts that have a location (lat and lng).
```python
import instagraper
instagraper.scrape("anthonybourdain", geojson_output="anthonybourdain_posts.geojson")
# creates the "anthonybourdain_posts.geojson" file
```
#### Creates map with locations
Creates a leaflet map which uses the generated geojson file. If not provided, the geojson_output file is created with the corresponding username as it's file name.
```python
import instagraper
# with a geojson output
instagraper.scrape("anthonybourdain", geojson_output="anthonybourdain_posts.geojson", map_output="anthonybourdain.html")
# creates the "anthonybourdain.html" and "anthonybourdain_posts.geojson" files.
# without a geojson output it still creates one, using the username as the default file name
instagraper.scrape("anthonybourdain", map_output="anthonybourdain.html")
# creates the "anthonybourdain.html" and "anthonybourdain.geojson" files.
```
It's also possible to download each post image, so it's path is saved in the geojson file and plotted in the map:
```python
import instagraper
instagraper.scrape("anthonybourdain", map_output="anthonybourdain.html", with_images=True)
# creates an "images" directory with each post cover image
```
## CLI
To use the CLI program, you need to install it first:
```bash
pip install "instagraper[cli]"
```
### Usage:
as mentioned in [authentication](#authentication), you need 2 keys to authenticate through instagram. They can be saved as environment variables or passed as CLI options.
```console
$ instagraper USERNAME [OPTIONS]
```
#### Arguments:
```console
* username TEXT The Instagram username to scrape posts from. [default: None] [required]
```
#### Options:
```console
--x-ig-app-id TEXT Instagram app id (x-ig-app-id) header to authenticate the requests. If not provided, the tool will try to read it from the environment variable X_IG_APP_ID [default: None]
--session-id TEXT Instagram session id (sessionid) cookie to authenticate the requests. If not provided, the tool will try to read it from the environment variable SESSION_ID [default: None]
--compact -c Wether to cleanup the scraped posts [default: True]
--json -j TEXT The file name to save the scraped posts in JSON format. The file path will be {target}/{json_output}. [default: None]
--geojson -g TEXT The file name to save the scraped posts in GeoJSON format. If map is enabled, it will be used as the input file for the map and will default to {username}.geojson. The file path will be {target}/{geojson_output}. [default: None]
--map -m TEXT The html file name to save the generated map. The file path will be {target}/{map_output}. [default: None]
--target -t TEXT the target path/directory to save the output files. Defaults to a directory with the instagram username as it's name, e.g ./{username}/ [default: None]
--images -i whether to download post's images or not. The images will be saved in the {target}/images directory.
--static-url -s TEXT The static url/path where the target directory will be hosted. Used to serve the images for the geojson output. e.g. if https://example.com/instagraper/ images will be in https://example.com/instagraper/{target}/images/ [default: None]
--limit -l INTEGER The maximum number of posts to scrape. If not provided, all posts will be scraped. [default: None]
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to copy it or customize the installation.
--help Show this message and exit.
```
### Examples
1. To use the tool with default settings (compact JSON output), you just need to provide the Instagram username:
```bash
instagraper anthonybourdain
```
2. To dump the posts to a json file named `anthonybourdain.json`
```bash
instagraper anthonybourdain -j
```
3. To dump the posts to a geojson file named `anthonybourdain.geojson`
```bash
instagraper anthonybourdain -g
```
3. To create a map (`anthonybourdain.html`) that plots each post's location:
```bash
instagraper anthonybourdain -m
```
This will also create the above mentioned geojson file.
4. To also download images when creating the geojson file or map:
```bash
instagraper anthonybourdain -g -i
```
# TODO
- [ ] When the post location is only a city name (like "London"), the pins overalp each other and only the one top is clickable.
- [ ] Aggregate pins like Gmaps, "10", "20", etc...
Raw data
{
"_id": null,
"home_page": "",
"name": "instagraper",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "Francisco Macedo",
"author_email": "franciscovcbm@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/6e/18/b6e92fd976288eefceb58cfae54c15e8649a9159eea6e332b77f1014ebbf/instagraper-0.1.6.tar.gz",
"platform": null,
"description": "# `instagraper`\n\nScrape instagram profile posts including the corresponding locations. It can preform cleaning and produce json, geojson or leaflet map outputs.\n\nIt can be used as a [python package](#python-package) or as a [CLI tool](#cli).\n\n# Example\n\n**Installation**\n\n```console\n pip install instagraper[cli]\n```\n\n**Usage**\n\n```console\ninstagraper anthonybourdain --x-ig-app-id your-ig-app-id --session-id your-session-id --map --images\n```\n\nwhich creates the following example map (in `anthonybourdain.html`), with each location plotted, the corresponding instagram post details and link:\n\n\n# Authentication\n\nTo use this tool, you need to get the instagram app id and session id and set them as environment variables (`X_IG_APP_ID` and `SESSION_ID`, as shown on [`.env.example`](.env.example)) or pass them as variables for both the python package or the CLI tool.\n\nThis variables can be extracted from the developer tools, saved locally on your machine and they will last for a long time:\n\n1. In your browser, navigate to [instagram](https://www.instagram.com/) and authenticate\n2. open the developer tools (right click on some blank space on the webpage and click \"Inspect\")\n3. To get the `X_IG_APP_ID` variable, go to the `Elements` tab, press `CTRL+F` (or `CMD+F`) and search for `X-IG-App-ID`. You should see a large number next to it - copy it and save it.\n4. To get the `SESSION_ID` variable, go to the `Application` tab > `Cookies` > `https://www.instagram.com` > and search for `sessionid`. copy it's value and save it.\n\n## Python package\n\n### Installation\n\n```bash\npip install \"instagraper\"\n```\n\n### Usage\n\nas mentioned in [authentication](#authentication), you need 2 keys to authenticate through instagram. They can be saved as environment variables or passed as parameters, like this:\n\n```python\nimport instagraper\nposts = instagraper.scrape(x_ig_app_id=\"your-x-ig-app-id\", session_id=\"your-session-id\")\n```\n\nIn the following examples, it's assumed that this variables were saved as environment variables.\n\n#### Get posts\n\nreturns a list of `Post` dataclass defined [here](./instagraper/models.py).\n\n```python\nimport instagraper\n\nposts = instagraper.scrape(\"anthonybourdain\", compact=True)\nprint(posts)\n\"\"\"\n[\n Post(\n taken_at=datetime.datetime(2018, 6, 4, 11, 48, 9),\n username='anthonybourdain',\n caption='Light lunch. #Alsace',\n lng=11.25,\n lat=43.7833,\n image_url='https://scontent-lhr6-1.cdninstagram.com/v/t51...',\n pk='1794233220902862216',\n id='1794233220902862216_6113104',\n code='BjmZZuwHr2I',\n user_id='6113104',\n ...\n ),\n ...\n]\n\"\"\"\n```\n\n#### Dump posts into a json file\n\n```python\nimport instagraper\n# in raw format\ninstagraper.scrape(\"anthonybourdain\", json_output=\"anthonybourdain_posts.json\")\n# creates the \"anthonybourdain_posts.json\" file\n\n# in compact format\ninstagraper.scrape(\"anthonybourdain\", compact=True, json_output=\"anthonybourdain_compact_posts.json\")\n# creates the \"anthonybourdain_compact_posts.json\" file\n```\n\n#### Dumps posts into geojson\n\nCreates geojson points with the posts that have a location (lat and lng).\n\n```python\nimport instagraper\n\ninstagraper.scrape(\"anthonybourdain\", geojson_output=\"anthonybourdain_posts.geojson\")\n# creates the \"anthonybourdain_posts.geojson\" file\n\n```\n\n#### Creates map with locations\n\nCreates a leaflet map which uses the generated geojson file. If not provided, the geojson_output file is created with the corresponding username as it's file name.\n\n```python\nimport instagraper\n\n# with a geojson output\ninstagraper.scrape(\"anthonybourdain\", geojson_output=\"anthonybourdain_posts.geojson\", map_output=\"anthonybourdain.html\")\n# creates the \"anthonybourdain.html\" and \"anthonybourdain_posts.geojson\" files.\n\n# without a geojson output it still creates one, using the username as the default file name\ninstagraper.scrape(\"anthonybourdain\", map_output=\"anthonybourdain.html\")\n# creates the \"anthonybourdain.html\" and \"anthonybourdain.geojson\" files.\n```\n\nIt's also possible to download each post image, so it's path is saved in the geojson file and plotted in the map:\n\n```python\nimport instagraper\n\ninstagraper.scrape(\"anthonybourdain\", map_output=\"anthonybourdain.html\", with_images=True)\n# creates an \"images\" directory with each post cover image\n```\n\n## CLI\n\nTo use the CLI program, you need to install it first:\n\n```bash\npip install \"instagraper[cli]\"\n```\n\n### Usage:\n\nas mentioned in [authentication](#authentication), you need 2 keys to authenticate through instagram. They can be saved as environment variables or passed as CLI options.\n\n```console\n$ instagraper USERNAME [OPTIONS]\n```\n\n#### Arguments:\n\n```console\n * username TEXT The Instagram username to scrape posts from. [default: None] [required] \n```\n\n#### Options:\n```console\n\n--x-ig-app-id TEXT Instagram app id (x-ig-app-id) header to authenticate the requests. If not provided, the tool will try to read it from the environment variable X_IG_APP_ID [default: None] \n--session-id TEXT Instagram session id (sessionid) cookie to authenticate the requests. If not provided, the tool will try to read it from the environment variable SESSION_ID [default: None] \n--compact -c Wether to cleanup the scraped posts [default: True] \n--json -j TEXT The file name to save the scraped posts in JSON format. The file path will be {target}/{json_output}. [default: None] \n--geojson -g TEXT The file name to save the scraped posts in GeoJSON format. If map is enabled, it will be used as the input file for the map and will default to {username}.geojson. The file path will be {target}/{geojson_output}. [default: None]\n--map -m TEXT The html file name to save the generated map. The file path will be {target}/{map_output}. [default: None] \n--target -t TEXT the target path/directory to save the output files. Defaults to a directory with the instagram username as it's name, e.g ./{username}/ [default: None] \n--images -i whether to download post's images or not. The images will be saved in the {target}/images directory. \n--static-url -s TEXT The static url/path where the target directory will be hosted. Used to serve the images for the geojson output. e.g. if https://example.com/instagraper/ images will be in https://example.com/instagraper/{target}/images/ [default: None]\n--limit -l INTEGER The maximum number of posts to scrape. If not provided, all posts will be scraped. [default: None] \n--install-completion Install completion for the current shell. \n--show-completion Show completion for the current shell, to copy it or customize the installation. \n--help Show this message and exit. \n```\n\n### Examples\n\n1. To use the tool with default settings (compact JSON output), you just need to provide the Instagram username:\n\n```bash\ninstagraper anthonybourdain\n```\n\n2. To dump the posts to a json file named `anthonybourdain.json`\n\n```bash\ninstagraper anthonybourdain -j\n```\n\n3. To dump the posts to a geojson file named `anthonybourdain.geojson`\n\n```bash\ninstagraper anthonybourdain -g\n```\n\n3. To create a map (`anthonybourdain.html`) that plots each post's location:\n\n```bash\ninstagraper anthonybourdain -m\n```\n\nThis will also create the above mentioned geojson file.\n\n4. To also download images when creating the geojson file or map:\n\n```bash\ninstagraper anthonybourdain -g -i\n```\n\n\n# TODO\n- [ ] When the post location is only a city name (like \"London\"), the pins overalp each other and only the one top is clickable.\n- [ ] Aggregate pins like Gmaps, \"10\", \"20\", etc...\n ",
"bugtrack_url": null,
"license": "",
"summary": "",
"version": "0.1.6",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "570f18e4c54f4cfdd651f6437234b4d099a80117795ad25fbb103d4e72ee19af",
"md5": "4fea82569c16a9a75e6f001d07cc3eb7",
"sha256": "4889962b9b9412b2f6575f36e118d7d4387ac765222ea7c7494f919a717d315f"
},
"downloads": -1,
"filename": "instagraper-0.1.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4fea82569c16a9a75e6f001d07cc3eb7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10,<4.0",
"size": 13952,
"upload_time": "2024-01-13T15:49:07",
"upload_time_iso_8601": "2024-01-13T15:49:07.925779Z",
"url": "https://files.pythonhosted.org/packages/57/0f/18e4c54f4cfdd651f6437234b4d099a80117795ad25fbb103d4e72ee19af/instagraper-0.1.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6e18b6e92fd976288eefceb58cfae54c15e8649a9159eea6e332b77f1014ebbf",
"md5": "140e2f8acf6816b3206bd7b297439b0b",
"sha256": "b1bb67d312bfde4b1607d53c27dc68910de53d05be08b6c3544a9d395511ad4d"
},
"downloads": -1,
"filename": "instagraper-0.1.6.tar.gz",
"has_sig": false,
"md5_digest": "140e2f8acf6816b3206bd7b297439b0b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10,<4.0",
"size": 12213,
"upload_time": "2024-01-13T15:49:09",
"upload_time_iso_8601": "2024-01-13T15:49:09.712458Z",
"url": "https://files.pythonhosted.org/packages/6e/18/b6e92fd976288eefceb58cfae54c15e8649a9159eea6e332b77f1014ebbf/instagraper-0.1.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-13 15:49:09",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "instagraper"
}