<h1> Twitter scraper selenium </h1>
<p> Python's package to scrape Twitter's front-end easily with selenium. </p>
[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.8](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-360/)
[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)
<!--TABLE of contents-->
<h2> Table of Contents </h2>
<details open="open">
<summary>Table of Contents</summary>
<ol>
<li>
<a href="#getting-started">Getting Started</a>
<ul>
<li><a href="#Prerequisites">Prerequisites</a></li>
<li><a href="#Installation">Installation</a>
<ul>
<li><a href="#sourceInstallation">Installing from source</a></li>
<li><a href="#pypiInstallation">Installing with PyPI</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#Usage">Usage</a>
<ul><li><a href="#availableFunction">Available Functions in this package- Summary</a></li></ul>
<ul><li><a href="#profileDetail">Scraping profile's details</a>
<ul>
<li><a href="#profileDetailExample">In JSON Format - Example</a></li>
<li><a href="#profileDetailArgument">Function Argument</a></li>
<li><a href="#profileDetailKeys">Keys of the output</a></li>
</ul>
</li></ul>
<!---->
<ul>
<li><a href="#profile">Scraping profile's tweets</a>
<ul>
<li><a href="#profileJson">In JSON format - Example</a></li>
<li><a href="#profileCSV">In CSV format - Example</a></li>
<li><a href="#profileArgument">Function Arguments</a></li>
<li><a href="#profileOutput">Keys of the output data</a></li>
</ul>
<li><a href='#to-scrape-user-tweets-with-api'>Scraping user's tweet using API</a></li>
<ul>
<li><a href='#to-scrape-user-tweets-with-api'>In JSON format - Example</a></li>
<li><a href='#users_api_parameter'>Function Arguments</a></li>
<li><a href='#scrape_user_with_api_args_keys'>Keys of the output</a></li>
</ul>
<li><a href="#proxy">Using scraper with proxy</a>
<ul>
<li><a href="#unauthenticatedProxy">Unauthenticated Proxy</a></li>
<li><a href="#authenticatedProxy">Authenticated Proxy</a></li>
</ul>
</li>
</li>
</ul>
</li>
<li><a href="#privacy">Privacy</a></li>
<li><a href="#license">License</a></li>
</ol>
</details>
<!--TABLE of contents //-->
<br>
<hr>
<h2 id="Prerequisites">Prerequisites </h2>
<li> Internet Connection </li>
<li> Python 3.6+ </li>
<li> Chrome or Firefox browser installed on your machine </li>
<hr>
<h2 id="Installation"> Installation </h2>
<h3 id="sourceInstallation">Installing from the source</h3>
<p>Download the source code or clone it with:<p>
```
git clone https://github.com/shaikhsajid1111/twitter-scraper-selenium
```
<p>Open terminal inside the downloaded folder:</p>
<br>
```
python3 setup.py install
```
<h3 id="pypiInstallation">
Installing with <a href="https://pypi.org">PyPI</a>
</h3>
```
pip3 install twitter-scraper-selenium
```
<hr>
<h2 id="Usage">
Usage</h2>
<h3 id="availableFunction">Available Function In this Package - Summary</h3>
<div>
<table>
<thead>
<tr>
<td>Function Name</td>
<td>Function Description</td>
<td>Scraping Method</td>
<td>Scraping Speed</td>
</tr>
</thead>
<tr>
<td><code>scrape_profile()</code></td>
<td>Scrape's Twitter user's profile tweets</td>
<td>Browser Automation</td>
<td>Slow</td>
</tr>
<tr>
<td><code>get_profile_details()</code></td>
<td>Scrape's Twitter user details.</td>
<td>HTTP Request</td>
<td>Fast</td>
</tr>
<tr>
<td><code>scrape_profile_with_api()</code></td>
<td>Scrape's Twitter tweets by twitter profile username. It expects the username of the profile</td>
<td>Browser Automation & HTTP Request</td>
<td>Fast</td>
</tr>
</table>
<p>
Note: HTTP Request Method sends the request to Twitter's API directly for scraping data, and Browser Automation visits that page, scroll while collecting the data.</p>
</div>
<br>
<hr>
<h3 id="profileDetail">To scrape twitter profile details:</h3>
<div id="profileDetailExample">
```python
from twitter_scraper_selenium import get_profile_details
twitter_username = "TwitterAPI"
filename = "twitter_api_data"
browser = "firefox"
headless = True
get_profile_details(twitter_username=twitter_username, filename=filename, browser=browser, headless=headless)
```
Output:
```js
{
"id": 6253282,
"id_str": "6253282",
"name": "Twitter API",
"screen_name": "TwitterAPI",
"location": "San Francisco, CA",
"profile_location": null,
"description": "The Real Twitter API. Tweets about API changes, service issues and our Developer Platform. Don't get an answer? It's on my website.",
"url": "https:\/\/t.co\/8IkCzCDr19",
"entities": {
"url": {
"urls": [{
"url": "https:\/\/t.co\/8IkCzCDr19",
"expanded_url": "https:\/\/developer.twitter.com",
"display_url": "developer.twitter.com",
"indices": [
0,
23
]
}]
},
"description": {
"urls": []
}
},
"protected": false,
"followers_count": 6133636,
"friends_count": 12,
"listed_count": 12936,
"created_at": "Wed May 23 06:01:13 +0000 2007",
"favourites_count": 31,
"utc_offset": null,
"time_zone": null,
"geo_enabled": null,
"verified": true,
"statuses_count": 3656,
"lang": null,
"contributors_enabled": null,
"is_translator": null,
"is_translation_enabled": null,
"profile_background_color": null,
"profile_background_image_url": null,
"profile_background_image_url_https": null,
"profile_background_tile": null,
"profile_image_url": null,
"profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/942858479592554497\/BbazLO9L_normal.jpg",
"profile_banner_url": null,
"profile_link_color": null,
"profile_sidebar_border_color": null,
"profile_sidebar_fill_color": null,
"profile_text_color": null,
"profile_use_background_image": null,
"has_extended_profile": null,
"default_profile": false,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null,
"translator_type": null
}
```
</div>
<br>
<div id="profileDetailArgument">
<p><code>get_profile_details()</code> arguments:</p>
<table>
<thead>
<tr>
<td>Argument</td>
<td>Argument Type</td>
<td>Description</td>
</tr>
</thead>
<tbody>
<tr>
<td>twitter_username</td>
<td>String</td>
<td>Twitter Username</td>
</tr>
<tr>
<td>output_filename</td>
<td>String</td>
<td>What should be the filename where output is stored?.</td>
</tr>
<tr>
<td>output_dir</td>
<td>String</td>
<td>What directory output file should be saved?</td>
</tr>
<tr>
<td>proxy</td>
<td>String</td>
<td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>
</tr>
</tbody>
</table>
</div>
<hr>
<br>
<div>
<h4 id="profileDetailKeys">Keys of the output:</p>
Detail of each key can be found <a href="https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/user">here</a>.</h4>
</div>
<br>
<hr>
<h3 id="profile">To scrape profile's tweets:</h3>
<p id="profileJson">In JSON format:</p>
```python
from twitter_scraper_selenium import scrape_profile
microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
print(microsoft)
```
Output:
```javascript
{
"1430938749840629773": {
"tweet_id": "1430938749840629773",
"username": "Microsoft",
"name": "Microsoft",
"profile_picture": "https://twitter.com/Microsoft/photo",
"replies": 29,
"retweets": 58,
"likes": 453,
"is_retweet": false,
"retweet_link": "",
"posted_time": "2021-08-26T17:02:38+00:00",
"content": "Easy to use and efficient for all \u2013 Windows 11 is committed to an accessible future.\n\nHere's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW ",
"hashtags": [],
"mentions": [],
"images": [],
"videos": [],
"tweet_url": "https://twitter.com/Microsoft/status/1430938749840629773",
"link": "https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC"
},...
}
```
<hr>
<p id="profileCSV">In CSV format:</p>
```python
from twitter_scraper_selenium import scrape_profile
scrape_profile(twitter_username="microsoft",output_format="csv",browser="firefox",tweets_count=10,filename="microsoft",directory="/home/user/Downloads")
```
Output:
<br>
<table class="table table-bordered table-hover table-condensed" style="line-height: 14px;overflow:hidden;white-space: nowrap">
<thead><tr><th title="Field #1">tweet_id</th>
<th title="Field #2">username</th>
<th title="Field #3">name</th>
<th title="Field #4">profile_picture</th>
<th title="Field #5">replies</th>
<th title="Field #6">retweets</th>
<th title="Field #7">likes</th>
<th title="Field #8">is_retweet</th>
<th title="Field #9">retweet_link</th>
<th title="Field #10">posted_time</th>
<th title="Field #11">content</th>
<th title="Field #12">hashtags</th>
<th title="Field #13">mentions</th>
<th title="Field #14">images</th>
<th title="Field #15">videos</th>
<th title="Field #16">post_url</th>
<th title="Field #17">link</th>
</tr></thead>
<tbody><tr>
<td>1430938749840629773</td>
<td>Microsoft</td>
<td>Microsoft</td>
<td>https://twitter.com/Microsoft/photo</td>
<td align="right">64</td>
<td align="right">75</td>
<td align="right">521</td>
<td>False</td>
<td> </td>
<td>2021-08-26T17:02:38+00:00</td>
<td>Easy to use and efficient for all – Windows 11 is committed to an accessible future.<br/><br/>Here's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW </td>
<td>[]</td>
<td>[]</td>
<td>[]</td>
<td>[]</td>
<td>https://twitter.com/Microsoft/status/1430938749840629773</td>
<td>https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC</td>
</tr>
</tbody>
</table>
<p>...</p>
<br><hr>
<div id="profileArgument">
<p><code>scrape_profile()</code> arguments:</p>
<table>
<thead>
<tr>
<td>Argument</td>
<td>Argument Type</td>
<td>Description</td>
</tr>
</thead>
<tbody>
<tr>
<td>twitter_username</td>
<td>String</td>
<td>Twitter username of the account</td>
</tr>
<tr>
<td>browser</td>
<td>String</td>
<td>Which browser to use for scraping?, Only 2 are supported Chrome and Firefox. Default is set to Firefox</td>
</tr>
<tr>
<td>proxy</td>
<td>String</td>
<td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>
</tr>
<tr>
<td>tweets_count</td>
<td>Integer</td>
<td>Number of posts to scrape. Default is 10.</td>
</tr>
<tr>
<td>output_format</td>
<td>String</td>
<td>The output format, whether JSON or CSV. Default is JSON.</td>
</tr>
<tr>
<td>filename</td>
<td>String</td>
<td>If output parameter is set to CSV, then it is necessary for filename parameter to passed. If not passed then the filename will be same as username passed.</td>
</tr>
<tr>
<td>directory</td>
<td>String</td>
<td>If output_format parameter is set to CSV, then it is valid for directory parameter to be passed. If not passed then CSV file will be saved in current working directory.</td>
</tr>
<tr>
<td>headless</td>
<td>Boolean</td>
<td>Whether to run crawler headlessly?. Default is <code>True</code></td>
</tr>
</tbody>
</table>
</div>
<hr>
<br>
<div id="profileOutput">
<p>Keys of the output</p>
<table>
<thead>
<tr>
<td>Key</td>
<td>Type</td>
<td>Description</td>
</tr>
</thead>
<tbody>
<tr>
<td>tweet_id</td>
<td>String</td>
<td>Post Identifier(integer casted inside string)</td>
</tr>
<tr>
<td>username</td>
<td>String</td>
<td>Username of the profile</td>
</tr>
<tr>
<td>name</td>
<td>String</td>
<td>Name of the profile</td>
</tr>
<tr>
<td>profile_picture</td>
<td>String</td>
<td>Profile Picture link</td>
</tr>
<tr>
<td>replies</td>
<td>Integer</td>
<td>Number of replies of tweet</td>
</tr>
<tr>
<td>retweets</td>
<td>Integer</td>
<td>Number of retweets of tweet</td>
</tr>
<tr>
<td>likes</td>
<td>Integer</td>
<td>Number of likes of tweet</td>
</tr>
<tr>
<td>is_retweet</td>
<td>boolean</td>
<td>Is the tweet a retweet?</td>
</tr>
<tr>
<td>retweet_link</td>
<td>String</td>
<td>If it is retweet, then the retweet link else it'll be empty string</td>
</tr>
<tr>
<td>posted_time</td>
<td>String</td>
<td>Time when tweet was posted in ISO 8601 format</td>
</tr>
<tr>
<td>content</td>
<td>String</td>
<td>content of tweet as text</td>
</tr>
<tr>
<td>hashtags</td>
<td>Array</td>
<td>Hashtags presents in tweet, if they're present in tweet</td>
</tr>
<tr>
<td>mentions</td>
<td>Array</td>
<td>Mentions presents in tweet, if they're present in tweet</td>
</tr>
<tr>
<td>images</td>
<td>Array</td>
<td>Images links, if they're present in tweet</td>
</tr>
<tr>
<td>videos</td>
<td>Array</td>
<td>Videos links, if they're present in tweet</td>
</tr>
<tr>
<td>tweet_url</td>
<td>String</td>
<td>URL of the tweet</td>
</tr>
<tr>
<td>link</td>
<td>String</td>
<td>If any link is present inside tweet for some external website. </td>
</tr>
</tbody>
</table>
</div>
<br>
<hr>
<div id="to-scrape-user-tweets-with-api">
<p>To Scrap profile's tweets with API:</p>
```python
from twitter_scraper_selenium import scrape_profile_with_api
scrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)
```
</div>
<br>
<div id="users_api_parameter">
<p><code>scrape_profile_with_api()</code> Arguments:<p>
<table>
<thead>
<tr>
<td>Argument</td>
<td>Argument Type</td>
<td>Description</td>
</tr>
</thead>
<tbody>
<tr>
<td>username</td>
<td>String</td>
<td>Twitter's Profile username</td>
</tr>
<tr>
<td>tweets_count</td>
<td>Integer</td>
<td>Number of tweets to scrape.</td>
</tr>
<tr>
<td>output_filename</td>
<td>String</td>
<td>What should be the filename where output is stored?.</td>
</tr>
<tr>
<td>output_dir</td>
<td>String</td>
<td>What directory output file should be saved?</td>
</tr>
<tr>
<td>proxy</td>
<td>String</td>
<td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>
</tr>
<tr>
<td>browser</td>
<td>String</td>
<td>Which browser to use for extracting out graphql key. Default is firefox.</td>
</tr>
<tr>
<td>headless</td>
<td>String</td>
<td>Whether to run browser in headless mode?</td>
</tr>
</tbody>
</table>
</div>
<br>
<div id="scrape_user_with_api_args_keys"> <p>Output:<p>
```js
{
"1608939190548598784": {
"tweet_url" : "https://twitter.com/elonmusk/status/1608939190548598784",
"tweet_details":{
...
},
"user_details":{
...
}
}, ...
}
```
</div>
<br>
<hr>
</div>
<h3 id="proxy"> Using scraper with proxy (http proxy) </h3>
<div id="unauthenticatedProxy">
<p>Just pass <code>proxy</code> argument to function.</p>
```python
from twitter_scraper_selenium import scrape_profile
scrape_profile("elonmusk", headless=False, proxy="66.115.38.247:5678", output_format="csv",filename="musk") #In IP:PORT format
```
</div>
<br>
<div id="authenticatedProxy">
<p> Proxy that requires authentication: </p>
```python
from twitter_scraper_selenium import scrape_profile
microsoft_data = scrape_profile(twitter_username="microsoft", browser="chrome", tweets_count=10, output="json",
proxy="sajid:pass123@66.115.38.247:5678") # username:password@IP:PORT
print(microsoft_data)
```
</div>
<br>
<hr>
<div id="privacy">
<h2>Privacy</h2>
<p>
This scraper only scrapes public data available to unauthenticated user and does not holds the capability to scrape anything private.
</p>
</div>
<br>
<hr>
<div id="license">
<h2>LICENSE</h2>
MIT
</div>
Raw data
{
"_id": null,
"home_page": "https://github.com/shaikhsajid1111/twitter-scraper-selenium",
"name": "twitter-scraper-selenium",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "web-scraping selenium social media twitter keyword twitter-profile twitter-keywords automation json csv twitter-hashtag hashtag",
"author": "Sajid Shaikh",
"author_email": "shaikhsajid3732@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/90/ad/69f7ef85b67c90ca62e4cf951813692139b7cdba3abe5efa6ff412016b57/twitter_scraper_selenium-6.2.2.tar.gz",
"platform": null,
"description": "<h1> Twitter scraper selenium </h1>\r\n<p> Python's package to scrape Twitter's front-end easily with selenium. </p>\r\n\r\n\r\n[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.8](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-360/)\r\n[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)\r\n\r\n<!--TABLE of contents-->\r\n<h2> Table of Contents </h2>\r\n<details open=\"open\">\r\n <summary>Table of Contents</summary>\r\n <ol>\r\n <li>\r\n <a href=\"#getting-started\">Getting Started</a>\r\n <ul>\r\n <li><a href=\"#Prerequisites\">Prerequisites</a></li>\r\n <li><a href=\"#Installation\">Installation</a>\r\n <ul>\r\n <li><a href=\"#sourceInstallation\">Installing from source</a></li>\r\n <li><a href=\"#pypiInstallation\">Installing with PyPI</a></li>\r\n </ul>\r\n </li>\r\n </ul>\r\n </li>\r\n <li><a href=\"#Usage\">Usage</a>\r\n <ul><li><a href=\"#availableFunction\">Available Functions in this package- Summary</a></li></ul>\r\n <ul><li><a href=\"#profileDetail\">Scraping profile's details</a>\r\n <ul>\r\n <li><a href=\"#profileDetailExample\">In JSON Format - Example</a></li>\r\n <li><a href=\"#profileDetailArgument\">Function Argument</a></li>\r\n <li><a href=\"#profileDetailKeys\">Keys of the output</a></li>\r\n </ul>\r\n </li></ul>\r\n <!---->\r\n <ul>\r\n <li><a href=\"#profile\">Scraping profile's tweets</a>\r\n <ul>\r\n <li><a href=\"#profileJson\">In JSON format - Example</a></li>\r\n <li><a href=\"#profileCSV\">In CSV format - Example</a></li>\r\n <li><a href=\"#profileArgument\">Function Arguments</a></li>\r\n <li><a href=\"#profileOutput\">Keys of the output data</a></li>\r\n </ul>\r\n <li><a href='#to-scrape-user-tweets-with-api'>Scraping user's tweet using API</a></li>\r\n <ul>\r\n <li><a href='#to-scrape-user-tweets-with-api'>In JSON format - Example</a></li>\r\n <li><a href='#users_api_parameter'>Function Arguments</a></li>\r\n <li><a href='#scrape_user_with_api_args_keys'>Keys of the output</a></li>\r\n </ul>\r\n <li><a href=\"#proxy\">Using scraper with proxy</a>\r\n <ul>\r\n <li><a href=\"#unauthenticatedProxy\">Unauthenticated Proxy</a></li>\r\n <li><a href=\"#authenticatedProxy\">Authenticated Proxy</a></li>\r\n </ul>\r\n </li>\r\n </li>\r\n </ul>\r\n </li>\r\n <li><a href=\"#privacy\">Privacy</a></li>\r\n <li><a href=\"#license\">License</a></li>\r\n </ol>\r\n</details>\r\n\r\n<!--TABLE of contents //-->\r\n<br>\r\n<hr>\r\n<h2 id=\"Prerequisites\">Prerequisites </h2>\r\n<li> Internet Connection </li>\r\n<li> Python 3.6+ </li>\r\n<li> Chrome or Firefox browser installed on your machine </li>\r\n<hr>\r\n<h2 id=\"Installation\"> Installation </h2>\r\n<h3 id=\"sourceInstallation\">Installing from the source</h3>\r\n<p>Download the source code or clone it with:<p>\r\n\r\n```\r\ngit clone https://github.com/shaikhsajid1111/twitter-scraper-selenium\r\n```\r\n\r\n<p>Open terminal inside the downloaded folder:</p>\r\n\r\n<br>\r\n\r\n```\r\n python3 setup.py install\r\n```\r\n\r\n<h3 id=\"pypiInstallation\">\r\nInstalling with <a href=\"https://pypi.org\">PyPI</a>\r\n</h3>\r\n\r\n```\r\npip3 install twitter-scraper-selenium\r\n```\r\n\r\n<hr>\r\n<h2 id=\"Usage\">\r\nUsage</h2>\r\n<h3 id=\"availableFunction\">Available Function In this Package - Summary</h3>\r\n<div>\r\n<table>\r\n<thead>\r\n<tr>\r\n<td>Function Name</td>\r\n<td>Function Description</td>\r\n<td>Scraping Method</td>\r\n<td>Scraping Speed</td>\r\n</tr>\r\n</thead>\r\n<tr>\r\n<td><code>scrape_profile()</code></td>\r\n<td>Scrape's Twitter user's profile tweets</td>\r\n<td>Browser Automation</td>\r\n<td>Slow</td>\r\n</tr>\r\n<tr>\r\n<td><code>get_profile_details()</code></td>\r\n<td>Scrape's Twitter user details.</td>\r\n<td>HTTP Request</td>\r\n<td>Fast</td>\r\n</tr>\r\n<tr>\r\n<td><code>scrape_profile_with_api()</code></td>\r\n<td>Scrape's Twitter tweets by twitter profile username. It expects the username of the profile</td>\r\n<td>Browser Automation & HTTP Request</td>\r\n<td>Fast</td>\r\n</tr>\r\n</table>\r\n<p>\r\nNote: HTTP Request Method sends the request to Twitter's API directly for scraping data, and Browser Automation visits that page, scroll while collecting the data.</p>\r\n</div>\r\n<br>\r\n<hr>\r\n<h3 id=\"profileDetail\">To scrape twitter profile details:</h3>\r\n<div id=\"profileDetailExample\">\r\n\r\n```python\r\nfrom twitter_scraper_selenium import get_profile_details\r\n\r\ntwitter_username = \"TwitterAPI\"\r\nfilename = \"twitter_api_data\"\r\nbrowser = \"firefox\"\r\nheadless = True\r\nget_profile_details(twitter_username=twitter_username, filename=filename, browser=browser, headless=headless)\r\n\r\n```\r\nOutput:\r\n```js\r\n{\r\n\t\"id\": 6253282,\r\n\t\"id_str\": \"6253282\",\r\n\t\"name\": \"Twitter API\",\r\n\t\"screen_name\": \"TwitterAPI\",\r\n\t\"location\": \"San Francisco, CA\",\r\n\t\"profile_location\": null,\r\n\t\"description\": \"The Real Twitter API. Tweets about API changes, service issues and our Developer Platform. Don't get an answer? It's on my website.\",\r\n\t\"url\": \"https:\\/\\/t.co\\/8IkCzCDr19\",\r\n\t\"entities\": {\r\n\t\t\"url\": {\r\n\t\t\t\"urls\": [{\r\n\t\t\t\t\"url\": \"https:\\/\\/t.co\\/8IkCzCDr19\",\r\n\t\t\t\t\"expanded_url\": \"https:\\/\\/developer.twitter.com\",\r\n\t\t\t\t\"display_url\": \"developer.twitter.com\",\r\n\t\t\t\t\"indices\": [\r\n\t\t\t\t\t0,\r\n\t\t\t\t\t23\r\n\t\t\t\t]\r\n\t\t\t}]\r\n\t\t},\r\n\t\t\"description\": {\r\n\t\t\t\"urls\": []\r\n\t\t}\r\n\t},\r\n\t\"protected\": false,\r\n\t\"followers_count\": 6133636,\r\n\t\"friends_count\": 12,\r\n\t\"listed_count\": 12936,\r\n\t\"created_at\": \"Wed May 23 06:01:13 +0000 2007\",\r\n\t\"favourites_count\": 31,\r\n\t\"utc_offset\": null,\r\n\t\"time_zone\": null,\r\n\t\"geo_enabled\": null,\r\n\t\"verified\": true,\r\n\t\"statuses_count\": 3656,\r\n\t\"lang\": null,\r\n\t\"contributors_enabled\": null,\r\n\t\"is_translator\": null,\r\n\t\"is_translation_enabled\": null,\r\n\t\"profile_background_color\": null,\r\n\t\"profile_background_image_url\": null,\r\n\t\"profile_background_image_url_https\": null,\r\n\t\"profile_background_tile\": null,\r\n\t\"profile_image_url\": null,\r\n\t\"profile_image_url_https\": \"https:\\/\\/pbs.twimg.com\\/profile_images\\/942858479592554497\\/BbazLO9L_normal.jpg\",\r\n\t\"profile_banner_url\": null,\r\n\t\"profile_link_color\": null,\r\n\t\"profile_sidebar_border_color\": null,\r\n\t\"profile_sidebar_fill_color\": null,\r\n\t\"profile_text_color\": null,\r\n\t\"profile_use_background_image\": null,\r\n\t\"has_extended_profile\": null,\r\n\t\"default_profile\": false,\r\n\t\"default_profile_image\": false,\r\n\t\"following\": null,\r\n\t\"follow_request_sent\": null,\r\n\t\"notifications\": null,\r\n\t\"translator_type\": null\r\n}\r\n```\r\n</div>\r\n<br>\r\n<div id=\"profileDetailArgument\">\r\n<p><code>get_profile_details()</code> arguments:</p>\r\n\r\n<table>\r\n <thead>\r\n <tr>\r\n <td>Argument</td>\r\n <td>Argument Type</td>\r\n <td>Description</td>\r\n </tr>\r\n </thead>\r\n <tbody>\r\n <tr>\r\n <td>twitter_username</td>\r\n <td>String</td>\r\n <td>Twitter Username</td>\r\n </tr>\r\n <tr>\r\n <td>output_filename</td>\r\n <td>String</td>\r\n <td>What should be the filename where output is stored?.</td>\r\n </tr>\r\n <tr>\r\n <td>output_dir</td>\r\n <td>String</td>\r\n <td>What directory output file should be saved?</td>\r\n </tr>\r\n <tr>\r\n <td>proxy</td>\r\n <td>String</td>\r\n <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>\r\n </tr>\r\n </tbody>\r\n</table>\r\n\r\n</div>\r\n<hr>\r\n<br>\r\n<div>\r\n<h4 id=\"profileDetailKeys\">Keys of the output:</p>\r\nDetail of each key can be found <a href=\"https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/user\">here</a>.</h4>\r\n</div>\r\n<br>\r\n<hr>\r\n<h3 id=\"profile\">To scrape profile's tweets:</h3>\r\n<p id=\"profileJson\">In JSON format:</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_profile\r\n\r\nmicrosoft = scrape_profile(twitter_username=\"microsoft\",output_format=\"json\",browser=\"firefox\",tweets_count=10)\r\nprint(microsoft)\r\n```\r\nOutput:\r\n```javascript\r\n{\r\n \"1430938749840629773\": {\r\n \"tweet_id\": \"1430938749840629773\",\r\n \"username\": \"Microsoft\",\r\n \"name\": \"Microsoft\",\r\n \"profile_picture\": \"https://twitter.com/Microsoft/photo\",\r\n \"replies\": 29,\r\n \"retweets\": 58,\r\n \"likes\": 453,\r\n \"is_retweet\": false,\r\n \"retweet_link\": \"\",\r\n \"posted_time\": \"2021-08-26T17:02:38+00:00\",\r\n \"content\": \"Easy to use and efficient for all \\u2013 Windows 11 is committed to an accessible future.\\n\\nHere's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW \",\r\n \"hashtags\": [],\r\n \"mentions\": [],\r\n \"images\": [],\r\n \"videos\": [],\r\n \"tweet_url\": \"https://twitter.com/Microsoft/status/1430938749840629773\",\r\n \"link\": \"https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC\"\r\n },...\r\n}\r\n```\r\n<hr>\r\n<p id=\"profileCSV\">In CSV format:</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_profile\r\n\r\n\r\nscrape_profile(twitter_username=\"microsoft\",output_format=\"csv\",browser=\"firefox\",tweets_count=10,filename=\"microsoft\",directory=\"/home/user/Downloads\")\r\n\r\n\r\n```\r\n\r\nOutput:\r\n<br>\r\n<table class=\"table table-bordered table-hover table-condensed\" style=\"line-height: 14px;overflow:hidden;white-space: nowrap\">\r\n<thead><tr><th title=\"Field #1\">tweet_id</th>\r\n<th title=\"Field #2\">username</th>\r\n<th title=\"Field #3\">name</th>\r\n<th title=\"Field #4\">profile_picture</th>\r\n<th title=\"Field #5\">replies</th>\r\n<th title=\"Field #6\">retweets</th>\r\n<th title=\"Field #7\">likes</th>\r\n<th title=\"Field #8\">is_retweet</th>\r\n<th title=\"Field #9\">retweet_link</th>\r\n<th title=\"Field #10\">posted_time</th>\r\n<th title=\"Field #11\">content</th>\r\n<th title=\"Field #12\">hashtags</th>\r\n<th title=\"Field #13\">mentions</th>\r\n<th title=\"Field #14\">images</th>\r\n<th title=\"Field #15\">videos</th>\r\n<th title=\"Field #16\">post_url</th>\r\n<th title=\"Field #17\">link</th>\r\n</tr></thead>\r\n<tbody><tr>\r\n<td>1430938749840629773</td>\r\n<td>Microsoft</td>\r\n<td>Microsoft</td>\r\n<td>https://twitter.com/Microsoft/photo</td>\r\n<td align=\"right\">64</td>\r\n<td align=\"right\">75</td>\r\n<td align=\"right\">521</td>\r\n<td>False</td>\r\n<td> </td>\r\n<td>2021-08-26T17:02:38+00:00</td>\r\n<td>Easy to use and efficient for all \u2013 Windows 11 is committed to an accessible future.<br/><br/>Here's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW </td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>https://twitter.com/Microsoft/status/1430938749840629773</td>\r\n<td>https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC</td>\r\n</tr>\r\n\r\n</tbody>\r\n</table>\r\n<p>...</p>\r\n\r\n<br><hr>\r\n<div id=\"profileArgument\">\r\n<p><code>scrape_profile()</code> arguments:</p>\r\n\r\n<table>\r\n <thead>\r\n <tr>\r\n <td>Argument</td>\r\n <td>Argument Type</td>\r\n <td>Description</td>\r\n </tr>\r\n </thead>\r\n <tbody>\r\n <tr>\r\n <td>twitter_username</td>\r\n <td>String</td>\r\n <td>Twitter username of the account</td>\r\n </tr>\r\n <tr>\r\n <td>browser</td>\r\n <td>String</td>\r\n <td>Which browser to use for scraping?, Only 2 are supported Chrome and Firefox. Default is set to Firefox</td>\r\n </tr>\r\n <tr>\r\n <td>proxy</td>\r\n <td>String</td>\r\n <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>\r\n </tr>\r\n <tr>\r\n <td>tweets_count</td>\r\n <td>Integer</td>\r\n <td>Number of posts to scrape. Default is 10.</td>\r\n </tr>\r\n <tr>\r\n <td>output_format</td>\r\n <td>String</td>\r\n <td>The output format, whether JSON or CSV. Default is JSON.</td>\r\n </tr>\r\n <tr>\r\n <td>filename</td>\r\n <td>String</td>\r\n <td>If output parameter is set to CSV, then it is necessary for filename parameter to passed. If not passed then the filename will be same as username passed.</td>\r\n </tr>\r\n <tr>\r\n <td>directory</td>\r\n <td>String</td>\r\n <td>If output_format parameter is set to CSV, then it is valid for directory parameter to be passed. If not passed then CSV file will be saved in current working directory.</td>\r\n </tr>\r\n <tr>\r\n <td>headless</td>\r\n <td>Boolean</td>\r\n <td>Whether to run crawler headlessly?. Default is <code>True</code></td>\r\n </tr>\r\n </tbody>\r\n</table>\r\n\r\n</div>\r\n<hr>\r\n<br>\r\n<div id=\"profileOutput\">\r\n<p>Keys of the output</p>\r\n\r\n<table>\r\n <thead>\r\n <tr>\r\n <td>Key</td>\r\n <td>Type</td>\r\n <td>Description</td>\r\n </tr>\r\n </thead>\r\n <tbody>\r\n <tr>\r\n <td>tweet_id</td>\r\n <td>String</td>\r\n <td>Post Identifier(integer casted inside string)</td>\r\n </tr>\r\n <tr>\r\n <td>username</td>\r\n <td>String</td>\r\n <td>Username of the profile</td>\r\n </tr>\r\n <tr>\r\n <td>name</td>\r\n <td>String</td>\r\n <td>Name of the profile</td>\r\n </tr>\r\n <tr>\r\n <td>profile_picture</td>\r\n <td>String</td>\r\n <td>Profile Picture link</td>\r\n </tr>\r\n <tr>\r\n <td>replies</td>\r\n <td>Integer</td>\r\n <td>Number of replies of tweet</td>\r\n </tr>\r\n <tr>\r\n <td>retweets</td>\r\n <td>Integer</td>\r\n <td>Number of retweets of tweet</td>\r\n </tr>\r\n <tr>\r\n <td>likes</td>\r\n <td>Integer</td>\r\n <td>Number of likes of tweet</td>\r\n </tr>\r\n <tr>\r\n <td>is_retweet</td>\r\n <td>boolean</td>\r\n <td>Is the tweet a retweet?</td>\r\n </tr>\r\n <tr>\r\n <td>retweet_link</td>\r\n <td>String</td>\r\n <td>If it is retweet, then the retweet link else it'll be empty string</td>\r\n </tr>\r\n <tr>\r\n <td>posted_time</td>\r\n <td>String</td>\r\n <td>Time when tweet was posted in ISO 8601 format</td>\r\n </tr>\r\n <tr>\r\n <td>content</td>\r\n <td>String</td>\r\n <td>content of tweet as text</td>\r\n </tr>\r\n <tr>\r\n <td>hashtags</td>\r\n <td>Array</td>\r\n <td>Hashtags presents in tweet, if they're present in tweet</td>\r\n </tr>\r\n <tr>\r\n <td>mentions</td>\r\n <td>Array</td>\r\n <td>Mentions presents in tweet, if they're present in tweet</td>\r\n </tr>\r\n <tr>\r\n <td>images</td>\r\n <td>Array</td>\r\n <td>Images links, if they're present in tweet</td>\r\n </tr>\r\n <tr>\r\n <td>videos</td>\r\n <td>Array</td>\r\n <td>Videos links, if they're present in tweet</td>\r\n </tr>\r\n <tr>\r\n <td>tweet_url</td>\r\n <td>String</td>\r\n <td>URL of the tweet</td>\r\n </tr>\r\n <tr>\r\n <td>link</td>\r\n <td>String</td>\r\n <td>If any link is present inside tweet for some external website. </td>\r\n </tr>\r\n </tbody>\r\n</table>\r\n</div>\r\n<br>\r\n<hr>\r\n<div id=\"to-scrape-user-tweets-with-api\">\r\n\r\n<p>To Scrap profile's tweets with API:</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_profile_with_api\r\n\r\nscrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)\r\n```\r\n</div>\r\n<br>\r\n<div id=\"users_api_parameter\">\r\n<p><code>scrape_profile_with_api()</code> Arguments:<p>\r\n<table>\r\n <thead>\r\n <tr>\r\n <td>Argument</td>\r\n <td>Argument Type</td>\r\n <td>Description</td>\r\n </tr>\r\n </thead>\r\n <tbody>\r\n <tr>\r\n <td>username</td>\r\n <td>String</td>\r\n <td>Twitter's Profile username</td>\r\n </tr>\r\n <tr>\r\n <td>tweets_count</td>\r\n <td>Integer</td>\r\n <td>Number of tweets to scrape.</td>\r\n </tr>\r\n <tr>\r\n <td>output_filename</td>\r\n <td>String</td>\r\n <td>What should be the filename where output is stored?.</td>\r\n </tr>\r\n <tr>\r\n <td>output_dir</td>\r\n <td>String</td>\r\n <td>What directory output file should be saved?</td>\r\n </tr>\r\n <tr>\r\n <td>proxy</td>\r\n <td>String</td>\r\n <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>\r\n </tr>\r\n <tr>\r\n <td>browser</td>\r\n <td>String</td>\r\n <td>Which browser to use for extracting out graphql key. Default is firefox.</td>\r\n </tr>\r\n <tr>\r\n <td>headless</td>\r\n <td>String</td>\r\n <td>Whether to run browser in headless mode?</td>\r\n </tr>\r\n </tbody>\r\n</table>\r\n</div>\r\n<br>\r\n<div id=\"scrape_user_with_api_args_keys\"> <p>Output:<p>\r\n\r\n```js\r\n{\r\n \"1608939190548598784\": {\r\n \"tweet_url\" : \"https://twitter.com/elonmusk/status/1608939190548598784\",\r\n \"tweet_details\":{\r\n ...\r\n },\r\n \"user_details\":{\r\n ...\r\n }\r\n }, ...\r\n}\r\n```\r\n\r\n</div>\r\n<br>\r\n<hr>\r\n</div>\r\n\r\n<h3 id=\"proxy\"> Using scraper with proxy (http proxy) </h3>\r\n\r\n<div id=\"unauthenticatedProxy\">\r\n<p>Just pass <code>proxy</code> argument to function.</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_profile\r\n\r\nscrape_profile(\"elonmusk\", headless=False, proxy=\"66.115.38.247:5678\", output_format=\"csv\",filename=\"musk\") #In IP:PORT format\r\n\r\n```\r\n</div>\r\n\r\n<br>\r\n<div id=\"authenticatedProxy\">\r\n<p> Proxy that requires authentication: </p>\r\n\r\n```python\r\n\r\nfrom twitter_scraper_selenium import scrape_profile\r\n\r\nmicrosoft_data = scrape_profile(twitter_username=\"microsoft\", browser=\"chrome\", tweets_count=10, output=\"json\",\r\n proxy=\"sajid:pass123@66.115.38.247:5678\") # username:password@IP:PORT\r\nprint(microsoft_data)\r\n\r\n\r\n```\r\n\r\n</div>\r\n<br>\r\n<hr>\r\n<div id=\"privacy\">\r\n<h2>Privacy</h2>\r\n\r\n<p>\r\nThis scraper only scrapes public data available to unauthenticated user and does not holds the capability to scrape anything private.\r\n</p>\r\n</div>\r\n<br>\r\n<hr>\r\n<div id=\"license\">\r\n<h2>LICENSE</h2>\r\n\r\nMIT\r\n</div>\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python package to scrap twitter's front-end easily with selenium",
"version": "6.2.2",
"project_urls": {
"Homepage": "https://github.com/shaikhsajid1111/twitter-scraper-selenium"
},
"split_keywords": [
"web-scraping",
"selenium",
"social",
"media",
"twitter",
"keyword",
"twitter-profile",
"twitter-keywords",
"automation",
"json",
"csv",
"twitter-hashtag",
"hashtag"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f5f4264ead446b60e811352f93f79cc526e60480125c2d2910458b8f522d1aa9",
"md5": "73f2c2e7262eb774e0c6e8b029429b7c",
"sha256": "b8ae2d4df81ce1260955567af7f1bb6acf99b6988c9e1ff0db493fb3b5a9bd02"
},
"downloads": -1,
"filename": "twitter_scraper_selenium-6.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "73f2c2e7262eb774e0c6e8b029429b7c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 32632,
"upload_time": "2024-09-07T14:56:37",
"upload_time_iso_8601": "2024-09-07T14:56:37.216705Z",
"url": "https://files.pythonhosted.org/packages/f5/f4/264ead446b60e811352f93f79cc526e60480125c2d2910458b8f522d1aa9/twitter_scraper_selenium-6.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "90ad69f7ef85b67c90ca62e4cf951813692139b7cdba3abe5efa6ff412016b57",
"md5": "a136d2eeb0ef58de54a1c17874dc1ef5",
"sha256": "a8f5886dac3055967cf001ddfc6e8202844dc3ee9a9595bdf1477af1344f75f3"
},
"downloads": -1,
"filename": "twitter_scraper_selenium-6.2.2.tar.gz",
"has_sig": false,
"md5_digest": "a136d2eeb0ef58de54a1c17874dc1ef5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 27296,
"upload_time": "2024-09-07T14:56:39",
"upload_time_iso_8601": "2024-09-07T14:56:39.094627Z",
"url": "https://files.pythonhosted.org/packages/90/ad/69f7ef85b67c90ca62e4cf951813692139b7cdba3abe5efa6ff412016b57/twitter_scraper_selenium-6.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-07 14:56:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "shaikhsajid1111",
"github_project": "twitter-scraper-selenium",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "twitter-scraper-selenium"
}