twitter-scraper-selenium


Nametwitter-scraper-selenium JSON
Version 6.2.2 PyPI version JSON
download
home_pagehttps://github.com/shaikhsajid1111/twitter-scraper-selenium
SummaryPython package to scrap twitter's front-end easily with selenium
upload_time2024-09-07 14:56:39
maintainerNone
docs_urlNone
authorSajid Shaikh
requires_python>=3.8
licenseMIT
keywords web-scraping selenium social media twitter keyword twitter-profile twitter-keywords automation json csv twitter-hashtag hashtag
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1> Twitter scraper selenium </h1>
<p> Python's package to scrape Twitter's front-end easily with selenium.  </p>


[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.8](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-360/)
[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)

<!--TABLE of contents-->
<h2> Table of Contents </h2>
<details open="open">
  <summary>Table of Contents</summary>
  <ol>
    <li>
      <a href="#getting-started">Getting Started</a>
      <ul>
        <li><a href="#Prerequisites">Prerequisites</a></li>
        <li><a href="#Installation">Installation</a>
        <ul>
        <li><a href="#sourceInstallation">Installing from source</a></li>
        <li><a href="#pypiInstallation">Installing with PyPI</a></li>
        </ul>
        </li>
      </ul>
    </li>
    <li><a href="#Usage">Usage</a>
    <ul><li><a href="#availableFunction">Available Functions in this package- Summary</a></li></ul>
    <ul><li><a href="#profileDetail">Scraping profile's details</a>
    <ul>
    <li><a href="#profileDetailExample">In JSON Format - Example</a></li>
    <li><a href="#profileDetailArgument">Function Argument</a></li>
    <li><a href="#profileDetailKeys">Keys of the output</a></li>
    </ul>
    </li></ul>
    <!---->
    <ul>
    <li><a href="#profile">Scraping profile's tweets</a>
    <ul>
    <li><a href="#profileJson">In JSON format - Example</a></li>
    <li><a href="#profileCSV">In CSV format - Example</a></li>
    <li><a href="#profileArgument">Function Arguments</a></li>
    <li><a href="#profileOutput">Keys of the output data</a></li>
    </ul>
    <li><a href='#to-scrape-user-tweets-with-api'>Scraping user's tweet using API</a></li>
    <ul>
    <li><a href='#to-scrape-user-tweets-with-api'>In JSON format - Example</a></li>
    <li><a href='#users_api_parameter'>Function Arguments</a></li>
    <li><a href='#scrape_user_with_api_args_keys'>Keys of the output</a></li>
    </ul>
    <li><a href="#proxy">Using scraper with proxy</a>
    <ul>
    <li><a href="#unauthenticatedProxy">Unauthenticated Proxy</a></li>
    <li><a href="#authenticatedProxy">Authenticated Proxy</a></li>
    </ul>
    </li>
    </li>
    </ul>
    </li>
    <li><a href="#privacy">Privacy</a></li>
    <li><a href="#license">License</a></li>
  </ol>
</details>

<!--TABLE of contents //-->
<br>
<hr>
<h2 id="Prerequisites">Prerequisites </h2>
<li> Internet Connection </li>
<li> Python 3.6+ </li>
<li> Chrome or Firefox browser installed on your machine </li>
<hr>
<h2 id="Installation"> Installation </h2>
<h3 id="sourceInstallation">Installing from the source</h3>
<p>Download the source code or clone it with:<p>

```
git clone https://github.com/shaikhsajid1111/twitter-scraper-selenium
```

<p>Open terminal inside the downloaded folder:</p>

<br>

```
 python3 setup.py install
```

<h3 id="pypiInstallation">
Installing with <a href="https://pypi.org">PyPI</a>
</h3>

```
pip3 install twitter-scraper-selenium
```

<hr>
<h2 id="Usage">
Usage</h2>
<h3 id="availableFunction">Available Function In this Package - Summary</h3>
<div>
<table>
<thead>
<tr>
<td>Function Name</td>
<td>Function Description</td>
<td>Scraping Method</td>
<td>Scraping Speed</td>
</tr>
</thead>
<tr>
<td><code>scrape_profile()</code></td>
<td>Scrape's Twitter user's profile tweets</td>
<td>Browser Automation</td>
<td>Slow</td>
</tr>
<tr>
<td><code>get_profile_details()</code></td>
<td>Scrape's Twitter user details.</td>
<td>HTTP Request</td>
<td>Fast</td>
</tr>
<tr>
<td><code>scrape_profile_with_api()</code></td>
<td>Scrape's Twitter tweets by twitter profile username. It expects the username of the profile</td>
<td>Browser Automation & HTTP Request</td>
<td>Fast</td>
</tr>
</table>
<p>
Note: HTTP Request Method sends the request to Twitter's API directly for scraping data, and Browser Automation visits that page, scroll while collecting the data.</p>
</div>
<br>
<hr>
<h3 id="profileDetail">To scrape twitter profile details:</h3>
<div id="profileDetailExample">

```python
from twitter_scraper_selenium import get_profile_details

twitter_username = "TwitterAPI"
filename = "twitter_api_data"
browser = "firefox"
headless = True
get_profile_details(twitter_username=twitter_username, filename=filename, browser=browser, headless=headless)

```
Output:
```js
{
	"id": 6253282,
	"id_str": "6253282",
	"name": "Twitter API",
	"screen_name": "TwitterAPI",
	"location": "San Francisco, CA",
	"profile_location": null,
	"description": "The Real Twitter API. Tweets about API changes, service issues and our Developer Platform. Don't get an answer? It's on my website.",
	"url": "https:\/\/t.co\/8IkCzCDr19",
	"entities": {
		"url": {
			"urls": [{
				"url": "https:\/\/t.co\/8IkCzCDr19",
				"expanded_url": "https:\/\/developer.twitter.com",
				"display_url": "developer.twitter.com",
				"indices": [
					0,
					23
				]
			}]
		},
		"description": {
			"urls": []
		}
	},
	"protected": false,
	"followers_count": 6133636,
	"friends_count": 12,
	"listed_count": 12936,
	"created_at": "Wed May 23 06:01:13 +0000 2007",
	"favourites_count": 31,
	"utc_offset": null,
	"time_zone": null,
	"geo_enabled": null,
	"verified": true,
	"statuses_count": 3656,
	"lang": null,
	"contributors_enabled": null,
	"is_translator": null,
	"is_translation_enabled": null,
	"profile_background_color": null,
	"profile_background_image_url": null,
	"profile_background_image_url_https": null,
	"profile_background_tile": null,
	"profile_image_url": null,
	"profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/942858479592554497\/BbazLO9L_normal.jpg",
	"profile_banner_url": null,
	"profile_link_color": null,
	"profile_sidebar_border_color": null,
	"profile_sidebar_fill_color": null,
	"profile_text_color": null,
	"profile_use_background_image": null,
	"has_extended_profile": null,
	"default_profile": false,
	"default_profile_image": false,
	"following": null,
	"follow_request_sent": null,
	"notifications": null,
	"translator_type": null
}
```
</div>
<br>
<div id="profileDetailArgument">
<p><code>get_profile_details()</code> arguments:</p>

<table>
    <thead>
        <tr>
            <td>Argument</td>
            <td>Argument Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>twitter_username</td>
            <td>String</td>
            <td>Twitter Username</td>
        </tr>
        <tr>
            <td>output_filename</td>
            <td>String</td>
            <td>What should be the filename where output is stored?.</td>
        </tr>
        <tr>
            <td>output_dir</td>
            <td>String</td>
            <td>What directory output file should be saved?</td>
        </tr>
        <tr>
            <td>proxy</td>
            <td>String</td>
            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>
        </tr>
    </tbody>
</table>

</div>
<hr>
<br>
<div>
<h4 id="profileDetailKeys">Keys of the output:</p>
Detail of each key can be found <a href="https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/user">here</a>.</h4>
</div>
<br>
<hr>
<h3 id="profile">To scrape profile's tweets:</h3>
<p id="profileJson">In JSON format:</p>

```python
from twitter_scraper_selenium import scrape_profile

microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
print(microsoft)
```
Output:
```javascript
{
  "1430938749840629773": {
    "tweet_id": "1430938749840629773",
    "username": "Microsoft",
    "name": "Microsoft",
    "profile_picture": "https://twitter.com/Microsoft/photo",
    "replies": 29,
    "retweets": 58,
    "likes": 453,
    "is_retweet": false,
    "retweet_link": "",
    "posted_time": "2021-08-26T17:02:38+00:00",
    "content": "Easy to use and efficient for all \u2013 Windows 11 is committed to an accessible future.\n\nHere's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW ",
    "hashtags": [],
    "mentions": [],
    "images": [],
    "videos": [],
    "tweet_url": "https://twitter.com/Microsoft/status/1430938749840629773",
    "link": "https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC"
  },...
}
```
<hr>
<p id="profileCSV">In CSV format:</p>

```python
from twitter_scraper_selenium import scrape_profile


scrape_profile(twitter_username="microsoft",output_format="csv",browser="firefox",tweets_count=10,filename="microsoft",directory="/home/user/Downloads")


```

Output:
<br>
<table class="table table-bordered table-hover table-condensed" style="line-height: 14px;overflow:hidden;white-space: nowrap">
<thead><tr><th title="Field #1">tweet_id</th>
<th title="Field #2">username</th>
<th title="Field #3">name</th>
<th title="Field #4">profile_picture</th>
<th title="Field #5">replies</th>
<th title="Field #6">retweets</th>
<th title="Field #7">likes</th>
<th title="Field #8">is_retweet</th>
<th title="Field #9">retweet_link</th>
<th title="Field #10">posted_time</th>
<th title="Field #11">content</th>
<th title="Field #12">hashtags</th>
<th title="Field #13">mentions</th>
<th title="Field #14">images</th>
<th title="Field #15">videos</th>
<th title="Field #16">post_url</th>
<th title="Field #17">link</th>
</tr></thead>
<tbody><tr>
<td>1430938749840629773</td>
<td>Microsoft</td>
<td>Microsoft</td>
<td>https://twitter.com/Microsoft/photo</td>
<td align="right">64</td>
<td align="right">75</td>
<td align="right">521</td>
<td>False</td>
<td> </td>
<td>2021-08-26T17:02:38+00:00</td>
<td>Easy to use and efficient for all – Windows 11 is committed to an accessible future.<br/><br/>Here&#39;s how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW </td>
<td>[]</td>
<td>[]</td>
<td>[]</td>
<td>[]</td>
<td>https://twitter.com/Microsoft/status/1430938749840629773</td>
<td>https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC</td>
</tr>

</tbody>
</table>
<p>...</p>

<br><hr>
<div id="profileArgument">
<p><code>scrape_profile()</code> arguments:</p>

<table>
    <thead>
        <tr>
            <td>Argument</td>
            <td>Argument Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>twitter_username</td>
            <td>String</td>
            <td>Twitter username of the account</td>
        </tr>
        <tr>
            <td>browser</td>
            <td>String</td>
            <td>Which browser to use for scraping?, Only 2 are supported Chrome and Firefox. Default is set to Firefox</td>
        </tr>
        <tr>
            <td>proxy</td>
            <td>String</td>
            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>
        </tr>
        <tr>
            <td>tweets_count</td>
            <td>Integer</td>
            <td>Number of posts to scrape. Default is 10.</td>
        </tr>
        <tr>
            <td>output_format</td>
            <td>String</td>
            <td>The output format, whether JSON or CSV. Default is JSON.</td>
        </tr>
        <tr>
            <td>filename</td>
            <td>String</td>
            <td>If output parameter is set to CSV, then it is necessary for filename parameter to passed. If not passed then the filename will be same as username passed.</td>
        </tr>
        <tr>
            <td>directory</td>
            <td>String</td>
            <td>If output_format parameter is set to CSV, then it is valid for directory parameter to be passed. If not passed then CSV file will be saved in current working directory.</td>
        </tr>
        <tr>
            <td>headless</td>
            <td>Boolean</td>
            <td>Whether to run crawler headlessly?. Default is <code>True</code></td>
        </tr>
    </tbody>
</table>

</div>
<hr>
<br>
<div id="profileOutput">
<p>Keys of the output</p>

<table>
    <thead>
        <tr>
            <td>Key</td>
            <td>Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>tweet_id</td>
            <td>String</td>
            <td>Post Identifier(integer casted inside string)</td>
        </tr>
        <tr>
            <td>username</td>
            <td>String</td>
            <td>Username of the profile</td>
        </tr>
        <tr>
            <td>name</td>
            <td>String</td>
            <td>Name of the profile</td>
        </tr>
        <tr>
            <td>profile_picture</td>
            <td>String</td>
            <td>Profile Picture link</td>
        </tr>
        <tr>
            <td>replies</td>
            <td>Integer</td>
            <td>Number of replies of tweet</td>
        </tr>
        <tr>
            <td>retweets</td>
            <td>Integer</td>
            <td>Number of retweets of tweet</td>
        </tr>
        <tr>
            <td>likes</td>
            <td>Integer</td>
            <td>Number of likes of tweet</td>
        </tr>
        <tr>
            <td>is_retweet</td>
            <td>boolean</td>
            <td>Is the tweet a retweet?</td>
        </tr>
        <tr>
            <td>retweet_link</td>
            <td>String</td>
            <td>If it is retweet, then the retweet link else it'll be empty string</td>
        </tr>
        <tr>
            <td>posted_time</td>
            <td>String</td>
            <td>Time when tweet was posted in ISO 8601 format</td>
        </tr>
        <tr>
            <td>content</td>
            <td>String</td>
            <td>content of tweet as text</td>
        </tr>
        <tr>
            <td>hashtags</td>
            <td>Array</td>
            <td>Hashtags presents in tweet, if they're present in tweet</td>
        </tr>
        <tr>
            <td>mentions</td>
            <td>Array</td>
            <td>Mentions presents in tweet, if they're present in tweet</td>
        </tr>
        <tr>
            <td>images</td>
            <td>Array</td>
            <td>Images links, if they're present in tweet</td>
        </tr>
        <tr>
            <td>videos</td>
            <td>Array</td>
            <td>Videos links, if they're present in tweet</td>
        </tr>
        <tr>
            <td>tweet_url</td>
            <td>String</td>
            <td>URL of the tweet</td>
        </tr>
        <tr>
            <td>link</td>
            <td>String</td>
            <td>If any link is present inside tweet for some external website. </td>
        </tr>
    </tbody>
</table>
</div>
<br>
<hr>
<div id="to-scrape-user-tweets-with-api">

<p>To Scrap profile's tweets with API:</p>

```python
from twitter_scraper_selenium import scrape_profile_with_api

scrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)
```
</div>
<br>
<div id="users_api_parameter">
<p><code>scrape_profile_with_api()</code> Arguments:<p>
<table>
    <thead>
        <tr>
            <td>Argument</td>
            <td>Argument Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>username</td>
            <td>String</td>
            <td>Twitter's Profile username</td>
        </tr>
        <tr>
            <td>tweets_count</td>
            <td>Integer</td>
            <td>Number of tweets to scrape.</td>
        </tr>
        <tr>
            <td>output_filename</td>
            <td>String</td>
            <td>What should be the filename where output is stored?.</td>
        </tr>
        <tr>
            <td>output_dir</td>
            <td>String</td>
            <td>What directory output file should be saved?</td>
        </tr>
        <tr>
            <td>proxy</td>
            <td>String</td>
            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>
        </tr>
        <tr>
            <td>browser</td>
            <td>String</td>
            <td>Which browser to use for extracting out graphql key. Default is firefox.</td>
        </tr>
        <tr>
            <td>headless</td>
            <td>String</td>
            <td>Whether to run browser in headless mode?</td>
        </tr>
    </tbody>
</table>
</div>
<br>
<div id="scrape_user_with_api_args_keys"> <p>Output:<p>

```js
{
  "1608939190548598784": {
    "tweet_url" : "https://twitter.com/elonmusk/status/1608939190548598784",
    "tweet_details":{
      ...
    },
    "user_details":{
      ...
    }
  }, ...
}
```

</div>
<br>
<hr>
</div>

<h3 id="proxy"> Using scraper with proxy (http proxy) </h3>

<div id="unauthenticatedProxy">
<p>Just pass <code>proxy</code> argument to function.</p>

```python
from twitter_scraper_selenium import scrape_profile

scrape_profile("elonmusk", headless=False, proxy="66.115.38.247:5678", output_format="csv",filename="musk") #In IP:PORT format

```
</div>

<br>
<div id="authenticatedProxy">
<p> Proxy that requires authentication: </p>

```python

from twitter_scraper_selenium import scrape_profile

microsoft_data = scrape_profile(twitter_username="microsoft", browser="chrome", tweets_count=10, output="json",
                      proxy="sajid:pass123@66.115.38.247:5678")  #  username:password@IP:PORT
print(microsoft_data)


```

</div>
<br>
<hr>
<div id="privacy">
<h2>Privacy</h2>

<p>
This scraper only scrapes public data available to unauthenticated user and does not holds the capability to scrape anything private.
</p>
</div>
<br>
<hr>
<div id="license">
<h2>LICENSE</h2>

MIT
</div>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/shaikhsajid1111/twitter-scraper-selenium",
    "name": "twitter-scraper-selenium",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "web-scraping selenium social media twitter keyword twitter-profile twitter-keywords automation json csv twitter-hashtag hashtag",
    "author": "Sajid Shaikh",
    "author_email": "shaikhsajid3732@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/90/ad/69f7ef85b67c90ca62e4cf951813692139b7cdba3abe5efa6ff412016b57/twitter_scraper_selenium-6.2.2.tar.gz",
    "platform": null,
    "description": "<h1> Twitter scraper selenium </h1>\r\n<p> Python's package to scrape Twitter's front-end easily with selenium.  </p>\r\n\r\n\r\n[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.8](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-360/)\r\n[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)\r\n\r\n<!--TABLE of contents-->\r\n<h2> Table of Contents </h2>\r\n<details open=\"open\">\r\n  <summary>Table of Contents</summary>\r\n  <ol>\r\n    <li>\r\n      <a href=\"#getting-started\">Getting Started</a>\r\n      <ul>\r\n        <li><a href=\"#Prerequisites\">Prerequisites</a></li>\r\n        <li><a href=\"#Installation\">Installation</a>\r\n        <ul>\r\n        <li><a href=\"#sourceInstallation\">Installing from source</a></li>\r\n        <li><a href=\"#pypiInstallation\">Installing with PyPI</a></li>\r\n        </ul>\r\n        </li>\r\n      </ul>\r\n    </li>\r\n    <li><a href=\"#Usage\">Usage</a>\r\n    <ul><li><a href=\"#availableFunction\">Available Functions in this package- Summary</a></li></ul>\r\n    <ul><li><a href=\"#profileDetail\">Scraping profile's details</a>\r\n    <ul>\r\n    <li><a href=\"#profileDetailExample\">In JSON Format - Example</a></li>\r\n    <li><a href=\"#profileDetailArgument\">Function Argument</a></li>\r\n    <li><a href=\"#profileDetailKeys\">Keys of the output</a></li>\r\n    </ul>\r\n    </li></ul>\r\n    <!---->\r\n    <ul>\r\n    <li><a href=\"#profile\">Scraping profile's tweets</a>\r\n    <ul>\r\n    <li><a href=\"#profileJson\">In JSON format - Example</a></li>\r\n    <li><a href=\"#profileCSV\">In CSV format - Example</a></li>\r\n    <li><a href=\"#profileArgument\">Function Arguments</a></li>\r\n    <li><a href=\"#profileOutput\">Keys of the output data</a></li>\r\n    </ul>\r\n    <li><a href='#to-scrape-user-tweets-with-api'>Scraping user's tweet using API</a></li>\r\n    <ul>\r\n    <li><a href='#to-scrape-user-tweets-with-api'>In JSON format - Example</a></li>\r\n    <li><a href='#users_api_parameter'>Function Arguments</a></li>\r\n    <li><a href='#scrape_user_with_api_args_keys'>Keys of the output</a></li>\r\n    </ul>\r\n    <li><a href=\"#proxy\">Using scraper with proxy</a>\r\n    <ul>\r\n    <li><a href=\"#unauthenticatedProxy\">Unauthenticated Proxy</a></li>\r\n    <li><a href=\"#authenticatedProxy\">Authenticated Proxy</a></li>\r\n    </ul>\r\n    </li>\r\n    </li>\r\n    </ul>\r\n    </li>\r\n    <li><a href=\"#privacy\">Privacy</a></li>\r\n    <li><a href=\"#license\">License</a></li>\r\n  </ol>\r\n</details>\r\n\r\n<!--TABLE of contents //-->\r\n<br>\r\n<hr>\r\n<h2 id=\"Prerequisites\">Prerequisites </h2>\r\n<li> Internet Connection </li>\r\n<li> Python 3.6+ </li>\r\n<li> Chrome or Firefox browser installed on your machine </li>\r\n<hr>\r\n<h2 id=\"Installation\"> Installation </h2>\r\n<h3 id=\"sourceInstallation\">Installing from the source</h3>\r\n<p>Download the source code or clone it with:<p>\r\n\r\n```\r\ngit clone https://github.com/shaikhsajid1111/twitter-scraper-selenium\r\n```\r\n\r\n<p>Open terminal inside the downloaded folder:</p>\r\n\r\n<br>\r\n\r\n```\r\n python3 setup.py install\r\n```\r\n\r\n<h3 id=\"pypiInstallation\">\r\nInstalling with <a href=\"https://pypi.org\">PyPI</a>\r\n</h3>\r\n\r\n```\r\npip3 install twitter-scraper-selenium\r\n```\r\n\r\n<hr>\r\n<h2 id=\"Usage\">\r\nUsage</h2>\r\n<h3 id=\"availableFunction\">Available Function In this Package - Summary</h3>\r\n<div>\r\n<table>\r\n<thead>\r\n<tr>\r\n<td>Function Name</td>\r\n<td>Function Description</td>\r\n<td>Scraping Method</td>\r\n<td>Scraping Speed</td>\r\n</tr>\r\n</thead>\r\n<tr>\r\n<td><code>scrape_profile()</code></td>\r\n<td>Scrape's Twitter user's profile tweets</td>\r\n<td>Browser Automation</td>\r\n<td>Slow</td>\r\n</tr>\r\n<tr>\r\n<td><code>get_profile_details()</code></td>\r\n<td>Scrape's Twitter user details.</td>\r\n<td>HTTP Request</td>\r\n<td>Fast</td>\r\n</tr>\r\n<tr>\r\n<td><code>scrape_profile_with_api()</code></td>\r\n<td>Scrape's Twitter tweets by twitter profile username. It expects the username of the profile</td>\r\n<td>Browser Automation & HTTP Request</td>\r\n<td>Fast</td>\r\n</tr>\r\n</table>\r\n<p>\r\nNote: HTTP Request Method sends the request to Twitter's API directly for scraping data, and Browser Automation visits that page, scroll while collecting the data.</p>\r\n</div>\r\n<br>\r\n<hr>\r\n<h3 id=\"profileDetail\">To scrape twitter profile details:</h3>\r\n<div id=\"profileDetailExample\">\r\n\r\n```python\r\nfrom twitter_scraper_selenium import get_profile_details\r\n\r\ntwitter_username = \"TwitterAPI\"\r\nfilename = \"twitter_api_data\"\r\nbrowser = \"firefox\"\r\nheadless = True\r\nget_profile_details(twitter_username=twitter_username, filename=filename, browser=browser, headless=headless)\r\n\r\n```\r\nOutput:\r\n```js\r\n{\r\n\t\"id\": 6253282,\r\n\t\"id_str\": \"6253282\",\r\n\t\"name\": \"Twitter API\",\r\n\t\"screen_name\": \"TwitterAPI\",\r\n\t\"location\": \"San Francisco, CA\",\r\n\t\"profile_location\": null,\r\n\t\"description\": \"The Real Twitter API. Tweets about API changes, service issues and our Developer Platform. Don't get an answer? It's on my website.\",\r\n\t\"url\": \"https:\\/\\/t.co\\/8IkCzCDr19\",\r\n\t\"entities\": {\r\n\t\t\"url\": {\r\n\t\t\t\"urls\": [{\r\n\t\t\t\t\"url\": \"https:\\/\\/t.co\\/8IkCzCDr19\",\r\n\t\t\t\t\"expanded_url\": \"https:\\/\\/developer.twitter.com\",\r\n\t\t\t\t\"display_url\": \"developer.twitter.com\",\r\n\t\t\t\t\"indices\": [\r\n\t\t\t\t\t0,\r\n\t\t\t\t\t23\r\n\t\t\t\t]\r\n\t\t\t}]\r\n\t\t},\r\n\t\t\"description\": {\r\n\t\t\t\"urls\": []\r\n\t\t}\r\n\t},\r\n\t\"protected\": false,\r\n\t\"followers_count\": 6133636,\r\n\t\"friends_count\": 12,\r\n\t\"listed_count\": 12936,\r\n\t\"created_at\": \"Wed May 23 06:01:13 +0000 2007\",\r\n\t\"favourites_count\": 31,\r\n\t\"utc_offset\": null,\r\n\t\"time_zone\": null,\r\n\t\"geo_enabled\": null,\r\n\t\"verified\": true,\r\n\t\"statuses_count\": 3656,\r\n\t\"lang\": null,\r\n\t\"contributors_enabled\": null,\r\n\t\"is_translator\": null,\r\n\t\"is_translation_enabled\": null,\r\n\t\"profile_background_color\": null,\r\n\t\"profile_background_image_url\": null,\r\n\t\"profile_background_image_url_https\": null,\r\n\t\"profile_background_tile\": null,\r\n\t\"profile_image_url\": null,\r\n\t\"profile_image_url_https\": \"https:\\/\\/pbs.twimg.com\\/profile_images\\/942858479592554497\\/BbazLO9L_normal.jpg\",\r\n\t\"profile_banner_url\": null,\r\n\t\"profile_link_color\": null,\r\n\t\"profile_sidebar_border_color\": null,\r\n\t\"profile_sidebar_fill_color\": null,\r\n\t\"profile_text_color\": null,\r\n\t\"profile_use_background_image\": null,\r\n\t\"has_extended_profile\": null,\r\n\t\"default_profile\": false,\r\n\t\"default_profile_image\": false,\r\n\t\"following\": null,\r\n\t\"follow_request_sent\": null,\r\n\t\"notifications\": null,\r\n\t\"translator_type\": null\r\n}\r\n```\r\n</div>\r\n<br>\r\n<div id=\"profileDetailArgument\">\r\n<p><code>get_profile_details()</code> arguments:</p>\r\n\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Argument</td>\r\n            <td>Argument Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>twitter_username</td>\r\n            <td>String</td>\r\n            <td>Twitter Username</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_filename</td>\r\n            <td>String</td>\r\n            <td>What should be the filename where output is stored?.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_dir</td>\r\n            <td>String</td>\r\n            <td>What directory output file should be saved?</td>\r\n        </tr>\r\n        <tr>\r\n            <td>proxy</td>\r\n            <td>String</td>\r\n            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n\r\n</div>\r\n<hr>\r\n<br>\r\n<div>\r\n<h4 id=\"profileDetailKeys\">Keys of the output:</p>\r\nDetail of each key can be found <a href=\"https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/user\">here</a>.</h4>\r\n</div>\r\n<br>\r\n<hr>\r\n<h3 id=\"profile\">To scrape profile's tweets:</h3>\r\n<p id=\"profileJson\">In JSON format:</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_profile\r\n\r\nmicrosoft = scrape_profile(twitter_username=\"microsoft\",output_format=\"json\",browser=\"firefox\",tweets_count=10)\r\nprint(microsoft)\r\n```\r\nOutput:\r\n```javascript\r\n{\r\n  \"1430938749840629773\": {\r\n    \"tweet_id\": \"1430938749840629773\",\r\n    \"username\": \"Microsoft\",\r\n    \"name\": \"Microsoft\",\r\n    \"profile_picture\": \"https://twitter.com/Microsoft/photo\",\r\n    \"replies\": 29,\r\n    \"retweets\": 58,\r\n    \"likes\": 453,\r\n    \"is_retweet\": false,\r\n    \"retweet_link\": \"\",\r\n    \"posted_time\": \"2021-08-26T17:02:38+00:00\",\r\n    \"content\": \"Easy to use and efficient for all \\u2013 Windows 11 is committed to an accessible future.\\n\\nHere's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW \",\r\n    \"hashtags\": [],\r\n    \"mentions\": [],\r\n    \"images\": [],\r\n    \"videos\": [],\r\n    \"tweet_url\": \"https://twitter.com/Microsoft/status/1430938749840629773\",\r\n    \"link\": \"https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC\"\r\n  },...\r\n}\r\n```\r\n<hr>\r\n<p id=\"profileCSV\">In CSV format:</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_profile\r\n\r\n\r\nscrape_profile(twitter_username=\"microsoft\",output_format=\"csv\",browser=\"firefox\",tweets_count=10,filename=\"microsoft\",directory=\"/home/user/Downloads\")\r\n\r\n\r\n```\r\n\r\nOutput:\r\n<br>\r\n<table class=\"table table-bordered table-hover table-condensed\" style=\"line-height: 14px;overflow:hidden;white-space: nowrap\">\r\n<thead><tr><th title=\"Field #1\">tweet_id</th>\r\n<th title=\"Field #2\">username</th>\r\n<th title=\"Field #3\">name</th>\r\n<th title=\"Field #4\">profile_picture</th>\r\n<th title=\"Field #5\">replies</th>\r\n<th title=\"Field #6\">retweets</th>\r\n<th title=\"Field #7\">likes</th>\r\n<th title=\"Field #8\">is_retweet</th>\r\n<th title=\"Field #9\">retweet_link</th>\r\n<th title=\"Field #10\">posted_time</th>\r\n<th title=\"Field #11\">content</th>\r\n<th title=\"Field #12\">hashtags</th>\r\n<th title=\"Field #13\">mentions</th>\r\n<th title=\"Field #14\">images</th>\r\n<th title=\"Field #15\">videos</th>\r\n<th title=\"Field #16\">post_url</th>\r\n<th title=\"Field #17\">link</th>\r\n</tr></thead>\r\n<tbody><tr>\r\n<td>1430938749840629773</td>\r\n<td>Microsoft</td>\r\n<td>Microsoft</td>\r\n<td>https://twitter.com/Microsoft/photo</td>\r\n<td align=\"right\">64</td>\r\n<td align=\"right\">75</td>\r\n<td align=\"right\">521</td>\r\n<td>False</td>\r\n<td> </td>\r\n<td>2021-08-26T17:02:38+00:00</td>\r\n<td>Easy to use and efficient for all \u2013 Windows 11 is committed to an accessible future.<br/><br/>Here&#39;s how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW </td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>https://twitter.com/Microsoft/status/1430938749840629773</td>\r\n<td>https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC</td>\r\n</tr>\r\n\r\n</tbody>\r\n</table>\r\n<p>...</p>\r\n\r\n<br><hr>\r\n<div id=\"profileArgument\">\r\n<p><code>scrape_profile()</code> arguments:</p>\r\n\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Argument</td>\r\n            <td>Argument Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>twitter_username</td>\r\n            <td>String</td>\r\n            <td>Twitter username of the account</td>\r\n        </tr>\r\n        <tr>\r\n            <td>browser</td>\r\n            <td>String</td>\r\n            <td>Which browser to use for scraping?, Only 2 are supported Chrome and Firefox. Default is set to Firefox</td>\r\n        </tr>\r\n        <tr>\r\n            <td>proxy</td>\r\n            <td>String</td>\r\n            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>tweets_count</td>\r\n            <td>Integer</td>\r\n            <td>Number of posts to scrape. Default is 10.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_format</td>\r\n            <td>String</td>\r\n            <td>The output format, whether JSON or CSV. Default is JSON.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>filename</td>\r\n            <td>String</td>\r\n            <td>If output parameter is set to CSV, then it is necessary for filename parameter to passed. If not passed then the filename will be same as username passed.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>directory</td>\r\n            <td>String</td>\r\n            <td>If output_format parameter is set to CSV, then it is valid for directory parameter to be passed. If not passed then CSV file will be saved in current working directory.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>headless</td>\r\n            <td>Boolean</td>\r\n            <td>Whether to run crawler headlessly?. Default is <code>True</code></td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n\r\n</div>\r\n<hr>\r\n<br>\r\n<div id=\"profileOutput\">\r\n<p>Keys of the output</p>\r\n\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Key</td>\r\n            <td>Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>tweet_id</td>\r\n            <td>String</td>\r\n            <td>Post Identifier(integer casted inside string)</td>\r\n        </tr>\r\n        <tr>\r\n            <td>username</td>\r\n            <td>String</td>\r\n            <td>Username of the profile</td>\r\n        </tr>\r\n        <tr>\r\n            <td>name</td>\r\n            <td>String</td>\r\n            <td>Name of the profile</td>\r\n        </tr>\r\n        <tr>\r\n            <td>profile_picture</td>\r\n            <td>String</td>\r\n            <td>Profile Picture link</td>\r\n        </tr>\r\n        <tr>\r\n            <td>replies</td>\r\n            <td>Integer</td>\r\n            <td>Number of replies of tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>retweets</td>\r\n            <td>Integer</td>\r\n            <td>Number of retweets of tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>likes</td>\r\n            <td>Integer</td>\r\n            <td>Number of likes of tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>is_retweet</td>\r\n            <td>boolean</td>\r\n            <td>Is the tweet a retweet?</td>\r\n        </tr>\r\n        <tr>\r\n            <td>retweet_link</td>\r\n            <td>String</td>\r\n            <td>If it is retweet, then the retweet link else it'll be empty string</td>\r\n        </tr>\r\n        <tr>\r\n            <td>posted_time</td>\r\n            <td>String</td>\r\n            <td>Time when tweet was posted in ISO 8601 format</td>\r\n        </tr>\r\n        <tr>\r\n            <td>content</td>\r\n            <td>String</td>\r\n            <td>content of tweet as text</td>\r\n        </tr>\r\n        <tr>\r\n            <td>hashtags</td>\r\n            <td>Array</td>\r\n            <td>Hashtags presents in tweet, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>mentions</td>\r\n            <td>Array</td>\r\n            <td>Mentions presents in tweet, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>images</td>\r\n            <td>Array</td>\r\n            <td>Images links, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>videos</td>\r\n            <td>Array</td>\r\n            <td>Videos links, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>tweet_url</td>\r\n            <td>String</td>\r\n            <td>URL of the tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>link</td>\r\n            <td>String</td>\r\n            <td>If any link is present inside tweet for some external website. </td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n</div>\r\n<br>\r\n<hr>\r\n<div id=\"to-scrape-user-tweets-with-api\">\r\n\r\n<p>To Scrap profile's tweets with API:</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_profile_with_api\r\n\r\nscrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)\r\n```\r\n</div>\r\n<br>\r\n<div id=\"users_api_parameter\">\r\n<p><code>scrape_profile_with_api()</code> Arguments:<p>\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Argument</td>\r\n            <td>Argument Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>username</td>\r\n            <td>String</td>\r\n            <td>Twitter's Profile username</td>\r\n        </tr>\r\n        <tr>\r\n            <td>tweets_count</td>\r\n            <td>Integer</td>\r\n            <td>Number of tweets to scrape.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_filename</td>\r\n            <td>String</td>\r\n            <td>What should be the filename where output is stored?.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_dir</td>\r\n            <td>String</td>\r\n            <td>What directory output file should be saved?</td>\r\n        </tr>\r\n        <tr>\r\n            <td>proxy</td>\r\n            <td>String</td>\r\n            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>browser</td>\r\n            <td>String</td>\r\n            <td>Which browser to use for extracting out graphql key. Default is firefox.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>headless</td>\r\n            <td>String</td>\r\n            <td>Whether to run browser in headless mode?</td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n</div>\r\n<br>\r\n<div id=\"scrape_user_with_api_args_keys\"> <p>Output:<p>\r\n\r\n```js\r\n{\r\n  \"1608939190548598784\": {\r\n    \"tweet_url\" : \"https://twitter.com/elonmusk/status/1608939190548598784\",\r\n    \"tweet_details\":{\r\n      ...\r\n    },\r\n    \"user_details\":{\r\n      ...\r\n    }\r\n  }, ...\r\n}\r\n```\r\n\r\n</div>\r\n<br>\r\n<hr>\r\n</div>\r\n\r\n<h3 id=\"proxy\"> Using scraper with proxy (http proxy) </h3>\r\n\r\n<div id=\"unauthenticatedProxy\">\r\n<p>Just pass <code>proxy</code> argument to function.</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_profile\r\n\r\nscrape_profile(\"elonmusk\", headless=False, proxy=\"66.115.38.247:5678\", output_format=\"csv\",filename=\"musk\") #In IP:PORT format\r\n\r\n```\r\n</div>\r\n\r\n<br>\r\n<div id=\"authenticatedProxy\">\r\n<p> Proxy that requires authentication: </p>\r\n\r\n```python\r\n\r\nfrom twitter_scraper_selenium import scrape_profile\r\n\r\nmicrosoft_data = scrape_profile(twitter_username=\"microsoft\", browser=\"chrome\", tweets_count=10, output=\"json\",\r\n                      proxy=\"sajid:pass123@66.115.38.247:5678\")  #  username:password@IP:PORT\r\nprint(microsoft_data)\r\n\r\n\r\n```\r\n\r\n</div>\r\n<br>\r\n<hr>\r\n<div id=\"privacy\">\r\n<h2>Privacy</h2>\r\n\r\n<p>\r\nThis scraper only scrapes public data available to unauthenticated user and does not holds the capability to scrape anything private.\r\n</p>\r\n</div>\r\n<br>\r\n<hr>\r\n<div id=\"license\">\r\n<h2>LICENSE</h2>\r\n\r\nMIT\r\n</div>\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package to scrap twitter's front-end easily with selenium",
    "version": "6.2.2",
    "project_urls": {
        "Homepage": "https://github.com/shaikhsajid1111/twitter-scraper-selenium"
    },
    "split_keywords": [
        "web-scraping",
        "selenium",
        "social",
        "media",
        "twitter",
        "keyword",
        "twitter-profile",
        "twitter-keywords",
        "automation",
        "json",
        "csv",
        "twitter-hashtag",
        "hashtag"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f5f4264ead446b60e811352f93f79cc526e60480125c2d2910458b8f522d1aa9",
                "md5": "73f2c2e7262eb774e0c6e8b029429b7c",
                "sha256": "b8ae2d4df81ce1260955567af7f1bb6acf99b6988c9e1ff0db493fb3b5a9bd02"
            },
            "downloads": -1,
            "filename": "twitter_scraper_selenium-6.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "73f2c2e7262eb774e0c6e8b029429b7c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 32632,
            "upload_time": "2024-09-07T14:56:37",
            "upload_time_iso_8601": "2024-09-07T14:56:37.216705Z",
            "url": "https://files.pythonhosted.org/packages/f5/f4/264ead446b60e811352f93f79cc526e60480125c2d2910458b8f522d1aa9/twitter_scraper_selenium-6.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "90ad69f7ef85b67c90ca62e4cf951813692139b7cdba3abe5efa6ff412016b57",
                "md5": "a136d2eeb0ef58de54a1c17874dc1ef5",
                "sha256": "a8f5886dac3055967cf001ddfc6e8202844dc3ee9a9595bdf1477af1344f75f3"
            },
            "downloads": -1,
            "filename": "twitter_scraper_selenium-6.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a136d2eeb0ef58de54a1c17874dc1ef5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 27296,
            "upload_time": "2024-09-07T14:56:39",
            "upload_time_iso_8601": "2024-09-07T14:56:39.094627Z",
            "url": "https://files.pythonhosted.org/packages/90/ad/69f7ef85b67c90ca62e4cf951813692139b7cdba3abe5efa6ff412016b57/twitter_scraper_selenium-6.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-07 14:56:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "shaikhsajid1111",
    "github_project": "twitter-scraper-selenium",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "twitter-scraper-selenium"
}
        
Elapsed time: 0.84340s