fast-flights


Namefast-flights JSON
Version 2.0 PyPI version JSON
download
home_pageNone
SummaryThe fast, robust, strongly-typed Google Flights scraper (API) implemented in Python.
upload_time2025-01-01 10:58:02
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords flights google google-flights scraper protobuf travel trip passengers airport
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            > Apparently, it's always a better approach to interact with the Internal Google APIs. I'm working on that, and I'll deliver the results soon if my experimental project works out well.

<br /><br />
<div align="center">

# ✈️ fast-flights

The fast and strongly-typed Google Flights scraper (API) implemented in Python. Based on Base64-encoded Protobuf string.

[**Documentation**](https://aweirddev.github.io/flights) • [Issues](https://github.com/AWeirdDev/flights/issues) • [Discussions](https://github.com/AWeirdDev/flights/discussions)

```haskell
$ pip install fast-flights
```

</div>

## Basics
**TL;DR**: To use `fast-flights`, you'll first create a filter (for `?tfs=`) to perform a request.
Then, add `flight_data`, `trip`, `seat`, `passengers` to use the API directly.

```python
from fast_flights import FlightData, Passengers, Result, get_flights

result: Result = get_flights(
    flight_data=[
        FlightData(date="2025-01-01", from_airport="TPE", to_airport="MYJ")
    ],
    trip="one-way",
    seat="economy",
    passengers=Passengers(adults=2, children=1, infants_in_seat=0, infants_on_lap=0),
    fetch_mode="fallback",
)

print(result)

# The price is currently... low/typical/high
print("The price is currently", result.current_price)
```

**Properties & usage for `Result`**:

```python
result.current_price

# Get the first flight
flight = result.flights[0]

flight.is_best
flight.name
flight.departure
flight.arrival
flight.arrival_time_ahead
flight.duration
flight.stops
flight.delay?  # may not be present
flight.price
```

**Useless enums**: Additionally, you can use the `Airport` enum to search for airports in code (as you type)! See `_generated_enum.py` in source.

```python
Airport.TAIPEI
              ╭─────────────────────────────────╮
              │ TAIPEI_SONGSHAN_AIRPORT         │
              │ TAPACHULA_INTERNATIONAL_AIRPORT │
              │ TAMPA_INTERNATIONAL_AIRPORT     │
              ╰─────────────────────────────────╯
```

## Cookies & consent
The EU region is a bit tricky to solve for now, but the fallback support should be able to handle it.

## What's new
- `v2.0` – New (much more succinct) API, fallback support for Playwright serverless functions, and [documentation](https://aweirddev.github.io/flights)!

***

## How it's made

The other day, I was making a chat-interface-based trip recommendation app and wanted to add a feature that can search for flights available for booking. My personal choice is definitely [Google Flights](https://flights.google.com) since Google always has the best and most organized data on the web. Therefore, I searched for APIs on Google.

> 🔎 **Search** <br />
> google flights api

The results? Bad. It seems like they discontinued this service and it now lives in the Graveyard of Google.

> <sup><a href="https://duffel.com/blog/google-flights-api" target="_blank">🧏‍♂️ <b>duffel.com</b></a></sup><br />
> <sup><i>Google Flights API: How did it work & what happened to it?</i></b>
>
> The Google Flights API offered developers access to aggregated airline data, including flight times, availability, and prices. Over a decade ago, Google announced the acquisition of ITA Software Inc. which it used to develop its API. **However, in 2018, Google ended access to the public-facing API and now only offers access through the QPX enterprise product**.

That's awful! I've also looked for free alternatives but their rate limits and pricing are just 😬 (not a good fit/deal for everyone).

<br />

However, Google Flights has their UI – [flights.google.com](https://flights.google.com). So, maybe I could just use Developer Tools to log the requests made and just replicate all of that? Undoubtedly not! Their requests are just full of numbers and unreadable text, so that's not the solution.

Perhaps, we could scrape it? I mean, Google allowed many companies like [Serpapi](https://google.com/search?q=serpapi) to scrape their web just pretending like nothing happened... So let's scrape our own.

> 🔎 **Search** <br />
> google flights ~~api~~ scraper pypi

Excluding the ones that are not active, I came across [hugoglvs/google-flights-scraper](https://pypi.org/project/google-flights-scraper) on Pypi. I thought to myself: "aint no way this is the solution!"

I checked hugoglvs's code on [GitHub](https://github.com/hugoglvs/google-flights-scraper), and I immediately detected "playwright," my worst enemy. One word can describe it well: slow. Two words? Extremely slow. What's more, it doesn't even run on the **🗻 Edge** because of configuration errors, missing libraries... etc. I could just reverse [try.playwright.tech](https://try.playwright.tech) and use a better environment, but that's just too risky if they added Cloudflare as an additional security barrier 😳.

Life tells me to never give up. Let's just take a look at their URL params...

```markdown
https://www.google.com/travel/flights/search?tfs=CBwQAhoeEgoyMDI0LTA1LTI4agcIARIDVFBFcgcIARIDTVlKGh4SCjIwMjQtMDUtMzBqBwgBEgNNWUpyBwgBEgNUUEVAAUgBcAGCAQsI____________AZgBAQ&hl=en
```

| Param | Content | My past understanding |
|-------|---------|-----------------------|
| hl    | en      | Sets the language.    |
| tfs   | CBwQAhoeEgoyMDI0LTA1LTI4agcIARID… | What is this???? 🤮🤮 |

I removed the `?tfs=` parameter and found out that this is the control of our request! And it looks so base64-y.

If we decode it to raw text, we can still see the dates, but we're not quite there — there's too much unwanted Unicode text.

Or maybe it's some kind of a **data-storing method** Google uses? What if it's something like JSON? Let's look it up.

> 🔎 **Search** <br />
> google's json alternative

> 🐣 **Result**<br />
> Solution: The Power of **Protocol Buffers**
> 
> LinkedIn turned to Protocol Buffers, often referred to as **protobuf**, a binary serialization format developed by Google. The key advantage of Protocol Buffers is its efficiency, compactness, and speed, making it significantly faster than JSON for serialization and deserialization.

Gotcha, Protobuf! Let's feed it to an online decoder and see how it does:

> 🔎 **Search** <br />
> protobuf decoder

> 🐣 **Result**<br />
> [protobuf-decoder.netlify.app](https://protobuf-decoder.netlify.app)

I then pasted the Base64-encoded string to the decoder and no way! It DID return valid data!

![annotated, Protobuf Decoder screenshot](https://github.com/AWeirdDev/flights/assets/90096971/77dfb097-f961-4494-be88-3640763dbc8c)

I immediately recognized the values — that's my data, that's my query!

So, I wrote some simple Protobuf code to decode the data.

```protobuf
syntax = "proto3"

message Airport {
    string name = 2;
}

message FlightInfo {
    string date = 2;
    Airport dep_airport = 13;
    Airport arr_airport = 14;
}

message GoogleSucks {
    repeated FlightInfo = 3;
}
```

It works! Now, I won't consider myself an "experienced Protobuf developer" but rather a complete beginner.

I have no idea what I wrote but... it worked! And here it is, `fast-flights`.

***

## Contributing

Yes, please: [github.com/AWeirdDev/flights](https://github.com/AWeirdDev/flights)

<br />

<div align="center>

(c) 2024 AWeirdDev

</div>

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fast-flights",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "flights, google, google-flights, scraper, protobuf, travel, trip, passengers, airport",
    "author": null,
    "author_email": "AWeirdDev <aweirdscratcher@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/9c/5f/4bf703022035dbaf6c38fe0d9f8ab3aaabddd5e100f2cb11682e0c081525/fast_flights-2.0.tar.gz",
    "platform": null,
    "description": "> Apparently, it's always a better approach to interact with the Internal Google APIs. I'm working on that, and I'll deliver the results soon if my experimental project works out well.\r\n\r\n<br /><br />\r\n<div align=\"center\">\r\n\r\n# \u2708\ufe0f fast-flights\r\n\r\nThe fast and strongly-typed Google Flights scraper (API) implemented in Python. Based on Base64-encoded Protobuf string.\r\n\r\n[**Documentation**](https://aweirddev.github.io/flights) \u2022 [Issues](https://github.com/AWeirdDev/flights/issues) \u2022 [Discussions](https://github.com/AWeirdDev/flights/discussions)\r\n\r\n```haskell\r\n$ pip install fast-flights\r\n```\r\n\r\n</div>\r\n\r\n## Basics\r\n**TL;DR**: To use `fast-flights`, you'll first create a filter (for `?tfs=`) to perform a request.\r\nThen, add `flight_data`, `trip`, `seat`, `passengers` to use the API directly.\r\n\r\n```python\r\nfrom fast_flights import FlightData, Passengers, Result, get_flights\r\n\r\nresult: Result = get_flights(\r\n    flight_data=[\r\n        FlightData(date=\"2025-01-01\", from_airport=\"TPE\", to_airport=\"MYJ\")\r\n    ],\r\n    trip=\"one-way\",\r\n    seat=\"economy\",\r\n    passengers=Passengers(adults=2, children=1, infants_in_seat=0, infants_on_lap=0),\r\n    fetch_mode=\"fallback\",\r\n)\r\n\r\nprint(result)\r\n\r\n# The price is currently... low/typical/high\r\nprint(\"The price is currently\", result.current_price)\r\n```\r\n\r\n**Properties & usage for `Result`**:\r\n\r\n```python\r\nresult.current_price\r\n\r\n# Get the first flight\r\nflight = result.flights[0]\r\n\r\nflight.is_best\r\nflight.name\r\nflight.departure\r\nflight.arrival\r\nflight.arrival_time_ahead\r\nflight.duration\r\nflight.stops\r\nflight.delay?  # may not be present\r\nflight.price\r\n```\r\n\r\n**Useless enums**: Additionally, you can use the `Airport` enum to search for airports in code (as you type)! See `_generated_enum.py` in source.\r\n\r\n```python\r\nAirport.TAIPEI\r\n              \u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\r\n              \u2502 TAIPEI_SONGSHAN_AIRPORT         \u2502\r\n              \u2502 TAPACHULA_INTERNATIONAL_AIRPORT \u2502\r\n              \u2502 TAMPA_INTERNATIONAL_AIRPORT     \u2502\r\n              \u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\r\n```\r\n\r\n## Cookies & consent\r\nThe EU region is a bit tricky to solve for now, but the fallback support should be able to handle it.\r\n\r\n## What's new\r\n- `v2.0` \u2013 New (much more succinct) API, fallback support for Playwright serverless functions, and [documentation](https://aweirddev.github.io/flights)!\r\n\r\n***\r\n\r\n## How it's made\r\n\r\nThe other day, I was making a chat-interface-based trip recommendation app and wanted to add a feature that can search for flights available for booking. My personal choice is definitely [Google Flights](https://flights.google.com) since Google always has the best and most organized data on the web. Therefore, I searched for APIs on Google.\r\n\r\n> \ud83d\udd0e **Search** <br />\r\n> google flights api\r\n\r\nThe results? Bad. It seems like they discontinued this service and it now lives in the Graveyard of Google.\r\n\r\n> <sup><a href=\"https://duffel.com/blog/google-flights-api\" target=\"_blank\">\ud83e\uddcf\u200d\u2642\ufe0f <b>duffel.com</b></a></sup><br />\r\n> <sup><i>Google Flights API: How did it work & what happened to it?</i></b>\r\n>\r\n> The Google Flights API offered developers access to aggregated airline data, including flight times, availability, and prices. Over a decade ago, Google announced the acquisition of ITA Software Inc. which it used to develop its API. **However, in 2018, Google ended access to the public-facing API and now only offers access through the QPX enterprise product**.\r\n\r\nThat's awful! I've also looked for free alternatives but their rate limits and pricing are just \ud83d\ude2c (not a good fit/deal for everyone).\r\n\r\n<br />\r\n\r\nHowever, Google Flights has their UI \u2013 [flights.google.com](https://flights.google.com). So, maybe I could just use Developer Tools to log the requests made and just replicate all of that? Undoubtedly not! Their requests are just full of numbers and unreadable text, so that's not the solution.\r\n\r\nPerhaps, we could scrape it? I mean, Google allowed many companies like [Serpapi](https://google.com/search?q=serpapi) to scrape their web just pretending like nothing happened... So let's scrape our own.\r\n\r\n> \ud83d\udd0e **Search** <br />\r\n> google flights ~~api~~ scraper pypi\r\n\r\nExcluding the ones that are not active, I came across [hugoglvs/google-flights-scraper](https://pypi.org/project/google-flights-scraper) on Pypi. I thought to myself: \"aint no way this is the solution!\"\r\n\r\nI checked hugoglvs's code on [GitHub](https://github.com/hugoglvs/google-flights-scraper), and I immediately detected \"playwright,\" my worst enemy. One word can describe it well: slow. Two words? Extremely slow. What's more, it doesn't even run on the **\ud83d\uddfb Edge** because of configuration errors, missing libraries... etc. I could just reverse [try.playwright.tech](https://try.playwright.tech) and use a better environment, but that's just too risky if they added Cloudflare as an additional security barrier \ud83d\ude33.\r\n\r\nLife tells me to never give up. Let's just take a look at their URL params...\r\n\r\n```markdown\r\nhttps://www.google.com/travel/flights/search?tfs=CBwQAhoeEgoyMDI0LTA1LTI4agcIARIDVFBFcgcIARIDTVlKGh4SCjIwMjQtMDUtMzBqBwgBEgNNWUpyBwgBEgNUUEVAAUgBcAGCAQsI____________AZgBAQ&hl=en\r\n```\r\n\r\n| Param | Content | My past understanding |\r\n|-------|---------|-----------------------|\r\n| hl    | en      | Sets the language.    |\r\n| tfs   | CBwQAhoeEgoyMDI0LTA1LTI4agcIARID\u2026 | What is this???? \ud83e\udd2e\ud83e\udd2e |\r\n\r\nI removed the `?tfs=` parameter and found out that this is the control of our request! And it looks so base64-y.\r\n\r\nIf we decode it to raw text, we can still see the dates, but we're not quite there \u2014 there's too much unwanted Unicode text.\r\n\r\nOr maybe it's some kind of a **data-storing method** Google uses? What if it's something like JSON? Let's look it up.\r\n\r\n> \ud83d\udd0e **Search** <br />\r\n> google's json alternative\r\n\r\n> \ud83d\udc23 **Result**<br />\r\n> Solution: The Power of **Protocol Buffers**\r\n> \r\n> LinkedIn turned to Protocol Buffers, often referred to as **protobuf**, a binary serialization format developed by Google. The key advantage of Protocol Buffers is its efficiency, compactness, and speed, making it significantly faster than JSON for serialization and deserialization.\r\n\r\nGotcha, Protobuf! Let's feed it to an online decoder and see how it does:\r\n\r\n> \ud83d\udd0e **Search** <br />\r\n> protobuf decoder\r\n\r\n> \ud83d\udc23 **Result**<br />\r\n> [protobuf-decoder.netlify.app](https://protobuf-decoder.netlify.app)\r\n\r\nI then pasted the Base64-encoded string to the decoder and no way! It DID return valid data!\r\n\r\n![annotated, Protobuf Decoder screenshot](https://github.com/AWeirdDev/flights/assets/90096971/77dfb097-f961-4494-be88-3640763dbc8c)\r\n\r\nI immediately recognized the values \u2014 that's my data, that's my query!\r\n\r\nSo, I wrote some simple Protobuf code to decode the data.\r\n\r\n```protobuf\r\nsyntax = \"proto3\"\r\n\r\nmessage Airport {\r\n    string name = 2;\r\n}\r\n\r\nmessage FlightInfo {\r\n    string date = 2;\r\n    Airport dep_airport = 13;\r\n    Airport arr_airport = 14;\r\n}\r\n\r\nmessage GoogleSucks {\r\n    repeated FlightInfo = 3;\r\n}\r\n```\r\n\r\nIt works! Now, I won't consider myself an \"experienced Protobuf developer\" but rather a complete beginner.\r\n\r\nI have no idea what I wrote but... it worked! And here it is, `fast-flights`.\r\n\r\n***\r\n\r\n## Contributing\r\n\r\nYes, please: [github.com/AWeirdDev/flights](https://github.com/AWeirdDev/flights)\r\n\r\n<br />\r\n\r\n<div align=\"center>\r\n\r\n(c) 2024 AWeirdDev\r\n\r\n</div>\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "The fast, robust, strongly-typed Google Flights scraper (API) implemented in Python.",
    "version": "2.0",
    "project_urls": {
        "Documentation": "https://aweirddev.github.io/flights/",
        "Issues": "https://github.com/AWeirdDev/flights/issues",
        "Source": "https://github.com/AWeirdDev/flights"
    },
    "split_keywords": [
        "flights",
        " google",
        " google-flights",
        " scraper",
        " protobuf",
        " travel",
        " trip",
        " passengers",
        " airport"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9c5f4bf703022035dbaf6c38fe0d9f8ab3aaabddd5e100f2cb11682e0c081525",
                "md5": "f7c9e07c99eff821040d6d0867718456",
                "sha256": "779f0d2f37b6ac3d50c6b8b533f7c50d2e0514beddda71d43f51556260792b87"
            },
            "downloads": -1,
            "filename": "fast_flights-2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f7c9e07c99eff821040d6d0867718456",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 52677,
            "upload_time": "2025-01-01T10:58:02",
            "upload_time_iso_8601": "2025-01-01T10:58:02.137941Z",
            "url": "https://files.pythonhosted.org/packages/9c/5f/4bf703022035dbaf6c38fe0d9f8ab3aaabddd5e100f2cb11682e0c081525/fast_flights-2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-01 10:58:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AWeirdDev",
    "github_project": "flights",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "fast-flights"
}
        
Elapsed time: 0.57415s