# stackoverflow-to-sqlite
Downloads all your contributions to StackOverflow into a searchable, sortable, sqlite database. This includes your questions, answers, and comments.
## Install
The best way to install the package is by using [pipx](https://pypa.github.io/pipx/):
```bash
pipx install stackoverflow-to-sqlite
```
It's also available via [brew](https://brew.sh/):
```bash
brew install xavdid/projects/stackoverflow-to-sqlite
```
## Usage
```
Usage: stackoverflow-to-sqlite [OPTIONS] USER_ID
Save all the contributions for a StackOverflow user to a SQLite database.
Options:
--version Show the version and exit.
--db FILE A path to a SQLite database file. If it doesn't exist, it will be
created. While it can have any extension, `.db` or `.sqlite` is
recommended.
--help Show this message and exit.
```
The CLI takes a single required argument: a StackOverflow user id. The easiest way to get this is from a user's profile page:
![](https://cdn.zappy.app/3564b18ce469812a367422b8e8eed1ab.png)
The simplest usage is to pass that directly to the CLI and use the default database location:
```shell
% stackoverflow-to-sqlite 1825390
```
## Viewing Data
The resulting SQLite database pairs well with [Datasette](https://datasette.io/), a tool for viewing SQLite in the web. Below is my recommended configuration.
First, install `datasette`:
```bash
pipx install datasette
```
Then, add the recommended plugins (for rendering timestamps and markdown):
```bash
pipx inject datasette datasette-render-markdown datasette-render-timestamps
```
Finally, create a `metadata.json` file next to your `stackoverflow.db` with the following:
```json
{
"databases": {
"stackoverflow": {
"tables": {
"questions": {
"sort_desc": "creation_date",
"plugins": {
"datasette-render-markdown": {
"columns": ["body_markdown"]
},
"datasette-render-timestamps": {
"columns": ["creation_date", "closed_date", "last_activity_date"]
}
}
},
"answers": {
"sort_desc": "creation_date",
"plugins": {
"datasette-render-markdown": {
"columns": ["body_markdown"]
},
"datasette-render-timestamps": {
"columns": ["last_edit_date", "creation_date"]
}
}
},
"comments": {
"sort_desc": "creation_date",
"plugins": {
"datasette-render-markdown": {
"columns": ["body_markdown"]
},
"datasette-render-timestamps": {
"columns": ["creation_date"]
}
}
},
"tags": {
"sort": "name"
}
}
}
}
}
```
Now when you run
```bash
datasette serve stackoverflow.db --metadata metadata.json
```
You'll get a nice, formatted output!
## Motivation
StackOverflow has [recently announced](https://stackoverflow.blog/2023/07/27/announcing-overflowai/) some pretty major AI-related plans. They also don't allow you to [modify or remove your content in protest](https://m.benui.ca/@ben/112396505994216742). There's no real guarantee around what they will or won't do to content you've produced.
Ultimately, there's no better steward of data you've put time and energy into creating than you. This builds a searchable archive of everything you've ever said on StackOverflow, which is nice in case it gets different or worse.
## FAQs
### Why are users stored under an "account_id" instead of their user id?
At some point, I'd like to crawl the entire Stack Exchange network. An account id is shared across all sites while a user id is specific to each site. So I'm using the former as the primary key to better represent that.
### Why are my longer contributions truncated in Datasette?
Datasette truncates long text fields by default. You can disable this behavior by using the `truncate_cells_html` flag when running `datasette` ([see the docs](https://docs.datasette.io/en/stable/settings.html#truncate-cells-html)):
```shell
datasette stackoverflow.db --setting truncate_cells_html 0
```
### Does this tool refetch old data?
Yes, currently it does a full backup every time the command is run. It technically does upserts on every row, so it'll update existing rows with new data.
I'd like to stop saving items once we've seen an item we've saved already, but doing it that way hasn't been a priority.
### Why doesn't this capture questions along with answers?
Because the goal is to capture your own data, not archive all of SO. There's [better avenues for that](https://archive.org/details/stackexchange).
## Development
This section is people making changes to this package.
When in a virtual environment, run the following:
```bash
just install
```
This installs the package in `--edit` mode and makes its dependencies available. You can now run `stackoverflow-to-sqlite` to invoke the CLI.
### Running Tests
In your virtual environment, a simple `just test` should run the unit test suite. You can also run `just typecheck` for type checking.
### Releasing New Versions
> these notes are mostly for myself (or other contributors)
1. Run `just release` while your venv is active
2. paste the stored API key (If you're getting invalid password, verify that `~/.pypirc` is empty)
Raw data
{
"_id": null,
"home_page": null,
"name": "stackoverflow-to-sqlite",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "stackoverflow, backup, exporter, sqlite",
"author": null,
"author_email": "David Brownman <beamneocube@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/3c/d0/bfdea0a4562ad7c6621ac0ccf128f201697340006b10caf3750858c23aa9/stackoverflow_to_sqlite-0.1.1.tar.gz",
"platform": null,
"description": "# stackoverflow-to-sqlite\n\nDownloads all your contributions to StackOverflow into a searchable, sortable, sqlite database. This includes your questions, answers, and comments.\n\n## Install\n\nThe best way to install the package is by using [pipx](https://pypa.github.io/pipx/):\n\n```bash\npipx install stackoverflow-to-sqlite\n```\n\nIt's also available via [brew](https://brew.sh/):\n\n```bash\nbrew install xavdid/projects/stackoverflow-to-sqlite\n```\n\n## Usage\n\n```\nUsage: stackoverflow-to-sqlite [OPTIONS] USER_ID\n\n Save all the contributions for a StackOverflow user to a SQLite database.\n\nOptions:\n --version Show the version and exit.\n --db FILE A path to a SQLite database file. If it doesn't exist, it will be\n created. While it can have any extension, `.db` or `.sqlite` is\n recommended.\n --help Show this message and exit.\n```\n\nThe CLI takes a single required argument: a StackOverflow user id. The easiest way to get this is from a user's profile page:\n\n![](https://cdn.zappy.app/3564b18ce469812a367422b8e8eed1ab.png)\n\nThe simplest usage is to pass that directly to the CLI and use the default database location:\n\n```shell\n% stackoverflow-to-sqlite 1825390\n```\n\n## Viewing Data\n\nThe resulting SQLite database pairs well with [Datasette](https://datasette.io/), a tool for viewing SQLite in the web. Below is my recommended configuration.\n\nFirst, install `datasette`:\n\n```bash\npipx install datasette\n```\n\nThen, add the recommended plugins (for rendering timestamps and markdown):\n\n```bash\npipx inject datasette datasette-render-markdown datasette-render-timestamps\n```\n\nFinally, create a `metadata.json` file next to your `stackoverflow.db` with the following:\n\n```json\n{\n \"databases\": {\n \"stackoverflow\": {\n \"tables\": {\n \"questions\": {\n \"sort_desc\": \"creation_date\",\n \"plugins\": {\n \"datasette-render-markdown\": {\n \"columns\": [\"body_markdown\"]\n },\n \"datasette-render-timestamps\": {\n \"columns\": [\"creation_date\", \"closed_date\", \"last_activity_date\"]\n }\n }\n },\n \"answers\": {\n \"sort_desc\": \"creation_date\",\n \"plugins\": {\n \"datasette-render-markdown\": {\n \"columns\": [\"body_markdown\"]\n },\n \"datasette-render-timestamps\": {\n \"columns\": [\"last_edit_date\", \"creation_date\"]\n }\n }\n },\n \"comments\": {\n \"sort_desc\": \"creation_date\",\n \"plugins\": {\n \"datasette-render-markdown\": {\n \"columns\": [\"body_markdown\"]\n },\n \"datasette-render-timestamps\": {\n \"columns\": [\"creation_date\"]\n }\n }\n },\n \"tags\": {\n \"sort\": \"name\"\n }\n }\n }\n }\n}\n```\n\nNow when you run\n\n```bash\ndatasette serve stackoverflow.db --metadata metadata.json\n```\n\nYou'll get a nice, formatted output!\n\n## Motivation\n\nStackOverflow has [recently announced](https://stackoverflow.blog/2023/07/27/announcing-overflowai/) some pretty major AI-related plans. They also don't allow you to [modify or remove your content in protest](https://m.benui.ca/@ben/112396505994216742). There's no real guarantee around what they will or won't do to content you've produced.\n\nUltimately, there's no better steward of data you've put time and energy into creating than you. This builds a searchable archive of everything you've ever said on StackOverflow, which is nice in case it gets different or worse.\n\n## FAQs\n\n### Why are users stored under an \"account_id\" instead of their user id?\n\nAt some point, I'd like to crawl the entire Stack Exchange network. An account id is shared across all sites while a user id is specific to each site. So I'm using the former as the primary key to better represent that.\n\n### Why are my longer contributions truncated in Datasette?\n\nDatasette truncates long text fields by default. You can disable this behavior by using the `truncate_cells_html` flag when running `datasette` ([see the docs](https://docs.datasette.io/en/stable/settings.html#truncate-cells-html)):\n\n```shell\ndatasette stackoverflow.db --setting truncate_cells_html 0\n```\n\n### Does this tool refetch old data?\n\nYes, currently it does a full backup every time the command is run. It technically does upserts on every row, so it'll update existing rows with new data.\n\nI'd like to stop saving items once we've seen an item we've saved already, but doing it that way hasn't been a priority.\n\n### Why doesn't this capture questions along with answers?\n\nBecause the goal is to capture your own data, not archive all of SO. There's [better avenues for that](https://archive.org/details/stackexchange).\n\n## Development\n\nThis section is people making changes to this package.\n\nWhen in a virtual environment, run the following:\n\n```bash\njust install\n```\n\nThis installs the package in `--edit` mode and makes its dependencies available. You can now run `stackoverflow-to-sqlite` to invoke the CLI.\n\n### Running Tests\n\nIn your virtual environment, a simple `just test` should run the unit test suite. You can also run `just typecheck` for type checking.\n\n### Releasing New Versions\n\n> these notes are mostly for myself (or other contributors)\n\n1. Run `just release` while your venv is active\n2. paste the stored API key (If you're getting invalid password, verify that `~/.pypirc` is empty)\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Quickly and painlessly dump all your Airtable schemas & data to JSON.",
"version": "0.1.1",
"project_urls": {
"Author": "https://xavd.id",
"Bug Tracker": "https://github.com/xavdid/stackoverflow-to-sqlite/issues",
"Changelog": "https://github.com/xavdid/stackoverflow-to-sqlite/blob/main/CHANGELOG.md",
"Homepage": "https://github.com/xavdid/stackoverflow-to-sqlite"
},
"split_keywords": [
"stackoverflow",
" backup",
" exporter",
" sqlite"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a4aca618069f5a24459627ac78b67702bae61c8364abb52fe0876371ad72f884",
"md5": "27df9e364df941f74f8d5692861ca9cd",
"sha256": "cab3685e392a4b677e2aef28a7848162743c596d71848e9cf5bb4fb71de59868"
},
"downloads": -1,
"filename": "stackoverflow_to_sqlite-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "27df9e364df941f74f8d5692861ca9cd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 10185,
"upload_time": "2024-05-29T05:20:06",
"upload_time_iso_8601": "2024-05-29T05:20:06.051493Z",
"url": "https://files.pythonhosted.org/packages/a4/ac/a618069f5a24459627ac78b67702bae61c8364abb52fe0876371ad72f884/stackoverflow_to_sqlite-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3cd0bfdea0a4562ad7c6621ac0ccf128f201697340006b10caf3750858c23aa9",
"md5": "6411a38427dee81e05ccfbd03713ca7d",
"sha256": "5ebe59fb756a25a23da4c9672963d3f0c1cddd2f646a069265aff774bde56270"
},
"downloads": -1,
"filename": "stackoverflow_to_sqlite-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "6411a38427dee81e05ccfbd03713ca7d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 8450,
"upload_time": "2024-05-29T05:20:09",
"upload_time_iso_8601": "2024-05-29T05:20:09.794801Z",
"url": "https://files.pythonhosted.org/packages/3c/d0/bfdea0a4562ad7c6621ac0ccf128f201697340006b10caf3750858c23aa9/stackoverflow_to_sqlite-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-29 05:20:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xavdid",
"github_project": "stackoverflow-to-sqlite",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "stackoverflow-to-sqlite"
}