indexdigest

Name	indexdigest JSON
Version	1.6.0 JSON
	download
home_page	https://github.com/macbre/index-digest
Summary	Analyses your database queries and schema and suggests indices and schema improvements
upload_time	2023-09-13 11:27:16
maintainer
docs_url	None
author	Maciej Brencz
requires_python	>=3.8
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage

            # index-digest

[![PyPI](https://img.shields.io/pypi/v/indexdigest.svg)](https://pypi.python.org/pypi/indexdigest)
[![Docker Hub](https://img.shields.io/docker/pulls/macbre/index-digest.svg)](https://hub.docker.com/r/macbre/index-digest/)
[![Coverage Status](https://coveralls.io/repos/github/macbre/index-digest/badge.svg?branch=master)](https://coveralls.io/github/macbre/index-digest?branch=master)

Analyses your database queries and schema and suggests indices improvements. You can use `index-digest` as **your database linter**. The goal is to **provide the user with actionable reports** instead of just a list of statistics and schema details. Inspired by [Percona's `pt-index-usage`](https://www.percona.com/doc/percona-toolkit/LATEST/pt-index-usage.html).

**NEW** You can also [use `index-digest` as GitHub's Action](https://github.com/marketplace/actions/index-digest).

## What this tool does

`index-digest` does the following:

* it checks the schema of all tables in a given database and suggests improvements (e.g. removal of redundant indices, adding a primary key to ease replication, dropping tables with just a single column or no rows)
* if provided with SQL queries log (via `--sql-log` option) it:
  * checks if all tables, columns and indices are used by these queries
  * reports text columns with character set different than `utf`
  * reports queries that do not use indices
  * reports queries that use filesort, temporary file or full table scan
  * reports queries that are not quite kosher (e.g. `LIKE "%foo%"`, `INSERT IGNORE`, `SELECT *`, `HAVING` clause, high `OFFSET` in pagination queries)
* if run with `--analyze-data` switch it:
  * reports tables with old data (by querying for `MIN()` value of time column) where data retency can be reviewed
  * reports tables with not up-to-date data (by querying for `MAX()` value of time column)
* if run with `--check-empty-databases` switch it:
  * report empty databases on the current MySQL server

This tool **supports MySQL 5.7, 8.0, 8.1, [Percona Server](https://www.percona.com/software/mysql-database/percona-server) 8.0 and MariaDB 10.1, 10.2, 10.5, 10.6** and runs under **Python 3.8+**.

Results can be reported in a human-readable form, as YAML or sent to syslog and later aggregated & processed using ELK stack.

## Requirements & install

### From `pypi`

```
pip install indexdigest
```

### From git

```
git clone git@github.com:macbre/index-digest.git && cd index-digest
sudo apt-get install libmysqlclient-dev python3-dev virtualenv

virtualenv -ppython3 env
source env/bin/activate
make install
```

When using MacOS, you should follow [this `mysql_config` installation steps](https://stackoverflow.com/a/25491082).

#### Running tests

**We assume that the test database is running locally on port 53306**. You can use the following to test your changes locally before pushing them (this one uses MySQL 8.0.20):

```
docker run --rm -p 53306:3306 --health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=3 -e "MYSQL_ALLOW_EMPTY_PASSWORD=yes" -e "MYSQL_DATABASE=index_digest" --name=index_digest_mysql mysql:8.0.22 "--default-authentication-plugin=mysql_native_password"
```

Wait until the server is up and running.

```
mysql --protocol=tcp --port=53306 -u root --password="" -v < setup.sql
./sql/populate.sh
mysql --protocol=tcp --port=53306 -uindex_digest -pqwerty index_digest -v -e '\s; SHOW TABLES; SHOW DATABASES;'

make test
```

### Using Docker

> See https://hub.docker.com/r/macbre/index-digest/

```
$ docker run --network=host -t macbre/index-digest:latest mysql://index_digest:qwerty@debian/index_digest  | head -n 20
------------------------------------------------------------
Found 61 issue(s) to report for "index_digest" database
------------------------------------------------------------
MySQL v5.7.22 at debian
index-digest v1.2.0
------------------------------------------------------------
redundant_indices → table affected: 0004_id_foo

✗ "idx" index can be removed as redundant (covered by "PRIMARY")

  - redundant: UNIQUE KEY idx (item_id, foo)
  - covered_by: PRIMARY KEY (item_id, foo)
  - schema: CREATE TABLE `0004_id_foo` (
      `item_id` int(9) NOT NULL AUTO_INCREMENT,
      `foo` varbinary(16) NOT NULL DEFAULT '',
      PRIMARY KEY (`item_id`,`foo`),
      UNIQUE KEY `idx` (`item_id`,`foo`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
  - table_data_size_mb: 0.015625
  - table_index_size_mb: 0.015625
...
```

## How to run it?

```
$ index_digest -h
index_digest

Analyses your database queries and schema and suggests indices improvements.

Usage:
  index_digest DSN [--sql-log=<file>] [--format=<formatter>] [--analyze-data] [--checks=<checks> | --skip-checks=<skip-checks>] [--tables=<tables> | --skip-tables=<skip-tables>]
  index_digest (-h | --help)
  index_digest --version

Options:
  DSN               Data Source Name of database to check
  --sql-log=<file>  Text file with SQL queries to check against the database
  --format=<formatter>  Use a given results formatter (plain, syslog, yaml)
  --analyze-data    Run additional checks that will query table data (can be slow!)
  --checks=<list>   Comma-separated lists of checks to report
  --skip-checks=<list> Comma-separated lists of checks to skip from report
  --tables=<list>   Comma-separated lists of tables to report
  --skip-tables=<list> Comma-separated lists of tables to skip from report
  -h --help         Show this screen.
  --version         Show version.

Examples:
  index_digest mysql://username:password@localhost/dbname
  index_digest mysql://index_digest:qwerty@localhost/index_digest --sql-log=sql.log
  index_digest mysql://index_digest:qwerty@localhost/index_digest --skip-checks=non_utf_columns
  index_digest mysql://index_digest:qwerty@localhost/index_digest --analyze-data --checks=data_too_old,data_not_updated_recently
  index_digest mysql://index_digest:qwerty@localhost/index_digest --analyze-data --skip-tables=DATABASECHANGELOG,DATABASECHANGELOGLOCK

Visit <https://github.com/macbre/index-digest>
```

## SQL query log

It's a text file with a single SQL query in each line (no line breaks are allowed). Lines that do start with `--` (SQL comment) are ignored. The file can be [generated using `query-digest` when `--sql-log` output mode is selected](https://github.com/macbre/query-digest#output-modes).

An example:

```sql
-- A comment
select * from 0002_not_used_indices order by id
select * from 0002_not_used_indices where foo = 'foo' and id = 2
select count(*) from 0002_not_used_indices where foo = 'foo'
/* foo bar */ select * from 0002_not_used_indices where bar = 'foo'
INSERT  IGNORE INTO `0070_insert_ignore` VALUES ('123', 9, '2017-01-01');
```

### From [MySQL slow query log](https://dev.mysql.com/doc/refman/8.0/en/slow-query-log.html)

MySQL's slow query log needs to be pre-processed first (to remove comments and timestamps):

```
cat mysql-slow.log | egrep -v '^(SET timestamp|#|throttle: )' > queries.log
```

Then you can run `index_digest --sql-log=queries.log ...`.

## Formatters

`index-digest` can return results in various formats (use `--format` to choose one).

### plain

Emits human-readable report to a console. You can disable colored and bold text by setting env variable `ANSI_COLORS_DISABLED=1`.

### syslog

Pushes JSON-formatted messages via syslog, so they can be aggregated using ELK stack.
Use `SYSLOG_IDENT` env variable to customize syslog's `ident` messages are sent with (defaults to `index-digest`).

```
Dec 28 15:59:58 debian index-digest[17485]: {"meta": {"version": "index-digest v0.1.0", "database_name": "index_digest", "database_host": "debian", "database_version": "MySQL v5.7.20"}, "report": {"type": "redundant_indices", "table": "0004_id_foo", "message": "\"idx\" index can be removed as redundant (covered by \"PRIMARY\")", "context": {"redundant": "UNIQUE KEY idx (id, foo)", "covered_by": "PRIMARY KEY (id, foo)", "schema": "CREATE TABLE `0004_id_foo` (\n  `id` int(9) NOT NULL AUTO_INCREMENT,\n  `foo` varbinary(16) NOT NULL DEFAULT '',\n  PRIMARY KEY (`id`,`foo`),\n  UNIQUE KEY `idx` (`id`,`foo`)\n) ENGINE=InnoDB DEFAULT CHARSET=latin1", "table_data_size_mb": 0.015625, "table_index_size_mb": 0.015625}}}
```

### yaml

Outputs YML file with results and metadata.

## Checks

You can select which checks should be reported by the tool by using `--checks` command line option. Certain checks can also be skipped via `--skip-checks` option. Refer to `index_digest --help` for examples.

> **Number of checks**: 24

* `redundant_indices`: reports indices that are redundant and covered by other
* `non_utf_columns`: reports text columns that have characters encoding set to `latin1` (utf is the way to go)
* `missing_primary_index`: reports tables with no primary or unique key (see [MySQL bug #76252](https://bugs.mysql.com/bug.php?id=76252) and [Wikia/app#9863](https://github.com/Wikia/app/pull/9863)). [Primary keys can be enforced on MySQL config level](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_sql_require_primary_key) since 8.0.13 (via `sql_require_primary_key` variable).
* `test_tables`: reports tables that seem to be test leftovers (e.g. `some_guy_test_table`)
* `single_column`: reports tables with just a single column
* `empty_tables`: reports tables with no rows
* `generic_primary_key`: reports tables with [a primary key on `id` column](https://github.com/jarulraj/sqlcheck/blob/master/docs/logical/1004.md) (a more meaningful name should be used)
* `use_innodb`: reports table using storage engines different than `InnoDB` (a default for MySQL 5.5+ and MariaDB 10.2+)
* `low_cardinality_index`: reports [indices with low cardinality](https://github.com/macbre/index-digest/issues/31)

### Additional checks performed on SQL log

> You need to provide SQL log file via `--sql-log` option

* `not_used_columns`: checks which columns were not used by SELECT queries
* `not_used_indices`: checks which indices are not used by SELECT queries
* `not_used_tables`: checks which tables are not used by SELECT queries
* `queries_not_using_index`: reports SELECT queries that do not use any index
* `queries_using_filesort`: reports SELECT queries that require filesort ([a sort can’t be performed from an index and quicksort is used](https://www.percona.com/blog/2009/03/05/what-does-using-filesort-mean-in-mysql/))
* `queries_using_temporary`: reports SELECT queries that require a temporary table to hold the result
* `queries_using_full_table_scan`: reports SELECT queries that require a [full table scan](https://dev.mysql.com/doc/refman/5.7/en/table-scan-avoidance.html)
* `selects_with_like`: reports SELECT queries that use `LIKE '%foo'` conditions (they can not use an index)
* `insert_ignore`: reports [queries using `INSERT IGNORE`](https://medium.com/legacy-systems-diary/things-to-avoid-episode-1-insert-ignore-535b4c24406b)
* `select_star`: reports [queries using `SELECT *`](https://github.com/jarulraj/sqlcheck/blob/master/docs/query/3001.md)
* `having_clause`: reports [queries using `HAVING` clause](https://github.com/jarulraj/sqlcheck/blob/master/docs/query/3012.md)
* `high_offset_selects`: report [SELECT queries using high OFFSET](https://www.percona.com/blog/2008/09/24/four-ways-to-optimize-paginated-displays/)

### Additional checks performed on tables data

> You need to use `--analyze-data` command line switch. Please note that these checks will query your tables. **These checks can take a while if queried columns are not indexed**.

* `data_too_old`: reports tables that have really old data, maybe it's worth checking if such long data retention is actually needed (**defaults to three months threshold**, can be customized via `INDEX_DIGEST_DATA_TOO_OLD_THRESHOLD_DAYS` env variable)
* `data_not_updated_recently`: reports tables that were not updated recently, check if it should be up-to-date (**defaults a month threshold**, can be customized via `INDEX_DIGEST_DATA_NOT_UPDATED_RECENTLY_THRESHOLD_DAYS` env variable)

### Additional checks performed across database on the current MySQL server

> You need to use `--check-empty-databases` command line switch.

* `empty_database`: reports databases that have no `BASE TABLE` tables (as provided by `information_schema.TABLES`)

## An example report

```sql
$ index_digest mysql://index_digest:qwerty@localhost/index_digest --sql-log sql/0002-not-used-indices-log 
------------------------------------------------------------
Found 85 issue(s) to report for "index_digest" database
------------------------------------------------------------
MySQL v5.7.21 at debian
index-digest v1.0.0
------------------------------------------------------------
redundant_indices → table affected: 0004_id_foo

✗ "idx" index can be removed as redundant (covered by "PRIMARY")

  - redundant: UNIQUE KEY idx (id, foo)
  - covered_by: PRIMARY KEY (id, foo)
  - schema: CREATE TABLE `0004_id_foo` (
      `id` int(9) NOT NULL AUTO_INCREMENT,
      `foo` varbinary(16) NOT NULL DEFAULT '',
      PRIMARY KEY (`id`,`foo`),
      UNIQUE KEY `idx` (`id`,`foo`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
  - table_data_size_mb: 0.015625
  - table_index_size_mb: 0.015625

------------------------------------------------------------
redundant_indices → table affected: 0004_id_foo_bar

✗ "idx_foo" index can be removed as redundant (covered by "idx_foo_bar")

  - redundant: KEY idx_foo (foo)
  - covered_by: KEY idx_foo_bar (foo, bar)
  - schema: CREATE TABLE `0004_id_foo_bar` (
      `id` int(9) NOT NULL AUTO_INCREMENT,
      `foo` varbinary(16) NOT NULL DEFAULT '',
      `bar` varbinary(16) NOT NULL DEFAULT '',
      PRIMARY KEY (`id`),
      KEY `idx_foo` (`foo`),
      KEY `idx_foo_bar` (`foo`,`bar`),
      KEY `idx_id_foo` (`id`,`foo`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
  - table_data_size_mb: 0.015625
  - table_index_size_mb: 0.046875

------------------------------------------------------------
missing_primary_index → table affected: 0034_querycache

✗ "0034_querycache" table does not have any primary or unique index

  - schema: CREATE TABLE `0034_querycache` (
      `qc_type` varbinary(32) NOT NULL,
      `qc_value` int(10) unsigned NOT NULL DEFAULT '0',
      `qc_namespace` int(11) NOT NULL DEFAULT '0',
      `qc_title` varchar(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL DEFAULT '',
      KEY `qc_type` (`qc_type`,`qc_value`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8

------------------------------------------------------------
test_tables → table affected: 0075_some_guy_test_table

✗ "0075_some_guy_test_table" seems to be a test table

  - schema: CREATE TABLE `0075_some_guy_test_table` (
      `id` int(9) NOT NULL AUTO_INCREMENT,
      `name` varchar(255) NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8

------------------------------------------------------------
single_column → table affected: 0074_bag_of_ints

✗ "0074_bag_of_ints" has just a single column

  - schema: CREATE TABLE `0074_bag_of_ints` (
      `id` int(9) NOT NULL AUTO_INCREMENT,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8

------------------------------------------------------------
empty_tables → table affected: 0089_empty_table

✗ "0089_empty_table" table has no rows, is it really needed?

  - schema: CREATE TABLE `0089_empty_table` (
      `id` int(9) NOT NULL AUTO_INCREMENT,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1

------------------------------------------------------------
generic_primary_key → table affected: 0094_generic_primary_key

✗ "0094_generic_primary_key" has a primary key called id, use a more meaningful name

  - schema: CREATE TABLE `0094_generic_primary_key` (
      `id` int(9) NOT NULL AUTO_INCREMENT,
      `foo` varchar(16) NOT NULL DEFAULT '',
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1

------------------------------------------------------------
use_innodb → table affected: 0036_use_innodb_myisam

✗ "0036_use_innodb_myisam" uses MyISAM storage engine

  - schema: CREATE TABLE `0036_use_innodb_myisam` (
      `item_id` int(9) NOT NULL AUTO_INCREMENT,
      `foo` int(8) DEFAULT NULL,
      PRIMARY KEY (`item_id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1
  - engine: MyISAM

------------------------------------------------------------
not_used_indices → table affected: 0002_not_used_indices

✗ "test_id_idx" index was not used by provided queries

  - not_used_index: KEY test_id_idx (test, id)

------------------------------------------------------------
not_used_tables → table affected: 0020_big_table

✗ "0020_big_table" table was not used by provided queries

  - schema: CREATE TABLE `0020_big_table` (
      `id` int(9) NOT NULL AUTO_INCREMENT,
      `val` int(9) NOT NULL,
      `text` char(5) NOT NULL,
      PRIMARY KEY (`id`),
      KEY `text_idx` (`text`)
    ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8
  - table_size_mb: 5.03125
  - rows_estimated: 100405

------------------------------------------------------------
insert_ignore → table affected: 0070_insert_ignore

✗ "INSERT IGNORE INTO `0070_insert_ignore` VALUES (9,..." query uses a risky INSERT IGNORE

  - query: INSERT IGNORE INTO `0070_insert_ignore` VALUES (9, '123', '2017-01-01');
  - schema: CREATE TABLE `0070_insert_ignore` (
      `id` int(9) NOT NULL,
      `text` char(5) NOT NULL,
      `time` datetime DEFAULT NULL,
      UNIQUE KEY `id` (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8

------------------------------------------------------------
non_utf_columns → table affected: 0032_latin1_table

✗ "name" text column has "latin1" character set defined

  - column: name
  - column_character_set: latin1
  - column_collation: latin1_swedish_ci
  - schema: CREATE TABLE `0032_latin1_table` (
      `item_id` int(9) NOT NULL AUTO_INCREMENT,
      `name` varchar(255) DEFAULT NULL,
      `utf8_column` varchar(255) CHARACTER SET utf8 COLLATE utf8_polish_ci NOT NULL,
      `ucs2_column` varchar(255) CHARACTER SET ucs2 DEFAULT NULL,
      `utf8mb4_column` varchar(255) CHARACTER SET utf8mb4 DEFAULT NULL,
      `utf16_column` varchar(255) CHARACTER SET utf16 DEFAULT NULL,
      `utf32_column` varchar(255) CHARACTER SET utf32 DEFAULT NULL,
      `binary_column` varbinary(255) DEFAULT NULL,
      `latin_blob` blob,
      PRIMARY KEY (`item_id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1

------------------------------------------------------------

(...)

------------------------------------------------------------
queries_using_filesort → table affected: 0020_big_table

✗ "SELECT val, count(*) FROM 0020_big_table WHERE id ..." query used filesort

  - query: SELECT val, count(*) FROM 0020_big_table WHERE id BETWEEN 10 AND 20 GROUP BY val
  - explain_extra: Using where; Using temporary; Using filesort
  - explain_rows: 11
  - explain_filtered: None
  - explain_key: PRIMARY

------------------------------------------------------------
queries_using_temporary → table affected: 0020_big_table

✗ "SELECT val, count(*) FROM 0020_big_table WHERE id ..." query used temporary

  - query: SELECT val, count(*) FROM 0020_big_table WHERE id BETWEEN 10 AND 20 GROUP BY val
  - explain_extra: Using where; Using temporary; Using filesort
  - explain_rows: 11
  - explain_filtered: None
  - explain_key: PRIMARY

------------------------------------------------------------
queries_using_full_table_scan → table affected: 0020_big_table

✗ "SELECT * FROM 0020_big_table" query triggered full table scan

  - query: SELECT * FROM 0020_big_table
  - explain_rows: 9041

------------------------------------------------------------
selects_with_like → table affected: 0020_big_table

✗ "SELECT * FROM 0020_big_table WHERE text LIKE '%00'" query uses LIKE with left-most wildcard

  - query: SELECT * FROM 0020_big_table WHERE text LIKE '%00'
  - explain_extra: Using where
  - explain_rows: 100623

------------------------------------------------------------
select_star → table affected: bar

✗ "SELECT t.* FROM bar AS t" query uses SELECT *

  - query: SELECT t.* FROM bar AS t;

------------------------------------------------------------
having_clause → table affected: sales

✗ "SELECT s.cust_id,count(s.cust_id) FROM SH.sales s ..." query uses HAVING clause

  - query: SELECT s.cust_id,count(s.cust_id) FROM SH.sales s GROUP BY s.cust_id HAVING s.cust_id != '1660' AND s.cust_id != '2'

(...)

------------------------------------------------------------
low_cardinality_index → table affected: 0020_big_table

✗ "num_idx" index on "num" column has low cardinality, check if it is needed

  - column_name: num
  - index_name: num_idx
  - index_cardinality: 2
  - schema: CREATE TABLE `0020_big_table` (
      `item_id` int(9) NOT NULL AUTO_INCREMENT,
      `val` int(9) NOT NULL,
      `text` char(5) NOT NULL,
      `num` int(3) NOT NULL,
      PRIMARY KEY (`item_id`),
      KEY `text_idx` (`text`),
      KEY `num_idx` (`num`)
    ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8
  - value_usage: 33.24788541334185

(...)

------------------------------------------------------------
data_too_old → table affected: 0028_data_too_old

✗ "0028_data_too_old" has rows added 184 days ago, consider changing retention policy

  - diff_days: 184
  - data_since: 2017-08-17 12:03:44
  - data_until: 2018-02-17 12:03:44
  - date_column_name: timestamp
  - schema: CREATE TABLE `0028_data_too_old` (
      `item_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
      `cnt` int(8) unsigned NOT NULL,
      `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`item_id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1
  - rows: 4
  - table_size_mb: 0.015625

------------------------------------------------------------
data_not_updated_recently → table affected: 0028_data_not_updated_recently

✗ "0028_data_not_updated_recently" has the latest row added 40 days ago, consider checking if it should be up-to-date

  - diff_days: 40
  - data_since: 2017-12-29 12:03:44
  - data_until: 2018-01-08 12:03:44
  - date_column_name: timestamp
  - schema: CREATE TABLE `0028_data_not_updated_recently` (
      `item_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
      `cnt` int(8) unsigned NOT NULL,
      `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`item_id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
  - rows: 3
  - table_size_mb: 0.015625

------------------------------------------------------------
high_offset_selects → table affected: page

✗ "SELECT /* CategoryPaginationViewer::processSection..." query uses too high offset impacting the performance

  - query: SELECT /* CategoryPaginationViewer::processSection */  page_namespace,page_title,page_len,page_is_redirect,cl_sortkey_prefix  FROM `page` INNER JOIN `categorylinks` FORCE INDEX (cl_sortkey) ON ((cl_from = page_id))  WHERE cl_type = 'page' AND cl_to = 'Spotify/Song'  ORDER BY cl_sortkey LIMIT 927600,200
  - limit: 200
  - offset: 927600

------------------------------------------------------------
empty_database → table affected: index_digest_empty

✗ "index_digest_empty" database has no tables

------------------------------------------------------------
Queries performed: 100
```

## Success stories

> Want to add your entry here? Submit a pull request

* By running `index-digest` at [Wikia](http://wikia.com) on shared database clusters (including tables storing ~450 mm of rows with 300+ GiB of data) we were able to [reclaim around 1.25 TiB of MySQL storage space across all replicas](https://medium.com/legacy-systems-diary/linting-your-database-schema-cd8947835a52).

## Read more

* [Percona Database Performance Blog](https://www.percona.com/blog/)
* [High Performance MySQL, 3rd Edition by Vadim Tkachenko, Peter Zaitsev, Baron Schwartz](https://www.safaribooksonline.com/library/view/high-performance-mysql/9781449332471/ch05.html)
* [Percona | Indexing 101: Optimizing MySQL queries on a single table](https://www.percona.com/blog/2015/04/27/indexing-101-optimizing-mysql-queries-on-a-single-table/)
* [Percona | `pt-index-usage`](https://www.percona.com/doc/percona-toolkit/LATEST/pt-index-usage.html) / [find unused indexes](https://www.percona.com/blog/2012/06/30/find-unused-indexes/)

### Slides

* [Percona | MySQL Indexing: Best Practices](https://www.percona.com/files/presentations/WEBINAR-MySQL-Indexing-Best-Practices.pdf)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/macbre/index-digest",
    "name": "indexdigest",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "Maciej Brencz",
    "author_email": "maciej.brencz@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f1/87/3466edcf6ceb82147a98978201b039d94838c524c7dff982fd8bf4fd5b3f/indexdigest-1.6.0.tar.gz",
    "platform": null,
    "description": "# index-digest\n\n[![PyPI](https://img.shields.io/pypi/v/indexdigest.svg)](https://pypi.python.org/pypi/indexdigest)\n[![Docker Hub](https://img.shields.io/docker/pulls/macbre/index-digest.svg)](https://hub.docker.com/r/macbre/index-digest/)\n[![Coverage Status](https://coveralls.io/repos/github/macbre/index-digest/badge.svg?branch=master)](https://coveralls.io/github/macbre/index-digest?branch=master)\n\nAnalyses your database queries and schema and suggests indices improvements. You can use `index-digest` as **your database linter**. The goal is to **provide the user with actionable reports** instead of just a list of statistics and schema details. Inspired by [Percona's `pt-index-usage`](https://www.percona.com/doc/percona-toolkit/LATEST/pt-index-usage.html).\n\n**NEW** You can also [use `index-digest` as GitHub's Action](https://github.com/marketplace/actions/index-digest).\n\n## What this tool does\n\n`index-digest` does the following:\n\n* it checks the schema of all tables in a given database and suggests improvements (e.g. removal of redundant indices, adding a primary key to ease replication, dropping tables with just a single column or no rows)\n* if provided with SQL queries log (via `--sql-log` option) it:\n  * checks if all tables, columns and indices are used by these queries\n  * reports text columns with character set different than `utf`\n  * reports queries that do not use indices\n  * reports queries that use filesort, temporary file or full table scan\n  * reports queries that are not quite kosher (e.g. `LIKE \"%foo%\"`, `INSERT IGNORE`, `SELECT *`, `HAVING` clause, high `OFFSET` in pagination queries)\n* if run with `--analyze-data` switch it:\n  * reports tables with old data (by querying for `MIN()` value of time column) where data retency can be reviewed\n  * reports tables with not up-to-date data (by querying for `MAX()` value of time column)\n* if run with `--check-empty-databases` switch it:\n  * report empty databases on the current MySQL server\n\nThis tool **supports MySQL 5.7, 8.0, 8.1, [Percona Server](https://www.percona.com/software/mysql-database/percona-server) 8.0 and MariaDB 10.1, 10.2, 10.5, 10.6** and runs under **Python 3.8+**.\n\nResults can be reported in a human-readable form, as YAML or sent to syslog and later aggregated & processed using ELK stack.\n\n## Requirements & install\n\n### From `pypi`\n\n```\npip install indexdigest\n```\n\n### From git\n\n```\ngit clone git@github.com:macbre/index-digest.git && cd index-digest\nsudo apt-get install libmysqlclient-dev python3-dev virtualenv\n\nvirtualenv -ppython3 env\nsource env/bin/activate\nmake install\n```\n\nWhen using MacOS, you should follow [this `mysql_config` installation steps](https://stackoverflow.com/a/25491082).\n\n#### Running tests\n\n**We assume that the test database is running locally on port 53306**. You can use the following to test your changes locally before pushing them (this one uses MySQL 8.0.20):\n\n```\ndocker run --rm -p 53306:3306 --health-cmd=\"mysqladmin ping\" --health-interval=10s --health-timeout=5s --health-retries=3 -e \"MYSQL_ALLOW_EMPTY_PASSWORD=yes\" -e \"MYSQL_DATABASE=index_digest\" --name=index_digest_mysql mysql:8.0.22 \"--default-authentication-plugin=mysql_native_password\"\n```\n\nWait until the server is up and running.\n\n```\nmysql --protocol=tcp --port=53306 -u root --password=\"\" -v < setup.sql\n./sql/populate.sh\nmysql --protocol=tcp --port=53306 -uindex_digest -pqwerty index_digest -v -e '\\s; SHOW TABLES; SHOW DATABASES;'\n\nmake test\n```\n\n### Using Docker\n\n> See https://hub.docker.com/r/macbre/index-digest/\n\n```\n$ docker run --network=host -t macbre/index-digest:latest mysql://index_digest:qwerty@debian/index_digest  | head -n 20\n------------------------------------------------------------\nFound 61 issue(s) to report for \"index_digest\" database\n------------------------------------------------------------\nMySQL v5.7.22 at debian\nindex-digest v1.2.0\n------------------------------------------------------------\nredundant_indices \u2192 table affected: 0004_id_foo\n\n\u2717 \"idx\" index can be removed as redundant (covered by \"PRIMARY\")\n\n  - redundant: UNIQUE KEY idx (item_id, foo)\n  - covered_by: PRIMARY KEY (item_id, foo)\n  - schema: CREATE TABLE `0004_id_foo` (\n      `item_id` int(9) NOT NULL AUTO_INCREMENT,\n      `foo` varbinary(16) NOT NULL DEFAULT '',\n      PRIMARY KEY (`item_id`,`foo`),\n      UNIQUE KEY `idx` (`item_id`,`foo`)\n    ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n  - table_data_size_mb: 0.015625\n  - table_index_size_mb: 0.015625\n...\n```\n\n## How to run it?\n\n```\n$ index_digest -h\nindex_digest\n\nAnalyses your database queries and schema and suggests indices improvements.\n\nUsage:\n  index_digest DSN [--sql-log=<file>] [--format=<formatter>] [--analyze-data] [--checks=<checks> | --skip-checks=<skip-checks>] [--tables=<tables> | --skip-tables=<skip-tables>]\n  index_digest (-h | --help)\n  index_digest --version\n\nOptions:\n  DSN               Data Source Name of database to check\n  --sql-log=<file>  Text file with SQL queries to check against the database\n  --format=<formatter>  Use a given results formatter (plain, syslog, yaml)\n  --analyze-data    Run additional checks that will query table data (can be slow!)\n  --checks=<list>   Comma-separated lists of checks to report\n  --skip-checks=<list> Comma-separated lists of checks to skip from report\n  --tables=<list>   Comma-separated lists of tables to report\n  --skip-tables=<list> Comma-separated lists of tables to skip from report\n  -h --help         Show this screen.\n  --version         Show version.\n\nExamples:\n  index_digest mysql://username:password@localhost/dbname\n  index_digest mysql://index_digest:qwerty@localhost/index_digest --sql-log=sql.log\n  index_digest mysql://index_digest:qwerty@localhost/index_digest --skip-checks=non_utf_columns\n  index_digest mysql://index_digest:qwerty@localhost/index_digest --analyze-data --checks=data_too_old,data_not_updated_recently\n  index_digest mysql://index_digest:qwerty@localhost/index_digest --analyze-data --skip-tables=DATABASECHANGELOG,DATABASECHANGELOGLOCK\n\nVisit <https://github.com/macbre/index-digest>\n```\n\n## SQL query log\n\nIt's a text file with a single SQL query in each line (no line breaks are allowed). Lines that do start with `--` (SQL comment) are ignored. The file can be [generated using `query-digest` when `--sql-log` output mode is selected](https://github.com/macbre/query-digest#output-modes).\n\nAn example:\n\n```sql\n-- A comment\nselect * from 0002_not_used_indices order by id\nselect * from 0002_not_used_indices where foo = 'foo' and id = 2\nselect count(*) from 0002_not_used_indices where foo = 'foo'\n/* foo bar */ select * from 0002_not_used_indices where bar = 'foo'\nINSERT  IGNORE INTO `0070_insert_ignore` VALUES ('123', 9, '2017-01-01');\n```\n\n### From [MySQL slow query log](https://dev.mysql.com/doc/refman/8.0/en/slow-query-log.html)\n\nMySQL's slow query log needs to be pre-processed first (to remove comments and timestamps):\n\n```\ncat mysql-slow.log | egrep -v '^(SET timestamp|#|throttle: )' > queries.log\n```\n\nThen you can run `index_digest --sql-log=queries.log ...`.\n\n## Formatters\n\n`index-digest` can return results in various formats (use `--format` to choose one).\n\n### plain\n\nEmits human-readable report to a console. You can disable colored and bold text by setting env variable `ANSI_COLORS_DISABLED=1`.\n\n### syslog\n\nPushes JSON-formatted messages via syslog, so they can be aggregated using ELK stack.\nUse `SYSLOG_IDENT` env variable to customize syslog's `ident` messages are sent with (defaults to `index-digest`).\n\n```\nDec 28 15:59:58 debian index-digest[17485]: {\"meta\": {\"version\": \"index-digest v0.1.0\", \"database_name\": \"index_digest\", \"database_host\": \"debian\", \"database_version\": \"MySQL v5.7.20\"}, \"report\": {\"type\": \"redundant_indices\", \"table\": \"0004_id_foo\", \"message\": \"\\\"idx\\\" index can be removed as redundant (covered by \\\"PRIMARY\\\")\", \"context\": {\"redundant\": \"UNIQUE KEY idx (id, foo)\", \"covered_by\": \"PRIMARY KEY (id, foo)\", \"schema\": \"CREATE TABLE `0004_id_foo` (\\n  `id` int(9) NOT NULL AUTO_INCREMENT,\\n  `foo` varbinary(16) NOT NULL DEFAULT '',\\n  PRIMARY KEY (`id`,`foo`),\\n  UNIQUE KEY `idx` (`id`,`foo`)\\n) ENGINE=InnoDB DEFAULT CHARSET=latin1\", \"table_data_size_mb\": 0.015625, \"table_index_size_mb\": 0.015625}}}\n```\n\n### yaml\n\nOutputs YML file with results and metadata.\n\n## Checks\n\nYou can select which checks should be reported by the tool by using `--checks` command line option. Certain checks can also be skipped via `--skip-checks` option. Refer to `index_digest --help` for examples.\n\n> **Number of checks**: 24\n\n* `redundant_indices`: reports indices that are redundant and covered by other\n* `non_utf_columns`: reports text columns that have characters encoding set to `latin1` (utf is the way to go)\n* `missing_primary_index`: reports tables with no primary or unique key (see [MySQL bug #76252](https://bugs.mysql.com/bug.php?id=76252) and [Wikia/app#9863](https://github.com/Wikia/app/pull/9863)). [Primary keys can be enforced on MySQL config level](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_sql_require_primary_key) since 8.0.13 (via `sql_require_primary_key` variable).\n* `test_tables`: reports tables that seem to be test leftovers (e.g. `some_guy_test_table`)\n* `single_column`: reports tables with just a single column\n* `empty_tables`: reports tables with no rows\n* `generic_primary_key`: reports tables with [a primary key on `id` column](https://github.com/jarulraj/sqlcheck/blob/master/docs/logical/1004.md) (a more meaningful name should be used)\n* `use_innodb`: reports table using storage engines different than `InnoDB` (a default for MySQL 5.5+ and MariaDB 10.2+)\n* `low_cardinality_index`: reports [indices with low cardinality](https://github.com/macbre/index-digest/issues/31)\n\n### Additional checks performed on SQL log\n\n> You need to provide SQL log file via `--sql-log` option\n\n* `not_used_columns`: checks which columns were not used by SELECT queries\n* `not_used_indices`: checks which indices are not used by SELECT queries\n* `not_used_tables`: checks which tables are not used by SELECT queries\n* `queries_not_using_index`: reports SELECT queries that do not use any index\n* `queries_using_filesort`: reports SELECT queries that require filesort ([a sort can\u2019t be performed from an index and quicksort is used](https://www.percona.com/blog/2009/03/05/what-does-using-filesort-mean-in-mysql/))\n* `queries_using_temporary`: reports SELECT queries that require a temporary table to hold the result\n* `queries_using_full_table_scan`: reports SELECT queries that require a [full table scan](https://dev.mysql.com/doc/refman/5.7/en/table-scan-avoidance.html)\n* `selects_with_like`: reports SELECT queries that use `LIKE '%foo'` conditions (they can not use an index)\n* `insert_ignore`: reports [queries using `INSERT IGNORE`](https://medium.com/legacy-systems-diary/things-to-avoid-episode-1-insert-ignore-535b4c24406b)\n* `select_star`: reports [queries using `SELECT *`](https://github.com/jarulraj/sqlcheck/blob/master/docs/query/3001.md)\n* `having_clause`: reports [queries using `HAVING` clause](https://github.com/jarulraj/sqlcheck/blob/master/docs/query/3012.md)\n* `high_offset_selects`: report [SELECT queries using high OFFSET](https://www.percona.com/blog/2008/09/24/four-ways-to-optimize-paginated-displays/)\n\n### Additional checks performed on tables data\n\n> You need to use `--analyze-data` command line switch. Please note that these checks will query your tables. **These checks can take a while if queried columns are not indexed**.\n\n* `data_too_old`: reports tables that have really old data, maybe it's worth checking if such long data retention is actually needed (**defaults to three months threshold**, can be customized via `INDEX_DIGEST_DATA_TOO_OLD_THRESHOLD_DAYS` env variable)\n* `data_not_updated_recently`: reports tables that were not updated recently, check if it should be up-to-date (**defaults a month threshold**, can be customized via `INDEX_DIGEST_DATA_NOT_UPDATED_RECENTLY_THRESHOLD_DAYS` env variable)\n\n### Additional checks performed across database on the current MySQL server\n\n> You need to use `--check-empty-databases` command line switch.\n\n* `empty_database`: reports databases that have no `BASE TABLE` tables (as provided by `information_schema.TABLES`)\n\n## An example report\n\n```sql\n$ index_digest mysql://index_digest:qwerty@localhost/index_digest --sql-log sql/0002-not-used-indices-log \n------------------------------------------------------------\nFound 85 issue(s) to report for \"index_digest\" database\n------------------------------------------------------------\nMySQL v5.7.21 at debian\nindex-digest v1.0.0\n------------------------------------------------------------\nredundant_indices \u2192 table affected: 0004_id_foo\n\n\u2717 \"idx\" index can be removed as redundant (covered by \"PRIMARY\")\n\n  - redundant: UNIQUE KEY idx (id, foo)\n  - covered_by: PRIMARY KEY (id, foo)\n  - schema: CREATE TABLE `0004_id_foo` (\n      `id` int(9) NOT NULL AUTO_INCREMENT,\n      `foo` varbinary(16) NOT NULL DEFAULT '',\n      PRIMARY KEY (`id`,`foo`),\n      UNIQUE KEY `idx` (`id`,`foo`)\n    ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n  - table_data_size_mb: 0.015625\n  - table_index_size_mb: 0.015625\n\n------------------------------------------------------------\nredundant_indices \u2192 table affected: 0004_id_foo_bar\n\n\u2717 \"idx_foo\" index can be removed as redundant (covered by \"idx_foo_bar\")\n\n  - redundant: KEY idx_foo (foo)\n  - covered_by: KEY idx_foo_bar (foo, bar)\n  - schema: CREATE TABLE `0004_id_foo_bar` (\n      `id` int(9) NOT NULL AUTO_INCREMENT,\n      `foo` varbinary(16) NOT NULL DEFAULT '',\n      `bar` varbinary(16) NOT NULL DEFAULT '',\n      PRIMARY KEY (`id`),\n      KEY `idx_foo` (`foo`),\n      KEY `idx_foo_bar` (`foo`,`bar`),\n      KEY `idx_id_foo` (`id`,`foo`)\n    ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n  - table_data_size_mb: 0.015625\n  - table_index_size_mb: 0.046875\n\n------------------------------------------------------------\nmissing_primary_index \u2192 table affected: 0034_querycache\n\n\u2717 \"0034_querycache\" table does not have any primary or unique index\n\n  - schema: CREATE TABLE `0034_querycache` (\n      `qc_type` varbinary(32) NOT NULL,\n      `qc_value` int(10) unsigned NOT NULL DEFAULT '0',\n      `qc_namespace` int(11) NOT NULL DEFAULT '0',\n      `qc_title` varchar(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL DEFAULT '',\n      KEY `qc_type` (`qc_type`,`qc_value`)\n    ) ENGINE=InnoDB DEFAULT CHARSET=utf8\n\n------------------------------------------------------------\ntest_tables \u2192 table affected: 0075_some_guy_test_table\n\n\u2717 \"0075_some_guy_test_table\" seems to be a test table\n\n  - schema: CREATE TABLE `0075_some_guy_test_table` (\n      `id` int(9) NOT NULL AUTO_INCREMENT,\n      `name` varchar(255) NOT NULL,\n      PRIMARY KEY (`id`)\n    ) ENGINE=InnoDB DEFAULT CHARSET=utf8\n\n------------------------------------------------------------\nsingle_column \u2192 table affected: 0074_bag_of_ints\n\n\u2717 \"0074_bag_of_ints\" has just a single column\n\n  - schema: CREATE TABLE `0074_bag_of_ints` (\n      `id` int(9) NOT NULL AUTO_INCREMENT,\n      PRIMARY KEY (`id`)\n    ) ENGINE=InnoDB DEFAULT CHARSET=utf8\n\n------------------------------------------------------------\nempty_tables \u2192 table affected: 0089_empty_table\n\n\u2717 \"0089_empty_table\" table has no rows, is it really needed?\n\n  - schema: CREATE TABLE `0089_empty_table` (\n      `id` int(9) NOT NULL AUTO_INCREMENT,\n      PRIMARY KEY (`id`)\n    ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n\n------------------------------------------------------------\ngeneric_primary_key \u2192 table affected: 0094_generic_primary_key\n\n\u2717 \"0094_generic_primary_key\" has a primary key called id, use a more meaningful name\n\n  - schema: CREATE TABLE `0094_generic_primary_key` (\n      `id` int(9) NOT NULL AUTO_INCREMENT,\n      `foo` varchar(16) NOT NULL DEFAULT '',\n      PRIMARY KEY (`id`)\n    ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n\n------------------------------------------------------------\nuse_innodb \u2192 table affected: 0036_use_innodb_myisam\n\n\u2717 \"0036_use_innodb_myisam\" uses MyISAM storage engine\n\n  - schema: CREATE TABLE `0036_use_innodb_myisam` (\n      `item_id` int(9) NOT NULL AUTO_INCREMENT,\n      `foo` int(8) DEFAULT NULL,\n      PRIMARY KEY (`item_id`)\n    ) ENGINE=MyISAM DEFAULT CHARSET=latin1\n  - engine: MyISAM\n\n------------------------------------------------------------\nnot_used_indices \u2192 table affected: 0002_not_used_indices\n\n\u2717 \"test_id_idx\" index was not used by provided queries\n\n  - not_used_index: KEY test_id_idx (test, id)\n\n------------------------------------------------------------\nnot_used_tables \u2192 table affected: 0020_big_table\n\n\u2717 \"0020_big_table\" table was not used by provided queries\n\n  - schema: CREATE TABLE `0020_big_table` (\n      `id` int(9) NOT NULL AUTO_INCREMENT,\n      `val` int(9) NOT NULL,\n      `text` char(5) NOT NULL,\n      PRIMARY KEY (`id`),\n      KEY `text_idx` (`text`)\n    ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8\n  - table_size_mb: 5.03125\n  - rows_estimated: 100405\n\n------------------------------------------------------------\ninsert_ignore \u2192 table affected: 0070_insert_ignore\n\n\u2717 \"INSERT IGNORE INTO `0070_insert_ignore` VALUES (9,...\" query uses a risky INSERT IGNORE\n\n  - query: INSERT IGNORE INTO `0070_insert_ignore` VALUES (9, '123', '2017-01-01');\n  - schema: CREATE TABLE `0070_insert_ignore` (\n      `id` int(9) NOT NULL,\n      `text` char(5) NOT NULL,\n      `time` datetime DEFAULT NULL,\n      UNIQUE KEY `id` (`id`)\n    ) ENGINE=InnoDB DEFAULT CHARSET=utf8\n\n------------------------------------------------------------\nnon_utf_columns \u2192 table affected: 0032_latin1_table\n\n\u2717 \"name\" text column has \"latin1\" character set defined\n\n  - column: name\n  - column_character_set: latin1\n  - column_collation: latin1_swedish_ci\n  - schema: CREATE TABLE `0032_latin1_table` (\n      `item_id` int(9) NOT NULL AUTO_INCREMENT,\n      `name` varchar(255) DEFAULT NULL,\n      `utf8_column` varchar(255) CHARACTER SET utf8 COLLATE utf8_polish_ci NOT NULL,\n      `ucs2_column` varchar(255) CHARACTER SET ucs2 DEFAULT NULL,\n      `utf8mb4_column` varchar(255) CHARACTER SET utf8mb4 DEFAULT NULL,\n      `utf16_column` varchar(255) CHARACTER SET utf16 DEFAULT NULL,\n      `utf32_column` varchar(255) CHARACTER SET utf32 DEFAULT NULL,\n      `binary_column` varbinary(255) DEFAULT NULL,\n      `latin_blob` blob,\n      PRIMARY KEY (`item_id`)\n    ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n\n------------------------------------------------------------\n\n(...)\n\n------------------------------------------------------------\nqueries_using_filesort \u2192 table affected: 0020_big_table\n\n\u2717 \"SELECT val, count(*) FROM 0020_big_table WHERE id ...\" query used filesort\n\n  - query: SELECT val, count(*) FROM 0020_big_table WHERE id BETWEEN 10 AND 20 GROUP BY val\n  - explain_extra: Using where; Using temporary; Using filesort\n  - explain_rows: 11\n  - explain_filtered: None\n  - explain_key: PRIMARY\n\n------------------------------------------------------------\nqueries_using_temporary \u2192 table affected: 0020_big_table\n\n\u2717 \"SELECT val, count(*) FROM 0020_big_table WHERE id ...\" query used temporary\n\n  - query: SELECT val, count(*) FROM 0020_big_table WHERE id BETWEEN 10 AND 20 GROUP BY val\n  - explain_extra: Using where; Using temporary; Using filesort\n  - explain_rows: 11\n  - explain_filtered: None\n  - explain_key: PRIMARY\n\n------------------------------------------------------------\nqueries_using_full_table_scan \u2192 table affected: 0020_big_table\n\n\u2717 \"SELECT * FROM 0020_big_table\" query triggered full table scan\n\n  - query: SELECT * FROM 0020_big_table\n  - explain_rows: 9041\n\n------------------------------------------------------------\nselects_with_like \u2192 table affected: 0020_big_table\n\n\u2717 \"SELECT * FROM 0020_big_table WHERE text LIKE '%00'\" query uses LIKE with left-most wildcard\n\n  - query: SELECT * FROM 0020_big_table WHERE text LIKE '%00'\n  - explain_extra: Using where\n  - explain_rows: 100623\n\n------------------------------------------------------------\nselect_star \u2192 table affected: bar\n\n\u2717 \"SELECT t.* FROM bar AS t\" query uses SELECT *\n\n  - query: SELECT t.* FROM bar AS t;\n\n------------------------------------------------------------\nhaving_clause \u2192 table affected: sales\n\n\u2717 \"SELECT s.cust_id,count(s.cust_id) FROM SH.sales s ...\" query uses HAVING clause\n\n  - query: SELECT s.cust_id,count(s.cust_id) FROM SH.sales s GROUP BY s.cust_id HAVING s.cust_id != '1660' AND s.cust_id != '2'\n\n(...)\n\n------------------------------------------------------------\nlow_cardinality_index \u2192 table affected: 0020_big_table\n\n\u2717 \"num_idx\" index on \"num\" column has low cardinality, check if it is needed\n\n  - column_name: num\n  - index_name: num_idx\n  - index_cardinality: 2\n  - schema: CREATE TABLE `0020_big_table` (\n      `item_id` int(9) NOT NULL AUTO_INCREMENT,\n      `val` int(9) NOT NULL,\n      `text` char(5) NOT NULL,\n      `num` int(3) NOT NULL,\n      PRIMARY KEY (`item_id`),\n      KEY `text_idx` (`text`),\n      KEY `num_idx` (`num`)\n    ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8\n  - value_usage: 33.24788541334185\n\n(...)\n\n------------------------------------------------------------\ndata_too_old \u2192 table affected: 0028_data_too_old\n\n\u2717 \"0028_data_too_old\" has rows added 184 days ago, consider changing retention policy\n\n  - diff_days: 184\n  - data_since: 2017-08-17 12:03:44\n  - data_until: 2018-02-17 12:03:44\n  - date_column_name: timestamp\n  - schema: CREATE TABLE `0028_data_too_old` (\n      `item_id` int(8) unsigned NOT NULL AUTO_INCREMENT,\n      `cnt` int(8) unsigned NOT NULL,\n      `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\n      PRIMARY KEY (`item_id`)\n    ) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1\n  - rows: 4\n  - table_size_mb: 0.015625\n\n------------------------------------------------------------\ndata_not_updated_recently \u2192 table affected: 0028_data_not_updated_recently\n\n\u2717 \"0028_data_not_updated_recently\" has the latest row added 40 days ago, consider checking if it should be up-to-date\n\n  - diff_days: 40\n  - data_since: 2017-12-29 12:03:44\n  - data_until: 2018-01-08 12:03:44\n  - date_column_name: timestamp\n  - schema: CREATE TABLE `0028_data_not_updated_recently` (\n      `item_id` int(8) unsigned NOT NULL AUTO_INCREMENT,\n      `cnt` int(8) unsigned NOT NULL,\n      `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\n      PRIMARY KEY (`item_id`)\n    ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1\n  - rows: 3\n  - table_size_mb: 0.015625\n\n------------------------------------------------------------\nhigh_offset_selects \u2192 table affected: page\n\n\u2717 \"SELECT /* CategoryPaginationViewer::processSection...\" query uses too high offset impacting the performance\n\n  - query: SELECT /* CategoryPaginationViewer::processSection */  page_namespace,page_title,page_len,page_is_redirect,cl_sortkey_prefix  FROM `page` INNER JOIN `categorylinks` FORCE INDEX (cl_sortkey) ON ((cl_from = page_id))  WHERE cl_type = 'page' AND cl_to = 'Spotify/Song'  ORDER BY cl_sortkey LIMIT 927600,200\n  - limit: 200\n  - offset: 927600\n\n------------------------------------------------------------\nempty_database \u2192 table affected: index_digest_empty\n\n\u2717 \"index_digest_empty\" database has no tables\n\n------------------------------------------------------------\nQueries performed: 100\n```\n\n## Success stories\n\n> Want to add your entry here? Submit a pull request\n\n* By running `index-digest` at [Wikia](http://wikia.com) on shared database clusters (including tables storing ~450 mm of rows with 300+ GiB of data) we were able to [reclaim around 1.25 TiB of MySQL storage space across all replicas](https://medium.com/legacy-systems-diary/linting-your-database-schema-cd8947835a52).\n\n## Read more\n\n* [Percona Database Performance Blog](https://www.percona.com/blog/)\n* [High Performance MySQL, 3rd Edition by Vadim Tkachenko, Peter Zaitsev, Baron Schwartz](https://www.safaribooksonline.com/library/view/high-performance-mysql/9781449332471/ch05.html)\n* [Percona | Indexing 101: Optimizing MySQL queries on a single table](https://www.percona.com/blog/2015/04/27/indexing-101-optimizing-mysql-queries-on-a-single-table/)\n* [Percona | `pt-index-usage`](https://www.percona.com/doc/percona-toolkit/LATEST/pt-index-usage.html) / [find unused indexes](https://www.percona.com/blog/2012/06/30/find-unused-indexes/)\n\n### Slides\n\n* [Percona | MySQL Indexing: Best Practices](https://www.percona.com/files/presentations/WEBINAR-MySQL-Indexing-Best-Practices.pdf)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Analyses your database queries and schema and suggests indices and schema improvements",
    "version": "1.6.0",
    "project_urls": {
        "Homepage": "https://github.com/macbre/index-digest"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ca5e34f2a729bc714cba78258af872753257bc9a257c7142d254770e6f872f8a",
                "md5": "6db4a0aa24955d31c33e399c05bd5366",
                "sha256": "1a996512ae34ccc6d28f9d3e244982493191392090fb6d864330767e8dabc64d"
            },
            "downloads": -1,
            "filename": "indexdigest-1.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6db4a0aa24955d31c33e399c05bd5366",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 73556,
            "upload_time": "2023-09-13T11:27:15",
            "upload_time_iso_8601": "2023-09-13T11:27:15.630058Z",
            "url": "https://files.pythonhosted.org/packages/ca/5e/34f2a729bc714cba78258af872753257bc9a257c7142d254770e6f872f8a/indexdigest-1.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f1873466edcf6ceb82147a98978201b039d94838c524c7dff982fd8bf4fd5b3f",
                "md5": "f58ece2292f116f474d9b3be867fac4a",
                "sha256": "8817f2a0313d669161fb194cb30950ee46597e9358fe822d1a014ca31c45edfa"
            },
            "downloads": -1,
            "filename": "indexdigest-1.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f58ece2292f116f474d9b3be867fac4a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 36578,
            "upload_time": "2023-09-13T11:27:16",
            "upload_time_iso_8601": "2023-09-13T11:27:16.850799Z",
            "url": "https://files.pythonhosted.org/packages/f1/87/3466edcf6ceb82147a98978201b039d94838c524c7dff982fd8bf4fd5b3f/indexdigest-1.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-13 11:27:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "macbre",
    "github_project": "index-digest",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "indexdigest"
}

Maciej Brencz