# index-digest
[![PyPI](https://img.shields.io/pypi/v/indexdigest.svg)](https://pypi.python.org/pypi/indexdigest)
[![Docker Hub](https://img.shields.io/docker/pulls/macbre/index-digest.svg)](https://hub.docker.com/r/macbre/index-digest/)
[![Coverage Status](https://coveralls.io/repos/github/macbre/index-digest/badge.svg?branch=master)](https://coveralls.io/github/macbre/index-digest?branch=master)
Analyses your database queries and schema and suggests indices improvements. You can use `index-digest` as **your database linter**. The goal is to **provide the user with actionable reports** instead of just a list of statistics and schema details. Inspired by [Percona's `pt-index-usage`](https://www.percona.com/doc/percona-toolkit/LATEST/pt-index-usage.html).
**NEW** You can also [use `index-digest` as GitHub's Action](https://github.com/marketplace/actions/index-digest).
## What this tool does
`index-digest` does the following:
* it checks the schema of all tables in a given database and suggests improvements (e.g. removal of redundant indices, adding a primary key to ease replication, dropping tables with just a single column or no rows)
* if provided with SQL queries log (via `--sql-log` option) it:
* checks if all tables, columns and indices are used by these queries
* reports text columns with character set different than `utf`
* reports queries that do not use indices
* reports queries that use filesort, temporary file or full table scan
* reports queries that are not quite kosher (e.g. `LIKE "%foo%"`, `INSERT IGNORE`, `SELECT *`, `HAVING` clause, high `OFFSET` in pagination queries)
* if run with `--analyze-data` switch it:
* reports tables with old data (by querying for `MIN()` value of time column) where data retency can be reviewed
* reports tables with not up-to-date data (by querying for `MAX()` value of time column)
* if run with `--check-empty-databases` switch it:
* report empty databases on the current MySQL server
This tool **supports MySQL 5.7, 8.0, 8.1, [Percona Server](https://www.percona.com/software/mysql-database/percona-server) 8.0 and MariaDB 10.1, 10.2, 10.5, 10.6** and runs under **Python 3.8+**.
Results can be reported in a human-readable form, as YAML or sent to syslog and later aggregated & processed using ELK stack.
## Requirements & install
### From `pypi`
```
pip install indexdigest
```
### From git
```
git clone git@github.com:macbre/index-digest.git && cd index-digest
sudo apt-get install libmysqlclient-dev python3-dev virtualenv
virtualenv -ppython3 env
source env/bin/activate
make install
```
When using MacOS, you should follow [this `mysql_config` installation steps](https://stackoverflow.com/a/25491082).
#### Running tests
**We assume that the test database is running locally on port 53306**. You can use the following to test your changes locally before pushing them (this one uses MySQL 8.0.20):
```
docker run --rm -p 53306:3306 --health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=3 -e "MYSQL_ALLOW_EMPTY_PASSWORD=yes" -e "MYSQL_DATABASE=index_digest" --name=index_digest_mysql mysql:8.0.22 "--default-authentication-plugin=mysql_native_password"
```
Wait until the server is up and running.
```
mysql --protocol=tcp --port=53306 -u root --password="" -v < setup.sql
./sql/populate.sh
mysql --protocol=tcp --port=53306 -uindex_digest -pqwerty index_digest -v -e '\s; SHOW TABLES; SHOW DATABASES;'
make test
```
### Using Docker
> See https://hub.docker.com/r/macbre/index-digest/
```
$ docker run --network=host -t macbre/index-digest:latest mysql://index_digest:qwerty@debian/index_digest | head -n 20
------------------------------------------------------------
Found 61 issue(s) to report for "index_digest" database
------------------------------------------------------------
MySQL v5.7.22 at debian
index-digest v1.2.0
------------------------------------------------------------
redundant_indices → table affected: 0004_id_foo
✗ "idx" index can be removed as redundant (covered by "PRIMARY")
- redundant: UNIQUE KEY idx (item_id, foo)
- covered_by: PRIMARY KEY (item_id, foo)
- schema: CREATE TABLE `0004_id_foo` (
`item_id` int(9) NOT NULL AUTO_INCREMENT,
`foo` varbinary(16) NOT NULL DEFAULT '',
PRIMARY KEY (`item_id`,`foo`),
UNIQUE KEY `idx` (`item_id`,`foo`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
- table_data_size_mb: 0.015625
- table_index_size_mb: 0.015625
...
```
## How to run it?
```
$ index_digest -h
index_digest
Analyses your database queries and schema and suggests indices improvements.
Usage:
index_digest DSN [--sql-log=<file>] [--format=<formatter>] [--analyze-data] [--checks=<checks> | --skip-checks=<skip-checks>] [--tables=<tables> | --skip-tables=<skip-tables>]
index_digest (-h | --help)
index_digest --version
Options:
DSN Data Source Name of database to check
--sql-log=<file> Text file with SQL queries to check against the database
--format=<formatter> Use a given results formatter (plain, syslog, yaml)
--analyze-data Run additional checks that will query table data (can be slow!)
--checks=<list> Comma-separated lists of checks to report
--skip-checks=<list> Comma-separated lists of checks to skip from report
--tables=<list> Comma-separated lists of tables to report
--skip-tables=<list> Comma-separated lists of tables to skip from report
-h --help Show this screen.
--version Show version.
Examples:
index_digest mysql://username:password@localhost/dbname
index_digest mysql://index_digest:qwerty@localhost/index_digest --sql-log=sql.log
index_digest mysql://index_digest:qwerty@localhost/index_digest --skip-checks=non_utf_columns
index_digest mysql://index_digest:qwerty@localhost/index_digest --analyze-data --checks=data_too_old,data_not_updated_recently
index_digest mysql://index_digest:qwerty@localhost/index_digest --analyze-data --skip-tables=DATABASECHANGELOG,DATABASECHANGELOGLOCK
Visit <https://github.com/macbre/index-digest>
```
## SQL query log
It's a text file with a single SQL query in each line (no line breaks are allowed). Lines that do start with `--` (SQL comment) are ignored. The file can be [generated using `query-digest` when `--sql-log` output mode is selected](https://github.com/macbre/query-digest#output-modes).
An example:
```sql
-- A comment
select * from 0002_not_used_indices order by id
select * from 0002_not_used_indices where foo = 'foo' and id = 2
select count(*) from 0002_not_used_indices where foo = 'foo'
/* foo bar */ select * from 0002_not_used_indices where bar = 'foo'
INSERT IGNORE INTO `0070_insert_ignore` VALUES ('123', 9, '2017-01-01');
```
### From [MySQL slow query log](https://dev.mysql.com/doc/refman/8.0/en/slow-query-log.html)
MySQL's slow query log needs to be pre-processed first (to remove comments and timestamps):
```
cat mysql-slow.log | egrep -v '^(SET timestamp|#|throttle: )' > queries.log
```
Then you can run `index_digest --sql-log=queries.log ...`.
## Formatters
`index-digest` can return results in various formats (use `--format` to choose one).
### plain
Emits human-readable report to a console. You can disable colored and bold text by setting env variable `ANSI_COLORS_DISABLED=1`.
### syslog
Pushes JSON-formatted messages via syslog, so they can be aggregated using ELK stack.
Use `SYSLOG_IDENT` env variable to customize syslog's `ident` messages are sent with (defaults to `index-digest`).
```
Dec 28 15:59:58 debian index-digest[17485]: {"meta": {"version": "index-digest v0.1.0", "database_name": "index_digest", "database_host": "debian", "database_version": "MySQL v5.7.20"}, "report": {"type": "redundant_indices", "table": "0004_id_foo", "message": "\"idx\" index can be removed as redundant (covered by \"PRIMARY\")", "context": {"redundant": "UNIQUE KEY idx (id, foo)", "covered_by": "PRIMARY KEY (id, foo)", "schema": "CREATE TABLE `0004_id_foo` (\n `id` int(9) NOT NULL AUTO_INCREMENT,\n `foo` varbinary(16) NOT NULL DEFAULT '',\n PRIMARY KEY (`id`,`foo`),\n UNIQUE KEY `idx` (`id`,`foo`)\n) ENGINE=InnoDB DEFAULT CHARSET=latin1", "table_data_size_mb": 0.015625, "table_index_size_mb": 0.015625}}}
```
### yaml
Outputs YML file with results and metadata.
## Checks
You can select which checks should be reported by the tool by using `--checks` command line option. Certain checks can also be skipped via `--skip-checks` option. Refer to `index_digest --help` for examples.
> **Number of checks**: 24
* `redundant_indices`: reports indices that are redundant and covered by other
* `non_utf_columns`: reports text columns that have characters encoding set to `latin1` (utf is the way to go)
* `missing_primary_index`: reports tables with no primary or unique key (see [MySQL bug #76252](https://bugs.mysql.com/bug.php?id=76252) and [Wikia/app#9863](https://github.com/Wikia/app/pull/9863)). [Primary keys can be enforced on MySQL config level](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_sql_require_primary_key) since 8.0.13 (via `sql_require_primary_key` variable).
* `test_tables`: reports tables that seem to be test leftovers (e.g. `some_guy_test_table`)
* `single_column`: reports tables with just a single column
* `empty_tables`: reports tables with no rows
* `generic_primary_key`: reports tables with [a primary key on `id` column](https://github.com/jarulraj/sqlcheck/blob/master/docs/logical/1004.md) (a more meaningful name should be used)
* `use_innodb`: reports table using storage engines different than `InnoDB` (a default for MySQL 5.5+ and MariaDB 10.2+)
* `low_cardinality_index`: reports [indices with low cardinality](https://github.com/macbre/index-digest/issues/31)
### Additional checks performed on SQL log
> You need to provide SQL log file via `--sql-log` option
* `not_used_columns`: checks which columns were not used by SELECT queries
* `not_used_indices`: checks which indices are not used by SELECT queries
* `not_used_tables`: checks which tables are not used by SELECT queries
* `queries_not_using_index`: reports SELECT queries that do not use any index
* `queries_using_filesort`: reports SELECT queries that require filesort ([a sort can’t be performed from an index and quicksort is used](https://www.percona.com/blog/2009/03/05/what-does-using-filesort-mean-in-mysql/))
* `queries_using_temporary`: reports SELECT queries that require a temporary table to hold the result
* `queries_using_full_table_scan`: reports SELECT queries that require a [full table scan](https://dev.mysql.com/doc/refman/5.7/en/table-scan-avoidance.html)
* `selects_with_like`: reports SELECT queries that use `LIKE '%foo'` conditions (they can not use an index)
* `insert_ignore`: reports [queries using `INSERT IGNORE`](https://medium.com/legacy-systems-diary/things-to-avoid-episode-1-insert-ignore-535b4c24406b)
* `select_star`: reports [queries using `SELECT *`](https://github.com/jarulraj/sqlcheck/blob/master/docs/query/3001.md)
* `having_clause`: reports [queries using `HAVING` clause](https://github.com/jarulraj/sqlcheck/blob/master/docs/query/3012.md)
* `high_offset_selects`: report [SELECT queries using high OFFSET](https://www.percona.com/blog/2008/09/24/four-ways-to-optimize-paginated-displays/)
### Additional checks performed on tables data
> You need to use `--analyze-data` command line switch. Please note that these checks will query your tables. **These checks can take a while if queried columns are not indexed**.
* `data_too_old`: reports tables that have really old data, maybe it's worth checking if such long data retention is actually needed (**defaults to three months threshold**, can be customized via `INDEX_DIGEST_DATA_TOO_OLD_THRESHOLD_DAYS` env variable)
* `data_not_updated_recently`: reports tables that were not updated recently, check if it should be up-to-date (**defaults a month threshold**, can be customized via `INDEX_DIGEST_DATA_NOT_UPDATED_RECENTLY_THRESHOLD_DAYS` env variable)
### Additional checks performed across database on the current MySQL server
> You need to use `--check-empty-databases` command line switch.
* `empty_database`: reports databases that have no `BASE TABLE` tables (as provided by `information_schema.TABLES`)
## An example report
```sql
$ index_digest mysql://index_digest:qwerty@localhost/index_digest --sql-log sql/0002-not-used-indices-log
------------------------------------------------------------
Found 85 issue(s) to report for "index_digest" database
------------------------------------------------------------
MySQL v5.7.21 at debian
index-digest v1.0.0
------------------------------------------------------------
redundant_indices → table affected: 0004_id_foo
✗ "idx" index can be removed as redundant (covered by "PRIMARY")
- redundant: UNIQUE KEY idx (id, foo)
- covered_by: PRIMARY KEY (id, foo)
- schema: CREATE TABLE `0004_id_foo` (
`id` int(9) NOT NULL AUTO_INCREMENT,
`foo` varbinary(16) NOT NULL DEFAULT '',
PRIMARY KEY (`id`,`foo`),
UNIQUE KEY `idx` (`id`,`foo`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
- table_data_size_mb: 0.015625
- table_index_size_mb: 0.015625
------------------------------------------------------------
redundant_indices → table affected: 0004_id_foo_bar
✗ "idx_foo" index can be removed as redundant (covered by "idx_foo_bar")
- redundant: KEY idx_foo (foo)
- covered_by: KEY idx_foo_bar (foo, bar)
- schema: CREATE TABLE `0004_id_foo_bar` (
`id` int(9) NOT NULL AUTO_INCREMENT,
`foo` varbinary(16) NOT NULL DEFAULT '',
`bar` varbinary(16) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx_foo` (`foo`),
KEY `idx_foo_bar` (`foo`,`bar`),
KEY `idx_id_foo` (`id`,`foo`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
- table_data_size_mb: 0.015625
- table_index_size_mb: 0.046875
------------------------------------------------------------
missing_primary_index → table affected: 0034_querycache
✗ "0034_querycache" table does not have any primary or unique index
- schema: CREATE TABLE `0034_querycache` (
`qc_type` varbinary(32) NOT NULL,
`qc_value` int(10) unsigned NOT NULL DEFAULT '0',
`qc_namespace` int(11) NOT NULL DEFAULT '0',
`qc_title` varchar(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL DEFAULT '',
KEY `qc_type` (`qc_type`,`qc_value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
------------------------------------------------------------
test_tables → table affected: 0075_some_guy_test_table
✗ "0075_some_guy_test_table" seems to be a test table
- schema: CREATE TABLE `0075_some_guy_test_table` (
`id` int(9) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
------------------------------------------------------------
single_column → table affected: 0074_bag_of_ints
✗ "0074_bag_of_ints" has just a single column
- schema: CREATE TABLE `0074_bag_of_ints` (
`id` int(9) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
------------------------------------------------------------
empty_tables → table affected: 0089_empty_table
✗ "0089_empty_table" table has no rows, is it really needed?
- schema: CREATE TABLE `0089_empty_table` (
`id` int(9) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
------------------------------------------------------------
generic_primary_key → table affected: 0094_generic_primary_key
✗ "0094_generic_primary_key" has a primary key called id, use a more meaningful name
- schema: CREATE TABLE `0094_generic_primary_key` (
`id` int(9) NOT NULL AUTO_INCREMENT,
`foo` varchar(16) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
------------------------------------------------------------
use_innodb → table affected: 0036_use_innodb_myisam
✗ "0036_use_innodb_myisam" uses MyISAM storage engine
- schema: CREATE TABLE `0036_use_innodb_myisam` (
`item_id` int(9) NOT NULL AUTO_INCREMENT,
`foo` int(8) DEFAULT NULL,
PRIMARY KEY (`item_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
- engine: MyISAM
------------------------------------------------------------
not_used_indices → table affected: 0002_not_used_indices
✗ "test_id_idx" index was not used by provided queries
- not_used_index: KEY test_id_idx (test, id)
------------------------------------------------------------
not_used_tables → table affected: 0020_big_table
✗ "0020_big_table" table was not used by provided queries
- schema: CREATE TABLE `0020_big_table` (
`id` int(9) NOT NULL AUTO_INCREMENT,
`val` int(9) NOT NULL,
`text` char(5) NOT NULL,
PRIMARY KEY (`id`),
KEY `text_idx` (`text`)
) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8
- table_size_mb: 5.03125
- rows_estimated: 100405
------------------------------------------------------------
insert_ignore → table affected: 0070_insert_ignore
✗ "INSERT IGNORE INTO `0070_insert_ignore` VALUES (9,..." query uses a risky INSERT IGNORE
- query: INSERT IGNORE INTO `0070_insert_ignore` VALUES (9, '123', '2017-01-01');
- schema: CREATE TABLE `0070_insert_ignore` (
`id` int(9) NOT NULL,
`text` char(5) NOT NULL,
`time` datetime DEFAULT NULL,
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
------------------------------------------------------------
non_utf_columns → table affected: 0032_latin1_table
✗ "name" text column has "latin1" character set defined
- column: name
- column_character_set: latin1
- column_collation: latin1_swedish_ci
- schema: CREATE TABLE `0032_latin1_table` (
`item_id` int(9) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`utf8_column` varchar(255) CHARACTER SET utf8 COLLATE utf8_polish_ci NOT NULL,
`ucs2_column` varchar(255) CHARACTER SET ucs2 DEFAULT NULL,
`utf8mb4_column` varchar(255) CHARACTER SET utf8mb4 DEFAULT NULL,
`utf16_column` varchar(255) CHARACTER SET utf16 DEFAULT NULL,
`utf32_column` varchar(255) CHARACTER SET utf32 DEFAULT NULL,
`binary_column` varbinary(255) DEFAULT NULL,
`latin_blob` blob,
PRIMARY KEY (`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
------------------------------------------------------------
(...)
------------------------------------------------------------
queries_using_filesort → table affected: 0020_big_table
✗ "SELECT val, count(*) FROM 0020_big_table WHERE id ..." query used filesort
- query: SELECT val, count(*) FROM 0020_big_table WHERE id BETWEEN 10 AND 20 GROUP BY val
- explain_extra: Using where; Using temporary; Using filesort
- explain_rows: 11
- explain_filtered: None
- explain_key: PRIMARY
------------------------------------------------------------
queries_using_temporary → table affected: 0020_big_table
✗ "SELECT val, count(*) FROM 0020_big_table WHERE id ..." query used temporary
- query: SELECT val, count(*) FROM 0020_big_table WHERE id BETWEEN 10 AND 20 GROUP BY val
- explain_extra: Using where; Using temporary; Using filesort
- explain_rows: 11
- explain_filtered: None
- explain_key: PRIMARY
------------------------------------------------------------
queries_using_full_table_scan → table affected: 0020_big_table
✗ "SELECT * FROM 0020_big_table" query triggered full table scan
- query: SELECT * FROM 0020_big_table
- explain_rows: 9041
------------------------------------------------------------
selects_with_like → table affected: 0020_big_table
✗ "SELECT * FROM 0020_big_table WHERE text LIKE '%00'" query uses LIKE with left-most wildcard
- query: SELECT * FROM 0020_big_table WHERE text LIKE '%00'
- explain_extra: Using where
- explain_rows: 100623
------------------------------------------------------------
select_star → table affected: bar
✗ "SELECT t.* FROM bar AS t" query uses SELECT *
- query: SELECT t.* FROM bar AS t;
------------------------------------------------------------
having_clause → table affected: sales
✗ "SELECT s.cust_id,count(s.cust_id) FROM SH.sales s ..." query uses HAVING clause
- query: SELECT s.cust_id,count(s.cust_id) FROM SH.sales s GROUP BY s.cust_id HAVING s.cust_id != '1660' AND s.cust_id != '2'
(...)
------------------------------------------------------------
low_cardinality_index → table affected: 0020_big_table
✗ "num_idx" index on "num" column has low cardinality, check if it is needed
- column_name: num
- index_name: num_idx
- index_cardinality: 2
- schema: CREATE TABLE `0020_big_table` (
`item_id` int(9) NOT NULL AUTO_INCREMENT,
`val` int(9) NOT NULL,
`text` char(5) NOT NULL,
`num` int(3) NOT NULL,
PRIMARY KEY (`item_id`),
KEY `text_idx` (`text`),
KEY `num_idx` (`num`)
) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8
- value_usage: 33.24788541334185
(...)
------------------------------------------------------------
data_too_old → table affected: 0028_data_too_old
✗ "0028_data_too_old" has rows added 184 days ago, consider changing retention policy
- diff_days: 184
- data_since: 2017-08-17 12:03:44
- data_until: 2018-02-17 12:03:44
- date_column_name: timestamp
- schema: CREATE TABLE `0028_data_too_old` (
`item_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
`cnt` int(8) unsigned NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1
- rows: 4
- table_size_mb: 0.015625
------------------------------------------------------------
data_not_updated_recently → table affected: 0028_data_not_updated_recently
✗ "0028_data_not_updated_recently" has the latest row added 40 days ago, consider checking if it should be up-to-date
- diff_days: 40
- data_since: 2017-12-29 12:03:44
- data_until: 2018-01-08 12:03:44
- date_column_name: timestamp
- schema: CREATE TABLE `0028_data_not_updated_recently` (
`item_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
`cnt` int(8) unsigned NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
- rows: 3
- table_size_mb: 0.015625
------------------------------------------------------------
high_offset_selects → table affected: page
✗ "SELECT /* CategoryPaginationViewer::processSection..." query uses too high offset impacting the performance
- query: SELECT /* CategoryPaginationViewer::processSection */ page_namespace,page_title,page_len,page_is_redirect,cl_sortkey_prefix FROM `page` INNER JOIN `categorylinks` FORCE INDEX (cl_sortkey) ON ((cl_from = page_id)) WHERE cl_type = 'page' AND cl_to = 'Spotify/Song' ORDER BY cl_sortkey LIMIT 927600,200
- limit: 200
- offset: 927600
------------------------------------------------------------
empty_database → table affected: index_digest_empty
✗ "index_digest_empty" database has no tables
------------------------------------------------------------
Queries performed: 100
```
## Success stories
> Want to add your entry here? Submit a pull request
* By running `index-digest` at [Wikia](http://wikia.com) on shared database clusters (including tables storing ~450 mm of rows with 300+ GiB of data) we were able to [reclaim around 1.25 TiB of MySQL storage space across all replicas](https://medium.com/legacy-systems-diary/linting-your-database-schema-cd8947835a52).
## Read more
* [Percona Database Performance Blog](https://www.percona.com/blog/)
* [High Performance MySQL, 3rd Edition by Vadim Tkachenko, Peter Zaitsev, Baron Schwartz](https://www.safaribooksonline.com/library/view/high-performance-mysql/9781449332471/ch05.html)
* [Percona | Indexing 101: Optimizing MySQL queries on a single table](https://www.percona.com/blog/2015/04/27/indexing-101-optimizing-mysql-queries-on-a-single-table/)
* [Percona | `pt-index-usage`](https://www.percona.com/doc/percona-toolkit/LATEST/pt-index-usage.html) / [find unused indexes](https://www.percona.com/blog/2012/06/30/find-unused-indexes/)
### Slides
* [Percona | MySQL Indexing: Best Practices](https://www.percona.com/files/presentations/WEBINAR-MySQL-Indexing-Best-Practices.pdf)
Raw data
{
"_id": null,
"home_page": "https://github.com/macbre/index-digest",
"name": "indexdigest",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "",
"author": "Maciej Brencz",
"author_email": "maciej.brencz@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f1/87/3466edcf6ceb82147a98978201b039d94838c524c7dff982fd8bf4fd5b3f/indexdigest-1.6.0.tar.gz",
"platform": null,
"description": "# index-digest\n\n[![PyPI](https://img.shields.io/pypi/v/indexdigest.svg)](https://pypi.python.org/pypi/indexdigest)\n[![Docker Hub](https://img.shields.io/docker/pulls/macbre/index-digest.svg)](https://hub.docker.com/r/macbre/index-digest/)\n[![Coverage Status](https://coveralls.io/repos/github/macbre/index-digest/badge.svg?branch=master)](https://coveralls.io/github/macbre/index-digest?branch=master)\n\nAnalyses your database queries and schema and suggests indices improvements. You can use `index-digest` as **your database linter**. The goal is to **provide the user with actionable reports** instead of just a list of statistics and schema details. Inspired by [Percona's `pt-index-usage`](https://www.percona.com/doc/percona-toolkit/LATEST/pt-index-usage.html).\n\n**NEW** You can also [use `index-digest` as GitHub's Action](https://github.com/marketplace/actions/index-digest).\n\n## What this tool does\n\n`index-digest` does the following:\n\n* it checks the schema of all tables in a given database and suggests improvements (e.g. removal of redundant indices, adding a primary key to ease replication, dropping tables with just a single column or no rows)\n* if provided with SQL queries log (via `--sql-log` option) it:\n * checks if all tables, columns and indices are used by these queries\n * reports text columns with character set different than `utf`\n * reports queries that do not use indices\n * reports queries that use filesort, temporary file or full table scan\n * reports queries that are not quite kosher (e.g. `LIKE \"%foo%\"`, `INSERT IGNORE`, `SELECT *`, `HAVING` clause, high `OFFSET` in pagination queries)\n* if run with `--analyze-data` switch it:\n * reports tables with old data (by querying for `MIN()` value of time column) where data retency can be reviewed\n * reports tables with not up-to-date data (by querying for `MAX()` value of time column)\n* if run with `--check-empty-databases` switch it:\n * report empty databases on the current MySQL server\n\nThis tool **supports MySQL 5.7, 8.0, 8.1, [Percona Server](https://www.percona.com/software/mysql-database/percona-server) 8.0 and MariaDB 10.1, 10.2, 10.5, 10.6** and runs under **Python 3.8+**.\n\nResults can be reported in a human-readable form, as YAML or sent to syslog and later aggregated & processed using ELK stack.\n\n## Requirements & install\n\n### From `pypi`\n\n```\npip install indexdigest\n```\n\n### From git\n\n```\ngit clone git@github.com:macbre/index-digest.git && cd index-digest\nsudo apt-get install libmysqlclient-dev python3-dev virtualenv\n\nvirtualenv -ppython3 env\nsource env/bin/activate\nmake install\n```\n\nWhen using MacOS, you should follow [this `mysql_config` installation steps](https://stackoverflow.com/a/25491082).\n\n#### Running tests\n\n**We assume that the test database is running locally on port 53306**. You can use the following to test your changes locally before pushing them (this one uses MySQL 8.0.20):\n\n```\ndocker run --rm -p 53306:3306 --health-cmd=\"mysqladmin ping\" --health-interval=10s --health-timeout=5s --health-retries=3 -e \"MYSQL_ALLOW_EMPTY_PASSWORD=yes\" -e \"MYSQL_DATABASE=index_digest\" --name=index_digest_mysql mysql:8.0.22 \"--default-authentication-plugin=mysql_native_password\"\n```\n\nWait until the server is up and running.\n\n```\nmysql --protocol=tcp --port=53306 -u root --password=\"\" -v < setup.sql\n./sql/populate.sh\nmysql --protocol=tcp --port=53306 -uindex_digest -pqwerty index_digest -v -e '\\s; SHOW TABLES; SHOW DATABASES;'\n\nmake test\n```\n\n### Using Docker\n\n> See https://hub.docker.com/r/macbre/index-digest/\n\n```\n$ docker run --network=host -t macbre/index-digest:latest mysql://index_digest:qwerty@debian/index_digest | head -n 20\n------------------------------------------------------------\nFound 61 issue(s) to report for \"index_digest\" database\n------------------------------------------------------------\nMySQL v5.7.22 at debian\nindex-digest v1.2.0\n------------------------------------------------------------\nredundant_indices \u2192 table affected: 0004_id_foo\n\n\u2717 \"idx\" index can be removed as redundant (covered by \"PRIMARY\")\n\n - redundant: UNIQUE KEY idx (item_id, foo)\n - covered_by: PRIMARY KEY (item_id, foo)\n - schema: CREATE TABLE `0004_id_foo` (\n `item_id` int(9) NOT NULL AUTO_INCREMENT,\n `foo` varbinary(16) NOT NULL DEFAULT '',\n PRIMARY KEY (`item_id`,`foo`),\n UNIQUE KEY `idx` (`item_id`,`foo`)\n ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n - table_data_size_mb: 0.015625\n - table_index_size_mb: 0.015625\n...\n```\n\n## How to run it?\n\n```\n$ index_digest -h\nindex_digest\n\nAnalyses your database queries and schema and suggests indices improvements.\n\nUsage:\n index_digest DSN [--sql-log=<file>] [--format=<formatter>] [--analyze-data] [--checks=<checks> | --skip-checks=<skip-checks>] [--tables=<tables> | --skip-tables=<skip-tables>]\n index_digest (-h | --help)\n index_digest --version\n\nOptions:\n DSN Data Source Name of database to check\n --sql-log=<file> Text file with SQL queries to check against the database\n --format=<formatter> Use a given results formatter (plain, syslog, yaml)\n --analyze-data Run additional checks that will query table data (can be slow!)\n --checks=<list> Comma-separated lists of checks to report\n --skip-checks=<list> Comma-separated lists of checks to skip from report\n --tables=<list> Comma-separated lists of tables to report\n --skip-tables=<list> Comma-separated lists of tables to skip from report\n -h --help Show this screen.\n --version Show version.\n\nExamples:\n index_digest mysql://username:password@localhost/dbname\n index_digest mysql://index_digest:qwerty@localhost/index_digest --sql-log=sql.log\n index_digest mysql://index_digest:qwerty@localhost/index_digest --skip-checks=non_utf_columns\n index_digest mysql://index_digest:qwerty@localhost/index_digest --analyze-data --checks=data_too_old,data_not_updated_recently\n index_digest mysql://index_digest:qwerty@localhost/index_digest --analyze-data --skip-tables=DATABASECHANGELOG,DATABASECHANGELOGLOCK\n\nVisit <https://github.com/macbre/index-digest>\n```\n\n## SQL query log\n\nIt's a text file with a single SQL query in each line (no line breaks are allowed). Lines that do start with `--` (SQL comment) are ignored. The file can be [generated using `query-digest` when `--sql-log` output mode is selected](https://github.com/macbre/query-digest#output-modes).\n\nAn example:\n\n```sql\n-- A comment\nselect * from 0002_not_used_indices order by id\nselect * from 0002_not_used_indices where foo = 'foo' and id = 2\nselect count(*) from 0002_not_used_indices where foo = 'foo'\n/* foo bar */ select * from 0002_not_used_indices where bar = 'foo'\nINSERT IGNORE INTO `0070_insert_ignore` VALUES ('123', 9, '2017-01-01');\n```\n\n### From [MySQL slow query log](https://dev.mysql.com/doc/refman/8.0/en/slow-query-log.html)\n\nMySQL's slow query log needs to be pre-processed first (to remove comments and timestamps):\n\n```\ncat mysql-slow.log | egrep -v '^(SET timestamp|#|throttle: )' > queries.log\n```\n\nThen you can run `index_digest --sql-log=queries.log ...`.\n\n## Formatters\n\n`index-digest` can return results in various formats (use `--format` to choose one).\n\n### plain\n\nEmits human-readable report to a console. You can disable colored and bold text by setting env variable `ANSI_COLORS_DISABLED=1`.\n\n### syslog\n\nPushes JSON-formatted messages via syslog, so they can be aggregated using ELK stack.\nUse `SYSLOG_IDENT` env variable to customize syslog's `ident` messages are sent with (defaults to `index-digest`).\n\n```\nDec 28 15:59:58 debian index-digest[17485]: {\"meta\": {\"version\": \"index-digest v0.1.0\", \"database_name\": \"index_digest\", \"database_host\": \"debian\", \"database_version\": \"MySQL v5.7.20\"}, \"report\": {\"type\": \"redundant_indices\", \"table\": \"0004_id_foo\", \"message\": \"\\\"idx\\\" index can be removed as redundant (covered by \\\"PRIMARY\\\")\", \"context\": {\"redundant\": \"UNIQUE KEY idx (id, foo)\", \"covered_by\": \"PRIMARY KEY (id, foo)\", \"schema\": \"CREATE TABLE `0004_id_foo` (\\n `id` int(9) NOT NULL AUTO_INCREMENT,\\n `foo` varbinary(16) NOT NULL DEFAULT '',\\n PRIMARY KEY (`id`,`foo`),\\n UNIQUE KEY `idx` (`id`,`foo`)\\n) ENGINE=InnoDB DEFAULT CHARSET=latin1\", \"table_data_size_mb\": 0.015625, \"table_index_size_mb\": 0.015625}}}\n```\n\n### yaml\n\nOutputs YML file with results and metadata.\n\n## Checks\n\nYou can select which checks should be reported by the tool by using `--checks` command line option. Certain checks can also be skipped via `--skip-checks` option. Refer to `index_digest --help` for examples.\n\n> **Number of checks**: 24\n\n* `redundant_indices`: reports indices that are redundant and covered by other\n* `non_utf_columns`: reports text columns that have characters encoding set to `latin1` (utf is the way to go)\n* `missing_primary_index`: reports tables with no primary or unique key (see [MySQL bug #76252](https://bugs.mysql.com/bug.php?id=76252) and [Wikia/app#9863](https://github.com/Wikia/app/pull/9863)). [Primary keys can be enforced on MySQL config level](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_sql_require_primary_key) since 8.0.13 (via `sql_require_primary_key` variable).\n* `test_tables`: reports tables that seem to be test leftovers (e.g. `some_guy_test_table`)\n* `single_column`: reports tables with just a single column\n* `empty_tables`: reports tables with no rows\n* `generic_primary_key`: reports tables with [a primary key on `id` column](https://github.com/jarulraj/sqlcheck/blob/master/docs/logical/1004.md) (a more meaningful name should be used)\n* `use_innodb`: reports table using storage engines different than `InnoDB` (a default for MySQL 5.5+ and MariaDB 10.2+)\n* `low_cardinality_index`: reports [indices with low cardinality](https://github.com/macbre/index-digest/issues/31)\n\n### Additional checks performed on SQL log\n\n> You need to provide SQL log file via `--sql-log` option\n\n* `not_used_columns`: checks which columns were not used by SELECT queries\n* `not_used_indices`: checks which indices are not used by SELECT queries\n* `not_used_tables`: checks which tables are not used by SELECT queries\n* `queries_not_using_index`: reports SELECT queries that do not use any index\n* `queries_using_filesort`: reports SELECT queries that require filesort ([a sort can\u2019t be performed from an index and quicksort is used](https://www.percona.com/blog/2009/03/05/what-does-using-filesort-mean-in-mysql/))\n* `queries_using_temporary`: reports SELECT queries that require a temporary table to hold the result\n* `queries_using_full_table_scan`: reports SELECT queries that require a [full table scan](https://dev.mysql.com/doc/refman/5.7/en/table-scan-avoidance.html)\n* `selects_with_like`: reports SELECT queries that use `LIKE '%foo'` conditions (they can not use an index)\n* `insert_ignore`: reports [queries using `INSERT IGNORE`](https://medium.com/legacy-systems-diary/things-to-avoid-episode-1-insert-ignore-535b4c24406b)\n* `select_star`: reports [queries using `SELECT *`](https://github.com/jarulraj/sqlcheck/blob/master/docs/query/3001.md)\n* `having_clause`: reports [queries using `HAVING` clause](https://github.com/jarulraj/sqlcheck/blob/master/docs/query/3012.md)\n* `high_offset_selects`: report [SELECT queries using high OFFSET](https://www.percona.com/blog/2008/09/24/four-ways-to-optimize-paginated-displays/)\n\n### Additional checks performed on tables data\n\n> You need to use `--analyze-data` command line switch. Please note that these checks will query your tables. **These checks can take a while if queried columns are not indexed**.\n\n* `data_too_old`: reports tables that have really old data, maybe it's worth checking if such long data retention is actually needed (**defaults to three months threshold**, can be customized via `INDEX_DIGEST_DATA_TOO_OLD_THRESHOLD_DAYS` env variable)\n* `data_not_updated_recently`: reports tables that were not updated recently, check if it should be up-to-date (**defaults a month threshold**, can be customized via `INDEX_DIGEST_DATA_NOT_UPDATED_RECENTLY_THRESHOLD_DAYS` env variable)\n\n### Additional checks performed across database on the current MySQL server\n\n> You need to use `--check-empty-databases` command line switch.\n\n* `empty_database`: reports databases that have no `BASE TABLE` tables (as provided by `information_schema.TABLES`)\n\n## An example report\n\n```sql\n$ index_digest mysql://index_digest:qwerty@localhost/index_digest --sql-log sql/0002-not-used-indices-log \n------------------------------------------------------------\nFound 85 issue(s) to report for \"index_digest\" database\n------------------------------------------------------------\nMySQL v5.7.21 at debian\nindex-digest v1.0.0\n------------------------------------------------------------\nredundant_indices \u2192 table affected: 0004_id_foo\n\n\u2717 \"idx\" index can be removed as redundant (covered by \"PRIMARY\")\n\n - redundant: UNIQUE KEY idx (id, foo)\n - covered_by: PRIMARY KEY (id, foo)\n - schema: CREATE TABLE `0004_id_foo` (\n `id` int(9) NOT NULL AUTO_INCREMENT,\n `foo` varbinary(16) NOT NULL DEFAULT '',\n PRIMARY KEY (`id`,`foo`),\n UNIQUE KEY `idx` (`id`,`foo`)\n ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n - table_data_size_mb: 0.015625\n - table_index_size_mb: 0.015625\n\n------------------------------------------------------------\nredundant_indices \u2192 table affected: 0004_id_foo_bar\n\n\u2717 \"idx_foo\" index can be removed as redundant (covered by \"idx_foo_bar\")\n\n - redundant: KEY idx_foo (foo)\n - covered_by: KEY idx_foo_bar (foo, bar)\n - schema: CREATE TABLE `0004_id_foo_bar` (\n `id` int(9) NOT NULL AUTO_INCREMENT,\n `foo` varbinary(16) NOT NULL DEFAULT '',\n `bar` varbinary(16) NOT NULL DEFAULT '',\n PRIMARY KEY (`id`),\n KEY `idx_foo` (`foo`),\n KEY `idx_foo_bar` (`foo`,`bar`),\n KEY `idx_id_foo` (`id`,`foo`)\n ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n - table_data_size_mb: 0.015625\n - table_index_size_mb: 0.046875\n\n------------------------------------------------------------\nmissing_primary_index \u2192 table affected: 0034_querycache\n\n\u2717 \"0034_querycache\" table does not have any primary or unique index\n\n - schema: CREATE TABLE `0034_querycache` (\n `qc_type` varbinary(32) NOT NULL,\n `qc_value` int(10) unsigned NOT NULL DEFAULT '0',\n `qc_namespace` int(11) NOT NULL DEFAULT '0',\n `qc_title` varchar(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL DEFAULT '',\n KEY `qc_type` (`qc_type`,`qc_value`)\n ) ENGINE=InnoDB DEFAULT CHARSET=utf8\n\n------------------------------------------------------------\ntest_tables \u2192 table affected: 0075_some_guy_test_table\n\n\u2717 \"0075_some_guy_test_table\" seems to be a test table\n\n - schema: CREATE TABLE `0075_some_guy_test_table` (\n `id` int(9) NOT NULL AUTO_INCREMENT,\n `name` varchar(255) NOT NULL,\n PRIMARY KEY (`id`)\n ) ENGINE=InnoDB DEFAULT CHARSET=utf8\n\n------------------------------------------------------------\nsingle_column \u2192 table affected: 0074_bag_of_ints\n\n\u2717 \"0074_bag_of_ints\" has just a single column\n\n - schema: CREATE TABLE `0074_bag_of_ints` (\n `id` int(9) NOT NULL AUTO_INCREMENT,\n PRIMARY KEY (`id`)\n ) ENGINE=InnoDB DEFAULT CHARSET=utf8\n\n------------------------------------------------------------\nempty_tables \u2192 table affected: 0089_empty_table\n\n\u2717 \"0089_empty_table\" table has no rows, is it really needed?\n\n - schema: CREATE TABLE `0089_empty_table` (\n `id` int(9) NOT NULL AUTO_INCREMENT,\n PRIMARY KEY (`id`)\n ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n\n------------------------------------------------------------\ngeneric_primary_key \u2192 table affected: 0094_generic_primary_key\n\n\u2717 \"0094_generic_primary_key\" has a primary key called id, use a more meaningful name\n\n - schema: CREATE TABLE `0094_generic_primary_key` (\n `id` int(9) NOT NULL AUTO_INCREMENT,\n `foo` varchar(16) NOT NULL DEFAULT '',\n PRIMARY KEY (`id`)\n ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n\n------------------------------------------------------------\nuse_innodb \u2192 table affected: 0036_use_innodb_myisam\n\n\u2717 \"0036_use_innodb_myisam\" uses MyISAM storage engine\n\n - schema: CREATE TABLE `0036_use_innodb_myisam` (\n `item_id` int(9) NOT NULL AUTO_INCREMENT,\n `foo` int(8) DEFAULT NULL,\n PRIMARY KEY (`item_id`)\n ) ENGINE=MyISAM DEFAULT CHARSET=latin1\n - engine: MyISAM\n\n------------------------------------------------------------\nnot_used_indices \u2192 table affected: 0002_not_used_indices\n\n\u2717 \"test_id_idx\" index was not used by provided queries\n\n - not_used_index: KEY test_id_idx (test, id)\n\n------------------------------------------------------------\nnot_used_tables \u2192 table affected: 0020_big_table\n\n\u2717 \"0020_big_table\" table was not used by provided queries\n\n - schema: CREATE TABLE `0020_big_table` (\n `id` int(9) NOT NULL AUTO_INCREMENT,\n `val` int(9) NOT NULL,\n `text` char(5) NOT NULL,\n PRIMARY KEY (`id`),\n KEY `text_idx` (`text`)\n ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8\n - table_size_mb: 5.03125\n - rows_estimated: 100405\n\n------------------------------------------------------------\ninsert_ignore \u2192 table affected: 0070_insert_ignore\n\n\u2717 \"INSERT IGNORE INTO `0070_insert_ignore` VALUES (9,...\" query uses a risky INSERT IGNORE\n\n - query: INSERT IGNORE INTO `0070_insert_ignore` VALUES (9, '123', '2017-01-01');\n - schema: CREATE TABLE `0070_insert_ignore` (\n `id` int(9) NOT NULL,\n `text` char(5) NOT NULL,\n `time` datetime DEFAULT NULL,\n UNIQUE KEY `id` (`id`)\n ) ENGINE=InnoDB DEFAULT CHARSET=utf8\n\n------------------------------------------------------------\nnon_utf_columns \u2192 table affected: 0032_latin1_table\n\n\u2717 \"name\" text column has \"latin1\" character set defined\n\n - column: name\n - column_character_set: latin1\n - column_collation: latin1_swedish_ci\n - schema: CREATE TABLE `0032_latin1_table` (\n `item_id` int(9) NOT NULL AUTO_INCREMENT,\n `name` varchar(255) DEFAULT NULL,\n `utf8_column` varchar(255) CHARACTER SET utf8 COLLATE utf8_polish_ci NOT NULL,\n `ucs2_column` varchar(255) CHARACTER SET ucs2 DEFAULT NULL,\n `utf8mb4_column` varchar(255) CHARACTER SET utf8mb4 DEFAULT NULL,\n `utf16_column` varchar(255) CHARACTER SET utf16 DEFAULT NULL,\n `utf32_column` varchar(255) CHARACTER SET utf32 DEFAULT NULL,\n `binary_column` varbinary(255) DEFAULT NULL,\n `latin_blob` blob,\n PRIMARY KEY (`item_id`)\n ) ENGINE=InnoDB DEFAULT CHARSET=latin1\n\n------------------------------------------------------------\n\n(...)\n\n------------------------------------------------------------\nqueries_using_filesort \u2192 table affected: 0020_big_table\n\n\u2717 \"SELECT val, count(*) FROM 0020_big_table WHERE id ...\" query used filesort\n\n - query: SELECT val, count(*) FROM 0020_big_table WHERE id BETWEEN 10 AND 20 GROUP BY val\n - explain_extra: Using where; Using temporary; Using filesort\n - explain_rows: 11\n - explain_filtered: None\n - explain_key: PRIMARY\n\n------------------------------------------------------------\nqueries_using_temporary \u2192 table affected: 0020_big_table\n\n\u2717 \"SELECT val, count(*) FROM 0020_big_table WHERE id ...\" query used temporary\n\n - query: SELECT val, count(*) FROM 0020_big_table WHERE id BETWEEN 10 AND 20 GROUP BY val\n - explain_extra: Using where; Using temporary; Using filesort\n - explain_rows: 11\n - explain_filtered: None\n - explain_key: PRIMARY\n\n------------------------------------------------------------\nqueries_using_full_table_scan \u2192 table affected: 0020_big_table\n\n\u2717 \"SELECT * FROM 0020_big_table\" query triggered full table scan\n\n - query: SELECT * FROM 0020_big_table\n - explain_rows: 9041\n\n------------------------------------------------------------\nselects_with_like \u2192 table affected: 0020_big_table\n\n\u2717 \"SELECT * FROM 0020_big_table WHERE text LIKE '%00'\" query uses LIKE with left-most wildcard\n\n - query: SELECT * FROM 0020_big_table WHERE text LIKE '%00'\n - explain_extra: Using where\n - explain_rows: 100623\n\n------------------------------------------------------------\nselect_star \u2192 table affected: bar\n\n\u2717 \"SELECT t.* FROM bar AS t\" query uses SELECT *\n\n - query: SELECT t.* FROM bar AS t;\n\n------------------------------------------------------------\nhaving_clause \u2192 table affected: sales\n\n\u2717 \"SELECT s.cust_id,count(s.cust_id) FROM SH.sales s ...\" query uses HAVING clause\n\n - query: SELECT s.cust_id,count(s.cust_id) FROM SH.sales s GROUP BY s.cust_id HAVING s.cust_id != '1660' AND s.cust_id != '2'\n\n(...)\n\n------------------------------------------------------------\nlow_cardinality_index \u2192 table affected: 0020_big_table\n\n\u2717 \"num_idx\" index on \"num\" column has low cardinality, check if it is needed\n\n - column_name: num\n - index_name: num_idx\n - index_cardinality: 2\n - schema: CREATE TABLE `0020_big_table` (\n `item_id` int(9) NOT NULL AUTO_INCREMENT,\n `val` int(9) NOT NULL,\n `text` char(5) NOT NULL,\n `num` int(3) NOT NULL,\n PRIMARY KEY (`item_id`),\n KEY `text_idx` (`text`),\n KEY `num_idx` (`num`)\n ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8\n - value_usage: 33.24788541334185\n\n(...)\n\n------------------------------------------------------------\ndata_too_old \u2192 table affected: 0028_data_too_old\n\n\u2717 \"0028_data_too_old\" has rows added 184 days ago, consider changing retention policy\n\n - diff_days: 184\n - data_since: 2017-08-17 12:03:44\n - data_until: 2018-02-17 12:03:44\n - date_column_name: timestamp\n - schema: CREATE TABLE `0028_data_too_old` (\n `item_id` int(8) unsigned NOT NULL AUTO_INCREMENT,\n `cnt` int(8) unsigned NOT NULL,\n `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\n PRIMARY KEY (`item_id`)\n ) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1\n - rows: 4\n - table_size_mb: 0.015625\n\n------------------------------------------------------------\ndata_not_updated_recently \u2192 table affected: 0028_data_not_updated_recently\n\n\u2717 \"0028_data_not_updated_recently\" has the latest row added 40 days ago, consider checking if it should be up-to-date\n\n - diff_days: 40\n - data_since: 2017-12-29 12:03:44\n - data_until: 2018-01-08 12:03:44\n - date_column_name: timestamp\n - schema: CREATE TABLE `0028_data_not_updated_recently` (\n `item_id` int(8) unsigned NOT NULL AUTO_INCREMENT,\n `cnt` int(8) unsigned NOT NULL,\n `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\n PRIMARY KEY (`item_id`)\n ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1\n - rows: 3\n - table_size_mb: 0.015625\n\n------------------------------------------------------------\nhigh_offset_selects \u2192 table affected: page\n\n\u2717 \"SELECT /* CategoryPaginationViewer::processSection...\" query uses too high offset impacting the performance\n\n - query: SELECT /* CategoryPaginationViewer::processSection */ page_namespace,page_title,page_len,page_is_redirect,cl_sortkey_prefix FROM `page` INNER JOIN `categorylinks` FORCE INDEX (cl_sortkey) ON ((cl_from = page_id)) WHERE cl_type = 'page' AND cl_to = 'Spotify/Song' ORDER BY cl_sortkey LIMIT 927600,200\n - limit: 200\n - offset: 927600\n\n------------------------------------------------------------\nempty_database \u2192 table affected: index_digest_empty\n\n\u2717 \"index_digest_empty\" database has no tables\n\n------------------------------------------------------------\nQueries performed: 100\n```\n\n## Success stories\n\n> Want to add your entry here? Submit a pull request\n\n* By running `index-digest` at [Wikia](http://wikia.com) on shared database clusters (including tables storing ~450 mm of rows with 300+ GiB of data) we were able to [reclaim around 1.25 TiB of MySQL storage space across all replicas](https://medium.com/legacy-systems-diary/linting-your-database-schema-cd8947835a52).\n\n## Read more\n\n* [Percona Database Performance Blog](https://www.percona.com/blog/)\n* [High Performance MySQL, 3rd Edition by Vadim Tkachenko, Peter Zaitsev, Baron Schwartz](https://www.safaribooksonline.com/library/view/high-performance-mysql/9781449332471/ch05.html)\n* [Percona | Indexing 101: Optimizing MySQL queries on a single table](https://www.percona.com/blog/2015/04/27/indexing-101-optimizing-mysql-queries-on-a-single-table/)\n* [Percona | `pt-index-usage`](https://www.percona.com/doc/percona-toolkit/LATEST/pt-index-usage.html) / [find unused indexes](https://www.percona.com/blog/2012/06/30/find-unused-indexes/)\n\n### Slides\n\n* [Percona | MySQL Indexing: Best Practices](https://www.percona.com/files/presentations/WEBINAR-MySQL-Indexing-Best-Practices.pdf)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Analyses your database queries and schema and suggests indices and schema improvements",
"version": "1.6.0",
"project_urls": {
"Homepage": "https://github.com/macbre/index-digest"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ca5e34f2a729bc714cba78258af872753257bc9a257c7142d254770e6f872f8a",
"md5": "6db4a0aa24955d31c33e399c05bd5366",
"sha256": "1a996512ae34ccc6d28f9d3e244982493191392090fb6d864330767e8dabc64d"
},
"downloads": -1,
"filename": "indexdigest-1.6.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6db4a0aa24955d31c33e399c05bd5366",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 73556,
"upload_time": "2023-09-13T11:27:15",
"upload_time_iso_8601": "2023-09-13T11:27:15.630058Z",
"url": "https://files.pythonhosted.org/packages/ca/5e/34f2a729bc714cba78258af872753257bc9a257c7142d254770e6f872f8a/indexdigest-1.6.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f1873466edcf6ceb82147a98978201b039d94838c524c7dff982fd8bf4fd5b3f",
"md5": "f58ece2292f116f474d9b3be867fac4a",
"sha256": "8817f2a0313d669161fb194cb30950ee46597e9358fe822d1a014ca31c45edfa"
},
"downloads": -1,
"filename": "indexdigest-1.6.0.tar.gz",
"has_sig": false,
"md5_digest": "f58ece2292f116f474d9b3be867fac4a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 36578,
"upload_time": "2023-09-13T11:27:16",
"upload_time_iso_8601": "2023-09-13T11:27:16.850799Z",
"url": "https://files.pythonhosted.org/packages/f1/87/3466edcf6ceb82147a98978201b039d94838c524c7dff982fd8bf4fd5b3f/indexdigest-1.6.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-13 11:27:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "macbre",
"github_project": "index-digest",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "indexdigest"
}