.. contents::
Compute APDEX from Apache-style logs.
Overview
========
Parses Apache-style logs and generates several statistics intended for a
website developer audience:
- APDEX (Application Performance inDEX, see http://www.apdex.org) ratio
(plotted)
Because you want to know how satisfied your users are.
- hit count (plotted)
Because achieving 100% APDEX is easy when there is nobody around.
- HTTP status codes, with optional detailed output of the most frequent URLs
per error status code, along with their most frequent referers
Because your forgot to update a link to that conditionally-used browser
compatibility javascript you renamed.
- Hottest pages (pages which use rendering time the most)
Because you want to know where to invest time to get highest user experience
improvement.
- ERP5 sites: per-module statistics, with module and document views separated
Because module and document types are not born equal in usage patterns.
Some parsing performance figures:
On a 2.3Ghz Corei5, apachedex achieves 97000 lines/s (
pypy-c-jit-62994-bd32583a3f11-linux64) and 43000 lines/s (CPython 2.7).
Those were measures on a 3000000-hits logfile, with 3 --skip-base, 1
--erp5-base, 3 --base and --default set. --\*base values were similar in
simplicity to the ones provided in examples below.
What APacheDEX is not
=====================
APacheDEX does not produce website audience statistics like AWStats, Google
Analytics (etc) could do.
APacheDEX does not monitor website availability & resource usage like Zabbix,
Cacti, Ganglia, Nagios (etc) could do.
Requirements
============
Dependencies
------------
As such, apachedex has no strict dependencies outside of standard python 2.7
installation.
But generated output needs a few javascript files which come from other
projects:
- jquery.js
- jquery.flot.js
- jquery.flot.time.js (official flot plugin)
- jquery.flot.axislabels.js (third-party flot plugin)
If you installed apachedex (using an egg or with a distribution's package) you
should have them already.
If you are running from repository, you need to fetch them first::
python setup.py deps
Input
-----
All default "combined" log format fields are supported (more can easily be
added), plus %D.
Mandatory fields are (in any order) `%t`, `%r` (for request's URL), `%>s`,
`%{Referer}i`, `%D`. Just tell apachedex the value from your apache log
configuration (see `--logformat` argument documentation).
Input files may be provided uncompressed or compressed in:
- bzip
- gzip2
- xz
Input filename "-" is understood as stdin.
Output
------
The output is HTML + CSS + JS, so you need a web browser to read it.
Output filename "-" is understood as stdout.
Usage
=====
A few usage examples. See embedded help (`-h`/`--help`) for further options.
Most basic usage::
apachedex --default website access.log
Generate stand-alone output (suitable for inclusion in a mail, for example)::
apachedex --default website --js-embed access.log --out attachment.html
A log file with requests for 2 websites for which individual stats are
desired, and hits outside those base urls are ignored::
apachedex --base "/site1(/|$|\?)" "/site2(/|$|\?)"
A log file with a site section to ignore. Order does not matter::
apachedex --skip-base "/ignored(/|$|\?)" --default website
A mix of both above examples. Order matters !::
apachedex --skip-base "/site1/ignored(/|$|\?)" \
--base "/site1(/|$|\?)" "/site2(/|$|\?)"
Matching non-ASCII urls works by using urlencoded strings::
apachedex --base "/%E6%96%87%E5%AD%97%E5%8C%96%E3%81%91(/|$|\\?)" access.log
Naming websites so that report looks less intimidating, by interleaving
"+"-prefixed titles with regexes (title must be just before regex)::
apachedex --default "Public website" --base "+Back office" \
"/backoffice(/|$|\\?)" "+User access" "/secure(/|$|\\?)" access.log
Saving the result of an analysis for faster reuse::
apachedex --default foo --format json --out save_state.json --period day \
access.log
Although not required, it is strongly advised to provide `--period` argument,
as mixing states saved with different periods (fixed or auto-detected from
data) give hard-to-read results and can cause problems if loaded data gets
converted to a larger period.
Continuing a saved analysis, updating collected data::
apachedex --default foo --format json --state-file save_state.json \
--out save_state.json --period day access.2.log
Generating HTML output from two state files, aggregating their content
without parsing more logs::
apachedex --default foo --state-file save_state.json save_state.2.json \
--period day --out index.html
Configuration files
===================
Providing a filename prefixed by "@" puts the content of that file in place of
that argument, recursively. Each file is loaded relative to the containing
directory of referencing file, or current working directory for command line.
- foo/dev.cfg::
--error-detail
@site.cfg
--stats
- foo/site.cfg::
--default Front-office
# This is a comment
--prefix "+Back office" "/back(/|$|\?)" # This is another comment
--skip-prefix "/baz/ignored(/|$|\?)" --prefix +Something "/baz(/|$|\?)"
- command line::
apachedex --skip-base "/ignored(/|$|\?)" @foo/dev.cfg --out index.html \
access.log
This is equivalent to::
apachedex --skip-base "/ignored(/|$|\?)" --error-detail \
--default Front-office --prefix "+Back office" "/back(/|$|\?)" \
--skip-prefix "/baz/ignored(/|$|\?)" --prefix +Something "/baz(/|$|\?)" \
--stats --out index.html access.log
Portability note: the use of paths containing directory elements inside
configuration files is discouraged, as it's not portable. This may change
later (ex: deciding that import paths are URLs and applying their rules).
Periods
=======
When providing the `--period` argument, two related settings are affected:
- the period represented by each point in a graph (most important for the
hit graph, as it represents the number of hits per such period)
- the period represented by each column in per-period tables (status codes
per date, hits per day...)
Also, when `--period` is not provided, apachedex uses a threshold to tell
when to switch to the larger period. That period was chosen to correspond
to 200 graph points, which represents a varying number of table columns.
.. table :: Details of `--period` argument
=========== ========== ========== ============== =========================
--period graph table to next period columns until next period
=========== ========== ========== ============== =========================
quarterhour minute 15 minutes 200 minutes 8 (3.3 hours)
halfday 30 minutes 12 hours 100 hours 9 (4.1 days)
day hour day 200 hours 9 (8.3 days)
week 6 hours week 1200 hours 8 (7.1 weeks)
month day month 5000 hours 7 (~6.7 months)
quarter 7 days quarter 1400 days 16 (15.3 weeks)
year month year (n/a) (infinity)
=========== ========== ========== ============== =========================
"7 days" period used in `--period quarter` are not weeks strictly
speaking: a week starts a monday/sunday, pendending on the locale.
"7 days" start on the first day of the year, for simplicity - and
performance. "week" used for `--period week` are really weeks, although
starting on monday independently from locale.
When there are no hits for more than a graph period, placeholders are
generated at 0 hit value (which is the reality) and 100% apdex (this is
arbitrary). Those placeholders only affect graphs, and do not affect
averages nor table content.
Because not all graph periods are actually equal in length (because of
leap seconds, DST, leap years, year containing a non-integer number of
weeks), some hit graph points are artificially corrected against these
effects. Here also, the correction only affects graphs, neither averages
nor table content. For example, on non-leap years, the last year's
"7 days" period lasts a single day. Ploted hit count is then multiplied
by 7 (and 3.5 on leap years).
Performance
===========
For better performance...
- pipe decompressed files to apachedex instead of having apachedex decompress
files itself::
bzcat access.log.bz2 | apachedex [...] -
- when letting apachedex decide statistic granularity with multiple log files,
provide earliest and latest log files first (whatever order) so apachedex can
adapt its data structure to analysed time range before there is too much
data::
apachedex [...] access.log.1.gz access.log.99.gz access.log.2.gz \
access.log.3.gz [...] access.98.gz
- parse log files in parallel processes, saving analysis output and aggregating
them in the end::
for LOG in access*.log; do
apachedex "$@" --format json --out "$LOG.json" "$LOG" &
done
wait
apachedex "$@" --out access.html --state-file access.*.json
If you have bash and have an xargs implementation supporting `-P`, you may
want to use `parallel_parse.sh` available in source distribution or from
repository.
Notes
=====
Loading saved states generated with different sets of parameters is not
prevented, but can produce nonsense/unreadable results. Or it can save the day
if you do want to mix different parameters (ex: you have some logs generated
with %T, others with %D).
It is unclear how saved state format will evolve. Be prepared to have
to regenerate saved states when you upgrade APacheDEX.
Raw data
{
"_id": null,
"home_page": "https://lab.nexedi.com/nexedi/apachedex.git",
"name": "APacheDEX",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Vincent Pelletier",
"author_email": "vincent@nexedi.com",
"download_url": "https://files.pythonhosted.org/packages/1b/03/77152347c135893359b1ff28549b7cb0a7e27a384900c77497e749152edb/APacheDEX-2.1.tar.gz",
"platform": "any",
"description": ".. contents::\n\nCompute APDEX from Apache-style logs.\n\nOverview\n========\n\nParses Apache-style logs and generates several statistics intended for a\nwebsite developer audience:\n\n- APDEX (Application Performance inDEX, see http://www.apdex.org) ratio\n (plotted)\n\n Because you want to know how satisfied your users are.\n\n- hit count (plotted)\n\n Because achieving 100% APDEX is easy when there is nobody around.\n\n- HTTP status codes, with optional detailed output of the most frequent URLs\n per error status code, along with their most frequent referers\n\n Because your forgot to update a link to that conditionally-used browser\n compatibility javascript you renamed.\n\n- Hottest pages (pages which use rendering time the most)\n\n Because you want to know where to invest time to get highest user experience\n improvement.\n\n- ERP5 sites: per-module statistics, with module and document views separated\n\n Because module and document types are not born equal in usage patterns.\n\nSome parsing performance figures:\n\nOn a 2.3Ghz Corei5, apachedex achieves 97000 lines/s (\npypy-c-jit-62994-bd32583a3f11-linux64) and 43000 lines/s (CPython 2.7).\nThose were measures on a 3000000-hits logfile, with 3 --skip-base, 1\n--erp5-base, 3 --base and --default set. --\\*base values were similar in\nsimplicity to the ones provided in examples below.\n\nWhat APacheDEX is not\n=====================\n\nAPacheDEX does not produce website audience statistics like AWStats, Google\nAnalytics (etc) could do.\n\nAPacheDEX does not monitor website availability & resource usage like Zabbix,\nCacti, Ganglia, Nagios (etc) could do.\n\nRequirements\n============\n\nDependencies\n------------\n\nAs such, apachedex has no strict dependencies outside of standard python 2.7\ninstallation.\nBut generated output needs a few javascript files which come from other\nprojects:\n\n- jquery.js\n\n- jquery.flot.js\n\n- jquery.flot.time.js (official flot plugin)\n\n- jquery.flot.axislabels.js (third-party flot plugin)\n\nIf you installed apachedex (using an egg or with a distribution's package) you\nshould have them already.\nIf you are running from repository, you need to fetch them first::\n\n python setup.py deps\n\nInput\n-----\n\nAll default \"combined\" log format fields are supported (more can easily be\nadded), plus %D.\n\nMandatory fields are (in any order) `%t`, `%r` (for request's URL), `%>s`,\n`%{Referer}i`, `%D`. Just tell apachedex the value from your apache log\nconfiguration (see `--logformat` argument documentation).\n\nInput files may be provided uncompressed or compressed in:\n\n- bzip\n\n- gzip2\n\n- xz\n\nInput filename \"-\" is understood as stdin.\n\nOutput\n------\n\nThe output is HTML + CSS + JS, so you need a web browser to read it.\n\nOutput filename \"-\" is understood as stdout.\n\nUsage\n=====\n\nA few usage examples. See embedded help (`-h`/`--help`) for further options.\n\nMost basic usage::\n\n apachedex --default website access.log\n\nGenerate stand-alone output (suitable for inclusion in a mail, for example)::\n\n apachedex --default website --js-embed access.log --out attachment.html\n\nA log file with requests for 2 websites for which individual stats are\ndesired, and hits outside those base urls are ignored::\n\n apachedex --base \"/site1(/|$|\\?)\" \"/site2(/|$|\\?)\"\n\nA log file with a site section to ignore. Order does not matter::\n\n apachedex --skip-base \"/ignored(/|$|\\?)\" --default website\n\nA mix of both above examples. Order matters !::\n\n apachedex --skip-base \"/site1/ignored(/|$|\\?)\" \\\n --base \"/site1(/|$|\\?)\" \"/site2(/|$|\\?)\"\n\nMatching non-ASCII urls works by using urlencoded strings::\n\n apachedex --base \"/%E6%96%87%E5%AD%97%E5%8C%96%E3%81%91(/|$|\\\\?)\" access.log\n\nNaming websites so that report looks less intimidating, by interleaving\n\"+\"-prefixed titles with regexes (title must be just before regex)::\n\n apachedex --default \"Public website\" --base \"+Back office\" \\\n \"/backoffice(/|$|\\\\?)\" \"+User access\" \"/secure(/|$|\\\\?)\" access.log\n\nSaving the result of an analysis for faster reuse::\n\n apachedex --default foo --format json --out save_state.json --period day \\\n access.log\n\nAlthough not required, it is strongly advised to provide `--period` argument,\nas mixing states saved with different periods (fixed or auto-detected from\ndata) give hard-to-read results and can cause problems if loaded data gets\nconverted to a larger period.\n\nContinuing a saved analysis, updating collected data::\n\n apachedex --default foo --format json --state-file save_state.json \\\n --out save_state.json --period day access.2.log\n\nGenerating HTML output from two state files, aggregating their content\nwithout parsing more logs::\n\n apachedex --default foo --state-file save_state.json save_state.2.json \\\n --period day --out index.html\n\n\nConfiguration files\n===================\n\nProviding a filename prefixed by \"@\" puts the content of that file in place of\nthat argument, recursively. Each file is loaded relative to the containing\ndirectory of referencing file, or current working directory for command line.\n\n- foo/dev.cfg::\n\n --error-detail\n @site.cfg\n --stats\n\n- foo/site.cfg::\n\n --default Front-office\n # This is a comment\n --prefix \"+Back office\" \"/back(/|$|\\?)\" # This is another comment\n --skip-prefix \"/baz/ignored(/|$|\\?)\" --prefix +Something \"/baz(/|$|\\?)\"\n\n- command line::\n\n apachedex --skip-base \"/ignored(/|$|\\?)\" @foo/dev.cfg --out index.html \\\n access.log\n\nThis is equivalent to::\n\n apachedex --skip-base \"/ignored(/|$|\\?)\" --error-detail \\\n --default Front-office --prefix \"+Back office\" \"/back(/|$|\\?)\" \\\n --skip-prefix \"/baz/ignored(/|$|\\?)\" --prefix +Something \"/baz(/|$|\\?)\" \\\n --stats --out index.html access.log\n\nPortability note: the use of paths containing directory elements inside\nconfiguration files is discouraged, as it's not portable. This may change\nlater (ex: deciding that import paths are URLs and applying their rules).\n\nPeriods\n=======\n\nWhen providing the `--period` argument, two related settings are affected:\n\n- the period represented by each point in a graph (most important for the\n hit graph, as it represents the number of hits per such period)\n\n- the period represented by each column in per-period tables (status codes\n per date, hits per day...)\n\nAlso, when `--period` is not provided, apachedex uses a threshold to tell\nwhen to switch to the larger period. That period was chosen to correspond\nto 200 graph points, which represents a varying number of table columns.\n\n.. table :: Details of `--period` argument\n\n =========== ========== ========== ============== =========================\n --period graph table to next period columns until next period\n =========== ========== ========== ============== =========================\n quarterhour minute 15 minutes 200 minutes 8 (3.3 hours)\n halfday 30 minutes 12 hours 100 hours 9 (4.1 days)\n day hour day 200 hours 9 (8.3 days)\n week 6 hours week 1200 hours 8 (7.1 weeks)\n month day month 5000 hours 7 (~6.7 months)\n quarter 7 days quarter 1400 days 16 (15.3 weeks)\n year month year (n/a) (infinity)\n =========== ========== ========== ============== =========================\n\n\"7 days\" period used in `--period quarter` are not weeks strictly\nspeaking: a week starts a monday/sunday, pendending on the locale.\n\"7 days\" start on the first day of the year, for simplicity - and\nperformance. \"week\" used for `--period week` are really weeks, although\nstarting on monday independently from locale.\n\nWhen there are no hits for more than a graph period, placeholders are\ngenerated at 0 hit value (which is the reality) and 100% apdex (this is\narbitrary). Those placeholders only affect graphs, and do not affect\naverages nor table content.\n\nBecause not all graph periods are actually equal in length (because of\nleap seconds, DST, leap years, year containing a non-integer number of\nweeks), some hit graph points are artificially corrected against these\neffects. Here also, the correction only affects graphs, neither averages\nnor table content. For example, on non-leap years, the last year's\n\"7 days\" period lasts a single day. Ploted hit count is then multiplied\nby 7 (and 3.5 on leap years).\n\nPerformance\n===========\n\nFor better performance...\n\n- pipe decompressed files to apachedex instead of having apachedex decompress\n files itself::\n\n bzcat access.log.bz2 | apachedex [...] -\n\n- when letting apachedex decide statistic granularity with multiple log files,\n provide earliest and latest log files first (whatever order) so apachedex can\n adapt its data structure to analysed time range before there is too much\n data::\n\n apachedex [...] access.log.1.gz access.log.99.gz access.log.2.gz \\\n access.log.3.gz [...] access.98.gz\n\n- parse log files in parallel processes, saving analysis output and aggregating\n them in the end::\n\n for LOG in access*.log; do\n apachedex \"$@\" --format json --out \"$LOG.json\" \"$LOG\" &\n done\n wait\n apachedex \"$@\" --out access.html --state-file access.*.json\n\n If you have bash and have an xargs implementation supporting `-P`, you may\n want to use `parallel_parse.sh` available in source distribution or from\n repository.\n\nNotes\n=====\n\nLoading saved states generated with different sets of parameters is not\nprevented, but can produce nonsense/unreadable results. Or it can save the day\nif you do want to mix different parameters (ex: you have some logs generated\nwith %T, others with %D).\n\nIt is unclear how saved state format will evolve. Be prepared to have\nto regenerate saved states when you upgrade APacheDEX.\n",
"bugtrack_url": null,
"license": "GPL 2+",
"summary": "Compute APDEX from Apache-style logs.",
"version": "2.1",
"project_urls": {
"Homepage": "https://lab.nexedi.com/nexedi/apachedex.git"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1b0377152347c135893359b1ff28549b7cb0a7e27a384900c77497e749152edb",
"md5": "b30a689d33e02951cb74d6719fad8fb0",
"sha256": "4907dd5502c1c2b68edec27b9692f7ff995b86f904cbda23f08c4e252597dc73"
},
"downloads": -1,
"filename": "APacheDEX-2.1.tar.gz",
"has_sig": false,
"md5_digest": "b30a689d33e02951cb74d6719fad8fb0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 182419,
"upload_time": "2024-05-07T06:22:37",
"upload_time_iso_8601": "2024-05-07T06:22:37.415470Z",
"url": "https://files.pythonhosted.org/packages/1b/03/77152347c135893359b1ff28549b7cb0a7e27a384900c77497e749152edb/APacheDEX-2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-07 06:22:37",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "apachedex"
}