cryptonita

Name	cryptonita JSON
Version	0.6.3 JSON
	download
home_page	None
Summary	Cryptanalysis swiss army knife
upload_time	2024-04-25 14:54:56
maintainer	None
docs_url	None
author	Di Paola Martin
requires_python	>=3.3
license	GNU GPLv3
keywords	crypto cryptography crypto-analysis
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ``cryptonita`` - Cryptanalysis Swiss Army Knife
===============================================

`cryptonita <https://pypi.org/project/cryptonita/>`__ is a set of
building blocks to create automated crypto-attacks.

You may not find the advanced attack implemented here (yet) but I hope
that this building blocks or primitives can help you in your journey.

Without more, let’s put our hands on and break the famous `Vigenere
cipher <https://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher>`__.

Install
-------

::

   pip install cryptonita          # lite version
   pip install cryptonita[full]    # full version

Tutorial - Break a xor Vigenere cipher
--------------------------------------

The `Vigenere
cipher <https://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher>`__ was once
the most secure cipher. It was thought that it was unbreakable…

Let’s put under test that statement and learn about
`cryptonita <https://pypi.org/project/cryptonita/>`__ along the journey!

   Note: the following README is also an automated test for the
   ``cryptonita`` lib thanks to
   `byexample <https://byexamples.github.io/byexample>`__.

Implement the cipher - Load the bytes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The building block in ``cryptonita`` is the *byte string*: a finite
immutable sequence of bytes.

In ``cryptonita`` we can create a *byte string* with the ``B`` function
and do any conversion needed:

.. code:: python

   >>> from cryptonita import B            # byexample: +timeout=10
   >>> B(u'from an unicode encoded text', encoding='utf-8')
   'from an unicode encoded text'

   >>> B([0x46, 0x72, 0x6f, 0x6d, 0x20, 0x6e, 0x75, 0x6d, 0x62, 0x65, 0x72, 0x73])
   'From numbers'

   >>> B('RnJvbSBiYXNlNjQ=', encoding=64)
   'From base64'

For our purposes of implementing a Vigenere cipher, let’s load some
plain text from a file:

.. code:: python

   >>> ptext = B(open('./test/ds/plaintext', 'rt').read())

   >>> ptext[:29]
   'Now that the party is jumping'

..

   For the full list of conversions see `cryptonita/conv.py’s
   ``as_bytes`` <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/conv.py>`__

Implement the cipher - Apply a xor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

First, we load our secret key in base 64 from the file. Notice how the
decoding from base 64 is made by ``B``:

.. code:: python

   >>> secret = B(open('./test/ds/secret', 'rt').read(), encoding=64)

The Vigenere cipher consists in xord the plaintext with the key. If the
plaintext is larger than the key, just repeat the key over and over.

``cryptonita`` can do exactly that:

.. code:: python

   >>> ctext = ptext ^ secret.inf()

   >>> ctext[:29].encode(64)
   b'OA4ZSRgEAAJBGgEJTBEXExoQTAUSVgsbBBwFDxE='

The ``inf()`` method tells that the ``secret`` string must be seen as an
“infinite sequence”, repeating the key over and over.

Then, the ``^`` just does the xor byte by byte.

   For the full list of operation on ``ImmutableByteString`` see
   `cryptonita/bytestrings.py’s
   ``ImmutableByteString`` <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/bytestrings.py>`__
   and the
   `mixins <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/mixins.py>`__

Breaking the cipher - Scoring the length of the key
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Vigenere was thought to be unbreakable because a priori is not possible
to know the length of the key.

However this was proved to be false.

In 1863, `Kasiski <https://en.wikipedia.org/wiki/Kasiski_examination>`__
came with a cleaver method to know the length of the key but it is quite
hard to make it right and faster (I’m still `working on
it <https://book-of-gehn.github.io/articles/2020/10/11/Kasiski-Test-Part-I.html>`__)

Modern and better approaches are the `Hamming
distance <https://en.wikipedia.org/wiki/Hamming_distance>`__ and the
`Index of
Coincidence <https://book-of-gehn.github.io/articles/2019/10/04/Index-of-Coincidence.html>`__

The idea is to assume that the key is of length L and then pick every
Lth byte of the ciphertext:

.. code:: python

   >>> L = 8 # totally arbitrary here
   >>> picked = ctext[::L]

..

   Note how the ``ImmutableByteString`` ciphertext supports indexing
   operation like any Python string.

Now we compute the Index of Coincidence (IC) of this picked string.

If the assumed length L is **not** the correct one, every picked byte
will be the xor of the plaintext with a different key byte and the whole
``picked`` string would like **random** and the IC will be very low.

On the other hand, if we guessed correctly the length L, **all** the
picked bytes will be the xord of the plaintext and the **same** key byte
and therefore will not look random. A high IC would be expected!

.. code:: python

   >>> from cryptonita.metrics import icoincidences
   >>> icoincidences(picked)
   0.02<...>

..

   See
   `cryptonita/scoring.py <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/scoring/score_funcs.py>`__
   and
   `cryptonita/metrics.py <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/metrics/__init__.py>`__

   I you want to know more about the Index of Coincidence see this `blog
   post <https://book-of-gehn.github.io/articles/2019/10/04/Index-of-Coincidence.html>`__
   about it and this `comparison with other
   methods <https://book-of-gehn.github.io/articles/2018/04/01/A-string-of-coincidences-is-not-a-coincidence.html>`__

Breaking the cipher - Guessing the length of the key
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A IC of 0.02 is too low. It seems that 8 is not the length of the key.

We could do a loop to try other lengths but ``cryptonita`` already has
that

.. code:: python

   >>> from cryptonita.scoring import scoring
   >>> from cryptonita.scoring import key_length_by_ic

   >>> gklength = scoring(
   ...                     ctext,
   ...                     space=range(5, 25),
   ...                     score_func=key_length_by_ic,
   ...                     min_score=0.025,
   ... )

Okay, what is that?

-  ``scoring`` does a brute force *attack* computing a *score function*
   testing every possible length from 5 to 25.
-  ``key_length_by_ic`` is a *scores* how good the tested length is. It
   puts a score between 0 (bad) and 1 (good) using the Index of
   Coincidence.

You may think that ``gklength`` is the **the** guessed key but in
cryptoanalysis you mostly never work with a *specific* value. You work
with a **set of possible values**.

.. code:: python

   >>> gklength
   {5: 0.02702702702702703,
    6: 0.027649769585253458,
    7: 0.04682040531097135,
    8: 0.02682701202590194,
    9: 0.025551684088269456,
    10: 0.025604551920341393,
    12: 0.038306451612903226,
    14: 0.03133903133903134,
    16: 0.028985507246376812,
    17: 0.02766798418972332,
    21: 0.032679738562091505,
    24: 0.041666666666666664}

In ``cryptonita`` we call these sets, these *guesses*, ``FuzzySet``.

   For more scoring functions see
   `cryptonita/scoring.py <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/scoring/score_funcs.py>`__

Breaking the cipher - A guess as a fuzzy set
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A guess or ``FuzzySet`` is a bunch of possible solutions, each with an
associated probability or score.

We can query then the most likely answer. In our case, the most likely
length of the key:

.. code:: python

   >>> gklength.most_likely()
   7

But the most likely may not necessary mean the correct answer. Instead,
you should work always with the fuzzy set to test all of them.

If the sets gets to large (and they will), you can cut them off,
dropping items with a probability lower than some threshold.

Here we say that any length with a lower probability of 0.01 should be
out:

.. code:: python

   >>> gklength.cut_off(0.03)
   >>> gklength
   {7 -> 0.0468, 24 -> 0.0417, 12 -> 0.0383, 21 -> 0.0327, 14 -> 0.0313}

..

   Take a look at the `documentation of
   ``FuzzySet`` <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/fuzzy_set.py>`__
   and optional a wiki about `fuzzy set
   theory <https://en.wikipedia.org/wiki/Fuzzy_set>`__.

Breaking the cipher - Chop the ciphertext into blocks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now the we have a set of possible lengths, here is the plan to crack the
cipher:

First, split the ciphertext into *blocks* of guessed length L:

.. code:: python

   >>> L = gklength.most_likely()
   >>> cblocks = ctext.nblocks(L)

::

   ciphertext:  ABCDEFGHIJKLMN
                 |   |    |  |
                 |   |    \  \___
                 |   |     \     \
   cblocks      ABCD  EFGH  IJKL  MN

Each first byte of those blocks are supposedly the result of xor the
plaintext with the same key byte. The same goes for the second byte of
each block and so on.

Second, because it is easier to have all the first bytes in one block,
all the second bytes in another block and so on, we want to *transpose*
the blocks:

.. code:: python

   >>> from cryptonita.conv import transpose
   >>> cblocks = transpose(cblocks, allow_holes=True)

::

    cblocks   --> transposed cblocks
     ABCD           AEIM
     EFGH           BFJN
     IJKL           CGK
     MN             DHL

Now, each block (or row) is a piece of plaintext encrypted with the same
single-byte key.

Let’s break it!

Breaking the cipher - Frequency attack
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We could test all the 256 possible byte keys by brute force but that’s
quite slow.

Rather we could do a *frequency attack* because the statistics of the
plaintext are leaked into the ciphertext.

``cryptonita`` already provides us with a very simple *model* of the
frequencies of the English plaintext: the famous *ETAOIN SHRDLU*.

.. code:: python

   >>> from cryptonita.scoring.freq import etaoin_shrdlu

If our ciphertext has the same distribution than the plaintext, at least
one of the most common bytes in the ciphertext should be one of the most
common bytes in the plaintext, encrypted of course.

Under this hypothesis ``freq_attack`` xor the top most common bytes in
the ciphertext with the most common bytes in plaintext according to the
model.

.. code:: python

   >>> most_common_pbytes = etaoin_shrdlu()
   >>> ntop_most_common_cbytes = 1

   >>> from cryptonita.attacks import freq_attack

   >>> freq_attack(cblocks[0], most_common_pbytes, ntop_most_common_cbytes)
   {'"': 0.07387790762504176,
    '$': 0.055504740275805896,
    '%': 0.0561520934139066,
    '2': 0.03178778752478832,
    '3': 0.10384587375686015,
    '5': 0.026296157563462763,
    '7': 0.07060615929878336,
    '8': 0.060837928943597436,
    '9': 0.0634364224946222,
    ':': 0.0342469273170487,
    '>': 0.03964865941609311,
    '?': 0.06072776315086166,
    'v': 0.17269159612928756}

In general, ``freq_attack`` cannot give us **the** byte key but it can
give use a *guess*: a fuzzy set of possible keys. This is a much shorted
list than 256!

But don’t claim victory yet. We broke only the first block
(``cblocks[0]``).

   More frequency models may be found at
   `cryptonita/scoring/freq.py <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/scoring/freq.py>`__

Breaking the cipher - Guess explosion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We need to call ``freq_attack`` for all the blocks:

.. code:: python

   >>> gbkeys = []
   >>> for c in cblocks:
   ...     gbkeys.append(freq_attack(c, most_common_pbytes, ntop_most_common_cbytes))

   >>> len(gbkeys)
   7

So we have 7 guesses (7 fuzzy sets), one guess set per byte of the key.

But the key is one of the *all possible combination of the guesses*.

How many possible keys do we have?

.. code:: python

   >>> from cryptonita.fuzzy_set import len_join_fuzzy_sets

   >>> len_join_fuzzy_sets(gbkeys)
   62748517

How! that’s a lot! But still **much less than** 256^7 which is greater
than the age of the `observable
universe <https://en.wikipedia.org/wiki/Observable_universe>`__ in
years.

Still, we need to shrink the guesses even further to make it manageable.

Breaking the cipher - Brute force refinement
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``freq_attack`` is really powerful but it is not the only tool that we
have.

Not all the possible keys in a guess will produce *“reasonable”*
plaintext.

We can *score* a plaintext and filter out the ones that don’t look
*“good enough”*

``cryptonita`` implements different scoring functions and
``all_ascii_printable`` is the most simplest to understand:

Let’s *assume* that the plaintext is an English message encoded in
ASCII.

If we decipher one block and we got a plaintext with non-printable ASCII
char we can be sure that the key used is incorrect and we can score it
with a ``0``. Otherwise, we score it with ``1``.

.. code:: python

   >>> from cryptonita.scoring import all_ascii_printable

   >>> all_ascii_printable(B("a reasonable plaintext"))
   1

   >>> all_ascii_printable(B("n\0t v\4lid"))
   0

The plan is to try **all** the possible byte keys in **each** of our
guesses, score the results and drop the ones with lower score.

.. code:: python

   >>> from cryptonita.attacks import brute_force

   >>> for i, c in enumerate(cblocks):
   ...     # the fuzzy set of keys (a guess) for this ith byte
   ...     gbkey = gbkeys[i]
   ...
   ...     refined = brute_force(c,
   ...                     score_func=all_ascii_printable,
   ...                     key_space=gbkey,
   ...                     min_score=0.01
   ...                 )
   ...
   ...     # "refined" is another fuzzy set (a guess) for the ith byte
   ...     # but probably a much smaller one
   ...     gbkeys[i] = refined

Like ``guess_key_length``, ``brute_force`` receives a score function, a
key space and a minimum score.

Now we have a much smaller search space to work on:

.. code:: python

   >>> len_join_fuzzy_sets(gbkeys)
   260

   >>> 260 / 62748517
   4.14<...>e-06

While still we have a lot of possible keys, the refinement did an
amazing job and the new set is **6 orders of magnitud smaller** than the
original!

We can compute the set of possible keys doing a join and we can even
further reduce the set keeping only the most likely keys:

.. code:: python

   >>> from cryptonita.fuzzy_set import join_fuzzy_sets
   >>> gkstream = join_fuzzy_sets(gbkeys, cut_off=1024, j=B(''))

``gkstream`` is our guess for the complete key stream for the cipher.

Is this right?

Breaking the cipher - Break the cipher!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: python

   >>> kstream = gkstream.most_likely()

   >>> print((ctext ^ kstream.inf()).decode('ascii'))
   Now that the party is jumping
   With the bass kicked in and the Vega's are pumpin
   Quick to the point, to the point, no faking
   Cooking MC's like a pound of bacon
   Burning 'em, if you ain't quick and nimble
   I go crazy when I hear a cymbal
   And a high hat with a souped up tempo
   I'm on a roll, it's time to go solo
   ollin' in my five point oh
   ith my rag-top down so my hair can blow


   >>> kstream.encode(64)
   b'dmFuaWxsYQ=='

Final thoughts
~~~~~~~~~~~~~~

Vigenere or a repeating key cipher is a well known poor cipher shown in
every single cryptography course.

But little is explained in how to break it in an *automated* fashion.

`cryptonita <https://pypi.org/project/cryptonita/>`__ is not magical and
a little of brain is required from you, but it is a quite useful *Swiss
army knife for breaking crypto*.

PRs or comments are welcome.

Tested with `byexample <https://byexamples.github.io/byexample>`__.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cryptonita",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.3",
    "maintainer_email": null,
    "keywords": "crypto cryptography crypto-analysis",
    "author": "Di Paola Martin",
    "author_email": "use-github-issues@example.com",
    "download_url": "https://files.pythonhosted.org/packages/09/c9/8c24ebe4d2ab1ec342eff94efca58858d36357de42fb0fd2f0f84edc4622/cryptonita-0.6.3.tar.gz",
    "platform": null,
    "description": "``cryptonita`` - Cryptanalysis Swiss Army Knife\n===============================================\n\n`cryptonita <https://pypi.org/project/cryptonita/>`__ is a set of\nbuilding blocks to create automated crypto-attacks.\n\nYou may not find the advanced attack implemented here (yet) but I hope\nthat this building blocks or primitives can help you in your journey.\n\nWithout more, let\u2019s put our hands on and break the famous `Vigenere\ncipher <https://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher>`__.\n\nInstall\n-------\n\n::\n\n   pip install cryptonita          # lite version\n   pip install cryptonita[full]    # full version\n\nTutorial - Break a xor Vigenere cipher\n--------------------------------------\n\nThe `Vigenere\ncipher <https://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher>`__ was once\nthe most secure cipher. It was thought that it was unbreakable\u2026\n\nLet\u2019s put under test that statement and learn about\n`cryptonita <https://pypi.org/project/cryptonita/>`__ along the journey!\n\n   Note: the following README is also an automated test for the\n   ``cryptonita`` lib thanks to\n   `byexample <https://byexamples.github.io/byexample>`__.\n\nImplement the cipher - Load the bytes\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe building block in ``cryptonita`` is the *byte string*: a finite\nimmutable sequence of bytes.\n\nIn ``cryptonita`` we can create a *byte string* with the ``B`` function\nand do any conversion needed:\n\n.. code:: python\n\n   >>> from cryptonita import B            # byexample: +timeout=10\n   >>> B(u'from an unicode encoded text', encoding='utf-8')\n   'from an unicode encoded text'\n\n   >>> B([0x46, 0x72, 0x6f, 0x6d, 0x20, 0x6e, 0x75, 0x6d, 0x62, 0x65, 0x72, 0x73])\n   'From numbers'\n\n   >>> B('RnJvbSBiYXNlNjQ=', encoding=64)\n   'From base64'\n\nFor our purposes of implementing a Vigenere cipher, let\u2019s load some\nplain text from a file:\n\n.. code:: python\n\n   >>> ptext = B(open('./test/ds/plaintext', 'rt').read())\n\n   >>> ptext[:29]\n   'Now that the party is jumping'\n\n..\n\n   For the full list of conversions see `cryptonita/conv.py\u2019s\n   ``as_bytes`` <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/conv.py>`__\n\nImplement the cipher - Apply a xor\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nFirst, we load our secret key in base 64 from the file. Notice how the\ndecoding from base 64 is made by ``B``:\n\n.. code:: python\n\n   >>> secret = B(open('./test/ds/secret', 'rt').read(), encoding=64)\n\nThe Vigenere cipher consists in xord the plaintext with the key. If the\nplaintext is larger than the key, just repeat the key over and over.\n\n``cryptonita`` can do exactly that:\n\n.. code:: python\n\n   >>> ctext = ptext ^ secret.inf()\n\n   >>> ctext[:29].encode(64)\n   b'OA4ZSRgEAAJBGgEJTBEXExoQTAUSVgsbBBwFDxE='\n\nThe ``inf()`` method tells that the ``secret`` string must be seen as an\n\u201cinfinite sequence\u201d, repeating the key over and over.\n\nThen, the ``^`` just does the xor byte by byte.\n\n   For the full list of operation on ``ImmutableByteString`` see\n   `cryptonita/bytestrings.py\u2019s\n   ``ImmutableByteString`` <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/bytestrings.py>`__\n   and the\n   `mixins <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/mixins.py>`__\n\nBreaking the cipher - Scoring the length of the key\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nVigenere was thought to be unbreakable because a priori is not possible\nto know the length of the key.\n\nHowever this was proved to be false.\n\nIn 1863, `Kasiski <https://en.wikipedia.org/wiki/Kasiski_examination>`__\ncame with a cleaver method to know the length of the key but it is quite\nhard to make it right and faster (I\u2019m still `working on\nit <https://book-of-gehn.github.io/articles/2020/10/11/Kasiski-Test-Part-I.html>`__)\n\nModern and better approaches are the `Hamming\ndistance <https://en.wikipedia.org/wiki/Hamming_distance>`__ and the\n`Index of\nCoincidence <https://book-of-gehn.github.io/articles/2019/10/04/Index-of-Coincidence.html>`__\n\nThe idea is to assume that the key is of length L and then pick every\nLth byte of the ciphertext:\n\n.. code:: python\n\n   >>> L = 8 # totally arbitrary here\n   >>> picked = ctext[::L]\n\n..\n\n   Note how the ``ImmutableByteString`` ciphertext supports indexing\n   operation like any Python string.\n\nNow we compute the Index of Coincidence (IC) of this picked string.\n\nIf the assumed length L is **not** the correct one, every picked byte\nwill be the xor of the plaintext with a different key byte and the whole\n``picked`` string would like **random** and the IC will be very low.\n\nOn the other hand, if we guessed correctly the length L, **all** the\npicked bytes will be the xord of the plaintext and the **same** key byte\nand therefore will not look random. A high IC would be expected!\n\n.. code:: python\n\n   >>> from cryptonita.metrics import icoincidences\n   >>> icoincidences(picked)\n   0.02<...>\n\n..\n\n   See\n   `cryptonita/scoring.py <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/scoring/score_funcs.py>`__\n   and\n   `cryptonita/metrics.py <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/metrics/__init__.py>`__\n\n   I you want to know more about the Index of Coincidence see this `blog\n   post <https://book-of-gehn.github.io/articles/2019/10/04/Index-of-Coincidence.html>`__\n   about it and this `comparison with other\n   methods <https://book-of-gehn.github.io/articles/2018/04/01/A-string-of-coincidences-is-not-a-coincidence.html>`__\n\nBreaking the cipher - Guessing the length of the key\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA IC of 0.02 is too low. It seems that 8 is not the length of the key.\n\nWe could do a loop to try other lengths but ``cryptonita`` already has\nthat\n\n.. code:: python\n\n   >>> from cryptonita.scoring import scoring\n   >>> from cryptonita.scoring import key_length_by_ic\n\n   >>> gklength = scoring(\n   ...                     ctext,\n   ...                     space=range(5, 25),\n   ...                     score_func=key_length_by_ic,\n   ...                     min_score=0.025,\n   ... )\n\nOkay, what is that?\n\n-  ``scoring`` does a brute force *attack* computing a *score function*\n   testing every possible length from 5 to 25.\n-  ``key_length_by_ic`` is a *scores* how good the tested length is. It\n   puts a score between 0 (bad) and 1 (good) using the Index of\n   Coincidence.\n\nYou may think that ``gklength`` is the **the** guessed key but in\ncryptoanalysis you mostly never work with a *specific* value. You work\nwith a **set of possible values**.\n\n.. code:: python\n\n   >>> gklength\n   {5: 0.02702702702702703,\n    6: 0.027649769585253458,\n    7: 0.04682040531097135,\n    8: 0.02682701202590194,\n    9: 0.025551684088269456,\n    10: 0.025604551920341393,\n    12: 0.038306451612903226,\n    14: 0.03133903133903134,\n    16: 0.028985507246376812,\n    17: 0.02766798418972332,\n    21: 0.032679738562091505,\n    24: 0.041666666666666664}\n\nIn ``cryptonita`` we call these sets, these *guesses*, ``FuzzySet``.\n\n   For more scoring functions see\n   `cryptonita/scoring.py <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/scoring/score_funcs.py>`__\n\nBreaking the cipher - A guess as a fuzzy set\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA guess or ``FuzzySet`` is a bunch of possible solutions, each with an\nassociated probability or score.\n\nWe can query then the most likely answer. In our case, the most likely\nlength of the key:\n\n.. code:: python\n\n   >>> gklength.most_likely()\n   7\n\nBut the most likely may not necessary mean the correct answer. Instead,\nyou should work always with the fuzzy set to test all of them.\n\nIf the sets gets to large (and they will), you can cut them off,\ndropping items with a probability lower than some threshold.\n\nHere we say that any length with a lower probability of 0.01 should be\nout:\n\n.. code:: python\n\n   >>> gklength.cut_off(0.03)\n   >>> gklength\n   {7 -> 0.0468, 24 -> 0.0417, 12 -> 0.0383, 21 -> 0.0327, 14 -> 0.0313}\n\n..\n\n   Take a look at the `documentation of\n   ``FuzzySet`` <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/fuzzy_set.py>`__\n   and optional a wiki about `fuzzy set\n   theory <https://en.wikipedia.org/wiki/Fuzzy_set>`__.\n\nBreaking the cipher - Chop the ciphertext into blocks\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNow the we have a set of possible lengths, here is the plan to crack the\ncipher:\n\nFirst, split the ciphertext into *blocks* of guessed length L:\n\n.. code:: python\n\n   >>> L = gklength.most_likely()\n   >>> cblocks = ctext.nblocks(L)\n\n::\n\n   ciphertext:  ABCDEFGHIJKLMN\n                 |   |    |  |\n                 |   |    \\  \\___\n                 |   |     \\     \\\n   cblocks      ABCD  EFGH  IJKL  MN\n\nEach first byte of those blocks are supposedly the result of xor the\nplaintext with the same key byte. The same goes for the second byte of\neach block and so on.\n\nSecond, because it is easier to have all the first bytes in one block,\nall the second bytes in another block and so on, we want to *transpose*\nthe blocks:\n\n.. code:: python\n\n   >>> from cryptonita.conv import transpose\n   >>> cblocks = transpose(cblocks, allow_holes=True)\n\n::\n\n    cblocks   --> transposed cblocks\n     ABCD           AEIM\n     EFGH           BFJN\n     IJKL           CGK\n     MN             DHL\n\nNow, each block (or row) is a piece of plaintext encrypted with the same\nsingle-byte key.\n\nLet\u2019s break it!\n\nBreaking the cipher - Frequency attack\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWe could test all the 256 possible byte keys by brute force but that\u2019s\nquite slow.\n\nRather we could do a *frequency attack* because the statistics of the\nplaintext are leaked into the ciphertext.\n\n``cryptonita`` already provides us with a very simple *model* of the\nfrequencies of the English plaintext: the famous *ETAOIN SHRDLU*.\n\n.. code:: python\n\n   >>> from cryptonita.scoring.freq import etaoin_shrdlu\n\nIf our ciphertext has the same distribution than the plaintext, at least\none of the most common bytes in the ciphertext should be one of the most\ncommon bytes in the plaintext, encrypted of course.\n\nUnder this hypothesis ``freq_attack`` xor the top most common bytes in\nthe ciphertext with the most common bytes in plaintext according to the\nmodel.\n\n.. code:: python\n\n   >>> most_common_pbytes = etaoin_shrdlu()\n   >>> ntop_most_common_cbytes = 1\n\n   >>> from cryptonita.attacks import freq_attack\n\n   >>> freq_attack(cblocks[0], most_common_pbytes, ntop_most_common_cbytes)\n   {'\"': 0.07387790762504176,\n    '$': 0.055504740275805896,\n    '%': 0.0561520934139066,\n    '2': 0.03178778752478832,\n    '3': 0.10384587375686015,\n    '5': 0.026296157563462763,\n    '7': 0.07060615929878336,\n    '8': 0.060837928943597436,\n    '9': 0.0634364224946222,\n    ':': 0.0342469273170487,\n    '>': 0.03964865941609311,\n    '?': 0.06072776315086166,\n    'v': 0.17269159612928756}\n\nIn general, ``freq_attack`` cannot give us **the** byte key but it can\ngive use a *guess*: a fuzzy set of possible keys. This is a much shorted\nlist than 256!\n\nBut don\u2019t claim victory yet. We broke only the first block\n(``cblocks[0]``).\n\n   More frequency models may be found at\n   `cryptonita/scoring/freq.py <https://github.com/cryptonitas/cryptonita/tree/master/cryptonita/scoring/freq.py>`__\n\nBreaking the cipher - Guess explosion\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWe need to call ``freq_attack`` for all the blocks:\n\n.. code:: python\n\n   >>> gbkeys = []\n   >>> for c in cblocks:\n   ...     gbkeys.append(freq_attack(c, most_common_pbytes, ntop_most_common_cbytes))\n\n   >>> len(gbkeys)\n   7\n\nSo we have 7 guesses (7 fuzzy sets), one guess set per byte of the key.\n\nBut the key is one of the *all possible combination of the guesses*.\n\nHow many possible keys do we have?\n\n.. code:: python\n\n   >>> from cryptonita.fuzzy_set import len_join_fuzzy_sets\n\n   >>> len_join_fuzzy_sets(gbkeys)\n   62748517\n\nHow! that\u2019s a lot! But still **much less than** 256^7 which is greater\nthan the age of the `observable\nuniverse <https://en.wikipedia.org/wiki/Observable_universe>`__ in\nyears.\n\nStill, we need to shrink the guesses even further to make it manageable.\n\nBreaking the cipher - Brute force refinement\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n``freq_attack`` is really powerful but it is not the only tool that we\nhave.\n\nNot all the possible keys in a guess will produce *\u201creasonable\u201d*\nplaintext.\n\nWe can *score* a plaintext and filter out the ones that don\u2019t look\n*\u201cgood enough\u201d*\n\n``cryptonita`` implements different scoring functions and\n``all_ascii_printable`` is the most simplest to understand:\n\nLet\u2019s *assume* that the plaintext is an English message encoded in\nASCII.\n\nIf we decipher one block and we got a plaintext with non-printable ASCII\nchar we can be sure that the key used is incorrect and we can score it\nwith a ``0``. Otherwise, we score it with ``1``.\n\n.. code:: python\n\n   >>> from cryptonita.scoring import all_ascii_printable\n\n   >>> all_ascii_printable(B(\"a reasonable plaintext\"))\n   1\n\n   >>> all_ascii_printable(B(\"n\\0t v\\4lid\"))\n   0\n\nThe plan is to try **all** the possible byte keys in **each** of our\nguesses, score the results and drop the ones with lower score.\n\n.. code:: python\n\n   >>> from cryptonita.attacks import brute_force\n\n   >>> for i, c in enumerate(cblocks):\n   ...     # the fuzzy set of keys (a guess) for this ith byte\n   ...     gbkey = gbkeys[i]\n   ...\n   ...     refined = brute_force(c,\n   ...                     score_func=all_ascii_printable,\n   ...                     key_space=gbkey,\n   ...                     min_score=0.01\n   ...                 )\n   ...\n   ...     # \"refined\" is another fuzzy set (a guess) for the ith byte\n   ...     # but probably a much smaller one\n   ...     gbkeys[i] = refined\n\nLike ``guess_key_length``, ``brute_force`` receives a score function, a\nkey space and a minimum score.\n\nNow we have a much smaller search space to work on:\n\n.. code:: python\n\n   >>> len_join_fuzzy_sets(gbkeys)\n   260\n\n   >>> 260 / 62748517\n   4.14<...>e-06\n\nWhile still we have a lot of possible keys, the refinement did an\namazing job and the new set is **6 orders of magnitud smaller** than the\noriginal!\n\nWe can compute the set of possible keys doing a join and we can even\nfurther reduce the set keeping only the most likely keys:\n\n.. code:: python\n\n   >>> from cryptonita.fuzzy_set import join_fuzzy_sets\n   >>> gkstream = join_fuzzy_sets(gbkeys, cut_off=1024, j=B(''))\n\n``gkstream`` is our guess for the complete key stream for the cipher.\n\nIs this right?\n\nBreaking the cipher - Break the cipher!\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n   >>> kstream = gkstream.most_likely()\n\n   >>> print((ctext ^ kstream.inf()).decode('ascii'))\n   Now that the party is jumping\n   With the bass kicked in and the Vega's are pumpin\n   Quick to the point, to the point, no faking\n   Cooking MC's like a pound of bacon\n   Burning 'em, if you ain't quick and nimble\n   I go crazy when I hear a cymbal\n   And a high hat with a souped up tempo\n   I'm on a roll, it's time to go solo\n   ollin' in my five point oh\n   ith my rag-top down so my hair can blow\n\n\n   >>> kstream.encode(64)\n   b'dmFuaWxsYQ=='\n\nFinal thoughts\n~~~~~~~~~~~~~~\n\nVigenere or a repeating key cipher is a well known poor cipher shown in\nevery single cryptography course.\n\nBut little is explained in how to break it in an *automated* fashion.\n\n`cryptonita <https://pypi.org/project/cryptonita/>`__ is not magical and\na little of brain is required from you, but it is a quite useful *Swiss\narmy knife for breaking crypto*.\n\nPRs or comments are welcome.\n\nTested with `byexample <https://byexamples.github.io/byexample>`__.\n\n\n",
    "bugtrack_url": null,
    "license": "GNU GPLv3",
    "summary": "Cryptanalysis swiss army knife",
    "version": "0.6.3",
    "project_urls": null,
    "split_keywords": [
        "crypto",
        "cryptography",
        "crypto-analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b4d3255d0555968e4d0ac3a592461bc69596b58d3fd55231be1d3cc4b5aa898d",
                "md5": "f03ded9628ec72540618f3ad81c1accd",
                "sha256": "68e8bba292ec13b9e0f36f0b04c6a5129a0adcc862675d5e7c6da9b67b4d8589"
            },
            "downloads": -1,
            "filename": "cryptonita-0.6.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f03ded9628ec72540618f3ad81c1accd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.3",
            "size": 64827,
            "upload_time": "2024-04-25T14:54:53",
            "upload_time_iso_8601": "2024-04-25T14:54:53.692189Z",
            "url": "https://files.pythonhosted.org/packages/b4/d3/255d0555968e4d0ac3a592461bc69596b58d3fd55231be1d3cc4b5aa898d/cryptonita-0.6.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "09c98c24ebe4d2ab1ec342eff94efca58858d36357de42fb0fd2f0f84edc4622",
                "md5": "f606c4b03ffca9e1c72eee9ef8893c21",
                "sha256": "9a1761d90dc8a9765df98b17ec4a63e4e96f36b24bb572d731dfff3534c0a55c"
            },
            "downloads": -1,
            "filename": "cryptonita-0.6.3.tar.gz",
            "has_sig": false,
            "md5_digest": "f606c4b03ffca9e1c72eee9ef8893c21",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.3",
            "size": 63891,
            "upload_time": "2024-04-25T14:54:56",
            "upload_time_iso_8601": "2024-04-25T14:54:56.081134Z",
            "url": "https://files.pythonhosted.org/packages/09/c9/8c24ebe4d2ab1ec342eff94efca58858d36357de42fb0fd2f0f84edc4622/cryptonita-0.6.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-25 14:54:56",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "cryptonita"
}

Di Paola Martin