getdents


Namegetdents JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryPython binding to linux syscall getdents64.
upload_time2025-09-02 18:42:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords getdents
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ===============
Python getdents
===============

Iterate large directories efficiently with python.

About
=====

``python-getdents`` is a simple wrapper around Linux system call ``getdents64`` (see ``man getdents`` for details).

Implementation is based on solution descibed in `You can list a directory containing 8 million files! But not with ls. <http://be-n.com/spw/you-can-list-a-million-files-in-a-directory-but-not-with-ls.html>`_ article by Ben Congleton.

Install
=======

.. code-block:: sh

    pip install getdents

For development
---------------

.. code-block:: sh

    python3 -m venv env
    . env/bin/activate
    pip install -e .[test]

Building Wheels
~~~~~~~~~~~~~~~

.. code-block:: sh

    pip install cibuildwheel
    cibuildwheel --platform linux --output-dir wheelhouse

Run tests
=========

.. code-block:: sh

    ulimit -v 33554432 && py.test tests/

Usage
=====

.. code-block:: python

    from getdents import getdents

    for inode, type_, name in getdents("/tmp"):
        print(name)

Advanced
--------

While ``getdents`` provides a convenient wrapper with ls-like filtering, you can use ``getdents_raw`` for more control:

.. code-block:: python

    import os
    from getdents import DT_LNK, O_GETDENTS, getdents_raw

    fd = os.open("/tmp", O_GETDENTS)

    for inode, type_, name in getdents_raw(fd, 2**20):
        if type_ == DT_LNK and inode != 0:
            print("found symlink:", name, "->", os.readlink(name, dir_fd=fd))

    os.close(fd)

Batching
~~~~~~~~

In case you need more control over syscalls, you may call instance of ``getdents_raw`` instead.
Each call corresponds to single ``getdents64`` syscall, returning list of hovever many entries fits in buffer size.
Call returns ``None`` when there are no more entries to read.

.. code-block:: python

    it = getdents_raw(fd, 2**20)

    for batch in iter(it, None):
         for inode, type, name in batch:
            ...

Free-threading
~~~~~~~~~~~~~~

While it is not so wise idea to do an I/O from multiple threads on a single file descriptor, you can do it if you need to.
This package supports free-threading (nogil) in Python.

CLI
---

Usage
~~~~~

::

    python-getdents [-h] [-b N] [-o NAME] PATH

Options
~~~~~~~

+--------------------------+-------------------------------------------------+
| Option                   | Description                                     |
+==========================+=================================================+
| ``-b N``                 | Buffer size (in bytes) to allocate when         |
|                          | iterating over directory. Default is 32768, the |
|                          | same value used by glibc, you probably want to  |
+--------------------------+ increase this value. Try starting with 16777216 |
| ``--buffer-size N``      | (16 MiB). Best performance is achieved when     |
|                          | buffer size rounds to size of the file system   |
|                          | block.                                          |
+--------------------------+-------------------------------------------------+
| ``-o NAME``              | Output format:                                  |
|                          |                                                 |
|                          | * ``plain`` (default) Print only names.         |
|                          | * ``csv`` Print as comma-separated values in    |
+--------------------------+   order: inode, type, name.                     |
| ``--output-format NAME`` | * ``csv-headers`` Same as ``csv``, but print    |
|                          |   headers on the first line also.               |
|                          | * ``json`` output as JSON array.                |
|                          | * ``json-stream`` output each directory entry   |
|                          |   as single json object separated by newline.   |
+--------------------------+-------------------------------------------------+

Exit codes
~~~~~~~~~~

* 3 - Requested buffer is too large
* 4 - ``PATH`` not found.
* 5 - ``PATH`` is not a directory.
* 6 - Not enough permissions to read contents of the ``PATH``.

Examples
~~~~~~~~

.. code-block:: sh

    python-getdents /path/to/large/dir
    python -m getdents /path/to/large/dir
    python-getdents /path/to/large/dir -o csv -b 16777216 > dir.csv

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "getdents",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "getdents",
    "author": null,
    "author_email": "ZipFile <zipfile.d@protonmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/10/aa/cbdc87f71e8659f579557beb5d719e82459f70cdac6c089f948bce6cd76a/getdents-1.0.0.tar.gz",
    "platform": null,
    "description": "===============\nPython getdents\n===============\n\nIterate large directories efficiently with python.\n\nAbout\n=====\n\n``python-getdents`` is a simple wrapper around Linux system call ``getdents64`` (see ``man getdents`` for details).\n\nImplementation is based on solution descibed in `You can list a directory containing 8 million files! But not with ls. <http://be-n.com/spw/you-can-list-a-million-files-in-a-directory-but-not-with-ls.html>`_ article by Ben Congleton.\n\nInstall\n=======\n\n.. code-block:: sh\n\n    pip install getdents\n\nFor development\n---------------\n\n.. code-block:: sh\n\n    python3 -m venv env\n    . env/bin/activate\n    pip install -e .[test]\n\nBuilding Wheels\n~~~~~~~~~~~~~~~\n\n.. code-block:: sh\n\n    pip install cibuildwheel\n    cibuildwheel --platform linux --output-dir wheelhouse\n\nRun tests\n=========\n\n.. code-block:: sh\n\n    ulimit -v 33554432 && py.test tests/\n\nUsage\n=====\n\n.. code-block:: python\n\n    from getdents import getdents\n\n    for inode, type_, name in getdents(\"/tmp\"):\n        print(name)\n\nAdvanced\n--------\n\nWhile ``getdents`` provides a convenient wrapper with ls-like filtering, you can use ``getdents_raw`` for more control:\n\n.. code-block:: python\n\n    import os\n    from getdents import DT_LNK, O_GETDENTS, getdents_raw\n\n    fd = os.open(\"/tmp\", O_GETDENTS)\n\n    for inode, type_, name in getdents_raw(fd, 2**20):\n        if type_ == DT_LNK and inode != 0:\n            print(\"found symlink:\", name, \"->\", os.readlink(name, dir_fd=fd))\n\n    os.close(fd)\n\nBatching\n~~~~~~~~\n\nIn case you need more control over syscalls, you may call instance of ``getdents_raw`` instead.\nEach call corresponds to single ``getdents64`` syscall, returning list of hovever many entries fits in buffer size.\nCall returns ``None`` when there are no more entries to read.\n\n.. code-block:: python\n\n    it = getdents_raw(fd, 2**20)\n\n    for batch in iter(it, None):\n         for inode, type, name in batch:\n            ...\n\nFree-threading\n~~~~~~~~~~~~~~\n\nWhile it is not so wise idea to do an I/O from multiple threads on a single file descriptor, you can do it if you need to.\nThis package supports free-threading (nogil) in Python.\n\nCLI\n---\n\nUsage\n~~~~~\n\n::\n\n    python-getdents [-h] [-b N] [-o NAME] PATH\n\nOptions\n~~~~~~~\n\n+--------------------------+-------------------------------------------------+\n| Option                   | Description                                     |\n+==========================+=================================================+\n| ``-b N``                 | Buffer size (in bytes) to allocate when         |\n|                          | iterating over directory. Default is 32768, the |\n|                          | same value used by glibc, you probably want to  |\n+--------------------------+ increase this value. Try starting with 16777216 |\n| ``--buffer-size N``      | (16 MiB). Best performance is achieved when     |\n|                          | buffer size rounds to size of the file system   |\n|                          | block.                                          |\n+--------------------------+-------------------------------------------------+\n| ``-o NAME``              | Output format:                                  |\n|                          |                                                 |\n|                          | * ``plain`` (default) Print only names.         |\n|                          | * ``csv`` Print as comma-separated values in    |\n+--------------------------+   order: inode, type, name.                     |\n| ``--output-format NAME`` | * ``csv-headers`` Same as ``csv``, but print    |\n|                          |   headers on the first line also.               |\n|                          | * ``json`` output as JSON array.                |\n|                          | * ``json-stream`` output each directory entry   |\n|                          |   as single json object separated by newline.   |\n+--------------------------+-------------------------------------------------+\n\nExit codes\n~~~~~~~~~~\n\n* 3 - Requested buffer is too large\n* 4 - ``PATH`` not found.\n* 5 - ``PATH`` is not a directory.\n* 6 - Not enough permissions to read contents of the ``PATH``.\n\nExamples\n~~~~~~~~\n\n.. code-block:: sh\n\n    python-getdents /path/to/large/dir\n    python -m getdents /path/to/large/dir\n    python-getdents /path/to/large/dir -o csv -b 16777216 > dir.csv\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python binding to linux syscall getdents64.",
    "version": "1.0.0",
    "project_urls": {
        "Source": "https://github.com/ZipFile/python-getdents"
    },
    "split_keywords": [
        "getdents"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a192a28176f225841e06fd8c27c37951b045df648f89b8f2f04c65be430aef73",
                "md5": "eec3877d3b41e7931ae89dd6fb7fe697",
                "sha256": "f62d1edd1522fd044439589c4e8c200b94a677d81ae3b86320eff8e3cd8ccb10"
            },
            "downloads": -1,
            "filename": "getdents-1.0.0-cp310-abi3-manylinux_2_28_aarch64.whl",
            "has_sig": false,
            "md5_digest": "eec3877d3b41e7931ae89dd6fb7fe697",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.10",
            "size": 16275,
            "upload_time": "2025-09-02T18:42:24",
            "upload_time_iso_8601": "2025-09-02T18:42:24.282174Z",
            "url": "https://files.pythonhosted.org/packages/a1/92/a28176f225841e06fd8c27c37951b045df648f89b8f2f04c65be430aef73/getdents-1.0.0-cp310-abi3-manylinux_2_28_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8a2c4765c59e3349d60856c77215b32f5fe6b6cb2cc1675639d359fd12424a30",
                "md5": "36e823a5462464cd03296d9af38c85e3",
                "sha256": "49cd092b360b52a40802ef6fa08e50346ad36dd67f63f05781501f957ca21ab0"
            },
            "downloads": -1,
            "filename": "getdents-1.0.0-cp310-abi3-manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "36e823a5462464cd03296d9af38c85e3",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.10",
            "size": 15238,
            "upload_time": "2025-09-02T18:42:25",
            "upload_time_iso_8601": "2025-09-02T18:42:25.600711Z",
            "url": "https://files.pythonhosted.org/packages/8a/2c/4765c59e3349d60856c77215b32f5fe6b6cb2cc1675639d359fd12424a30/getdents-1.0.0-cp310-abi3-manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3a05a9d854115b73022e681cfa916e24cf70d38eaf19bc57a592097fa99b601f",
                "md5": "ef26257f57c1361a350a61c2f53e4913",
                "sha256": "8f992fa25380d76f88cb89cd582b1cb5b64e9bb1142cb26776ccf3b40044f7f4"
            },
            "downloads": -1,
            "filename": "getdents-1.0.0-cp310-abi3-musllinux_1_2_aarch64.whl",
            "has_sig": false,
            "md5_digest": "ef26257f57c1361a350a61c2f53e4913",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.10",
            "size": 16206,
            "upload_time": "2025-09-02T18:42:26",
            "upload_time_iso_8601": "2025-09-02T18:42:26.965935Z",
            "url": "https://files.pythonhosted.org/packages/3a/05/a9d854115b73022e681cfa916e24cf70d38eaf19bc57a592097fa99b601f/getdents-1.0.0-cp310-abi3-musllinux_1_2_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a897c31cb9dafdba8edba3983e14fc063cd885c99c0a5d4d0da3c692d43b06c0",
                "md5": "a41f6318f5d4be27891e378b301d9226",
                "sha256": "cdd21d302592fa4c2b4e983d30b07a0be8b41846ec1413d2ffd2034a287e25ce"
            },
            "downloads": -1,
            "filename": "getdents-1.0.0-cp310-abi3-musllinux_1_2_x86_64.whl",
            "has_sig": false,
            "md5_digest": "a41f6318f5d4be27891e378b301d9226",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.10",
            "size": 15520,
            "upload_time": "2025-09-02T18:42:27",
            "upload_time_iso_8601": "2025-09-02T18:42:27.945715Z",
            "url": "https://files.pythonhosted.org/packages/a8/97/c31cb9dafdba8edba3983e14fc063cd885c99c0a5d4d0da3c692d43b06c0/getdents-1.0.0-cp310-abi3-musllinux_1_2_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6ee4a4c3172a2e1d17621dd52884d48a5120673379ec478ff8e1c312124770fe",
                "md5": "82dda60e563df604bf5c0d3fb0cb98f4",
                "sha256": "7c6e461ec4d14e8ea668faab5e68467940dd0d8000f4c6d2f3f91832bddb0769"
            },
            "downloads": -1,
            "filename": "getdents-1.0.0-cp314-cp314t-manylinux_2_28_aarch64.whl",
            "has_sig": false,
            "md5_digest": "82dda60e563df604bf5c0d3fb0cb98f4",
            "packagetype": "bdist_wheel",
            "python_version": "cp314",
            "requires_python": ">=3.10",
            "size": 17234,
            "upload_time": "2025-09-02T18:42:28",
            "upload_time_iso_8601": "2025-09-02T18:42:28.923701Z",
            "url": "https://files.pythonhosted.org/packages/6e/e4/a4c3172a2e1d17621dd52884d48a5120673379ec478ff8e1c312124770fe/getdents-1.0.0-cp314-cp314t-manylinux_2_28_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5a8a4055f0eeb93b7a9251ff720dc1c4d5352cac7b36a975ffaf072073c105a4",
                "md5": "bc8b947e075fba300d11f32d39c1c1b0",
                "sha256": "2ce612bdc9cc3690dc568b360bafee319afca151b9abb38fea376e7ddd344085"
            },
            "downloads": -1,
            "filename": "getdents-1.0.0-cp314-cp314t-manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "bc8b947e075fba300d11f32d39c1c1b0",
            "packagetype": "bdist_wheel",
            "python_version": "cp314",
            "requires_python": ">=3.10",
            "size": 15888,
            "upload_time": "2025-09-02T18:42:29",
            "upload_time_iso_8601": "2025-09-02T18:42:29.917038Z",
            "url": "https://files.pythonhosted.org/packages/5a/8a/4055f0eeb93b7a9251ff720dc1c4d5352cac7b36a975ffaf072073c105a4/getdents-1.0.0-cp314-cp314t-manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d361e7304e86899d2b8181ab23035e54b0322f70c019439ece5b83b0cf1888bf",
                "md5": "842e3926720a0448ebdd7f525d1d56b6",
                "sha256": "381f0081be3bdd249f121e51b13b477c615b516be31d08b8bf6b839ea968b48e"
            },
            "downloads": -1,
            "filename": "getdents-1.0.0-cp314-cp314t-musllinux_1_2_aarch64.whl",
            "has_sig": false,
            "md5_digest": "842e3926720a0448ebdd7f525d1d56b6",
            "packagetype": "bdist_wheel",
            "python_version": "cp314",
            "requires_python": ">=3.10",
            "size": 17037,
            "upload_time": "2025-09-02T18:42:31",
            "upload_time_iso_8601": "2025-09-02T18:42:31.070727Z",
            "url": "https://files.pythonhosted.org/packages/d3/61/e7304e86899d2b8181ab23035e54b0322f70c019439ece5b83b0cf1888bf/getdents-1.0.0-cp314-cp314t-musllinux_1_2_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b8c26bf256ec5358ab95608f0f1a9671f7e441bfb9045ce046f8396d5be4d609",
                "md5": "34163cf17a58dd7f062fca53650655a5",
                "sha256": "35238c0e4fa94b266099abd00391ce1716d439cd4b127427ac385bc49fa230cd"
            },
            "downloads": -1,
            "filename": "getdents-1.0.0-cp314-cp314t-musllinux_1_2_x86_64.whl",
            "has_sig": false,
            "md5_digest": "34163cf17a58dd7f062fca53650655a5",
            "packagetype": "bdist_wheel",
            "python_version": "cp314",
            "requires_python": ">=3.10",
            "size": 16186,
            "upload_time": "2025-09-02T18:42:32",
            "upload_time_iso_8601": "2025-09-02T18:42:32.006247Z",
            "url": "https://files.pythonhosted.org/packages/b8/c2/6bf256ec5358ab95608f0f1a9671f7e441bfb9045ce046f8396d5be4d609/getdents-1.0.0-cp314-cp314t-musllinux_1_2_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "10aacbdc87f71e8659f579557beb5d719e82459f70cdac6c089f948bce6cd76a",
                "md5": "c1d657d70c3245cde663d587b5b793ae",
                "sha256": "80ab2825a09e5b1107fe3d166458d01d4a7cedfe255ee9762d12c68c9f890d24"
            },
            "downloads": -1,
            "filename": "getdents-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c1d657d70c3245cde663d587b5b793ae",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 13823,
            "upload_time": "2025-09-02T18:42:33",
            "upload_time_iso_8601": "2025-09-02T18:42:33.027385Z",
            "url": "https://files.pythonhosted.org/packages/10/aa/cbdc87f71e8659f579557beb5d719e82459f70cdac6c089f948bce6cd76a/getdents-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-02 18:42:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ZipFile",
    "github_project": "python-getdents",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "getdents"
}
        
Elapsed time: 1.26079s