********
xls2xlsx
********
.. image:: https://img.shields.io/pypi/v/xls2xlsx.svg
:target: https://pypi.python.org/pypi/xls2xlsx
.. image:: https://img.shields.io/travis/snoopyjc/xls2xlsx.svg
:target: https://travis-ci.com/snoopyjc/xls2xlsx
.. image:: https://readthedocs.org/projects/xls2xlsx/badge/?version=latest
:target: https://xls2xlsx.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
Convert xls file to xlsx
* Free software: MIT license
* Documentation: https://xls2xlsx.readthedocs.io.
========
Features
========
* Convert ``.xls`` files to ``.xlsx`` using xlrd and openpyxl.
* Convert ``.htm`` and ``.mht`` files containing tables or excel contents to ``.xlsx`` using beautifulsoup4 and openpyxl.
We attempt to support anything that the underlying packages used will support. For example, the following are supported for both input types:
* Multiple worksheets
* Text, Numbers, Dates/Times, Unicode
* Fonts, text color, bold, italic, underline, double underline, strikeout
* Solid and Pattern Fills with color
* Borders: Solid, Hair, Thin, Thick, Double, Dashed, Dotted; with color
* Alignment: Horizontal, Vertical, Rotated, Indent, Shrink To Fit
* Number Formats, including unicode currency symbols
* Hidden Rows and Columns
* Merged Cells
* Hyperlinks (only 1 per cell)
* Comments
These features are additionally supported by the ``.xls`` input format:
* Freeze panes
These features are additional supported by the ``.htm`` and ``.mht`` input formats:
* Images
Not supported by either format:
* Conditional Formatting (the current stylings are preserved)
* Formulas (the calculated values are preserved)
* Charts (the image of the chart is handled by ``.htm`` and ``.mht`` input formats)
* Drawings (the image of the drawing is handled by ``.htm`` and ``.mht`` input formats)
* Pivot tables (the current data is preserved)
* Text boxes (converted to an image by ``.htm`` and ``.mht`` input formats)
* Shapes and Clip Art (converted to an image by ``.htm`` and ``.mht`` input formats)
* Autofilter (the current filtered out rows are preserved)
* Rich text in cells (openpyxl doesn't support this: only styles applied to the entire cell are preserved)
* Named Ranges
* Macros (VBA)
============
Installation
============
To install xls2xlsx, run this command in your terminal:
.. code-block:: console
$ pip install xls2xlsx
This is the preferred method to install xls2xlsx, as it will always install the most recent stable release.
=====
Usage
=====
To use xls2xlsx from the command line:
.. code-block:: console
$ xls2xlsx [-v] file.xls ...
This will create ``file.xlsx`` in the current folder. ``file.xls`` can be any ``.xls``, ``.htm``, or ``.mht`` file and can also be a URL. The ``-v`` flag will print the input and output filename.
To use xls2xlsx in a project:
.. code:: python
from xls2xlsx import XLS2XLSX
x2x = XLS2XLSX("spreadsheet.xls")
x2x.to_xlsx("spreadsheet.xlsx")
Alternatively:
.. code:: python
from xls2xlsx import XLS2XLSX
x2x = XLS2XLSX("spreadsheet.xls")
wb = x2x.to_xlsx()
The xls2xlsx.to_xlsx method returns the filename given. If no filename is provided, the method returns the openpyxl workbook.
The input file can be in any of the following formats:
* Excel 97-2003 workbook (``.xls``)
* Web page (``.htm``, ``.html``), optionally including a _Files folder
* Single file web page (``.mht``, ``.mhtml``)
The input specified can also be any of the following:
* A filename / pathname
* A url
* A file-like object (opened in Binary mode for ``.xls`` and either Binary or Text mode otherwise)
* The contents of a ``.xls`` file as a ``bytes`` object
* The contents of a ``.htm`` or ``.mht`` file as a ``str`` object
Note: The file format is determined by examining the file contents, *not* by looking at the file extension.
============
Dependencies
============
Python >= 3.6 is required.
These packages are also required: ``xlrd, openpyxl, requests, beautifulsoup4, Pillow, python-dateutil, cssutils, webcolors, currency-symbols, fonttools, PyYAML``.
====================
Implementation Notes
====================
The ``.htm`` and ``.mht`` input format conversion uses ImageFont from Pillow to measure the size (width and height) of cell contents. The first time you use it, it will look for font files in standard places on your system and create a Font Name to filename mapping. If the proper font files are not found on your system corresponding to the fonts used in the input file, then as a backup, an estimation algorithm is used.
If passed a ``.mht`` file (or url), the temporary folder name specified in the file will be used to unpack the contents for processing, then this folder will be removed when done.
=======
Credits
=======
Development Lead
----------------
* Joe Cool <snoopyjc@gmail.com>
Contributors
------------
None yet. Why not be the first?
================
Acknowledgements
================
A portion of the code is based on the work of John Ricco (johnricco226@gmail.com), Apr 4, 2017:
https://johnricco.github.io/2017/04/04/python-html/
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
=======
History
=======
0.2.0 (2023-01-05)
------------------
* Modernize for more recent pythons and more recent packages. Drop support for Python 3.6. Fix issues #11, #14, #16. Add feature #12.
0.1.5 (2020-11-03)
------------------
* Fix issues #1, #3, #5
0.1.4 (2020-11-02)
------------------
* Fix issue #4
0.1.3 (2020-10-15)
------------------
* Fix issue #2 - cli not working
0.1.0 (2020-09-13)
------------------
* First release on PyPI.
Raw data
{
"_id": null,
"home_page": "https://github.com/snoopyjc/xls2xlsx",
"name": "xls2xlsx",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "xls2xlsx",
"author": "Joe Cool",
"author_email": "snoopyjc@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/bb/f1/cd87cb50c5da52a32f3c8eb268f31f2e0594171a89de69b37a66dc5de0b8/xls2xlsx-0.2.0.tar.gz",
"platform": null,
"description": "********\r\nxls2xlsx\r\n********\r\n\r\n\r\n.. image:: https://img.shields.io/pypi/v/xls2xlsx.svg\r\n :target: https://pypi.python.org/pypi/xls2xlsx\r\n\r\n.. image:: https://img.shields.io/travis/snoopyjc/xls2xlsx.svg\r\n :target: https://travis-ci.com/snoopyjc/xls2xlsx\r\n\r\n.. image:: https://readthedocs.org/projects/xls2xlsx/badge/?version=latest\r\n :target: https://xls2xlsx.readthedocs.io/en/latest/?badge=latest\r\n :alt: Documentation Status\r\n\r\n\r\n\r\n\r\nConvert xls file to xlsx\r\n\r\n\r\n* Free software: MIT license\r\n* Documentation: https://xls2xlsx.readthedocs.io.\r\n\r\n\r\n========\r\nFeatures\r\n========\r\n\r\n* Convert ``.xls`` files to ``.xlsx`` using xlrd and openpyxl.\r\n* Convert ``.htm`` and ``.mht`` files containing tables or excel contents to ``.xlsx`` using beautifulsoup4 and openpyxl.\r\n\r\nWe attempt to support anything that the underlying packages used will support. For example, the following are supported for both input types:\r\n\r\n* Multiple worksheets\r\n* Text, Numbers, Dates/Times, Unicode\r\n* Fonts, text color, bold, italic, underline, double underline, strikeout\r\n* Solid and Pattern Fills with color\r\n* Borders: Solid, Hair, Thin, Thick, Double, Dashed, Dotted; with color\r\n* Alignment: Horizontal, Vertical, Rotated, Indent, Shrink To Fit\r\n* Number Formats, including unicode currency symbols\r\n* Hidden Rows and Columns\r\n* Merged Cells\r\n* Hyperlinks (only 1 per cell)\r\n* Comments\r\n\r\nThese features are additionally supported by the ``.xls`` input format:\r\n\r\n* Freeze panes\r\n\r\nThese features are additional supported by the ``.htm`` and ``.mht`` input formats:\r\n\r\n* Images\r\n\r\nNot supported by either format:\r\n\r\n* Conditional Formatting (the current stylings are preserved)\r\n* Formulas (the calculated values are preserved)\r\n* Charts (the image of the chart is handled by ``.htm`` and ``.mht`` input formats)\r\n* Drawings (the image of the drawing is handled by ``.htm`` and ``.mht`` input formats)\r\n* Pivot tables (the current data is preserved)\r\n* Text boxes (converted to an image by ``.htm`` and ``.mht`` input formats)\r\n* Shapes and Clip Art (converted to an image by ``.htm`` and ``.mht`` input formats)\r\n* Autofilter (the current filtered out rows are preserved)\r\n* Rich text in cells (openpyxl doesn't support this: only styles applied to the entire cell are preserved)\r\n* Named Ranges\r\n* Macros (VBA)\r\n\r\n============\r\nInstallation\r\n============\r\n\r\nTo install xls2xlsx, run this command in your terminal:\r\n\r\n.. code-block:: console\r\n\r\n $ pip install xls2xlsx\r\n\r\nThis is the preferred method to install xls2xlsx, as it will always install the most recent stable release.\r\n\r\n=====\r\nUsage\r\n=====\r\n\r\nTo use xls2xlsx from the command line:\r\n\r\n.. code-block:: console\r\n\r\n $ xls2xlsx [-v] file.xls ...\r\n\r\nThis will create ``file.xlsx`` in the current folder. ``file.xls`` can be any ``.xls``, ``.htm``, or ``.mht`` file and can also be a URL. The ``-v`` flag will print the input and output filename.\r\n\r\nTo use xls2xlsx in a project:\r\n\r\n.. code:: python\r\n\r\n from xls2xlsx import XLS2XLSX\r\n x2x = XLS2XLSX(\"spreadsheet.xls\")\r\n x2x.to_xlsx(\"spreadsheet.xlsx\")\r\n\r\nAlternatively:\r\n\r\n.. code:: python\r\n\r\n from xls2xlsx import XLS2XLSX\r\n x2x = XLS2XLSX(\"spreadsheet.xls\")\r\n wb = x2x.to_xlsx()\r\n\r\nThe xls2xlsx.to_xlsx method returns the filename given. If no filename is provided, the method returns the openpyxl workbook.\r\n\r\nThe input file can be in any of the following formats:\r\n\r\n* Excel 97-2003 workbook (``.xls``)\r\n* Web page (``.htm``, ``.html``), optionally including a _Files folder\r\n* Single file web page (``.mht``, ``.mhtml``)\r\n\r\nThe input specified can also be any of the following:\r\n\r\n* A filename / pathname\r\n* A url\r\n* A file-like object (opened in Binary mode for ``.xls`` and either Binary or Text mode otherwise)\r\n* The contents of a ``.xls`` file as a ``bytes`` object\r\n* The contents of a ``.htm`` or ``.mht`` file as a ``str`` object\r\n\r\nNote: The file format is determined by examining the file contents, *not* by looking at the file extension.\r\n\r\n\r\n============\r\nDependencies\r\n============\r\n\r\nPython >= 3.6 is required.\r\n\r\nThese packages are also required: ``xlrd, openpyxl, requests, beautifulsoup4, Pillow, python-dateutil, cssutils, webcolors, currency-symbols, fonttools, PyYAML``.\r\n\r\n====================\r\nImplementation Notes\r\n====================\r\n\r\nThe ``.htm`` and ``.mht`` input format conversion uses ImageFont from Pillow to measure the size (width and height) of cell contents. The first time you use it, it will look for font files in standard places on your system and create a Font Name to filename mapping. If the proper font files are not found on your system corresponding to the fonts used in the input file, then as a backup, an estimation algorithm is used.\r\n\r\nIf passed a ``.mht`` file (or url), the temporary folder name specified in the file will be used to unpack the contents for processing, then this folder will be removed when done.\r\n\r\n=======\r\nCredits\r\n=======\r\n\r\nDevelopment Lead\r\n----------------\r\n\r\n* Joe Cool <snoopyjc@gmail.com>\r\n\r\nContributors\r\n------------\r\n\r\nNone yet. Why not be the first?\r\n\r\n================\r\nAcknowledgements\r\n================\r\n\r\nA portion of the code is based on the work of John Ricco (johnricco226@gmail.com), Apr 4, 2017:\r\nhttps://johnricco.github.io/2017/04/04/python-html/\r\n\r\nThis package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.\r\n\r\n.. _Cookiecutter: https://github.com/audreyr/cookiecutter\r\n.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage\r\n\r\n\r\n=======\r\nHistory\r\n=======\r\n\r\n0.2.0 (2023-01-05)\r\n------------------\r\n\r\n* Modernize for more recent pythons and more recent packages. Drop support for Python 3.6. Fix issues #11, #14, #16. Add feature #12.\r\n\r\n\r\n0.1.5 (2020-11-03)\r\n------------------\r\n\r\n* Fix issues #1, #3, #5\r\n\r\n0.1.4 (2020-11-02)\r\n------------------\r\n\r\n* Fix issue #4\r\n\r\n0.1.3 (2020-10-15)\r\n------------------\r\n\r\n* Fix issue #2 - cli not working\r\n\r\n0.1.0 (2020-09-13)\r\n------------------\r\n\r\n* First release on PyPI.\r\n",
"bugtrack_url": null,
"license": "MIT license",
"summary": "Convert xls file to xlsx",
"version": "0.2.0",
"split_keywords": [
"xls2xlsx"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fcbe8302d331252974200ff4adb392d1fc67e4ff161c85a3109b915f4cbaa1ca",
"md5": "649652f5b5a03dadb34679cc2a9fe5e2",
"sha256": "a6b9c6f887d2e366a54d26682d1ec399f5dbf408567d47768ef6178ef587af4e"
},
"downloads": -1,
"filename": "xls2xlsx-0.2.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "649652f5b5a03dadb34679cc2a9fe5e2",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.7",
"size": 39191,
"upload_time": "2023-01-06T04:56:37",
"upload_time_iso_8601": "2023-01-06T04:56:37.280127Z",
"url": "https://files.pythonhosted.org/packages/fc/be/8302d331252974200ff4adb392d1fc67e4ff161c85a3109b915f4cbaa1ca/xls2xlsx-0.2.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3187e1903627e92d77d2aad0e882360ff6201e7429f0e115ecf0a3fbd139bced",
"md5": "ac688d8085a58cdd9baaf017d6f63af7",
"sha256": "fd32666a187dd29a365d3347d79bb4a83fc3d67d823af454baa66ddec1d010a8"
},
"downloads": -1,
"filename": "xls2xlsx-0.2.0-py3.10.egg",
"has_sig": false,
"md5_digest": "ac688d8085a58cdd9baaf017d6f63af7",
"packagetype": "bdist_egg",
"python_version": "0.2.0",
"requires_python": ">=3.7",
"size": 77343,
"upload_time": "2023-01-06T04:56:38",
"upload_time_iso_8601": "2023-01-06T04:56:38.838361Z",
"url": "https://files.pythonhosted.org/packages/31/87/e1903627e92d77d2aad0e882360ff6201e7429f0e115ecf0a3fbd139bced/xls2xlsx-0.2.0-py3.10.egg",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bbf1cd87cb50c5da52a32f3c8eb268f31f2e0594171a89de69b37a66dc5de0b8",
"md5": "a7b19e31505a7a98224207e3177b5695",
"sha256": "98123cb8f43fdd68f4af8d61d7223100d6003daf9a592fa6c0746acbc7314c35"
},
"downloads": -1,
"filename": "xls2xlsx-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "a7b19e31505a7a98224207e3177b5695",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 1330340,
"upload_time": "2023-01-06T04:56:40",
"upload_time_iso_8601": "2023-01-06T04:56:40.799521Z",
"url": "https://files.pythonhosted.org/packages/bb/f1/cd87cb50c5da52a32f3c8eb268f31f2e0594171a89de69b37a66dc5de0b8/xls2xlsx-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-06 04:56:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "snoopyjc",
"github_project": "xls2xlsx",
"travis_ci": true,
"coveralls": true,
"github_actions": false,
"requirements": [],
"tox": true,
"lcname": "xls2xlsx"
}