=======
Martens
=======
.. image:: https://img.shields.io/pypi/v/martens.svg
:target: https://pypi.python.org/pypi/martens
.. image:: https://readthedocs.org/projects/martens/badge/?version=latest
:target: https://martens.readthedocs.io/en/latest/?version=latest
:alt: Documentation Status
Succinct small scale data manipulation
* Free software: MIT license
* Documentation: https://martens.readthedocs.io.
Usage
-----
To use Martens in a project::
import martens
The package is available freely on pypi under MIT licence.
About
-----
Martens is a python package for data manipulation in python.
It is designed for data that is too small,for example,
to worry about uploading into a cloud data warehouse for ease of processing
but which is still useful to you.
The kind of data that was probably passed to you in a spreadsheet
or csv file which needs to be transformed quickly into what you want.
The primary aim of Martens is to enable data manipulation code that is:
* Flexible
* Succinct
* Easily Readable and maintainable
* Lightweight
And finally, reasonably performant. That is to say, the intent and philosophy
is not to rely on libraries like numpy which may boost performance compared to
base python. Rather, martens fits neatly around concepts from core python.
This comes with benefits to flexibility and a minimal build profile.
The design is heavily inspired by `dplyr <https://dplyr.tidyverse.org/>`_
from the R universe.
Example code
------------
Importing data is simple::
source_data = martens.SourceFile(file_path=file_path).dataset
Generally speaking, martens will infer file type from the file extension provided
but you can specify a file type to override it. Access the underlying dataframe
with dataset property.
Dataframes
##########
A martens dataframe is really just a dict of equal length lists and string keys.
Any column of the dataframe can be accessed as follows and the result is always
a list::
source_data['age']
There's no such thing as a dataframe index in martens, they are not at all useful
for the type of data that martens is designed to parse. But, we can quickly add a
standard, integer based column to use as an id in later steps::
source_data.with_id('person_id')
Filtering with functions
########################
Filtering is best done with functions or lambdas (but doesn't have to be)::
source_data.dataset.filter(lambda gender: gender == 'Male')
The key innovation of martens is that argument names of a function
are used within the dataframe. These functions will then operate on data
within the columns with corresponding names to it's own arguments.
That is, argument names of your functions are important and determine
how that function will interact with the dataframe.
This allows for succinct, readable and flexible
code. For example, you can use any function to filter so long as
its argument names correspond to existing columns
in the dataframe and the function returns something that is
ultimately able to be resolved to either true or false.
Mutate and apply
################
Similarly we can quickly create new columns on the fly using data from existing functions::
source_data.mutate(lambda age: 365*age,'age_in_days')
Again, there is significant flexibility here. Any arbitrary function with any
arbitrary return value will do, as long as all of it's arguments
can be resolved using existing columns of the dataframe.
If you just want the output without adding to the dataframe, use apply::
source_data.apply(lambda age: 7*age)
Stack, stretch and squish
#########################
Sometimes, we don't want to simply create a new column with the required features.
If the output of your function resolves to a list, you can choose
to stack the output vertically. This will produce a new dataframe
with additional rows and the existing columns expanded (repeated)::
source_data.mutate_stack(lambda age: list(range(age)),)
We might instead want to create multiple new columns simultaneously::
source_data.mutate_stretch(some_function_returning_tuple_of_2,names=['A','B'])
More complex code
#################
If you are using martens the way it was intended, your code will tend to have large
blocks of three plus lines of code with each new operation just being a method
of the dataframe from the the previous line. That is, chaining commands is common::
def solve()
data = mt.Dataset({'line': [x for x in data_input.split('\n')]})
num_match = lambda line: [match for match in re.finditer(r'\b\d+\b', line)]
num_matches = data.with_id('num_line_no') \
.mutate_stack(num_match, 'match').with_id('num_id') \
.mutate(lambda match: int(match.group()), name='num_match') \
.mutate(lambda match: match.start(), name='num_start') \
.mutate(lambda match: match.end(), name='num_end')
chr_match = lambda line: [m.start() for m in re.finditer(r'[^.0-9]', line)]
chr_matches = data.with_id('chr_line_no') \
.mutate_stack(chr_match, 'chr_match') \
.with_id('chr_id').select(['chr_line_no', 'chr_match', 'chr_id'])
all_matches = num_matches.merge(chr_matches) \
.filter(lambda chr_line_no, num_line_no: abs(chr_line_no - num_line_no) <= 1) \
.filter(lambda chr_match, num_start, num_end: num_start - 1 <= chr_match <= num_end)
gear_match = all_matches.group_by(['chr_id'], other_cols=['num_id', 'num_match']) \
.mutate(lambda num_id: len(num_id), 'num_count') \
.filter(lambda num_count: num_count >= 2) \
.mutate(lambda num_match: prod(num_match), 'gear_ratio')
return {
'part one': sum(all_matches.unique_by(['num_id', 'num_match'])['num_match']),
'part two': sum(gear_match['gear_ratio'])
}
Extensibility
-------------
A martens dataframe can often be used in place of a pandas dataframe or similar
in another package. For example in plotly ::
import plotly.express as px
px.bar(dataframe,x='column1',y='column2')
What's next
-----------
This is just the beginning of this project, I hope it is useful to someone, somewhere.
There are many, many feature and speed improvements that I would like to implement.
Of course, feedback is welcome, raise an issue or otherwise get in touch and I'll do my best
to respond.
=======
History
=======
0.2.1 (2024-01-11)
------------------
* First release featured on PyPI.
Raw data
{
"_id": null,
"home_page": "https://github.com/arowley-ai/martens",
"name": "martens",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "martens",
"author": "Alex Rowley",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/10/e4/e35698b4f5d699768323edbb05c98a126ae024348df430cf10db453c7ff5/martens-0.3.7.tar.gz",
"platform": null,
"description": "=======\nMartens\n=======\n\n.. image:: https://img.shields.io/pypi/v/martens.svg\n :target: https://pypi.python.org/pypi/martens\n\n.. image:: https://readthedocs.org/projects/martens/badge/?version=latest\n :target: https://martens.readthedocs.io/en/latest/?version=latest\n :alt: Documentation Status\n\nSuccinct small scale data manipulation\n\n* Free software: MIT license\n* Documentation: https://martens.readthedocs.io.\n\nUsage\n-----\nTo use Martens in a project::\n\n import martens\n\nThe package is available freely on pypi under MIT licence.\n\nAbout\n-----\nMartens is a python package for data manipulation in python.\nIt is designed for data that is too small,for example,\nto worry about uploading into a cloud data warehouse for ease of processing\nbut which is still useful to you.\nThe kind of data that was probably passed to you in a spreadsheet\nor csv file which needs to be transformed quickly into what you want.\n\nThe primary aim of Martens is to enable data manipulation code that is:\n\n* Flexible\n* Succinct\n* Easily Readable and maintainable\n* Lightweight\n\nAnd finally, reasonably performant. That is to say, the intent and philosophy\nis not to rely on libraries like numpy which may boost performance compared to\nbase python. Rather, martens fits neatly around concepts from core python.\nThis comes with benefits to flexibility and a minimal build profile.\n\nThe design is heavily inspired by `dplyr <https://dplyr.tidyverse.org/>`_\nfrom the R universe.\n\nExample code\n------------\nImporting data is simple::\n\n source_data = martens.SourceFile(file_path=file_path).dataset\n\nGenerally speaking, martens will infer file type from the file extension provided\nbut you can specify a file type to override it. Access the underlying dataframe\nwith dataset property.\n\nDataframes\n##########\n\nA martens dataframe is really just a dict of equal length lists and string keys.\nAny column of the dataframe can be accessed as follows and the result is always\na list::\n\n source_data['age']\n\nThere's no such thing as a dataframe index in martens, they are not at all useful\nfor the type of data that martens is designed to parse. But, we can quickly add a\nstandard, integer based column to use as an id in later steps::\n\n source_data.with_id('person_id')\n\n\n\nFiltering with functions\n########################\n\nFiltering is best done with functions or lambdas (but doesn't have to be)::\n\n source_data.dataset.filter(lambda gender: gender == 'Male')\n\nThe key innovation of martens is that argument names of a function\nare used within the dataframe. These functions will then operate on data\nwithin the columns with corresponding names to it's own arguments.\nThat is, argument names of your functions are important and determine\nhow that function will interact with the dataframe.\nThis allows for succinct, readable and flexible\ncode. For example, you can use any function to filter so long as\nits argument names correspond to existing columns\nin the dataframe and the function returns something that is\nultimately able to be resolved to either true or false.\n\nMutate and apply\n################\n\nSimilarly we can quickly create new columns on the fly using data from existing functions::\n\n source_data.mutate(lambda age: 365*age,'age_in_days')\n\nAgain, there is significant flexibility here. Any arbitrary function with any\narbitrary return value will do, as long as all of it's arguments\ncan be resolved using existing columns of the dataframe.\n\nIf you just want the output without adding to the dataframe, use apply::\n\n source_data.apply(lambda age: 7*age)\n\nStack, stretch and squish\n#########################\nSometimes, we don't want to simply create a new column with the required features.\nIf the output of your function resolves to a list, you can choose\nto stack the output vertically. This will produce a new dataframe\nwith additional rows and the existing columns expanded (repeated)::\n\n source_data.mutate_stack(lambda age: list(range(age)),)\n\nWe might instead want to create multiple new columns simultaneously::\n\n source_data.mutate_stretch(some_function_returning_tuple_of_2,names=['A','B'])\n\n\nMore complex code\n#################\nIf you are using martens the way it was intended, your code will tend to have large\nblocks of three plus lines of code with each new operation just being a method\nof the dataframe from the the previous line. That is, chaining commands is common::\n\n def solve()\n data = mt.Dataset({'line': [x for x in data_input.split('\\n')]})\n num_match = lambda line: [match for match in re.finditer(r'\\b\\d+\\b', line)]\n num_matches = data.with_id('num_line_no') \\\n .mutate_stack(num_match, 'match').with_id('num_id') \\\n .mutate(lambda match: int(match.group()), name='num_match') \\\n .mutate(lambda match: match.start(), name='num_start') \\\n .mutate(lambda match: match.end(), name='num_end')\n chr_match = lambda line: [m.start() for m in re.finditer(r'[^.0-9]', line)]\n chr_matches = data.with_id('chr_line_no') \\\n .mutate_stack(chr_match, 'chr_match') \\\n .with_id('chr_id').select(['chr_line_no', 'chr_match', 'chr_id'])\n all_matches = num_matches.merge(chr_matches) \\\n .filter(lambda chr_line_no, num_line_no: abs(chr_line_no - num_line_no) <= 1) \\\n .filter(lambda chr_match, num_start, num_end: num_start - 1 <= chr_match <= num_end)\n gear_match = all_matches.group_by(['chr_id'], other_cols=['num_id', 'num_match']) \\\n .mutate(lambda num_id: len(num_id), 'num_count') \\\n .filter(lambda num_count: num_count >= 2) \\\n .mutate(lambda num_match: prod(num_match), 'gear_ratio')\n return {\n 'part one': sum(all_matches.unique_by(['num_id', 'num_match'])['num_match']),\n 'part two': sum(gear_match['gear_ratio'])\n }\n\nExtensibility\n-------------\nA martens dataframe can often be used in place of a pandas dataframe or similar\nin another package. For example in plotly ::\n\n import plotly.express as px\n px.bar(dataframe,x='column1',y='column2')\n\nWhat's next\n-----------\nThis is just the beginning of this project, I hope it is useful to someone, somewhere.\nThere are many, many feature and speed improvements that I would like to implement.\nOf course, feedback is welcome, raise an issue or otherwise get in touch and I'll do my best\nto respond.\n\n\n\n\n=======\nHistory\n=======\n\n0.2.1 (2024-01-11)\n------------------\n\n* First release featured on PyPI.\n",
"bugtrack_url": null,
"license": "MIT license",
"summary": "Succinct small scale data manipulation",
"version": "0.3.7",
"project_urls": {
"Homepage": "https://github.com/arowley-ai/martens"
},
"split_keywords": [
"martens"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d38720dcbd36a6a2620aff41981157d2abcaef39a5f2f2093bb6129c80991746",
"md5": "331697dfc222a79d0b2c9913f16c5f76",
"sha256": "d98c4f376b5b7a21dd5dc1baa2c9db7909c4423b6b061b08e5150f0cbca3d6f4"
},
"downloads": -1,
"filename": "martens-0.3.7-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "331697dfc222a79d0b2c9913f16c5f76",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 11429,
"upload_time": "2024-12-05T11:35:58",
"upload_time_iso_8601": "2024-12-05T11:35:58.782174Z",
"url": "https://files.pythonhosted.org/packages/d3/87/20dcbd36a6a2620aff41981157d2abcaef39a5f2f2093bb6129c80991746/martens-0.3.7-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "10e4e35698b4f5d699768323edbb05c98a126ae024348df430cf10db453c7ff5",
"md5": "32739ecd33e26807d4ec29ea69a2cf80",
"sha256": "290ef70c98821b630f2b457cdfe85fe46468fc7235f513df652bc6d613a80057"
},
"downloads": -1,
"filename": "martens-0.3.7.tar.gz",
"has_sig": false,
"md5_digest": "32739ecd33e26807d4ec29ea69a2cf80",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 27243,
"upload_time": "2024-12-05T11:35:59",
"upload_time_iso_8601": "2024-12-05T11:35:59.616654Z",
"url": "https://files.pythonhosted.org/packages/10/e4/e35698b4f5d699768323edbb05c98a126ae024348df430cf10db453c7ff5/martens-0.3.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-05 11:35:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "arowley-ai",
"github_project": "martens",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "openpyxl",
"specs": [
[
"~=",
"3.1.2"
]
]
},
{
"name": "xlrd",
"specs": [
[
"~=",
"2.0.1"
]
]
},
{
"name": "martens",
"specs": [
[
"~=",
"0.3.4"
]
]
},
{
"name": "deprecation",
"specs": [
[
"~=",
"2.1.0"
]
]
},
{
"name": "setuptools",
"specs": [
[
">=",
"69.1.0"
]
]
}
],
"lcname": "martens"
}