[![DOI](https://joss.theoj.org/papers/10.21105/joss.06244/status.svg)](https://doi.org/10.21105/joss.06244)
<p align="center">
<img src="docs/logo.jpg" width="400"/>
</p>
# seesus: a social, environmental, and economic sustainability classifier
`seesus` is a **Python** package that evaluates whether a textual expression aligns with the concept of sustainability as defined by the United Nations Sustainable Development Goals (SDGs). It labels a statement with the 17 SDGs as well as 169 specific targets and categorizes the statement into social, environmental, or economic sustainability. For analysis in **R**, please check <a target="_blank" href="https://github.com/Yingjie4Science/SDGdetector">`SDGdector`</a>.
`seesus` currently has four main functions:
1. Evaluating whether a statement aligns with the concept of sustainability
2. Identifying SDGs and associated targets in a statement
3. Classifying a statement into social, environmental, and economic sustainability
4. Customizing match syntax
## Installation
Please install `seesus` from PyPI by inputting the following command in your terminal:
`pip install seesus`
## Example
### Analyzing an individual sentence
```python
from seesus import SeeSus
text1 = "We aim to contribute to the mitigation of climate change by reducing carbon emissions in the city."
result1 = SeeSus(text1)
# print a summary of the results
print(result1)
# print result on whether a statement aligns with sustainability, True or False
print(result1.sus)
# print the names of identified SDGs
print(result1.sdg)
# print the descriptions of identified SDGs
print(result1.sdg_desc)
# print the names of identified SDG targets
print(result1.target)
# print the descriptions of identified SDG targets
print(result1.target_desc)
# determine which dimension of sustainability (social, environmental, or economic) a statement belongs to
print(result1.see)
```
### Analyzing a paragraph or a longer document
To achieve the best results, it is recommended to split a paragraph or a whole document into individual sentences (i.e., using individual sentences as the basic unit for `seesus` to analyze). This can be done by tools such as `nltk.tokenize` and `re.split`.
```python
import re
# source: https://www.nyc.gov/site/planning/about/dcp-priorities/resiliency-sustainability.page
text2 = "By working with communities in the floodplain and facilitating flood-resistant building design, DCP is reducing the city’s risks to sea level rise and coastal flooding. Hurricane Sandy was a stark reminder of these risks. The City, led by the Mayor’s Office of Recovery and Resiliency (ORR), has developed a multifaceted plan for recovering from Sandy and improving the city’s resiliency–the ability of its neighborhoods, buildings and infrastructure to withstand and recover quickly from flooding and climate events. As part of this effort, DCP has initiated a series of projects to identify and implement land use and zoning changes as well as other actions needed to support the short-term recovery and long-term vitality of communities affected by Hurricane Sandy and other areas at risk of coastal flooding."
for sent in re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', text2):
result = SeeSus(sent)
print('"', sent, '"', sep = "")
print("Is the sentence related to the concept of sustainability?", result.sus)
print("Which SDGs?", result.sdg)
print("Which SDG targets specifically?", result.target)
print("which dimensions of sustainability?", result.see)
print("----------------")
```
### Customizing match syntax
```python
# print match syntax
SeeSus.show_syntax("SDG1_general")
# customize match dyntax
SeeSus.edit_syntax("SDG1_general", "my match terms")
```
Please run `example.ipynb` to see more example usage.
## Methodology
In an era of large language models, `seesus` chooses to use predefined regular expression patterns instead of machine learning, because this method is more transparent, replicable, and controllable. The regular expression syntax was developed for the 17 SDGs and the 169 SDG targets, including both direct and indirect matching. The accuracy of the matching syntax was manually tested, reviewed, and improved using randomly selected statements from corporate reports. Three rounds of adjustments were conducted to finalize the syntax. `seesus` achieves an accuracy rate of 76%, as determined by alignment with manual coding. Human intercoder agreement on the same text stands at 83%. Considering the inherent ambiguity and complexity of language, as well as the interconnected nature of the SDGs, the accuracy of `seesus` is rather high. Please see <a target="_blank" href="https://github.com/Yingjie4Science/SDGdetector">`SDGdector`</a> for detailed information on the accuracy evaluation and manual refinement.
## How to cite
Cai, M., Li, Y., Colbry, D., Frans, V. F., & Zhang, Y. (2024). seesus: a social, environmental, and economic sustainability classifier for Python. Journal of Open Source Software, 9(96), 6244. https://doi.org/10.21105/joss.06244
```
@article{Cai_seesus_a_social_2024,
author = {Cai, Meng and Li, Yingjie and Colbry, Dirk and Frans, Veronica F. and Zhang, Yuqian},
doi = {10.21105/joss.06244},
journal = {Journal of Open Source Software},
month = apr,
number = {96},
pages = {6244},
title = {{seesus: a social, environmental, and economic sustainability classifier for Python}},
url = {https://joss.theoj.org/papers/10.21105/joss.06244},
volume = {9},
year = {2024}
}
```
## Maintenance
Please report any [issues](https://github.com/caimeng2/seesus/issues) if you find that a matching syntax is not accurate or can be improved. We welcome contributions to enhance the classification accuracy of `seesus`.
Raw data
{
"_id": null,
"home_page": null,
"name": "seesus",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "sustainability, SDG, sustainable-development-goals, classification, regular expressions, text mining",
"author": null,
"author_email": "Meng Cai <mengcai24601@gmail.com>, Yingjie Li <yingjieli.edu@gmail.com>, Dirk Colbry <colbrydi@msu.edu>, Veronica Frans <verofrans@gmail.com>, Yuqian Zhang <zhan1364@msu.edu>",
"download_url": "https://files.pythonhosted.org/packages/67/58/aa2476a77cebd1609ee309a1bf106b825e24ac3cde91f6ef497f2ecad766/seesus-1.2.1.tar.gz",
"platform": null,
"description": "[![DOI](https://joss.theoj.org/papers/10.21105/joss.06244/status.svg)](https://doi.org/10.21105/joss.06244)\n\n<p align=\"center\">\n <img src=\"docs/logo.jpg\" width=\"400\"/>\n</p>\n\n# seesus: a social, environmental, and economic sustainability classifier\n\n`seesus` is a **Python** package that evaluates whether a textual expression aligns with the concept of sustainability as defined by the United Nations Sustainable Development Goals (SDGs). It labels a statement with the 17 SDGs as well as 169 specific targets and categorizes the statement into social, environmental, or economic sustainability. For analysis in **R**, please check <a target=\"_blank\" href=\"https://github.com/Yingjie4Science/SDGdetector\">`SDGdector`</a>.\n\n`seesus` currently has four main functions:\n\n1. Evaluating whether a statement aligns with the concept of sustainability\n2. Identifying SDGs and associated targets in a statement\n3. Classifying a statement into social, environmental, and economic sustainability\n4. Customizing match syntax\n\n\n## Installation\n\nPlease install `seesus` from PyPI by inputting the following command in your terminal:\n\n`pip install seesus`\n\n\n## Example\n\n### Analyzing an individual sentence\n\n```python\nfrom seesus import SeeSus\n\ntext1 = \"We aim to contribute to the mitigation of climate change by reducing carbon emissions in the city.\"\nresult1 = SeeSus(text1)\n\n# print a summary of the results\nprint(result1)\n\n# print result on whether a statement aligns with sustainability, True or False\nprint(result1.sus)\n\n# print the names of identified SDGs\nprint(result1.sdg)\n# print the descriptions of identified SDGs\nprint(result1.sdg_desc)\n\n# print the names of identified SDG targets\nprint(result1.target)\n# print the descriptions of identified SDG targets\nprint(result1.target_desc)\n\n# determine which dimension of sustainability (social, environmental, or economic) a statement belongs to\nprint(result1.see)\n```\n\n### Analyzing a paragraph or a longer document\n\nTo achieve the best results, it is recommended to split a paragraph or a whole document into individual sentences (i.e., using individual sentences as the basic unit for `seesus` to analyze). This can be done by tools such as `nltk.tokenize` and `re.split`.\n\n```python\nimport re\n\n# source: https://www.nyc.gov/site/planning/about/dcp-priorities/resiliency-sustainability.page\ntext2 = \"By working with communities in the floodplain and facilitating flood-resistant building design, DCP is reducing the city\u2019s risks to sea level rise and coastal flooding. Hurricane Sandy was a stark reminder of these risks. The City, led by the Mayor\u2019s Office of Recovery and Resiliency (ORR), has developed a multifaceted plan for recovering from Sandy and improving the city\u2019s resiliency\u2013the ability of its neighborhoods, buildings and infrastructure to withstand and recover quickly from flooding and climate events. As part of this effort, DCP has initiated a series of projects to identify and implement land use and zoning changes as well as other actions needed to support the short-term recovery and long-term vitality of communities affected by Hurricane Sandy and other areas at risk of coastal flooding.\"\n\nfor sent in re.split(r'(?<!\\w\\.\\w.)(?<![A-Z][a-z]\\.)(?<=\\.|\\?)\\s', text2):\n result = SeeSus(sent)\n print('\"', sent, '\"', sep = \"\")\n print(\"Is the sentence related to the concept of sustainability?\", result.sus)\n print(\"Which SDGs?\", result.sdg)\n print(\"Which SDG targets specifically?\", result.target)\n print(\"which dimensions of sustainability?\", result.see)\n print(\"----------------\")\n```\n\n### Customizing match syntax\n\n```python\n# print match syntax\nSeeSus.show_syntax(\"SDG1_general\")\n\n# customize match dyntax\nSeeSus.edit_syntax(\"SDG1_general\", \"my match terms\")\n```\n\nPlease run `example.ipynb` to see more example usage.\n\n\n## Methodology\n\nIn an era of large language models, `seesus` chooses to use predefined regular expression patterns instead of machine learning, because this method is more transparent, replicable, and controllable. The regular expression syntax was developed for the 17 SDGs and the 169 SDG targets, including both direct and indirect matching. The accuracy of the matching syntax was manually tested, reviewed, and improved using randomly selected statements from corporate reports. Three rounds of adjustments were conducted to finalize the syntax. `seesus` achieves an accuracy rate of 76%, as determined by alignment with manual coding. Human intercoder agreement on the same text stands at 83%. Considering the inherent ambiguity and complexity of language, as well as the interconnected nature of the SDGs, the accuracy of `seesus` is rather high. Please see <a target=\"_blank\" href=\"https://github.com/Yingjie4Science/SDGdetector\">`SDGdector`</a> for detailed information on the accuracy evaluation and manual refinement.\n\n\n## How to cite\n\nCai, M., Li, Y., Colbry, D., Frans, V. F., & Zhang, Y. (2024). seesus: a social, environmental, and economic sustainability classifier for Python. Journal of Open Source Software, 9(96), 6244. https://doi.org/10.21105/joss.06244\n\n```\n@article{Cai_seesus_a_social_2024,\nauthor = {Cai, Meng and Li, Yingjie and Colbry, Dirk and Frans, Veronica F. and Zhang, Yuqian},\ndoi = {10.21105/joss.06244},\njournal = {Journal of Open Source Software},\nmonth = apr,\nnumber = {96},\npages = {6244},\ntitle = {{seesus: a social, environmental, and economic sustainability classifier for Python}},\nurl = {https://joss.theoj.org/papers/10.21105/joss.06244},\nvolume = {9},\nyear = {2024}\n}\n```\n\n\n## Maintenance\n\nPlease report any [issues](https://github.com/caimeng2/seesus/issues) if you find that a matching syntax is not accurate or can be improved. We welcome contributions to enhance the classification accuracy of `seesus`.\n",
"bugtrack_url": null,
"license": "GPL-3.0",
"summary": "a social, environmental, and economic sustainability classifier based on the UN Sustainable Development Goals",
"version": "1.2.1",
"project_urls": null,
"split_keywords": [
"sustainability",
" sdg",
" sustainable-development-goals",
" classification",
" regular expressions",
" text mining"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e1dc34dc19f74d773911f3ec96bb1a4a78d0e367e18f83f82a13a42b64128caf",
"md5": "d286be3781db1faddc11819fceab23b4",
"sha256": "fa9c1ab421e1a315298092b6c956fc7984533b6d48e758375abcceb7ff2faad0"
},
"downloads": -1,
"filename": "seesus-1.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d286be3781db1faddc11819fceab23b4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 74534,
"upload_time": "2024-04-09T20:49:27",
"upload_time_iso_8601": "2024-04-09T20:49:27.381964Z",
"url": "https://files.pythonhosted.org/packages/e1/dc/34dc19f74d773911f3ec96bb1a4a78d0e367e18f83f82a13a42b64128caf/seesus-1.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6758aa2476a77cebd1609ee309a1bf106b825e24ac3cde91f6ef497f2ecad766",
"md5": "1bb55e3420e6e5abadc35d52c6eb0729",
"sha256": "e272618ffa4e5d491b78de30bd88363f50c0814f840840d62202bf22ba80336f"
},
"downloads": -1,
"filename": "seesus-1.2.1.tar.gz",
"has_sig": false,
"md5_digest": "1bb55e3420e6e5abadc35d52c6eb0729",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 519238,
"upload_time": "2024-04-09T20:49:29",
"upload_time_iso_8601": "2024-04-09T20:49:29.104280Z",
"url": "https://files.pythonhosted.org/packages/67/58/aa2476a77cebd1609ee309a1bf106b825e24ac3cde91f6ef497f2ecad766/seesus-1.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-09 20:49:29",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "seesus"
}