cedarscript-grammar


Namecedarscript-grammar JSON
Version 0.6.1 PyPI version JSON
download
home_pageNone
SummaryCEDARScript grammar.js for tree-sitter
upload_time2024-11-30 12:34:58
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseApache-2.0
keywords parser tree-sitter ast cedarscript code-editing refactoring code-analysis sql-like ai-assisted-development python-binding
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CEDARScript

A SQL-like language for efficient code analysis, transformations, and tool use.
Most useful for AI code assistants.

[![PyPI version](https://badge.fury.io/py/cedarscript-grammar.svg)](https://pypi.org/project/cedarscript-grammar/)
[![Python Versions](https://img.shields.io/pypi/pyversions/cedarscript-grammar.svg)](https://pypi.org/project/cedarscript-grammar/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)

## Table of Contents
- [What is CEDARScript?](#what-is-cedarscript)
- [How to use it?](#how-to-use-it)
- [CEDARScript ELI5'ed](#cedarscript-eli5ed)
- [Technical Overview](#technical-overview)
- [Key Features](#key-features)
- [Supported Languages](#supported-languages)
- [How can CEDARScript be used](#how-can-cedarscript-be-used)
   - [Improving LLM <-> codebase interactions](#improving-llm---codebase-interactions)
      - [Examples](#codebase-interaction-examples)
   - [Use as a refactoring language / _diff_ format](#use-as-a-refactoring-language--_diff_-format)
   - [Tool Use](#tool-use)
      - [Run Python scripts to find the correct answer for certain types of problems](#run-python-scripts-to-find-the-correct-answer-for-certain-types-of-problems)
      - [Obtain the current local weather](#obtaining-the-current-local-weather)
      - [Get a list of image files in the current working dir](#get-a-list-of-image-files-in-the-current-working-dir)
      - [Take a peek at the user's screen and right-click on the user's clock widget](#take-a-peek-at-the-users-screen-and-right-click-on-the-users-clock-widget)
- [Proposals](#proposals)
- [Related](#related)

## What is CEDARScript?

[CEDARScript](https://bit.ly/cedarscript): a domain-specific language designed to improve how AI coding assistants interact with codebases and communicate their code modification intentions.

It provides a standardized way to express complex code modification and analysis operations, making it easier for
AI-assisted development tools to understand and execute these tasks.

It also helps with [tool use](#tool-use): it works as a gateway to external tools, so that the LLM can easily call local shell commands, external HTTP API endpoints, etc

## How to use it

1. You can easily [install an assistant that supports CEDARScript](https://github.com/CEDARScript/cedarscript-integration-aider/blob/main/README.md#installation).
2. Then, just ask the AI assistant to fix a bug or something in your codebase.

The assistant will write `CEDARSCript` commands that will be executed by the CEDARScript runtime editor.

## CEDARScript ELI5'ed
<details>
    <summary>The Magical Librarian analogy</summary>

Imagine a vast _library_ (`your codebase`) with millions of _books_ (`files`) across thousands of _shelves_ (`directories`).
Traditional code editing is like manually searching through each book, line by line, character by character, to find
relevant information or make changes.

**CEDARScript**, on the other hand, is like having a **magical librarian** with superpowers, like:

1. **_TurboKognition_ Boost** (`Code Analysis`):
    - This librarian can act as an _Omniscient Cataloger_ who can instantly tell you where any piece of information is
located across all books.
    - Want to know every place where a specific _protagonist_ (`function`) is mentioned Or where he/she was born?
      Or find all the _chapters_ (`classes`) that discuss a particular _topic_ (`variable usage`)?
      The librarian provides this information immediately, without having to flip through pages (`waste precious tokens`)
2. **The _GanzPunktGenau_ Editing Powers** (`Code Manipulation`):
    - When you want to make changes, instead of specifying exact page and line numbers, you can give high-level instructions.
      For example, _"Add this new paragraph after the first mention of 'dragons' in the fantasy section"_ or
      _"Move the chapter about 'time travel' to come before 'parallel universes' in all science fiction books."_
      The librarian understands these abstract instructions and makes the precise edits across all relevant books, handling
      details like page layout and consistent formatting.

This _magical librarian_ (`CEDARScript`) collaborates with the LLM and allows it to assume the role of an **Architect**
who can work with your vast library of code at a _higher_ level, making both understanding and modifying your codebase
faster and more intuitive. It bridges the gap between the LLM's _**high-level intent**_ and the _nitty-gritty details_
of code structure, allowing the **_architect_** to focus on the '_what_' while it handles the '_how_' of code analysis
and modification.

Audio overview / Podcasts
There are a few podcasts discussing CEDARScript you can listen to:
1. [Aider and the CEDARScript Advantage](https://open.spotify.com/episode/44ojEcwqFDujny82kibKK9?si=DTx_vMfxTpaAtjZULdVFMA) (~18 minutes)
1. [AI coding assistants and the Magical Librarian](https://open.spotify.com/show/4JAc8gphNlUspLV0XxjhQB)
2. [CEDARScript's _TurboKognition_ and _GanzPunktGenau_ editing](https://open.spotify.com/episode/79xCOfrvMZJPenLdKJiNZj?si=Mo2ofU_lRYKwxRZoCPJn6Q)
3. [Discussion of an LLM chat held during a benchmark and some command examples](https://podcasters.spotify.com/pod/show/elifarley/episodes/CEDARScript-chat-during-a-benchmark-test--command-examples-e2ptlq4)

</details>

## Technical Overview
`CEDARScript` (_Concise Examination, Development, And Refactoring Script_) is a **SQL**-like language designed to
lower costs and improve the efficiency and accuracy of AI code assistants. It enables offloading low-level code syntax and 
structure concerns, such as indentation and line counting, from the LLMs.
It aims to improve how AI coding assistants interact with codebases and communicate their code modification intentions
by providing a _standardized and concise_ way to express complex code analysis and modification operations, making it easier for
AI-assisted development tools to understand and execute these tasks.

**CEDARScript transforms LLMs from code writers into code _architects_.**

The **Architect** doesn't need to specify every tiny detail - instead of spending expensive tokens writing out
complete code changes, it simply provides high-level blueprints using **CEDARScript** commands like
`UPDATE FILE "main.py" MOVE FUNCTION "execute" INSERT AFTER FUNCTION "plan"`.

This **division of labor** between the architect and CEDARScript is not just _efficient_ - it's _economical_.
The **Architect** (_LLM_) conserves valuable resources (_tokens_) by focusing on strategic decisions rather than
character- or line-level editing tasks.

The CEDARScript runtime then handles all the minute details - precise line numbers, indentation counts, and syntax 
consistency - at zero token cost.

Let's get to know the 3 primary functions offered by CEDARScript:

1. **Code Analysis** to quickly get to know a large code base without having to read all contents of all files.
   - The CEDARScript runtime searches through the whole code base and only returns the relevant results,
thus reducing the token traffic between the LLM and the user;
   - This can be used to more quickly understand key aspects of the codebase, search for all or specific _identifiers_ (classes, 
methods, functions or variables) defined across ALL files of the project or in specific ones, etc.
   - Search results can include not only identifier definitions (in whole or only the signature or summary), 
but also call-sites and usages of an identifier;
     - These results can be useful not only when the LLM needs to read them, but also when the LLM wants to show some
parts of the code to the user (_why send a function to the user if the LLM can simply [`SELECT`](grammar.js#L191-L224) it and have the CEDARScript runtime show the contents?_)
2. **Code Manipulation and Refactoring**:
   - The [**CEDARScript runtime**](https://github.com/CEDARScript/cedarscript-editor-python) _bears the brunt of file
editing_ by locating the exact line numbers and characters to change, which indentation levels to apply to each line and
so on, allowing the _CEDARScript commands_ to focus instead on higher levels of abstraction, like 
[identifier](grammar.js#L248-L251) names, [line](grammar.js#L243-L246) markers, relative 
[indentations](grammar.js#L306-L370) and [positions](grammar.js#L241-L300)
(`AFTER`, `BEFORE`, `INTO` a function, its `BODY`, at the `TOP` or `BOTTOM` of it...)
3. **[Tool Use](#tool-use)**: The runtime acts as a gateway through which the LLM can send and receive information.
This opens up many possibilities.

## Key Features:

- **Learning Curve**
  - For _humans_: its **SQL-like syntax** allows for _intuitive_ code querying and manipulation (however, **humans don't
even need to learn it**, as its **primary purpose** is to offer _LLMs_ an easy language with which they can write simple,
concise commands to modify code or analyse it);
  - For _AIs_: some prompt engineering is enough to enable most LLMs (even cheaper ones like **Gemini _Flash_**) to
learn it well. Other forms of fine-tuning are planned, so that even SLMs (Small Language Models) like 
[Microsoft's Phi 3](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/) could
be able to learn CEDARScript. This has the potential to unlock locally-deployed SLMs to be used as AI code assistants.
- Shows **improved results** in **refactoring benchmarks** when compared to standard diff formats
   - [**Gemini 1.5 _Flash_** _outperformed_ Claude **3.5 Sonnet**](https://github.com/CEDARScript/cedarscript-integration-aider?tab=readme-ov-file#performance-comparison)
     - Pass rate: **76.4%** (beats Sonnet 3.5 at `64.0%`)
     - Well-formed cases: **94.4%** (beats Sonnet 3.5 at `76.4%`)
- **Reduced token usage** via semantic-level code transformations, not character-by-character matching;
    - **Scalable to larger codebases** with minimal token usage;
    - **Project-wide refactorings** can be performed with a single, concise command
    - Avoids wasted time and tokens on failed search/replace operations caused by misplaced spaces, indentations or typos;
- **High-level abstractions** for complex refactoring operations via refactoring languages (currently supports Rope syntax);
- **[Relative indentation](grammar.js#L306-L370)** for easily maintaining proper code structure;
- Allows fetching or modifying targeted parts of code;
- **Locations in code**: Doesn't use line numbers. Instead, offers [more resilient alternatives](grammar.js#L241-L300), like:
    - **[Line](grammar.js#L243-L246)** markers. Ex:
        - `LINE "if name == 'some name':"`
    - **[Identifier](grammar.js#L248-L251)** markers (`VARIABLE`, `FUNCTION`, `CLASS`). Ex:
        - `FUNCTION 'my_function'`
- **Language-agnostic design** for versatile code analysis
- **[Code analysis operations](grammar.js#L192-L219)** return results in XML format for easier parsing and processing by LLM (Large Language Model) systems.

## Supported Languages

Currently, `CEDARScript` theoretically supports **Python, Kotlin, PHP, Rust, Go, C++, C, Java, Javascript, Lua, FORTRAN, Scala and C#**,
but only **Python** has been tested so far.

**Cobol** and **MatLab**: Initial queries for these languages are ready, but the Tree-Sitter parsers for them still need to be included.

## Projects using the CEDARScript Language

1. [CEDARScript Integration: Aider](https://github.com/CEDARScript/cedarscript-integration-aider) - Provides 
`CEDARScript` [_edit format_](https://aider.chat/docs/llms/editing-format.html) for [Aider](https://aider.chat/)
2. [CEDARScript AST Parser (Python)](https://github.com/CEDARScript/cedarscript-ast-parser-python)
3. [CEDARScript Editor](https://github.com/CEDARScript/cedarscript-editor-python)
4. [CEDARScript Prompt Engineering](https://github.com/CEDARScript/cedarscript-llm-prompt-engineering)
   - Provides prompts that teach `CEDARScript` to LLMs
   - Also includes real conversations held via Aider in which an LLM uses this language to propose code modifications

## How can CEDARScript be used?

### Improving LLM <-> codebase interactions

`CEDARScript` can be used as a way to standardize and improve how AI coding assistants interact with codebases, learn about your code, and communicate their code modification intentions while keeping token usage _low_.
This efficiency allows for more complex operations within token limits.

It provides a concise way to express complex code modification and analysis operations, making it easier for AI-assisted development tools to understand and perform these tasks.

#### Codebase Interaction Examples

Quick example: turn a method into a top-level function, using `CASE` filter with REGEX:

```sql
UPDATE FILE "baseconverter.py"
MOVE FUNCTION "convert"
INSERT BEFORE class "BaseConverter"
  RELATIVE INDENTATION 0;

-- Update the call sites in encode() and decode() methods to use the top-level convert() function
UPDATE CLASS "BaseConverter"
  FROM FILE "baseconverter".py
REPLACE BODY
WITH CASE -- Filter each line in the function body through this CASE filter
  WHEN   REGEX r"self\.convert\((.*?)\)"
  THEN REPLACE r"convert(\1)"
END;
```

Use an ED script to change a function:

```sql
UPDATE FILE "app/main.py" REPLACE FUNCTION "calculate_total" WITH ED '''
-- Add type hints to parameters
1s/calculate_total(base_amount, tax_rate, discount, apply_shipping)/calculate_total(base_amount: float, tax_rate: float, discount: float, apply_shipping: bool) -> float/

-- Add docstring after function definition
1a
    """
    Calculate the total amount including tax, shipping, and discount.

    Args:
        base_amount: Base price of the item
        tax_rate: Tax rate as decimal (e.g., 0.1 for 10%)
        discount: Discount as decimal (e.g., 0.2 for 20%)
        apply_shipping: Whether to add shipping cost

    Returns:
        float: Final calculated amount rounded to 2 decimal places
    """
.

-- Add logging before return
/return/i
    logger.info(f"Calculated total amount: {subtotal:.2f}")
.
''';
```

There are [many more examples](test/corpus) to look at...

### Use as a refactoring language / _diff_ format

One can use `CEDARScript` to _concisely_ and _unambiguously_ represent code modifications at a _higher_ level than a standard `diff` format can.

IDEs can store the local history of files in `CEDARScript` format, and this can also be used for searches.

### Tool Use
If **explicit** configuration is set, the [**CEDARScript runtime**](https://github.com/CEDARScript/cedarscript-editor-python)
can act as a **unified gateway** through which _any_ LLM can call external commands and obtain their output
(a.k.a. **Tool Use** support).

This includes:
1. **Web browsing**
2. **Code Interpreter**
   - Run scripts written in Python, Bash, Javascript, Lua, etc
3. **Function Calling**
    - Call local commands (`ls`, `grep`, `find`, `open`)
    - Call external HTTP API services
4. **Computer Use**: See the user's screen and take control of the mouse and keyboard
5. Possibilities are numerous...

The output from the external tool is captured and sent back to the LLM.

#### Tool Use Examples

#### Run Python scripts to find the correct answer for certain types of problems

```sql
-- Suppose the LLM has difficulty counting letters...
-- It can delegate the counting to a Python script:
CALL LANGUAGE "python" WITH CONTENT '''
print("Refrigerator".lower().count('r'))
''';
```

```sql
-- Using env var
CALL LANGUAGE "python"
ENV CONTENT '''WORD=Refrigerator'''
WITH CONTENT '''
import os
print(os.environ['WORD'].count('r'))
''';
```

```sql
-- Using env var from the host computer
CALL LANGUAGE "python"
ENV INHERIT ONLY 'WORD'
WITH CONTENT '''
import os
print(os.environ['WORD'].count('r'))
''';
```

#### Obtain the current local weather

```sql
CALL COMMAND
ENV INHERIT ONLY 'LOCATION' -- Get the current location from the host env var
WITH CONTENT r'''
#!/bin/bash
curl -s "wttr.in/$LOCATION?format=%l:+%C+%t,+feels+like+%f,+%h+humidity"
''';
```

#### Get a list of image files in the current working dir

```sql
CALL LANGUAGE "bash"
WITH CONTENT r'''
    find . -type f -name "*.jpg"
''';
```

#### Take a peek at the user's screen and right-click on the user's clock widget

```sql
CALL LANGUAGE "python"
WITH CONTENT r'''
import pyautogui
import time
from datetime import datetime
import os

# Take screenshot and save it
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
screenshot_path = f"screen_{timestamp}.png"
pyautogui.screenshot(screenshot_path)

# Print the path so the LLM can analyze the image
print(f"IMAGE_PATH={screenshot_path}")
''';
```

After the LLM takes a look at the screenshot, it finds the clock and sends a mouse click:

```sql
CALL LANGUAGE "python"
ENV r'''
X=1850  # Coordinates provided by LLM after image analysis
Y=12    # Coordinates provided by LLM after image analysis
'''
WITH CONTENT r'''
import pyautogui
import os

# Get coordinates from environment
x = int(os.environ['X'])
y = int(os.environ['Y'])

# Move and click
pyautogui.moveTo(x, y, duration=1.0)
pyautogui.click()
print(f"Clicked at ({x}, {y})")
''';
```

### Other Ideas to Explore
- Code review systems for automated, in-depth code assessments
- Automated code documentation and explanation tools
- ...

# Proposals
See [current proposals](proposals/)

# Related

1. [.QL](https://en.wikipedia.org/wiki/.QL) - Object-oriented query language that enables querying Java source code using SQL-like syntax;
2. [JQL (Java Query Language)](https://github.com/fmbenhassine/jql) - Allows querying Java source code with SQL. It's designed for Java code analysis and linting;
3. [Joern](https://github.com/joernio/joern) - While primarily focused on C/C++, Joern is an open-source code analysis platform that uses a custom graph database to store code property graphs. It allows querying code using a Scala-based domain-specific language; 
4. [Codebase Context Suite](https://agentic-insights.github.io/codebase-context-spec/) - A comprehensive tool for managing codebase context, generating prompts, and enhancing development workflows;
5. [CONVENTIONS.md](https://aider.chat/docs/usage/conventions.html)

# See Also
1. [OpenAI Fine-tuning](https://platform.openai.com/docs/guides/fine-tuning/common-use-cases)
2. [llm-context.py](https://github.com/cyberchitta/llm-context.py)
3. [`Gemini 1.5 PRO` improved performance (on par with Sonnet 3.5)](https://github.com/Aider-AI/aider/pull/1897#issue-2563049442)

# Unrelated

1. [Cedar Policy Language](https://www.cedarpolicy.com/) _('CEDARScript' is _not_ a policy language. 'Cedar' and 'CEDARScript' are totally unrelated.)_

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cedarscript-grammar",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "parser, tree-sitter, ast, cedarscript, code-editing, refactoring, code-analysis, sql-like, ai-assisted-development, python-binding",
    "author": null,
    "author_email": "Elifarley <cedarscript@orgecc.com>",
    "download_url": "https://files.pythonhosted.org/packages/7f/19/4044a61b0bccc49f48dee2f9d5f338fe4379b1deb1f89028e027b6093dfe/cedarscript_grammar-0.6.1.tar.gz",
    "platform": null,
    "description": "# CEDARScript\n\nA SQL-like language for efficient code analysis, transformations, and tool use.\nMost useful for AI code assistants.\n\n[![PyPI version](https://badge.fury.io/py/cedarscript-grammar.svg)](https://pypi.org/project/cedarscript-grammar/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/cedarscript-grammar.svg)](https://pypi.org/project/cedarscript-grammar/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)\n\n## Table of Contents\n- [What is CEDARScript?](#what-is-cedarscript)\n- [How to use it?](#how-to-use-it)\n- [CEDARScript ELI5'ed](#cedarscript-eli5ed)\n- [Technical Overview](#technical-overview)\n- [Key Features](#key-features)\n- [Supported Languages](#supported-languages)\n- [How can CEDARScript be used](#how-can-cedarscript-be-used)\n   - [Improving LLM <-> codebase interactions](#improving-llm---codebase-interactions)\n      - [Examples](#codebase-interaction-examples)\n   - [Use as a refactoring language / _diff_ format](#use-as-a-refactoring-language--_diff_-format)\n   - [Tool Use](#tool-use)\n      - [Run Python scripts to find the correct answer for certain types of problems](#run-python-scripts-to-find-the-correct-answer-for-certain-types-of-problems)\n      - [Obtain the current local weather](#obtaining-the-current-local-weather)\n      - [Get a list of image files in the current working dir](#get-a-list-of-image-files-in-the-current-working-dir)\n      - [Take a peek at the user's screen and right-click on the user's clock widget](#take-a-peek-at-the-users-screen-and-right-click-on-the-users-clock-widget)\n- [Proposals](#proposals)\n- [Related](#related)\n\n## What is CEDARScript?\n\n[CEDARScript](https://bit.ly/cedarscript): a domain-specific language designed to improve how AI coding assistants interact with codebases and communicate their code modification intentions.\n\nIt provides a standardized way to express complex code modification and analysis operations, making it easier for\nAI-assisted development tools to understand and execute these tasks.\n\nIt also helps with [tool use](#tool-use): it works as a gateway to external tools, so that the LLM can easily call local shell commands, external HTTP API endpoints, etc\n\n## How to use it\n\n1. You can easily [install an assistant that supports CEDARScript](https://github.com/CEDARScript/cedarscript-integration-aider/blob/main/README.md#installation).\n2. Then, just ask the AI assistant to fix a bug or something in your codebase.\n\nThe assistant will write `CEDARSCript` commands that will be executed by the CEDARScript runtime editor.\n\n## CEDARScript ELI5'ed\n<details>\n    <summary>The Magical Librarian analogy</summary>\n\nImagine a vast _library_ (`your codebase`) with millions of _books_ (`files`) across thousands of _shelves_ (`directories`).\nTraditional code editing is like manually searching through each book, line by line, character by character, to find\nrelevant information or make changes.\n\n**CEDARScript**, on the other hand, is like having a **magical librarian** with superpowers, like:\n\n1. **_TurboKognition_ Boost** (`Code Analysis`):\n    - This librarian can act as an _Omniscient Cataloger_ who can instantly tell you where any piece of information is\nlocated across all books.\n    - Want to know every place where a specific _protagonist_ (`function`) is mentioned Or where he/she was born?\n      Or find all the _chapters_ (`classes`) that discuss a particular _topic_ (`variable usage`)?\n      The librarian provides this information immediately, without having to flip through pages (`waste precious tokens`)\n2. **The _GanzPunktGenau_ Editing Powers** (`Code Manipulation`):\n    - When you want to make changes, instead of specifying exact page and line numbers, you can give high-level instructions.\n      For example, _\"Add this new paragraph after the first mention of 'dragons' in the fantasy section\"_ or\n      _\"Move the chapter about 'time travel' to come before 'parallel universes' in all science fiction books.\"_\n      The librarian understands these abstract instructions and makes the precise edits across all relevant books, handling\n      details like page layout and consistent formatting.\n\nThis _magical librarian_ (`CEDARScript`) collaborates with the LLM and allows it to assume the role of an **Architect**\nwho can work with your vast library of code at a _higher_ level, making both understanding and modifying your codebase\nfaster and more intuitive. It bridges the gap between the LLM's _**high-level intent**_ and the _nitty-gritty details_\nof code structure, allowing the **_architect_** to focus on the '_what_' while it handles the '_how_' of code analysis\nand modification.\n\nAudio overview / Podcasts\nThere are a few podcasts discussing CEDARScript you can listen to:\n1. [Aider and the CEDARScript Advantage](https://open.spotify.com/episode/44ojEcwqFDujny82kibKK9?si=DTx_vMfxTpaAtjZULdVFMA) (~18 minutes)\n1. [AI coding assistants and the Magical Librarian](https://open.spotify.com/show/4JAc8gphNlUspLV0XxjhQB)\n2. [CEDARScript's _TurboKognition_ and _GanzPunktGenau_ editing](https://open.spotify.com/episode/79xCOfrvMZJPenLdKJiNZj?si=Mo2ofU_lRYKwxRZoCPJn6Q)\n3. [Discussion of an LLM chat held during a benchmark and some command examples](https://podcasters.spotify.com/pod/show/elifarley/episodes/CEDARScript-chat-during-a-benchmark-test--command-examples-e2ptlq4)\n\n</details>\n\n## Technical Overview\n`CEDARScript` (_Concise Examination, Development, And Refactoring Script_) is a **SQL**-like language designed to\nlower costs and improve the efficiency and accuracy of AI code assistants. It enables offloading low-level code syntax and \nstructure concerns, such as indentation and line counting, from the LLMs.\nIt aims to improve how AI coding assistants interact with codebases and communicate their code modification intentions\nby providing a _standardized and concise_ way to express complex code analysis and modification operations, making it easier for\nAI-assisted development tools to understand and execute these tasks.\n\n**CEDARScript transforms LLMs from code writers into code _architects_.**\n\nThe **Architect** doesn't need to specify every tiny detail - instead of spending expensive tokens writing out\ncomplete code changes, it simply provides high-level blueprints using **CEDARScript** commands like\n`UPDATE FILE \"main.py\" MOVE FUNCTION \"execute\" INSERT AFTER FUNCTION \"plan\"`.\n\nThis **division of labor** between the architect and CEDARScript is not just _efficient_ - it's _economical_.\nThe **Architect** (_LLM_) conserves valuable resources (_tokens_) by focusing on strategic decisions rather than\ncharacter- or line-level editing tasks.\n\nThe CEDARScript runtime then handles all the minute details - precise line numbers, indentation counts, and syntax \nconsistency - at zero token cost.\n\nLet's get to know the 3 primary functions offered by CEDARScript:\n\n1. **Code Analysis** to quickly get to know a large code base without having to read all contents of all files.\n   - The CEDARScript runtime searches through the whole code base and only returns the relevant results,\nthus reducing the token traffic between the LLM and the user;\n   - This can be used to more quickly understand key aspects of the codebase, search for all or specific _identifiers_ (classes, \nmethods, functions or variables) defined across ALL files of the project or in specific ones, etc.\n   - Search results can include not only identifier definitions (in whole or only the signature or summary), \nbut also call-sites and usages of an identifier;\n     - These results can be useful not only when the LLM needs to read them, but also when the LLM wants to show some\nparts of the code to the user (_why send a function to the user if the LLM can simply [`SELECT`](grammar.js#L191-L224) it and have the CEDARScript runtime show the contents?_)\n2. **Code Manipulation and Refactoring**:\n   - The [**CEDARScript runtime**](https://github.com/CEDARScript/cedarscript-editor-python) _bears the brunt of file\nediting_ by locating the exact line numbers and characters to change, which indentation levels to apply to each line and\nso on, allowing the _CEDARScript commands_ to focus instead on higher levels of abstraction, like \n[identifier](grammar.js#L248-L251) names, [line](grammar.js#L243-L246) markers, relative \n[indentations](grammar.js#L306-L370) and [positions](grammar.js#L241-L300)\n(`AFTER`, `BEFORE`, `INTO` a function, its `BODY`, at the `TOP` or `BOTTOM` of it...)\n3. **[Tool Use](#tool-use)**: The runtime acts as a gateway through which the LLM can send and receive information.\nThis opens up many possibilities.\n\n## Key Features:\n\n- **Learning Curve**\n  - For _humans_: its **SQL-like syntax** allows for _intuitive_ code querying and manipulation (however, **humans don't\neven need to learn it**, as its **primary purpose** is to offer _LLMs_ an easy language with which they can write simple,\nconcise commands to modify code or analyse it);\n  - For _AIs_: some prompt engineering is enough to enable most LLMs (even cheaper ones like **Gemini _Flash_**) to\nlearn it well. Other forms of fine-tuning are planned, so that even SLMs (Small Language Models) like \n[Microsoft's Phi 3](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/) could\nbe able to learn CEDARScript. This has the potential to unlock locally-deployed SLMs to be used as AI code assistants.\n- Shows **improved results** in **refactoring benchmarks** when compared to standard diff formats\n   - [**Gemini 1.5 _Flash_** _outperformed_ Claude **3.5 Sonnet**](https://github.com/CEDARScript/cedarscript-integration-aider?tab=readme-ov-file#performance-comparison)\n     - Pass rate: **76.4%** (beats Sonnet 3.5 at `64.0%`)\n     - Well-formed cases: **94.4%** (beats Sonnet 3.5 at `76.4%`)\n- **Reduced token usage** via semantic-level code transformations, not character-by-character matching;\n    - **Scalable to larger codebases** with minimal token usage;\n    - **Project-wide refactorings** can be performed with a single, concise command\n    - Avoids wasted time and tokens on failed search/replace operations caused by misplaced spaces, indentations or typos;\n- **High-level abstractions** for complex refactoring operations via refactoring languages (currently supports Rope syntax);\n- **[Relative indentation](grammar.js#L306-L370)** for easily maintaining proper code structure;\n- Allows fetching or modifying targeted parts of code;\n- **Locations in code**: Doesn't use line numbers. Instead, offers [more resilient alternatives](grammar.js#L241-L300), like:\n    - **[Line](grammar.js#L243-L246)** markers. Ex:\n        - `LINE \"if name == 'some name':\"`\n    - **[Identifier](grammar.js#L248-L251)** markers (`VARIABLE`, `FUNCTION`, `CLASS`). Ex:\n        - `FUNCTION 'my_function'`\n- **Language-agnostic design** for versatile code analysis\n- **[Code analysis operations](grammar.js#L192-L219)** return results in XML format for easier parsing and processing by LLM (Large Language Model) systems.\n\n## Supported Languages\n\nCurrently, `CEDARScript` theoretically supports **Python, Kotlin, PHP, Rust, Go, C++, C, Java, Javascript, Lua, FORTRAN, Scala and C#**,\nbut only **Python** has been tested so far.\n\n**Cobol** and **MatLab**: Initial queries for these languages are ready, but the Tree-Sitter parsers for them still need to be included.\n\n## Projects using the CEDARScript Language\n\n1. [CEDARScript Integration: Aider](https://github.com/CEDARScript/cedarscript-integration-aider) - Provides \n`CEDARScript` [_edit format_](https://aider.chat/docs/llms/editing-format.html) for [Aider](https://aider.chat/)\n2. [CEDARScript AST Parser (Python)](https://github.com/CEDARScript/cedarscript-ast-parser-python)\n3. [CEDARScript Editor](https://github.com/CEDARScript/cedarscript-editor-python)\n4. [CEDARScript Prompt Engineering](https://github.com/CEDARScript/cedarscript-llm-prompt-engineering)\n   - Provides prompts that teach `CEDARScript` to LLMs\n   - Also includes real conversations held via Aider in which an LLM uses this language to propose code modifications\n\n## How can CEDARScript be used?\n\n### Improving LLM <-> codebase interactions\n\n`CEDARScript` can be used as a way to standardize and improve how AI coding assistants interact with codebases, learn about your code, and communicate their code modification intentions while keeping token usage _low_.\nThis efficiency allows for more complex operations within token limits.\n\nIt provides a concise way to express complex code modification and analysis operations, making it easier for AI-assisted development tools to understand and perform these tasks.\n\n#### Codebase Interaction Examples\n\nQuick example: turn a method into a top-level function, using `CASE` filter with REGEX:\n\n```sql\nUPDATE FILE \"baseconverter.py\"\nMOVE FUNCTION \"convert\"\nINSERT BEFORE class \"BaseConverter\"\n  RELATIVE INDENTATION 0;\n\n-- Update the call sites in encode() and decode() methods to use the top-level convert() function\nUPDATE CLASS \"BaseConverter\"\n  FROM FILE \"baseconverter\".py\nREPLACE BODY\nWITH CASE -- Filter each line in the function body through this CASE filter\n  WHEN   REGEX r\"self\\.convert\\((.*?)\\)\"\n  THEN REPLACE r\"convert(\\1)\"\nEND;\n```\n\nUse an ED script to change a function:\n\n```sql\nUPDATE FILE \"app/main.py\" REPLACE FUNCTION \"calculate_total\" WITH ED '''\n-- Add type hints to parameters\n1s/calculate_total(base_amount, tax_rate, discount, apply_shipping)/calculate_total(base_amount: float, tax_rate: float, discount: float, apply_shipping: bool) -> float/\n\n-- Add docstring after function definition\n1a\n    \"\"\"\n    Calculate the total amount including tax, shipping, and discount.\n\n    Args:\n        base_amount: Base price of the item\n        tax_rate: Tax rate as decimal (e.g., 0.1 for 10%)\n        discount: Discount as decimal (e.g., 0.2 for 20%)\n        apply_shipping: Whether to add shipping cost\n\n    Returns:\n        float: Final calculated amount rounded to 2 decimal places\n    \"\"\"\n.\n\n-- Add logging before return\n/return/i\n    logger.info(f\"Calculated total amount: {subtotal:.2f}\")\n.\n''';\n```\n\nThere are [many more examples](test/corpus) to look at...\n\n### Use as a refactoring language / _diff_ format\n\nOne can use `CEDARScript` to _concisely_ and _unambiguously_ represent code modifications at a _higher_ level than a standard `diff` format can.\n\nIDEs can store the local history of files in `CEDARScript` format, and this can also be used for searches.\n\n### Tool Use\nIf **explicit** configuration is set, the [**CEDARScript runtime**](https://github.com/CEDARScript/cedarscript-editor-python)\ncan act as a **unified gateway** through which _any_ LLM can call external commands and obtain their output\n(a.k.a. **Tool Use** support).\n\nThis includes:\n1. **Web browsing**\n2. **Code Interpreter**\n   - Run scripts written in Python, Bash, Javascript, Lua, etc\n3. **Function Calling**\n    - Call local commands (`ls`, `grep`, `find`, `open`)\n    - Call external HTTP API services\n4. **Computer Use**: See the user's screen and take control of the mouse and keyboard\n5. Possibilities are numerous...\n\nThe output from the external tool is captured and sent back to the LLM.\n\n#### Tool Use Examples\n\n#### Run Python scripts to find the correct answer for certain types of problems\n\n```sql\n-- Suppose the LLM has difficulty counting letters...\n-- It can delegate the counting to a Python script:\nCALL LANGUAGE \"python\" WITH CONTENT '''\nprint(\"Refrigerator\".lower().count('r'))\n''';\n```\n\n```sql\n-- Using env var\nCALL LANGUAGE \"python\"\nENV CONTENT '''WORD=Refrigerator'''\nWITH CONTENT '''\nimport os\nprint(os.environ['WORD'].count('r'))\n''';\n```\n\n```sql\n-- Using env var from the host computer\nCALL LANGUAGE \"python\"\nENV INHERIT ONLY 'WORD'\nWITH CONTENT '''\nimport os\nprint(os.environ['WORD'].count('r'))\n''';\n```\n\n#### Obtain the current local weather\n\n```sql\nCALL COMMAND\nENV INHERIT ONLY 'LOCATION' -- Get the current location from the host env var\nWITH CONTENT r'''\n#!/bin/bash\ncurl -s \"wttr.in/$LOCATION?format=%l:+%C+%t,+feels+like+%f,+%h+humidity\"\n''';\n```\n\n#### Get a list of image files in the current working dir\n\n```sql\nCALL LANGUAGE \"bash\"\nWITH CONTENT r'''\n    find . -type f -name \"*.jpg\"\n''';\n```\n\n#### Take a peek at the user's screen and right-click on the user's clock widget\n\n```sql\nCALL LANGUAGE \"python\"\nWITH CONTENT r'''\nimport pyautogui\nimport time\nfrom datetime import datetime\nimport os\n\n# Take screenshot and save it\ntimestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\nscreenshot_path = f\"screen_{timestamp}.png\"\npyautogui.screenshot(screenshot_path)\n\n# Print the path so the LLM can analyze the image\nprint(f\"IMAGE_PATH={screenshot_path}\")\n''';\n```\n\nAfter the LLM takes a look at the screenshot, it finds the clock and sends a mouse click:\n\n```sql\nCALL LANGUAGE \"python\"\nENV r'''\nX=1850  # Coordinates provided by LLM after image analysis\nY=12    # Coordinates provided by LLM after image analysis\n'''\nWITH CONTENT r'''\nimport pyautogui\nimport os\n\n# Get coordinates from environment\nx = int(os.environ['X'])\ny = int(os.environ['Y'])\n\n# Move and click\npyautogui.moveTo(x, y, duration=1.0)\npyautogui.click()\nprint(f\"Clicked at ({x}, {y})\")\n''';\n```\n\n### Other Ideas to Explore\n- Code review systems for automated, in-depth code assessments\n- Automated code documentation and explanation tools\n- ...\n\n# Proposals\nSee [current proposals](proposals/)\n\n# Related\n\n1. [.QL](https://en.wikipedia.org/wiki/.QL) - Object-oriented query language that enables querying Java source code using SQL-like syntax;\n2. [JQL (Java Query Language)](https://github.com/fmbenhassine/jql) - Allows querying Java source code with SQL. It's designed for Java code analysis and linting;\n3. [Joern](https://github.com/joernio/joern) - While primarily focused on C/C++, Joern is an open-source code analysis platform that uses a custom graph database to store code property graphs. It allows querying code using a Scala-based domain-specific language; \n4. [Codebase Context Suite](https://agentic-insights.github.io/codebase-context-spec/) - A comprehensive tool for managing codebase context, generating prompts, and enhancing development workflows;\n5. [CONVENTIONS.md](https://aider.chat/docs/usage/conventions.html)\n\n# See Also\n1. [OpenAI Fine-tuning](https://platform.openai.com/docs/guides/fine-tuning/common-use-cases)\n2. [llm-context.py](https://github.com/cyberchitta/llm-context.py)\n3. [`Gemini 1.5 PRO` improved performance (on par with Sonnet 3.5)](https://github.com/Aider-AI/aider/pull/1897#issue-2563049442)\n\n# Unrelated\n\n1. [Cedar Policy Language](https://www.cedarpolicy.com/) _('CEDARScript' is _not_ a policy language. 'Cedar' and 'CEDARScript' are totally unrelated.)_\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "CEDARScript grammar.js for tree-sitter",
    "version": "0.6.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/CEDARScript/cedarscript-grammar/issues",
        "Documentation": "https://github.com/CEDARScript/cedarscript-grammar#readme",
        "Homepage": "https://github.com/CEDARScript/cedarscript-grammar#readme",
        "Repository": "https://github.com/CEDARScript/cedarscript-grammar.git"
    },
    "split_keywords": [
        "parser",
        " tree-sitter",
        " ast",
        " cedarscript",
        " code-editing",
        " refactoring",
        " code-analysis",
        " sql-like",
        " ai-assisted-development",
        " python-binding"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1e03f62fae68a8e5b91bd7e4d4c3c1724037237249de88c9e1d39219c5d9f584",
                "md5": "91a3c3ea0b75a8c08faa1fc21d9007e7",
                "sha256": "906a5c01c1bb670e51cdb93eed226e3101b237b53cfcad01f763787997e4846a"
            },
            "downloads": -1,
            "filename": "cedarscript_grammar-0.6.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "91a3c3ea0b75a8c08faa1fc21d9007e7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 118834,
            "upload_time": "2024-11-30T12:34:49",
            "upload_time_iso_8601": "2024-11-30T12:34:49.851513Z",
            "url": "https://files.pythonhosted.org/packages/1e/03/f62fae68a8e5b91bd7e4d4c3c1724037237249de88c9e1d39219c5d9f584/cedarscript_grammar-0.6.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7f194044a61b0bccc49f48dee2f9d5f338fe4379b1deb1f89028e027b6093dfe",
                "md5": "573c15408ac76b78216dcacd9fb58a14",
                "sha256": "87530ad807697cee12d11a7b944c212a44603380745e47a60a04952cbdd4546e"
            },
            "downloads": -1,
            "filename": "cedarscript_grammar-0.6.1.tar.gz",
            "has_sig": false,
            "md5_digest": "573c15408ac76b78216dcacd9fb58a14",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 148547,
            "upload_time": "2024-11-30T12:34:58",
            "upload_time_iso_8601": "2024-11-30T12:34:58.037075Z",
            "url": "https://files.pythonhosted.org/packages/7f/19/4044a61b0bccc49f48dee2f9d5f338fe4379b1deb1f89028e027b6093dfe/cedarscript_grammar-0.6.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-30 12:34:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "CEDARScript",
    "github_project": "cedarscript-grammar",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "cedarscript-grammar"
}
        
Elapsed time: 0.71063s