alphadev


Namealphadev JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/kyegomez/AlphaDev
SummaryAlphaDev - Pytorch
upload_time2023-08-29 00:50:54
maintainer
docs_urlNone
authorKye Gomez
requires_python>=3.6,<4.0
licenseMIT
keywords artificial intelligence attention mechanism transformers
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)


# AlphaDev
AlphaDev is an AI model based on the AlphaZero/MuZero Reinforcement Learning architecture. It's designed to optimize assembly code using a set of assembly instructions and a cost function which takes into account both correctness and performance.

## Usage
`pip install alphadev`

## Architecture

AlphaDev consists of:

1. Representation Network: `f_rep` that outputs a latent representation `ht` of the state `St`.

2. Prediction Network: `f_pred` that predicts the expected return (the value) `vˆt` and a policy `πˆt` from a given latent state.

3. Dynamics Network: `f_dyn` that predicts the next latent state `htk+1` and reward `rˆtk+1` resulting from a transition.

## How AlphaDev Works

On reaching a new state, AlphaDev encodes the state into a latent representation using the representation network. The dynamics and prediction networks are used to simulate several trajectories that fill out a search tree by sampling state transitions.

The actions are selected using a strategy that balances exploration (trying new actions) and exploitation (progressing further down the subtree of the current best action).

Finally, the predicted policy is trained to match the visit counts of the MCTS policy in an attempt to distil the search procedure into a policy that will disregard nodes that are not promising.

## Potential Use Cases

AlphaDev, due to its general architecture, could potentially be adapted to solve a wide variety of optimization problems. Here are a few examples:

1. **Route Optimization**: For logistics companies, optimizing the routes of their fleet can result in significant cost savings. AlphaDev could be used to learn the optimal routes based on a variety of factors such as traffic, distance, and number of stops.

2. **Job Scheduling**: In computing, job scheduling is a key issue. AlphaDev could be used to learn the optimal schedule that maximizes the usage of computational resources and minimizes job completion time.

3. **Stock Portfolio Optimization**: AlphaDev could be used to learn the optimal mix of stocks to maximize return and minimize risk, given the current market conditions.

4. **Game Playing**: Similar to its ancestor AlphaZero, AlphaDev could potentially be used to master a wide variety of games, by learning the optimal strategies.

5. **Drug Discovery**: AlphaDev could be used to find the optimal chemical structure for a new drug that maximizes efficacy and minimizes side effects.


## Usage

* AssemblyGame This represents the Assembly Game RL environment. The state of the RL environment contains the current program and the state of memory and registers. Doing a step in this environment is equivalent to adding a new assembly instruction to the program (see the step method). The reward is a combination of correctness and latency reward after executing the assembly program over an input distribution. For simplicity of the overall algorithm we are not including the assembly runner, but assembly execution can be delegated to an external library (e.g. AsmJit).

* AlphaDevConfig contains the main hyperparameters used for the AlphaDev agent. This includes configuration of AlphaZero, MCTS, and underlying networks.

* play_game contains the logic to run an AlphaDev game. This include the MCTS procedure and the storage of the game.

* RepresentationNet and PredictionNet contain the implementation the networks used in the AlphaZero algorithm. It uses a MultiQuery Transformer to represent assembly instruction

## Future Work

Future adaptations of AlphaDev could implement different learning algorithms or optimization techniques for specific domains or problem areas.

## Contributing

Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.

## License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.

## Acknowledgments

We appreciate the efforts of the researchers and developers who contributed to the development of the AlphaZero/MuZero architectures on which AlphaDev is based.

## Roadmap

* Add jax-based multi query attention: `MultiQueryAttentionBlock`

* add `ResBlockV2`

* add utils, terminal, is_correct, legal_actions

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/AlphaDev",
    "name": "alphadev",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "artificial intelligence,attention mechanism,transformers",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/e8/59/4452533f237b9a0c27e08e7d54276b752ff4f616dc30c074a867dd48b337/alphadev-0.0.3.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n\n# AlphaDev\nAlphaDev is an AI model based on the AlphaZero/MuZero Reinforcement Learning architecture. It's designed to optimize assembly code using a set of assembly instructions and a cost function which takes into account both correctness and performance.\n\n## Usage\n`pip install alphadev`\n\n## Architecture\n\nAlphaDev consists of:\n\n1. Representation Network: `f_rep` that outputs a latent representation `ht` of the state `St`.\n\n2. Prediction Network: `f_pred` that predicts the expected return (the value) `v\u02c6t` and a policy `\u03c0\u02c6t` from a given latent state.\n\n3. Dynamics Network: `f_dyn` that predicts the next latent state `htk+1` and reward `r\u02c6tk+1` resulting from a transition.\n\n## How AlphaDev Works\n\nOn reaching a new state, AlphaDev encodes the state into a latent representation using the representation network. The dynamics and prediction networks are used to simulate several trajectories that fill out a search tree by sampling state transitions.\n\nThe actions are selected using a strategy that balances exploration (trying new actions) and exploitation (progressing further down the subtree of the current best action).\n\nFinally, the predicted policy is trained to match the visit counts of the MCTS policy in an attempt to distil the search procedure into a policy that will disregard nodes that are not promising.\n\n## Potential Use Cases\n\nAlphaDev, due to its general architecture, could potentially be adapted to solve a wide variety of optimization problems. Here are a few examples:\n\n1. **Route Optimization**: For logistics companies, optimizing the routes of their fleet can result in significant cost savings. AlphaDev could be used to learn the optimal routes based on a variety of factors such as traffic, distance, and number of stops.\n\n2. **Job Scheduling**: In computing, job scheduling is a key issue. AlphaDev could be used to learn the optimal schedule that maximizes the usage of computational resources and minimizes job completion time.\n\n3. **Stock Portfolio Optimization**: AlphaDev could be used to learn the optimal mix of stocks to maximize return and minimize risk, given the current market conditions.\n\n4. **Game Playing**: Similar to its ancestor AlphaZero, AlphaDev could potentially be used to master a wide variety of games, by learning the optimal strategies.\n\n5. **Drug Discovery**: AlphaDev could be used to find the optimal chemical structure for a new drug that maximizes efficacy and minimizes side effects.\n\n\n## Usage\n\n* AssemblyGame This represents the Assembly Game RL environment. The state of the RL environment contains the current program and the state of memory and registers. Doing a step in this environment is equivalent to adding a new assembly instruction to the program (see the step method). The reward is a combination of correctness and latency reward after executing the assembly program over an input distribution. For simplicity of the overall algorithm we are not including the assembly runner, but assembly execution can be delegated to an external library (e.g. AsmJit).\n\n* AlphaDevConfig contains the main hyperparameters used for the AlphaDev agent. This includes configuration of AlphaZero, MCTS, and underlying networks.\n\n* play_game contains the logic to run an AlphaDev game. This include the MCTS procedure and the storage of the game.\n\n* RepresentationNet and PredictionNet contain the implementation the networks used in the AlphaZero algorithm. It uses a MultiQuery Transformer to represent assembly instruction\n\n## Future Work\n\nFuture adaptations of AlphaDev could implement different learning algorithms or optimization techniques for specific domains or problem areas.\n\n## Contributing\n\nPlease read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.\n\n## Acknowledgments\n\nWe appreciate the efforts of the researchers and developers who contributed to the development of the AlphaZero/MuZero architectures on which AlphaDev is based.\n\n## Roadmap\n\n* Add jax-based multi query attention: `MultiQueryAttentionBlock`\n\n* add `ResBlockV2`\n\n* add utils, terminal, is_correct, legal_actions\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "AlphaDev - Pytorch",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/kyegomez/AlphaDev"
    },
    "split_keywords": [
        "artificial intelligence",
        "attention mechanism",
        "transformers"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "89463b2413aae879a4f7f9acd3015cc7cf85ec28b2e78fa7d5b7c759af190b7c",
                "md5": "322400f5ea25c8f620b470317a4d4eb5",
                "sha256": "2fbcaaf50599646b526afdc633b9f65d33f7bb95053cd82a5f71c3afb56b572c"
            },
            "downloads": -1,
            "filename": "alphadev-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "322400f5ea25c8f620b470317a4d4eb5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 30807,
            "upload_time": "2023-08-29T00:50:51",
            "upload_time_iso_8601": "2023-08-29T00:50:51.890928Z",
            "url": "https://files.pythonhosted.org/packages/89/46/3b2413aae879a4f7f9acd3015cc7cf85ec28b2e78fa7d5b7c759af190b7c/alphadev-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e8594452533f237b9a0c27e08e7d54276b752ff4f616dc30c074a867dd48b337",
                "md5": "196e4e73937ae5b5a1427a8d329985ef",
                "sha256": "0f101ff89504390d5f6086a77223e06297252e6c3d812c877d6f9ab872f38f55"
            },
            "downloads": -1,
            "filename": "alphadev-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "196e4e73937ae5b5a1427a8d329985ef",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 31623,
            "upload_time": "2023-08-29T00:50:54",
            "upload_time_iso_8601": "2023-08-29T00:50:54.190979Z",
            "url": "https://files.pythonhosted.org/packages/e8/59/4452533f237b9a0c27e08e7d54276b752ff4f616dc30c074a867dd48b337/alphadev-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-29 00:50:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "AlphaDev",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "alphadev"
}
        
Elapsed time: 0.14024s