# aiu-fms-testing-utils
## Setup your environment
In this directory, checkout the Foundation Model Stack (FMS) and the FMS Model Optimizer:
```shell
git clone https://github.com/foundation-model-stack/foundation-model-stack.git
git clone https://github.com/foundation-model-stack/fms-model-optimizer.git
```
Install both FMS, FMS-Model-Optimizer and aiu-fms-testing-utils:
```shell
cd foundation-model-stack
pip install -e .
cd ..
cd fms-model-optimizer
pip install -e .
cd ..
pip install -e .
```
### Running in OpenShift
Use the `pod.yaml` file to get started with your OpenShift allocation
* Modify the `ibm.com/aiu_pf_tier0` values to indicate the number of AIUs that you want to use
* Modify the `namespace` to match your namespace/project (i.e., `oc project`)
Start the pod
```shell
oc apply -f pod.yaml
```
Copy this repository into the pod (includes scripts, FMS stack)
```shell
oc cp ${PWD} my-workspace:/tmp/
```
Exec into the pod
```shell
oc rsh my-workspace bash -l
```
When you are finished, make sure to delete your pod:
```shell
oc delete -f pod.yaml
```
### Setup the environment in the container
Verify the AIU discovery has happened by looking for output like the following when you exec into the pod:
```console
---- IBM AIU Device Discovery...
---- IBM AIU Environment Setup... (Generate config and environment)
---- IBM AIU Devices Found: 2
------------------------
[1000760000@my-workspace ~]$ echo $AIU_WORLD_SIZE
2
```
Inside the container, setup envars to use the FMS:
```shell
export HOME=/tmp
cd ${HOME}/aiu-fms-testing-utils/foundation-model-stack/
# Install the FMS stack
pip install -e .
```
Run with AIU instead of, default, senulator.
```shell
export FLEX_COMPUTE=SENTIENT
export FLEX_DEVICE=VFIO
```
Optional envars to supress debugging output:
```shell
export DTLOG_LEVEL=error
export TORCH_SENDNN_LOG=CRITICAL
export DT_DEEPRT_VERBOSE=-1
```
## Example runs
Tensor parallel execution is only supported on the AIU through the [Foundation Model Stack](https://github.com/foundation-model-stack/foundation-model-stack).
The `--nproc-per-node` command line option controls the number of AIUs to use (number of parallel processes).
### Small Toy
The `small-toy.py` is a slimmed down version of the Big Toy model. The purpose of this model is to demostrate how to run a tensor parallel model with the FMS on AIU hardware.
```bash
cd ${HOME}/aiu-fms-testing-utils/scripts
# 1 AIU (sequential)
# Inductor (CPU) backend (default)
torchrun --nproc-per-node 1 ./small-toy.py
# AIU backend
torchrun --nproc-per-node 1 ./small-toy.py --backend aiu
# 2 AIUs (tensor parallel)
# Inductor (CPU) backend (default)
torchrun --nproc-per-node 2 ./small-toy.py
# AIU backend
torchrun --nproc-per-node 2 ./small-toy.py --backend aiu
```
Example Output
```console
shell$ torchrun --nproc-per-node 4 ./small-toy.py --backend aiu
------------------------------------------------------------
0 / 4 : Python Version : 3.11.7
0 / 4 : PyTorch Version : 2.2.2+cpu
0 / 4 : Dynamo Backend : aiu -> sendnn
0 / 4 : PCI Addr. for Rank 0 : 0000:bd:00.0
0 / 4 : PCI Addr. for Rank 1 : 0000:b6:00.0
0 / 4 : PCI Addr. for Rank 2 : 0000:b9:00.0
0 / 4 : PCI Addr. for Rank 3 : 0000:b5:00.0
------------------------------------------------------------
0 / 4 : Creating the model...
0 / 4 : Compiling the model...
0 / 4 : Running model: First Time...
0 / 4 : Running model: Second Time...
0 / 4 : Done
```
### Roberta
The `roberta.py` is a simple version of the Roberta model. The purpose of this model is to demostrate how to run a tensor parallel model with the FMS on AIU hardware.
**Note**: We need to disable the Tensor Parallel `Embedding` conversion to avoid the use of a `torch.distributed` interface that `gloo` does not support. Namely `torch.ops._c10d_functional.all_gather_into_tensor`. The `roberta.py` script will set the following envar to avoid the problematic conversion. This will be removed in a future PyTorch release.
```shell
export DISTRIBUTED_STRATEGY_IGNORE_MODULES=WordEmbedding,Embedding
```
```bash
cd ${HOME}/aiu-fms-testing-utils/scripts
# 1 AIU (sequential)
# Inductor (CPU) backend (default)
torchrun --nproc-per-node 1 ./roberta.py
# AIU backend
torchrun --nproc-per-node 1 ./roberta.py --backend aiu
# 2 AIUs (tensor parallel)
# Inductor (CPU) backend (default)
torchrun --nproc-per-node 2 ./roberta.py
# AIU backend
torchrun --nproc-per-node 2 ./roberta.py --backend aiu
```
Example Output
```console
shell$ torchrun --nproc-per-node 2 ./roberta.py --backend aiu
------------------------------------------------------------
0 / 2 : Python Version : 3.11.7
0 / 2 : PyTorch Version : 2.2.2+cpu
0 / 2 : Dynamo Backend : aiu -> sendnn
0 / 2 : PCI Addr. for Rank 0 : 0000:bd:00.0
0 / 2 : PCI Addr. for Rank 1 : 0000:b6:00.0
------------------------------------------------------------
0 / 2 : Creating the model...
0 / 2 : Compiling the model...
0 / 2 : Running model: First Time...
0 / 2 : Answer: (0.11509) Miss Piggy is a pig.
0 / 2 : Running model: Second Time...
0 / 2 : Answer: (0.11509) Miss Piggy is a pig.
0 / 2 : Done
```
### LLaMA/Granite
```bash
export DT_OPT=varsub=1,lxopt=1,opfusion=1,arithfold=1,dataopt=1,patchinit=1,patchprog=1,autopilot=1,weipreload=0,kvcacheopt=1,progshareopt=1
# run 194m on AIU
python3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama3.194m --tokenizer=/home/senuser/llama3.194m --unfuse_weights --min_pad_length 64 --device_type=aiu --max_new_tokens=5 --compile --default_dtype=fp16 --compile_dynamic
# run 194m on CPU
python3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama3.194m --tokenizer=/home/senuser/llama3.194m --unfuse_weights --min_pad_length 64 --device_type=cpu --max_new_tokens=5 --default_dtype=fp32
# run 7b on AIU
python3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama2.7b --tokenizer=/home/senuser/llama2.7b --unfuse_weights --min_pad_length 64 --device_type=aiu --max_new_tokens=5 --compile --default_dtype=fp16 --compile_dynamic
# run 7b on CPU
python3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama2.7b--tokenizer=/home/senuser/llama2.7b --unfuse_weights --min_pad_length 64 --device_type=cpu --max_new_tokens=5 --default_dtype=fp32
# run gpt_bigcode (granite) 3b on AIU
python3 inference.py --architecture=gpt_bigcode --variant=ibm.3b --model_path=/home/senuser/gpt_bigcode.granite.3b/*00002.bin --model_source=hf --tokenizer=/home/senuser/gpt_bigcode.granite.3b --unfuse_weights --min_pad_length 64 --device_type=aiu --max_new_tokens=5 --prompt_type=code --compile --default_dtype=fp16 --compile_dynamic
# run gpt_bigcode (granite) 3b on CPU
python3 inference.py --architecture=gpt_bigcode --variant=ibm.3b --model_path=/home/senuser/gpt_bigcode.granite.3b/*00002.bin --model_source=hf --tokenizer=/home/senuser/gpt_bigcode.granite.3b --unfuse_weights --min_pad_length 64 --device_type=cpu --max_new_tokens=5 --prompt_type=code --default_dtype=fp32
```
To try mini-batch, use `--batch_input`
For the validation script, here are a few examples:
```bash
export DT_OPT=varsub=1,lxopt=1,opfusion=1,arithfold=1,dataopt=1,patchinit=1,patchprog=1,autopilot=1,weipreload=0,kvcacheopt=1,progshareopt=1
# Run a llama 194m model, grab the example inputs in the script, generate validation tokens on cpu, validate token equivalency:
python3 scripts/validation.py --architecture=hf_pretrained --model_path=/home/devel/models/llama-194m --tokenizer=/home/devel/models/llama-194m --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --compile_dynamic
# Run a llama 194m model, grab the example inputs in a folder, generate validation tokens on cpu, validate token equivalency:
python3 scripts/validation.py --architecture=hf_pretrained --model_path=/home/devel/models/llama-194m --tokenizer=/home/devel/models/llama-194m --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --prompt_path=/home/devel/aiu-fms-testing-utils/prompts/test/*.txt --compile_dynamic
# Run a llama 194m model, grab the example inputs in a folder, grab validation text from a folder, validate token equivalency (will only validate up to max(max_new_tokens, tokens_in_validation_file)):
python3 scripts/validation.py --architecture=hf_pretrained --model_path=/home/devel/models/llama-194m --tokenizer=/home/devel/models/llama-194m --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --prompt_path=/home/devel/aiu-fms-testing-utils/prompts/test/*.txt --validation_files_path=/home/devel/aiu-fms-testing-utils/prompts/validation/*.txt --compile_dynamic
# Validate a reduced size version of llama 8b
python3 scripts/validation.py --architecture=hf_configured --model_path=/home/devel/models/llama-8b --tokenizer=/home/devel/models/llama-8b --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --extra_get_model_kwargs nlayers=3 --compile_dynamic
```
To run a logits-based validation, pass `--validation_level=1` to the validation script. This will check for the logits output to match at every step of the model through cross-entropy loss.
You can control the acceptable threshold with `--logits_loss_threshold`
## Common Errors
### Pod connection error
Errors like the following often indicate that the pod has not started or is still in the process of starting.
```console
error: unable to upgrade connection: container not found ("my-pod")
```
Use `oc get pods` to check on the status. `ContainerCreating` indicates that the pod is being created. `Running` indicates that it is ready to use.
If there is an error the use `oc describe pod/my-workspace` to see a full diagnostic view. The `Events` list at the bottom will often let you know what the problem is.
### torchrun generic error
Below is the generic `torchrun` failed program trace. It is not helpful when trying to find the problem in the program. Instead look for the actual error message a little higher in the output trace.
```console
[2024-09-16 16:10:15,705] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1479484) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/usr/local/lib64/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib64/python3.9/site-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/usr/local/lib64/python3.9/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/usr/local/lib64/python3.9/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib64/python3.9/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
./roberta.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-09-16_16:10:15
host : ibm-aiu-rdma-jjhursey
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1479484)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
```
### Additional warnings
You may see the following additional warnings/notices printed to the console. They are normal and expected at this point in time. The team will work on cleaning these up.
```console
CUDA extension not installed.
using tensor parallel
ignoring module=Embedding when distributing module
[WARNING] Keys from checkpoint (adapted to FMS) not copied into model: {'roberta.embeddings.token_type_embeddings.weight', 'lm_head.bias'}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "aiu-fms-testing-utils",
"maintainer": null,
"docs_url": null,
"requires_python": "~=3.12",
"maintainer_email": null,
"keywords": "aiu-fms-testing-utils, python, utils",
"author": null,
"author_email": "Thara Palanivel <Tharangini.Palanivel@ibm.com>, Ted Chang <htchang@us.ibm.com>",
"download_url": null,
"platform": null,
"description": "# aiu-fms-testing-utils\n\n## Setup your environment\n\nIn this directory, checkout the Foundation Model Stack (FMS) and the FMS Model Optimizer:\n```shell\ngit clone https://github.com/foundation-model-stack/foundation-model-stack.git\ngit clone https://github.com/foundation-model-stack/fms-model-optimizer.git\n```\n\nInstall both FMS, FMS-Model-Optimizer and aiu-fms-testing-utils:\n```shell\ncd foundation-model-stack\npip install -e .\ncd ..\n\ncd fms-model-optimizer\npip install -e .\ncd ..\n\npip install -e .\n```\n\n### Running in OpenShift\n\nUse the `pod.yaml` file to get started with your OpenShift allocation\n * Modify the `ibm.com/aiu_pf_tier0` values to indicate the number of AIUs that you want to use\n * Modify the `namespace` to match your namespace/project (i.e., `oc project`)\n\nStart the pod\n```shell\noc apply -f pod.yaml\n```\n\nCopy this repository into the pod (includes scripts, FMS stack)\n```shell\noc cp ${PWD} my-workspace:/tmp/\n```\n\nExec into the pod\n```shell\n oc rsh my-workspace bash -l\n ```\n\nWhen you are finished, make sure to delete your pod:\n```shell\noc delete -f pod.yaml\n```\n\n### Setup the environment in the container\n\nVerify the AIU discovery has happened by looking for output like the following when you exec into the pod:\n```console\n---- IBM AIU Device Discovery...\n---- IBM AIU Environment Setup... (Generate config and environment)\n---- IBM AIU Devices Found: 2\n------------------------\n[1000760000@my-workspace ~]$ echo $AIU_WORLD_SIZE\n2\n```\n\nInside the container, setup envars to use the FMS:\n```shell\nexport HOME=/tmp\ncd ${HOME}/aiu-fms-testing-utils/foundation-model-stack/\n# Install the FMS stack\npip install -e .\n```\n\nRun with AIU instead of, default, senulator.\n```shell\nexport FLEX_COMPUTE=SENTIENT\nexport FLEX_DEVICE=VFIO\n```\n\nOptional envars to supress debugging output:\n```shell\nexport DTLOG_LEVEL=error\nexport TORCH_SENDNN_LOG=CRITICAL\nexport DT_DEEPRT_VERBOSE=-1\n```\n\n## Example runs\n\n Tensor parallel execution is only supported on the AIU through the [Foundation Model Stack](https://github.com/foundation-model-stack/foundation-model-stack).\n\nThe `--nproc-per-node` command line option controls the number of AIUs to use (number of parallel processes).\n\n### Small Toy\n\nThe `small-toy.py` is a slimmed down version of the Big Toy model. The purpose of this model is to demostrate how to run a tensor parallel model with the FMS on AIU hardware.\n\n```bash\ncd ${HOME}/aiu-fms-testing-utils/scripts\n\n# 1 AIU (sequential)\n# Inductor (CPU) backend (default)\ntorchrun --nproc-per-node 1 ./small-toy.py\n# AIU backend\ntorchrun --nproc-per-node 1 ./small-toy.py --backend aiu\n\n# 2 AIUs (tensor parallel)\n# Inductor (CPU) backend (default)\ntorchrun --nproc-per-node 2 ./small-toy.py\n# AIU backend\ntorchrun --nproc-per-node 2 ./small-toy.py --backend aiu\n```\n\nExample Output\n\n```console\nshell$ torchrun --nproc-per-node 4 ./small-toy.py --backend aiu\n------------------------------------------------------------\n0 / 4 : Python Version : 3.11.7\n0 / 4 : PyTorch Version : 2.2.2+cpu\n0 / 4 : Dynamo Backend : aiu -> sendnn\n0 / 4 : PCI Addr. for Rank 0 : 0000:bd:00.0\n0 / 4 : PCI Addr. for Rank 1 : 0000:b6:00.0\n0 / 4 : PCI Addr. for Rank 2 : 0000:b9:00.0\n0 / 4 : PCI Addr. for Rank 3 : 0000:b5:00.0\n------------------------------------------------------------\n0 / 4 : Creating the model...\n0 / 4 : Compiling the model...\n0 / 4 : Running model: First Time...\n0 / 4 : Running model: Second Time...\n0 / 4 : Done\n```\n\n\n### Roberta\n\nThe `roberta.py` is a simple version of the Roberta model. The purpose of this model is to demostrate how to run a tensor parallel model with the FMS on AIU hardware. \n\n**Note**: We need to disable the Tensor Parallel `Embedding` conversion to avoid the use of a `torch.distributed` interface that `gloo` does not support. Namely `torch.ops._c10d_functional.all_gather_into_tensor`. The `roberta.py` script will set the following envar to avoid the problematic conversion. This will be removed in a future PyTorch release.\n```shell\nexport DISTRIBUTED_STRATEGY_IGNORE_MODULES=WordEmbedding,Embedding\n```\n\n```bash\ncd ${HOME}/aiu-fms-testing-utils/scripts\n\n# 1 AIU (sequential)\n# Inductor (CPU) backend (default)\ntorchrun --nproc-per-node 1 ./roberta.py\n# AIU backend\ntorchrun --nproc-per-node 1 ./roberta.py --backend aiu\n\n# 2 AIUs (tensor parallel)\n# Inductor (CPU) backend (default)\ntorchrun --nproc-per-node 2 ./roberta.py\n# AIU backend\ntorchrun --nproc-per-node 2 ./roberta.py --backend aiu\n```\n\nExample Output\n\n```console\nshell$ torchrun --nproc-per-node 2 ./roberta.py --backend aiu\n------------------------------------------------------------\n0 / 2 : Python Version : 3.11.7\n0 / 2 : PyTorch Version : 2.2.2+cpu\n0 / 2 : Dynamo Backend : aiu -> sendnn\n0 / 2 : PCI Addr. for Rank 0 : 0000:bd:00.0\n0 / 2 : PCI Addr. for Rank 1 : 0000:b6:00.0\n------------------------------------------------------------\n0 / 2 : Creating the model...\n0 / 2 : Compiling the model...\n0 / 2 : Running model: First Time...\n0 / 2 : Answer: (0.11509) Miss Piggy is a pig.\n0 / 2 : Running model: Second Time...\n0 / 2 : Answer: (0.11509) Miss Piggy is a pig.\n0 / 2 : Done\n```\n\n### LLaMA/Granite\n```bash\nexport DT_OPT=varsub=1,lxopt=1,opfusion=1,arithfold=1,dataopt=1,patchinit=1,patchprog=1,autopilot=1,weipreload=0,kvcacheopt=1,progshareopt=1\n\n# run 194m on AIU\npython3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama3.194m --tokenizer=/home/senuser/llama3.194m --unfuse_weights --min_pad_length 64 --device_type=aiu --max_new_tokens=5 --compile --default_dtype=fp16 --compile_dynamic\n\n# run 194m on CPU\npython3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama3.194m --tokenizer=/home/senuser/llama3.194m --unfuse_weights --min_pad_length 64 --device_type=cpu --max_new_tokens=5 --default_dtype=fp32\n\n# run 7b on AIU\npython3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama2.7b --tokenizer=/home/senuser/llama2.7b --unfuse_weights --min_pad_length 64 --device_type=aiu --max_new_tokens=5 --compile --default_dtype=fp16 --compile_dynamic\n\n# run 7b on CPU\npython3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama2.7b--tokenizer=/home/senuser/llama2.7b --unfuse_weights --min_pad_length 64 --device_type=cpu --max_new_tokens=5 --default_dtype=fp32\n\n# run gpt_bigcode (granite) 3b on AIU\npython3 inference.py --architecture=gpt_bigcode --variant=ibm.3b --model_path=/home/senuser/gpt_bigcode.granite.3b/*00002.bin --model_source=hf --tokenizer=/home/senuser/gpt_bigcode.granite.3b --unfuse_weights --min_pad_length 64 --device_type=aiu --max_new_tokens=5 --prompt_type=code --compile --default_dtype=fp16 --compile_dynamic\n\n# run gpt_bigcode (granite) 3b on CPU\npython3 inference.py --architecture=gpt_bigcode --variant=ibm.3b --model_path=/home/senuser/gpt_bigcode.granite.3b/*00002.bin --model_source=hf --tokenizer=/home/senuser/gpt_bigcode.granite.3b --unfuse_weights --min_pad_length 64 --device_type=cpu --max_new_tokens=5 --prompt_type=code --default_dtype=fp32\n```\n\nTo try mini-batch, use `--batch_input`\n\nFor the validation script, here are a few examples:\n\n```bash\nexport DT_OPT=varsub=1,lxopt=1,opfusion=1,arithfold=1,dataopt=1,patchinit=1,patchprog=1,autopilot=1,weipreload=0,kvcacheopt=1,progshareopt=1\n\n# Run a llama 194m model, grab the example inputs in the script, generate validation tokens on cpu, validate token equivalency: \npython3 scripts/validation.py --architecture=hf_pretrained --model_path=/home/devel/models/llama-194m --tokenizer=/home/devel/models/llama-194m --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --compile_dynamic\n\n# Run a llama 194m model, grab the example inputs in a folder, generate validation tokens on cpu, validate token equivalency:\npython3 scripts/validation.py --architecture=hf_pretrained --model_path=/home/devel/models/llama-194m --tokenizer=/home/devel/models/llama-194m --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --prompt_path=/home/devel/aiu-fms-testing-utils/prompts/test/*.txt --compile_dynamic\n\n# Run a llama 194m model, grab the example inputs in a folder, grab validation text from a folder, validate token equivalency (will only validate up to max(max_new_tokens, tokens_in_validation_file)):\npython3 scripts/validation.py --architecture=hf_pretrained --model_path=/home/devel/models/llama-194m --tokenizer=/home/devel/models/llama-194m --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --prompt_path=/home/devel/aiu-fms-testing-utils/prompts/test/*.txt --validation_files_path=/home/devel/aiu-fms-testing-utils/prompts/validation/*.txt --compile_dynamic\n\n# Validate a reduced size version of llama 8b\npython3 scripts/validation.py --architecture=hf_configured --model_path=/home/devel/models/llama-8b --tokenizer=/home/devel/models/llama-8b --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --extra_get_model_kwargs nlayers=3 --compile_dynamic\n```\n\nTo run a logits-based validation, pass `--validation_level=1` to the validation script. This will check for the logits output to match at every step of the model through cross-entropy loss.\nYou can control the acceptable threshold with `--logits_loss_threshold`\n\n## Common Errors\n\n### Pod connection error\n\nErrors like the following often indicate that the pod has not started or is still in the process of starting.\n```console\nerror: unable to upgrade connection: container not found (\"my-pod\")\n```\n\nUse `oc get pods` to check on the status. `ContainerCreating` indicates that the pod is being created. `Running` indicates that it is ready to use.\n\nIf there is an error the use `oc describe pod/my-workspace` to see a full diagnostic view. The `Events` list at the bottom will often let you know what the problem is.\n\n### torchrun generic error\n\nBelow is the generic `torchrun` failed program trace. It is not helpful when trying to find the problem in the program. Instead look for the actual error message a little higher in the output trace.\n\n```console\n[2024-09-16 16:10:15,705] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1479484) of binary: /usr/bin/python3\nTraceback (most recent call last):\n File \"/usr/local/bin/torchrun\", line 8, in <module>\n sys.exit(main())\n File \"/usr/local/lib64/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 347, in wrapper\n return f(*args, **kwargs)\n File \"/usr/local/lib64/python3.9/site-packages/torch/distributed/run.py\", line 812, in main\n run(args)\n File \"/usr/local/lib64/python3.9/site-packages/torch/distributed/run.py\", line 803, in run\n elastic_launch(\n File \"/usr/local/lib64/python3.9/site-packages/torch/distributed/launcher/api.py\", line 135, in __call__\n return launch_agent(self._config, self._entrypoint, list(args))\n File \"/usr/local/lib64/python3.9/site-packages/torch/distributed/launcher/api.py\", line 268, in launch_agent\n raise ChildFailedError(\ntorch.distributed.elastic.multiprocessing.errors.ChildFailedError:\n============================================================\n./roberta.py FAILED\n------------------------------------------------------------\nFailures:\n <NO_OTHER_FAILURES>\n------------------------------------------------------------\nRoot Cause (first observed failure):\n[0]:\n time : 2024-09-16_16:10:15\n host : ibm-aiu-rdma-jjhursey\n rank : 0 (local_rank: 0)\n exitcode : 1 (pid: 1479484)\n error_file: <N/A>\n traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html\n============================================================\n```\n\n### Additional warnings\n\nYou may see the following additional warnings/notices printed to the console. They are normal and expected at this point in time. The team will work on cleaning these up.\n\n```console\nCUDA extension not installed.\nusing tensor parallel\nignoring module=Embedding when distributing module\n[WARNING] Keys from checkpoint (adapted to FMS) not copied into model: {'roberta.embeddings.token_type_embeddings.weight', 'lm_head.bias'}\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Spyre FMS Testing Utils",
"version": "0.0.2a1",
"project_urls": {
"Homepage": "https://github.com/foundation-model-stack/aiu-fms-testing-utils",
"Issues": "https://github.com/foundation-model-stack/aiu-fms-testing-utils/issues",
"Repository": "https://github.com/foundation-model-stack/aiu-fms-testing-utils"
},
"split_keywords": [
"aiu-fms-testing-utils",
" python",
" utils"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3138db03cf0aee10e4ea2fa0f987071e7aa7f8d107cf2adcc5df6d29a1d28900",
"md5": "9b7c001a5c932f79e433bf2c7a97a199",
"sha256": "af224ef44f4d5898a03c014e4fb23e17321355845614331939bc3fab41ed422e"
},
"downloads": -1,
"filename": "aiu_fms_testing_utils-0.0.2a1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9b7c001a5c932f79e433bf2c7a97a199",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.12",
"size": 35204,
"upload_time": "2025-07-25T20:39:20",
"upload_time_iso_8601": "2025-07-25T20:39:20.855057Z",
"url": "https://files.pythonhosted.org/packages/31/38/db03cf0aee10e4ea2fa0f987071e7aa7f8d107cf2adcc5df6d29a1d28900/aiu_fms_testing_utils-0.0.2a1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-25 20:39:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "foundation-model-stack",
"github_project": "aiu-fms-testing-utils",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "ibm-fms",
"specs": []
},
{
"name": "fms-model-optimizer",
"specs": []
},
{
"name": "sentencepiece",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "transformers",
"specs": []
}
],
"lcname": "aiu-fms-testing-utils"
}