Introdution to TAO and TAO Toolkit
Nvidia Train Adapt Optimize (TAO) Toolkit is a Python based AI toolkit for development of purpose-built AI models and customizing them with users' own data.
The overall architecture consists of two parts Nvidia-TAO (Frontend) and Nvidia Tao Toolkit (Backend).
TAO
TAO is a is a GUI based AI-model-adaptation framework that simplifies and accelerates the creation of enterprise AI applications and services. Official Site: Nvidia TAO
TAO Toolkit
TAO Toolkit is a CLI and Jupyter notebook based solution that simplifies the task of transfer learning and fine-tuning on computer vision and conversation AI models. The NVIDIA TAO Toolkit simplifies this process by abstracting away the AI/DL framework complexity. Official Site: Nvidia TAO Toolkit
Action Recognition with TAO Toolkit
We will use jupyter notebook for running the code shown in this tutorial.
Create a folder named workspace. We will use this as the root of our project
%env HOST_DATA_DIR=/home/user/AJC/workspace/action_recognition_net/data # Absolute path to directory to store the training data %env HOST_SPECS_DIR=/home/user/AJC/workspace/action_recognition_net/specs # Absolute path to directory to store the training specs %env HOST_RESULTS_DIR=/home/user/AJC/workspace/action_recognition_net/results # Absolute path to directory to store the training output %env KEY = nvidia_tao # Encryption key of nvidia model. No need to change this.
Let's create the sub directories
!mkdir -p $HOST_DATA_DIR !mkdir -p $HOST_SPECS_DIR !mkdir -p $HOST_RESULTS_DIR
Create a tao_mounts.json file which mounts the specified directories in TAO docker
import json import os mounts_file = os.path.expanduser("~/.tao_mounts.json") tlt_configs = { "Mounts":[ { "source": os.environ["HOST_DATA_DIR"], "destination": "/data" }, { "source": os.environ["HOST_SPECS_DIR"], "destination": "/specs" }, { "source": os.environ["HOST_RESULTS_DIR"], "destination": "/results" }, { "source": os.path.expanduser("~/.cache"), "destination": "/root/.cache" } ], "DockerOptions": { "shm_size": "16G", "ulimits": { "memlock": -1, "stack": 67108864 } } } # Writing the mounts file. with open(mounts_file, "w") as mfile: json.dump(tlt_configs, mfile, indent=4)
!cat ~/.tao_mounts.json
Installing Pre-Requisities
The basic requirements required for this tutorial is as following
- python >=3.6.9 < 3.8.x
- docker-ce > 19.03.5
- docker-API 1.40
- nvidia-container-toolkit > 1.3.0-1
- nvidia-container-runtime > 3.4.0-1
- nvidia-docker2 > 2.5.0-1
- nvidia-driver > 455+
Once you have installed the pre-requisites, please log in to the docker registry nvcr.io
docker login nvcr.io
You will be asked to enter a username and password. The username is $oauthtoken and the password is the API key generated from ngc.nvidia.com. Please follow the instructions in the NGC setup guide to generate your own API key.
Now install the other required libraries.
!pip3 install nvidia-pyindex # Nvidia PIP Index !pip3 install nvidia-tao # Nvidia TAO Toolkit library !apt update !apt-get install unrar # Required for extracting downloaded data !pip3 install xmltodict opencv-python # Required for dataprocessing
To verif the setup, run
!tao info
The result will be like this
Configuration of the TAO Toolkit Instance dockers: ['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm'] format_version: 2.0 toolkit_version: 3.21.11 published_date: 11/08/2021
Preparing the Dataset
We are using HMDB51 Dataset to finetune the action recognition model.
!wget -P $HOST_DATA_DIR http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/hmdb51_org.rar # Downloads the dataset !mkdir -p $HOST_DATA_DIR/videos && unrar x $HOST_DATA_DIR/hmdb51_org.rar $HOST_DATA_DIR/videos #Unrar downloaded data !mkdir -p $HOST_DATA_DIR/raw_data #Directory to copy the data to
Extract the required classes into $HOST_DATA_DIR/raw_data folder.
!unrar x $HOST_DATA_DIR/videos/fall_floor.rar $HOST_DATA_DIR/raw_data !unrar x $HOST_DATA_DIR/videos/ride_bike.rar $HOST_DATA_DIR/raw_data
Download dataset preprocessing script
!git clone https://github.com/NVIDIA-AI-IOT/tao_toolkit_recipes
Run the preprocessing
!cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && bash ./preprocess_HMDB_RGB.sh $HOST_DATA_DIR/raw_data $HOST_DATA_DIR/processed_data
Download the official train-test split from HMDB site
!cd $HOST_DATA_DIR && wget http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/test_train_splits.rar !cd $HOST_DATA_DIR && mkdir splits !unrar x $HOST_DATA_DIR/test_train_splits.rar $HOST_DATA_DIR/splits
Split the downloaded dataset based on the train-test split
!cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && python3 ./split_dataset.py $HOST_DATA_DIR/processed_data $HOST_DATA_DIR/splits/testTrainMulti_7030_splits $HOST_DATA_DIR/train $HOST_DATA_DIR/test
Downloading Pre-Trained Model from Nvidia NGC
Installing NGC CLI on the local machine.
import os %env CLI=ngccli_cat_linux.zip !mkdir -p $HOST_RESULTS_DIR/ngccli # Remove any previously existing CLI installations !rm -rf $HOST_RESULTS_DIR/ngccli/* !wget "https://ngc.nvidia.com/downloads/$CLI" -P $HOST_RESULTS_DIR/ngccli !unzip -u "$HOST_RESULTS_DIR/ngccli/$CLI" -d $HOST_RESULTS_DIR/ngccli/ !rm $HOST_RESULTS_DIR/ngccli/*.zip os.environ["PATH"]="{}/ngccli:{}".format(os.getenv("HOST_RESULTS_DIR", ""), os.getenv("PATH", ""))
Find the Action Recognition Model on Nvidia NGC
!ngc registry model list nvidia/tao/actionrecognitionnet:*
The output will be like this
+-------+-------+-------+-------+-------+-------+-------+-------+-------+ | Versi | Accur | Epoch | Batch | GPU | Memor | File | Statu | Creat | | on | acy | s | Size | Model | y Foo | Size | s | ed | | | | | | | tprin | | | Date | | | | | | | t | | | | +-------+-------+-------+-------+-------+-------+-------+-------+-------+ | train | 88.0 | 120 | 1 | V100 | 426.2 | 426.1 | UPLOA | Nov | | able_ | | | | | | 6 MB | D_COM | 23, | | v1.0 | | | | | | | PLETE | 2021 | | deplo | 90.0 | 120 | 1 | V100 | 170.3 | 170.3 | UPLOA | Oct | | yable | | | | | | 3 MB | D_COM | 22, | | _v1.0 | | | | | | | PLETE | 2021 | +-------+-------+-------+-------+-------+-------+-------+-------+-------+
Download the Pre-trained Model
!mkdir -p $HOST_RESULTS_DIR/pretrained # Pull pretrained model from NGC !ngc registry model download-version "nvidia/tao/actionrecognitionnet:trainable_v1.0" --dest $HOST_RESULTS_DIR/pretrained
Add the configuration file to $HOST_SPECS_DIR with the filename train_rgb_3d_finetune.yaml
output_dir: /results/rgb_3d_ptm encryption_key: nvidia_tao model_config: model_type: rgb backbone: resnet18 rgb_seq_length: 3 input_type: 3d sample_strategy: consecutive dropout_ratio: 0.0 train_config: optim: lr: 0.001 momentum: 0.9 weight_decay: 0.0001 lr_scheduler: MultiStep lr_steps: [5, 15, 20] lr_decay: 0.1 epochs: 20 checkpoint_interval: 1 dataset_config: train_dataset_dir: /data/train val_dataset_dir: /data/test label_map: fall_floor: 0 ride_bike: 1 output_shape: - 224 - 224 batch_size: 32 workers: 8 clips_per_video: 5 augmentation_config: train_crop_type: no_crop horizontal_flip_prob: 0.5 rgb_input_mean: [0.5] rgb_input_std: [0.5] val_center_crop: False
Training Parameters
- output_dir: Directory to save the results to
- encryption_key: Encryption key of the original model provided by nvidia
- model_config: configures the model settings
- model_type: specifies the type of model, rgb or opticalflow or both. Possible values: [rgb/of/joint]
- backbone: specifies the backbone network of model. Possible values:[resnet18/34/50/101/152]
- rgb_seq_length: specifies the length of RGB input sequence
- input_type: specifies the input type. Possible values:[2d/3d]
- sample_strategy: consecutive
- dropout_ratio: probability to drop the hidden units
- train_config: configure the training hyperparameters
- optim_config: configures parameters of optimiser like learning rate, scheduler, learning step etc.
- epochs: number of epochs to train for
- checkpoint_interval: interval between which we need to save the checkpoint
- dataset_config: configure the dataset
- train_dataset_dir: path to train data directory
- val_dataset_dir: path to test data directory
- label_map: map the class label to id
- output_shape
- batch_size: Specifies the batchsize.
- workers: number of workers that performs data loading, Usually the number equals the number of processors
- clips_per_video: number of clips to be sampled from single video
- augmentation_config: configure the augmentations to be used like crop, horizontal flip, normalisation etc.
Training / Fine tuning the model
# The paths are relative from inside the TAO Docker. No change required unless you changed `tao_mounts.json` file %env DATA_DIR = /data %env SPECS_DIR = /specs %env RESULTS_DIR = /results
!tao action_recognition train \ -e $SPECS_DIR/train_rgb_3d_finetune.yaml \ -r $RESULTS_DIR/rgb_3d_ptm \ -k $KEY \ model_config.rgb_pretrained_model_path=$RESULTS_DIR/pretrained/actionrecognitionnet_vtrainable_v1.0/resnet18_3d_rgb_hmdb5_32.tlt \ model_config.rgb_pretrained_num_classes=5
Rename the last checkpoint saved
!mv $HOST_RESULTS_DIR/rgb_3d_ptm/ar_model_epoch=19-val_loss=0.05.tlt $HOST_RESULTS_DIR/rgb_3d_ptm/rgb_only_model.tlt !ls -ltrh $HOST_RESULTS_DIR/rgb_3d_ptm/rgb_only_model.tlt
Evaluation, Inference and Exporting
We need to create a spec file in $HOST_SPECS_DIR folder for each of the three tasks Evaluation, Inference and Exporting. The files contain the following content
model_config: model_type: rgb backbone: resnet18 rgb_seq_length: 3 input_type: 3d sample_strategy: consecutive dropout_ratio: 0.0 dataset_config: label_map: fall_floor: 0 ride_bike: 1 output_shape: - 224 - 224 batch_size: 32 workers: 8 augmentation_config: train_crop_type: no_crop horizontal_flip_prob: 0.0 rgb_input_mean: [0.5] rgb_input_std: [0.5] val_center_crop: False
Save the file as evaluate_rgb.yaml, infer_rgb.yaml and export_rgb.yaml respectively.
Evaluating the finetuned model
Run the following command to start the evaluation.
!tao action_recognition evaluate \ -e $SPECS_DIR/evaluate_rgb.yaml \ -k $KEY \ model=$RESULTS_DIR/rgb_3d_ptm/rgb_only_model.tlt \ batch_size=1 \ test_dataset_dir=$DATA_DIR/test \ video_eval_mode=center
The output will be like this
100%|███████████████████████████████████████████| 60/60 [00:02<00:00, 27.19it/s] ******************************* fall_floor 96.67 ride_bike 100.0 ******************************* Total accuracy: 98.333 Average class accuracy: 98.333
Inference with the Finetuned model
For Inferencing run,
!tao action_recognition inference \ -e $SPECS_DIR/infer_rgb.yaml \ -k $KEY \ model=$RESULTS_DIR/rgb_3d_ptm/rgb_only_model.tlt \ inference_dataset_dir=$DATA_DIR/test/ride_bike \ video_inf_mode=center
The output will be like this
100%|███████████████████████████████████████████| 30/30 [00:01<00:00, 15.20it/s] /data/test/ride_bike/Yorki_Kassy_beim_Fahrrad_fahren_ride_bike_f_cm_np1_le_med_2 : ['ride_bike'] /data/test/ride_bike/Schuodde_kann_kein_Fahrrad_fahren_ride_bike_f_cm_np1_le_med_1 : ['ride_bike'] /data/test/ride_bike/Yorki_Kassy_beim_Fahrrad_fahren_ride_bike_f_cm_np1_le_med_0 : ['ride_bike'] /data/test/ride_bike/Fahrrad_fahren_mit_Albert_ride_bike_f_cm_np1_ba_med_0 : ['ride_bike'] /data/test/ride_bike/Fahrrad_fahren_mit_Albert_ride_bike_f_cm_np1_le_med_2 : ['ride_bike'] /data/test/ride_bike/Fahrrad_fahren_mit_Albert_ride_bike_f_cm_np1_ba_med_1 : ['ride_bike'] /data/test/ride_bike/Schuodde_kann_kein_Fahrrad_fahren_ride_bike_f_cm_np1_le_med_2 : ['ride_bike'] /data/test/ride_bike/lady_on_bike_ride_bike_f_cm_np1_ri_med_0 : ['ride_bike'] /data/test/ride_bike/Schuodde_kann_kein_Fahrrad_fahren_ride_bike_l_cm_np1_le_med_0 : ['ride_bike'] /data/test/ride_bike/Yorki_Kassy_beim_Fahrrad_fahren_ride_bike_f_cm_np1_fr_med_3 : ['ride_bike'] /data/test/ride_bike/Yorki_Kassy_beim_Fahrrad_fahren_ride_bike_f_cm_np1_ri_med_1 : ['ride_bike'] /data/test/ride_bike/lady_on_bike_ride_bike_f_cm_np1_ba_med_1 : ['ride_bike'] ...
Exporting the Model
Create an export directory
!mkdir -p $HOST_RESULTS_DIR/export
Run the following to start exporting
# Export the RGB model to encrypted ONNX model !tao action_recognition export \ -e $SPECS_DIR/export_rgb.yaml \ -k $KEY \ model=$RESULTS_DIR/rgb_3d_ptm/rgb_only_model.tlt\ output_file=$RESULTS_DIR/export/rgb_resnet18_3.etlt