Tuesday, August 23, 2022

Train YOLOv4 CrowdHuman on Colab for NVIDIA Jetson

Train YOLOv4 CrowdHuman on Colab for NVIDIA Jetson

For doing training on Google Colab, I use a "416x416" yolov4 model as example. I have put all data processing and training commands into an IPython Notebook. So training the "yolov4-crowdhuman-416x416" model on Google Colab is just as simple as: 

(1) opening the Notebook on Google Colab, 

(2) mount your Google Drive, 

(3) run all cells in the Notebook.

CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks.

We've 2 choices for Training Model.

  1. Darknet method
  2. Pytorch method


For DarkNet Method and Source Code

https://github.com/jkjung-avt/yolov4_crowdhuman


Deploying onto Jetson

To deploy the trained "yolov4-crowdhuman-416x416" model onto Jsetson Nano, I'd use my jkjung-avt/tensorrt_demos code to build/deploy it as a TensorRT engine. Here are the detailed steps:


Download the "yolov4-crowdhuman-416x416" model. More specifically, get "yolov4-crowdhuman-416x416.cfg" from this repository and download "yolov4-crowdhuman-416x416_best.weights" file from your Google Drive. Rename the .weights file so that it matches the .cfg file.

$ cd ${HOME}/project/tensorrt_demos/yolo
$ wget https://raw.githubusercontent.com/jkjung-avt/yolov4_crowdhuman/master/cfg/yolov4-crowdhuman-416x416.cfg


Then build the TensorRT (FP16) engine. Note the "-c 2" in the command-line option is for specifying that the model is for detecting 2 classes of objects.

$ python3 yolo_to_onnx.py -c 2 -m yolov4-crowdhuman-416x416
$ python3 onnx_to_tensorrt.py -c 2 -m yolov4-crowdhuman-416x416

Test the TensorRT engine. 

$ python3 trt_yolo.py -c 2 -m yolov4-crowdhuman-416-416 \
--image /home/jetson/Pictures/your_image.jpg

Part 2 Deploy Yolov4 CrowdHuman to NVIDIA Jetson

 


Run YOLOv4 CrowdHuman 416x416 on NVIDIA Jetson Nano


Run YOLOv4 CrowdHuman 416x416 on NVIDIA Jetson Xavier NX


Compare Performance 



Run YOLOv4-tiny CrowdHuman 416x416 on NVIDIA Jetson Nano


Performance 11-13 FPS





Compare YOLOv4 vs. YOLOv4-tiny on NVIDIA Jetson Nano


Reference

CrownHuman Dataset https://www.crowdhuman.org/

DarkNet YOLO https://github.com/AlexeyAB/darknet

TensorRT YOLO For Custom Trained Models https://jkjung-avt.github.io/trt-yolo-custom-updated/

Tuesday, August 16, 2022

Retrain Machine Learning Object Detection Model with Pytorch on Colab

Retrain SSD MobileNet Object Detection Model with Pytorch on Colab

Transfer Learning with Pytorch

Transfer learning is a technique for re-training a DNN model on a new dataset, which takes less time than training a network from scratch. With transfer learning, the weights of a pre-trained model are fine-tuned to classify a customized dataset. In these examples, we'll be using the ResNet-18 and SSD-Mobilenet networks, although you can experiment with other networks too.

PyTorch is the machine learning framework that we'll be using, and example datasets along with training scripts are provided to use below, in addition to a camera-based tool for collecting and labeling your own training datasets.

Installing Pytorch on NVIDIA Jetson

If you are Running the Docker Container or optionally chose to install PyTorch back when you Built the Project, it should already be installed on your Jetson to use. Otherwise, if you aren't using the container and want to proceed with transfer learning, you can install it now:

$ cd jetson-inference/build
$ ./install-pytorch.sh

Verifying Pytorch

You can test that PyTorch was installed correctly and detects your GPU by executing these commands from an interactive Python shell - run python or python3 from a terminal: 

>>> import torchvision
>>> torchvision.__version__

Re-training SSD-Mobilenet

Next, we'll train our own SSD-Mobilenet object detection model using PyTorch and the Open Images dataset. SSD-Mobilenet is a popular network architecture for realtime object detection on mobile and embedded devices that combines the SSD-300 Single-Shot MultiBox Detector with a Mobilenet backbone.

In the example below, we'll train a custom detection model that locates 8 different varieties of fruit, although you are welcome to pick from any of the 600 classes in the Open Images dataset to train your model on. You can visually browse the dataset here.

Downloading the Dataset

The Open Images dataset contains over 600 object classes that you can pick and choose from. There is a script provided called open_images_downloader.py which will automatically download the desired object classes for you.

The classes that we'll be using are "Apple,Orange,Banana,Strawberry,Grape,Pear,Pineapple,Watermelon", for example for a fruit-picking robot - although you are welcome to substitute your own choices from the class list. The fruit classes have ~6500 images, which is a happy medium.


Retrain model Diagram


Retrain with Pytorch. ( colab )

Then convert pytorch model to ONNX model ( colab )

then Run ONNX model and convert to TensorRT model on NVIDIA Jetson

Retrain Object Detection Model on Colab


!git clone https://github.com/dusty-nv/pytorch-ssd.git

cd pytorch-ssd/ !pip install -r requirements.txt


# Download Dataset
**By default, the dataset will be downloaded to the data/**
6360 images. 8 classes

train images:  5145
train boxes:   23539

validation images:  285
validation boxes:   825

test images:  930
test boxes:   2824

Total available images: 6360
Total available boxes:  27188

!python open_images_downloader.py \
--class-names "Apple,Orange,Banana,Strawberry,Grape,Pear,Pineapple,Watermelon" \
--data=data/fruit


# Load MobilenetV1 SSD Model

!wget -P models https://storage.googleapis.com/models-hao/mobilenet-v1-ssd-mp-0_675.pth

# Train Model 

!python train_ssd.py --data=data/fruit --model-dir=models/fruit --batch-size=32 --epochs=20


# Converting the Model to ONNX

!python onnx_export.py --model-dir=models/fruit



Pytorch => ONNX => NVIDIA Jetson
Then download ONNX file to NVIDIA jetson.
and run jetson inference DetectNet on NVIDIA jetson with TensorRT.




Run Jetson Inference with detectnet

./detectnet --model=networks/fruit/ssd-mobilenet.onnx \

--labels=networks/fruit/labels.txt --input-blob=input_0 \

--output-cvg=scores --output-bbox=boxes \

images/fruit_0.jpg fruit0.jpg




[image] saved 'fruit2.jpg' (1024x678, 3 channels) [TRT] ------------------------------------------------ [TRT] Timing Report networks/fruit/ssd-mobilenet.onnx [TRT] ------------------------------------------------ [TRT] Pre-Process CPU 0.10860ms CUDA 1.60302ms [TRT] Network CPU 1384.42908ms CUDA 1382.58435ms [TRT] Post-Process CPU 3.67787ms CUDA 3.94693ms [TRT] Visualize CPU 33.99613ms CUDA 34.45667ms [TRT] Total CPU 1422.21167ms CUDA 1422.59094ms [TRT] ------------------------------------------------


PART-2   Run Model on NVIDIA Jetson



Reference 

Jetson Inference.  https://github.com/dusty-nv/jetson-inference

Open Images Dataset  https://storage.googleapis.com/openimages/web/index.html





Adun Nantakaew อดุลย์ นันทะแก้ว 081-6452400
LINE : adunnan

Wednesday, August 10, 2022

NVIDIA Jetson Body Pose Estimation (PoseNet)

NVIDIA Jetson Body Pose Estimation (PoseNet)

What is Pose Estimation and PoseNet?

Pose estimation consists of locating various body parts (aka keypoints) that form a skeletal topology (aka links). Pose estimation has a variety of applications including gestures, AR/VR, HMI (human/machine interface), and posture/gait correction. Pre-trained models are provided for human body and hand pose estimation that are capable of detecting multiple people per frame.

The poseNet object accepts an image as input, and outputs a list of object poses. Each object pose contains a list of detected keypoints, along with their locations and links between keypoints. You can query these to find particular features. poseNet can be used from Python and C++.

Pre-trained Pose Estimation Models

ModelCLI argumentNetworkType enumKeypoints
Pose-ResNet18-Bodyresnet18-bodyRESNET18_BODY18
Pose-ResNet18-Handresnet18-handRESNET18_HAND21
Pose-DenseNet121-Bodydensenet121-bodyDENSENET121_BODY18

Run Demo Code

Default Model :  Pose-ResNet18-Body
Keypoints : 18



For Image


[image]  saved ‘pose.jpg'  (1920x1080, 3 channels)


[TRT]    ------------------------------------------------

[TRT]    Timing Report networks/Pose-ResNet18-Body/pose_resnet18_body.onnx

[TRT]    ------------------------------------------------

[TRT]    Pre-Process   CPU   0.08505ms  CUDA   0.75511ms

[TRT]    Network       CPU 141.51538ms  CUDA 140.91484ms

[TRT]    Post-Process  CPU  11.97670ms  CUDA  11.82979ms

[TRT]    Visualize     CPU  34.30075ms  CUDA  34.65588ms

[TRT]    Total         CPU 187.87788ms  CUDA 188.15562ms

[TRT]    ------------------------------------------------



For Image  

Model : Pose-DenseNet121-Body

[image]  saved 'pose.jpg'  (1920x1080, 3 channels)


[TRT]    ------------------------------------------------

[TRT]    Timing Report networks/Pose-DenseNet121-Body/pose_densenet121_body.onnx

[TRT]    ------------------------------------------------

[TRT]    Pre-Process   CPU   0.09818ms  CUDA   0.79229ms

[TRT]    Network       CPU  91.03501ms  CUDA  90.19708ms

[TRT]    Post-Process  CPU  14.73071ms  CUDA  14.91401ms

[TRT]    Visualize     CPU  39.99346ms  CUDA  39.82302ms

[TRT]    Total         CPU 145.85736ms  CUDA 145.72641ms

[TRT]    ------------------------------------------------




For Video 

Default Model :  Pose-ResNet18-Body
Keypoints : 18




Video Inference Performance : 14 -17 FPS

posenet: detected 4 person(s)


[TRT]    ------------------------------------------------

[TRT]    Timing Report networks/Pose-ResNet18-Body/pose_resnet18_body.onnx

[TRT]    ------------------------------------------------

[TRT]    Pre-Process   CPU   0.11573ms  CUDA   0.76760ms

[TRT]    Network       CPU  67.07027ms  CUDA  65.11172ms

[TRT]    Post-Process  CPU  11.31063ms  CUDA  11.28625ms

[TRT]    Visualize     CPU   2.35010ms  CUDA   1.99943ms

[TRT]    Total         CPU  80.84673ms  CUDA  79.16499ms

[TRT]    ------------------------------------------------


Hardware and OS

  • NVIDIA Jetson Nano 4GB
  • Jetpack SDK 4.6.1

Reference


Jetson Inference




Adun Nantakaew อดุลย์ นันทะแก้ว 081-6452400
LINE : adunnan