Object Detection Inference

1. Localizing Objects with DetectNet

The previous recognition example outputs class probabilities representing the entire input image. Next, we will focus on object detection and find the locations of various objects in the frame by extracting bounding boxes. Unlike image classification, object detection networks are able to detect many different objects per frame.

The detectNet object accepts an image as input and outputs a list of coordinates of the detected bounding boxes along with their class and confidence values. detectNet can be used from both Python and C++. See below for a variety of pre-trained detection models available for download. The default model used is the 91-class SSD-Mobilenet-v2 model trained on the MS COCO dataset, which achieves real-time inference performance on Jetson using TensorRT.

2. Detecting Objects from Images

First, let's try to use the detectnet program to localize objects in a static image. In addition to the input/output paths, there are some additional command-line options:

The default is --overlay=box,labels,conf, which displays the boxes, labels, and confidence values.

The cuboid option draws filled bounding boxes, while lines only draw unfilled outlines

If you are using a Docker container, it is recommended to save the output images to a directory mounted at images/test. Then, under jetsoninference/data/images/test, you can easily view these images from the host device.

Note: Before running the example, if you build your own environment, you need to download the resnet-18.tar.gz model file to the network folder to run the above program. You can directly enter the above program using the image we provide.

Make sure your terminal is in the aarch64/bin directory:

Here are some examples of detecting pedestrians in an image using the default SSD-Mobilenet-v2 model:

Note: The first time you run each model, TensorRT will take several minutes to optimize the network. This optimized network file is then cached to disk, so future runs using that model will load faster.

3. Processing video files

You can store videos in the images folder, the path is

Run the program:

Effect:

You can use the --threshold setting to change the detection sensitivity up or down (the default value is 0.5).

4. Run the real-time camera recognition demo

The detectnet.cpp/detectnet.py sample we used earlier can also be used for real-time camera streams. Supported camera types include:

Here are some typical scenarios for launching a program on a camera feed.

C++

python

The OpenGL window displays the live camera stream overlaid with bounding boxes of detected objects. Note that SSD-based models currently have the highest performance. Here is an example of using the model:

If the desired objects are not detected in the video feed, or you get spurious detections, try lowering or increasing the detection threshold using the --threshold parameter (default is 0.5).