The previous recognition example outputs class probabilities representing the entire input image. Next, we will focus on object detection and find the locations of various objects in the frame by extracting bounding boxes. Unlike image classification, object detection networks are able to detect many different objects per frame.
The detectNet object accepts an image as input and outputs a list of coordinates of the detected bounding boxes along with their class and confidence values. detectNet can be used from both Python and C++. See below for a variety of pre-trained detection models available for download. The default model used is the 91-class SSD-Mobilenet-v2 model trained on the MS COCO dataset, which achieves real-time inference performance on Jetson using TensorRT.
First, let's try to use the detectnet program to localize objects in a static image. In addition to the input/output paths, there are some additional command-line options:
The default is --overlay=box,labels,conf, which displays the boxes, labels, and confidence values.
The cuboid option draws filled bounding boxes, while lines only draw unfilled outlines
If you are using a Docker container, it is recommended to save the output images to a directory mounted at images/test. Then, under jetsoninference/data/images/test, you can easily view these images from the host device.
Note: Before running the example, if you build your own environment, you need to download the resnet-18.tar.gz model file to the network folder to run the above program. You can directly enter the above program using the image we provide.
Make sure your terminal is in the aarch64/bin directory:
xxxxxxxxxx
cd jetson-inference/build/aarch64/bin
Here are some examples of detecting pedestrians in an image using the default SSD-Mobilenet-v2 model:
xxxxxxxxxx
# C++
$ ./detectnet --network=ssd-mobilenet-v2 images/xingren.png images/test/output_xinren.png
# Python
$ ./detectnet.py --network=ssd-mobilenet-v2 images/xingren.png images/test/output_xinren.png
Note: The first time you run each model, TensorRT will take several minutes to optimize the network. This optimized network file is then cached to disk, so future runs using that model will load faster.
You can store videos in the images folder, the path is
xxxxxxxxxx
cd jetson-inference/build/aarch64/bin
Run the program:
xxxxxxxxxx
# C++
./detectnet pedestrians.mp4 images/test/pedestrians_ssd.mp4
# Python
./detectnet.py pedestrians.mp4 images/test/pedestrians_ssd.mp4
Effect:
You can use the --threshold setting to change the detection sensitivity up or down (the default value is 0.5).
The detectnet.cpp/detectnet.py sample we used earlier can also be used for real-time camera streams. Supported camera types include:
Here are some typical scenarios for launching a program on a camera feed.
C++
xxxxxxxxxx
$ ./detectnet csi://0 # MIPI CSI camera
$ ./detectnet /dev/video0 # V4L2 camera
$ ./detectnet /dev/video0 output.mp4 # save to video file
python
xxxxxxxxxx
$ ./detectnet.py csi://0 # MIPI CSI camera
$ ./detectnet.py /dev/video0 # V4L2 camera
$ ./detectnet.py /dev/video0 output.mp4 # save to video file
The OpenGL window displays the live camera stream overlaid with bounding boxes of detected objects. Note that SSD-based models currently have the highest performance. Here is an example of using the model:
If the desired objects are not detected in the video feed, or you get spurious detections, try lowering or increasing the detection threshold using the --threshold parameter (default is 0.5).