2. Opencv application

2.1. Overview

OpenCV is a cross-platform computer vision and machine learning software library distributed under a BSD license (open source) that runs on Linux, Windows, Android, and MacOS operating systems. [1] It is lightweight and efficient - it consists of a series of C functions and a few C++ classes, while providing interfaces to Python, Ruby, MATLAB and other languages, implementing many common algorithms in image processing and computer vision.

2.2 QR code

2.2.1 Introduction to QR codes

QR code is a two-dimensional bar code, QR from the English "Quick Response" abbreviation, that is, rapid response meaning, from the inventor hopes that the QR code can make its content quickly decoded. QR code not only has large information capacity, high reliability and low cost, but also can represent a variety of text information such as Chinese characters and images, and its security is strong and it is very convenient to use. What's more, the QR code technology is open source.

2.2.2 Structure of QR code

PictureParsing
imgPositioning markings: Indicate the direction of the QR code.
imgAlignment markings: If the QR code is large, these additional elements help with positioning.
imgTiming pattern: From these lines, the scanner can identify how large the matrix is.
imgVersion information: This specifies the version number of the QR code being used, there are currently 40 different versions of QR codes. The version numbers used in the sales industry are usually 1-7.
imgFormat information: The format pattern contains information about fault tolerance and data mask patterns, and makes it easier to scan code.
imgData and error correction keys: These schemas hold the actual data.
imgQuiet zone: This area is very important for the scanner, its role is to separate itself from the surrounding.

2.2.3 Features of QR code

The data values in the QR code contain duplicate information (redundant values). Therefore, even if up to 30% of the structure of the QR code is destroyed, the readability of the QR code is not affected. The QR code has a storage space of up to 7089 bits or 4296 characters, including punctuation and special characters, which can be written into the QR code. In addition to numbers and characters, you can encode words and phrases, such as web addresses. As more data is added to the QR code, the code size increases and the code structure becomes more complex.

2.2.4 QR code creation and recognition

Source path:~/orbbec_ws/src/astra_visual/qrcode

Install

Create a qrcode object

qrcode The logo is added to the QR code

Note: when using the Chinese language, need to add Chinese characters

image-20230612165818247

image-20230612171253823

2.3 Estimation of human posture

Source path:~/orbbec_ws/src/astra_visual/detection

2.3.1 Overview

Human Posture Estimation is to estimate the posture of the human body by correctly associating the key points of the human body that have been detected in the picture. The key points of the human body usually correspond to the joints with a certain degree of freedom on the human body, such as neck, shoulder, elbow, wrist, waist, knee, ankle, etc., as shown below.

基于图卷积的行人意图识别方法与流程

2.3.2 Principles

image-20210901161528551

Input an image, extract features through convolutional network, and get a set of feature Maps. Then divide into two branches, and use CNN network to extract Part Confidence Maps and Part Affinity Fields respectively. After obtaining these two information, we use Bipartite Matching in graph theory to find the Part Association and connect the nodes of the same person. Due to the vectorness of PAF itself, the generated even match is very correct and finally merged into the whole skeleton of a person. Finally, Multi-Person Parsing based on PAFs - > Transform the Multi-person parsing problem into a graphs problem - >Hungarian Algorithm (The Hungarian algorithm is the most common algorithm for partial graph matching. The core of this algorithm is to find an augmentation path. It is an algorithm for finding the maximum matching of bipartite graph with an augmentation path.)

2.3.3. Start

After clicking the image box, use the keyboard [f] key to switch target detection.

Input picture

person

Output picture

result

2.4. Object detection

The main problem in this section is how to use the dnn module in OpenCV to import a trained object detection network. But there are requirements for the opencv version.

At present, there are three main methods for object detection with deep learning:

Faster R-CNNs is the most commonly heard deep learning-based neural network. However, this approach is technically difficult (especially for deep learning novices), difficult to implement, and difficult to train.

In addition, even with the "Faster" approach to implement R-CNNs (where R stands for candidate Region Proposal), the algorithm is still relatively slow, at about 7FPS.

If we're looking for speed, we can turn to YOLO because it's very fast, reaching 40-90 FPS on TianXGPU, with the fastest version possibly reaching 155 FPS. But the problem with YOLO is that its accuracy needs to be improved.

SSDs was originally developed by Google and can be said to be a balance between the two. Compared to Faster R-CNNs, its algorithm is more straightforward. It's more accurate than YOLO.

2.4.1. Model structure

The main work of MobileNet is to replace the previous standard convolutions with depthwise sparable convolutions to solve the problems of computational efficiency and parameter number of convolutional networks. The MobileNets model is based on depthwise sparable convolutions (depth-level separable convolution), which can decompose standard convolution into a deep convolution and a point convolution (1 × 1 convolution kernel). Deep convolution applies each convolution kernel to each channel, while 1 × 1 convolution is used to combine the output of the channel convolution.

There is Batch Normalization (BN) in the basic component of MobileNet, that is, at each SGD (random gradient descent), the processing is normalized so that the result (each dimension of the output signal) has a mean of 0 and a variance of 1. Generally, BN can be tried to solve problems such as slow convergence or gradient explosion during neural network training. In addition, BN can also be added to speed up the training speed and improve the accuracy of the model in general use.

In addition, the model also uses ReLU activation functions, so the basic structure of depthwise separable convolution is shown as follows:

网络解析(二):MobileNets详解

The MobileNets network is composed of a number of depthwise separable convolution as shown in the figure above. Its specific network structure is shown in the figure below:

网络解析(二):MobileNets详解

2.4.2. Code analysis

A list of recognizable objects

Load the category [object_detection_coco.txt], import the model [frozen_inference_graph.pb], specify the deep learning framework [TensorFlow]

The image is imported, the height and width are extracted, the blob of 300x300 pixels is calculated, and the blob is passed into the neural network

2.4.3. Start

After clicking the image box, use the keyboard [f] key to switch the body pose estimation.

image-20210901172132630