2. Opencv application

2. Opencv application2.1. Overview2.2 QR code2.2.1 Introduction to QR codes2.2.2 Structure of QR code2.2.3 Features of QR code2.2.4 QR code creation and recognition2.3 Estimation of human posture2.3.1 Overview2.3.2 Principles2.3.3. Start2.4. Object detection2.4.1. Model structure2.4.2. Code analysis2.4.3. Start

2.1. Overview

OpenCV is a cross-platform computer vision and machine learning software library distributed under a BSD license (open source) that runs on Linux, Windows, Android, and MacOS operating systems. [1] It is lightweight and efficient - it consists of a series of C functions and a few C++ classes, while providing interfaces to Python, Ruby, MATLAB and other languages, implementing many common algorithms in image processing and computer vision.

2.2 QR code

2.2.1 Introduction to QR codes

QR code is a two-dimensional bar code, QR from the English "Quick Response" abbreviation, that is, rapid response meaning, from the inventor hopes that the QR code can make its content quickly decoded. QR code not only has large information capacity, high reliability and low cost, but also can represent a variety of text information such as Chinese characters and images, and its security is strong and it is very convenient to use. What's more, the QR code technology is open source.

2.2.2 Structure of QR code

Picture	Parsing
	Positioning markings: Indicate the direction of the QR code.
	Alignment markings: If the QR code is large, these additional elements help with positioning.
	Timing pattern: From these lines, the scanner can identify how large the matrix is.
	Version information: This specifies the version number of the QR code being used, there are currently 40 different versions of QR codes. The version numbers used in the sales industry are usually 1-7.
	Format information: The format pattern contains information about fault tolerance and data mask patterns, and makes it easier to scan code.
	Data and error correction keys: These schemas hold the actual data.
	Quiet zone: This area is very important for the scanner, its role is to separate itself from the surrounding.

2.2.3 Features of QR code

The data values in the QR code contain duplicate information (redundant values). Therefore, even if up to 30% of the structure of the QR code is destroyed, the readability of the QR code is not affected. The QR code has a storage space of up to 7089 bits or 4296 characters, including punctuation and special characters, which can be written into the QR code. In addition to numbers and characters, you can encode words and phrases, such as web addresses. As more data is added to the QR code, the code size increases and the code structure becomes more complex.

2.2.4 QR code creation and recognition

Source path:~/orbbec_ws/src/astra_visual/qrcode

Install


python3 -m pip install qrcode pyzbar
sudo apt-get install libzbar-dev

Create

Create a qrcode object


xxxxxxxxxx
    '''
    参数含义：
    version：值为1~40的整数，控制二维码的大小（最小值是1，是个12×12的矩阵）。
             如果想让程序自动确定，将值设置为 None 并使用 fit 参数即可。
    error_correction：控制二维码的错误纠正功能。可取值下列4个常量。
    　　ERROR_CORRECT_L：大约7%或更少的错误能被纠正。
    　　ERROR_CORRECT_M（默认）：大约15%或更少的错误能被纠正。
    　　ROR_CORRECT_H：大约30%或更少的错误能被纠正。
    box_size：控制二维码中每个小格子包含的像素数。
    border：控制边框（二维码与图片边界的距离）包含的格子数（默认为4，是相关标准规定的最小值）
    Meaning of parameter:
    version: The value is an integer ranging from 1 to 40, which controls the size of the two-dimensional code (the minimum value is 1, which is a 12×12 matrix).
            If you want the program to determine this automatically, set the value to None and use the fit argument.
    error_correction: Controls the error correction function of the two-dimensional code. It can take the following four constants.
        ERROR_CORRECT_L: Approximately 7% or less of errors can be corrected.
        ERROR_CORRECT_M (default) : Approximately 15% or less of errors can be corrected.
        ROR_CORRECT_H: About 30% or less of errors can be corrected.
    box_size: Controls the number of pixels contained in each small box in the QR code.
    border: Control the number of squares included in the border (the distance between the QR code and the image border) (default is 4, which is the minimum specified by the relevant standard)
    '''
    qr = qrcode.QRCode(
        version=1,
        error_correction=qrcode.constants.ERROR_CORRECT_H,
        box_size=5,
        border=4,)

qrcode The logo is added to the QR code


xxxxxxxxxx
    # 如果logo地址存在，就添加logo图片
    # Add a logo image if the logo address exists
    my_file = Path(logo_path)
    if my_file.is_file(): img = add_logo(img, logo_path)

Note: when using the Chinese language, need to add Chinese characters


xxxxxxxxxx
cd ~/orbbec_ws/src/astra_visual/qrcode
python3 QRcode_Create.py

Identification


xxxxxxxxxx
def decodeDisplay(image, font_path):
    gray = cv.cvtColor(image, cv.COLOR_BGR2GRAY)
    # 需要先把输出的中文字符转换成Unicode编码形式
    # Need to first convert the output Chinese characters to Unicode encoding form
    barcodes = pyzbar.decode(gray)
    for barcode in barcodes:
        # 提取二维码的边界框的位置
        # Extract the position of the boundary box of the QR code
        (x, y, w, h) = barcode.rect
        # 画出图像中条形码的边界框
        # Draw the bounding box of the barcode in the image
        cv.rectangle(image, (x, y), (x + w, y + h), (225, 0, 0), 5)
        encoding = 'UTF-8'
        # 画出来，就需要先将它转换成字符串
        # To draw it, you need to convert it to a string first
        barcodeData = barcode.data.decode(encoding)
        barcodeType = barcode.type
        # 绘出图像上数据和类型
        # Draw the data and type on the image
        pilimg = Image.fromarray(image)
        # 创建画笔
        # Create a brush
        draw = ImageDraw.Draw(pilimg) 
        # 参数1：字体文件路径，参数2：字体大小
        # Parameter 1: font file path, parameter 2: font size
        fontStyle = ImageFont.truetype(font_path, size=12, encoding=encoding)
        # 参数1：打印坐标，参数2：文本，参数3：字体颜色，参数4：字体
        # Parameter 1: print coordinates, parameter 2: text, parameter 3: font color, parameter 4: font
        draw.text((x, y - 25), str(barcode.data, encoding), fill=(255, 0, 0), font=fontStyle)
        # PIL图片转cv2 图片
        # PIL image to cv2 image
        image = cv.cvtColor(np.array(pilimg), cv.COLOR_RGB2BGR)
        # 向终端打印条形码数据和条形码类型
        # Print barcode data and barcode type to the terminal
        print("[INFO] Found {} barcode: {}".format(barcodeType, barcodeData))
    return image

Effect demonstration


xxxxxxxxxx
cd ~/orbbec_ws/src/astra_visual/qrcode
python3 QRcode_Parsing.py

2.3 Estimation of human posture

Source path:~/orbbec_ws/src/astra_visual/detection

2.3.1 Overview

Human Posture Estimation is to estimate the posture of the human body by correctly associating the key points of the human body that have been detected in the picture. The key points of the human body usually correspond to the joints with a certain degree of freedom on the human body, such as neck, shoulder, elbow, wrist, waist, knee, ankle, etc., as shown below.

基于图卷积的行人意图识别方法与流程

2.3.2 Principles

Input an image, extract features through convolutional network, and get a set of feature Maps. Then divide into two branches, and use CNN network to extract Part Confidence Maps and Part Affinity Fields respectively. After obtaining these two information, we use Bipartite Matching in graph theory to find the Part Association and connect the nodes of the same person. Due to the vectorness of PAF itself, the generated even match is very correct and finally merged into the whole skeleton of a person. Finally, Multi-Person Parsing based on PAFs - > Transform the Multi-person parsing problem into a graphs problem - >Hungarian Algorithm (The Hungarian algorithm is the most common algorithm for partial graph matching. The core of this algorithm is to find an augmentation path. It is an algorithm for finding the maximum matching of bipartite graph with an augmentation path.)

2.3.3. Start


xxxxxxxxxx
cd ~/orbbec_ws/src/astra_visual/detection
python3 target_detection.py

After clicking the image box, use the keyboard [f] key to switch target detection.


xxxxxxxxxx
if action == ord('f'):state = not state  # 功能切换
    # Function switch

Input picture

person

Output picture

result

2.4. Object detection

The main problem in this section is how to use the dnn module in OpenCV to import a trained object detection network. But there are requirements for the opencv version.

At present, there are three main methods for object detection with deep learning:

Faster R-CNNs
You Only Look Once(YOLO)
Single Shot Detectors(SSDs)

Faster R-CNNs is the most commonly heard deep learning-based neural network. However, this approach is technically difficult (especially for deep learning novices), difficult to implement, and difficult to train.

In addition, even with the "Faster" approach to implement R-CNNs (where R stands for candidate Region Proposal), the algorithm is still relatively slow, at about 7FPS.

If we're looking for speed, we can turn to YOLO because it's very fast, reaching 40-90 FPS on TianXGPU, with the fastest version possibly reaching 155 FPS. But the problem with YOLO is that its accuracy needs to be improved.

SSDs was originally developed by Google and can be said to be a balance between the two. Compared to Faster R-CNNs, its algorithm is more straightforward. It's more accurate than YOLO.

2.4.1. Model structure

The main work of MobileNet is to replace the previous standard convolutions with depthwise sparable convolutions to solve the problems of computational efficiency and parameter number of convolutional networks. The MobileNets model is based on depthwise sparable convolutions (depth-level separable convolution), which can decompose standard convolution into a deep convolution and a point convolution (1 × 1 convolution kernel). Deep convolution applies each convolution kernel to each channel, while 1 × 1 convolution is used to combine the output of the channel convolution.

There is Batch Normalization (BN) in the basic component of MobileNet, that is, at each SGD (random gradient descent), the processing is normalized so that the result (each dimension of the output signal) has a mean of 0 and a variance of 1. Generally, BN can be tried to solve problems such as slow convergence or gradient explosion during neural network training. In addition, BN can also be added to speed up the training speed and improve the accuracy of the model in general use.

In addition, the model also uses ReLU activation functions, so the basic structure of depthwise separable convolution is shown as follows:

网络解析（二）：MobileNets详解

The MobileNets network is composed of a number of depthwise separable convolution as shown in the figure above. Its specific network structure is shown in the figure below:

网络解析（二）：MobileNets详解

2.4.2. Code analysis

A list of recognizable objects


xxxxxxxxxx
[person, bicycle, car, motorcycle, airplane, bus, train,
 truck, boat, traffic light, fire hydrant, street sign,
 stop sign, parking meter, bench, bird, cat, dog, horse,
 sheep, cow, elephant, bear, zebra, giraffe, hat, backpack,
 umbrella, shoe, eye glasses, handbag, tie, suitcase,
 frisbee, skis, snowboard, sports ball, kite, baseball bat,
 baseball glove, skateboard, surfboard, tennis racket,
 bottle, plate, wine glass, cup, fork, knife, spoon, bowl,
 banana, apple, sandwich, orange, broccoli, carrot, hot dog,
 pizza, donut, cake, chair, couch, potted plant, bed, mirror,
 dining table, window, desk, toilet, door, tv, laptop, mouse,
 remote, keyboard, cell phone, microwave, oven, toaster,
 sink, refrigerator, blender, book, clock, vase, scissors,
 teddy bear, hair drier, toothbrush]

Load the category [object_detection_coco.txt], import the model [frozen_inference_graph.pb], specify the deep learning framework [TensorFlow]


xxxxxxxxxx
# 加载COCO类名称# Load COCO class name
with open('object_detection_coco.txt', 'r') as f: class_names = f.read().split('\n')
# 对于不同目标显示不同颜色# Display different colors for different targets
COLORS = np.random.uniform(0, 255, size=(len(class_names), 3))
# 加载DNN图像模型# Load DNN image model
model = cv.dnn.readNet(model='frozen_inference_graph.pb', config='ssd_mobilenet_v2_coco.txt', framework='TensorFlow')

The image is imported, the height and width are extracted, the blob of 300x300 pixels is calculated, and the blob is passed into the neural network


xxxxxxxxxx
def Target_Detection(image):
    image_height, image_width, _ = image.shape
    # 从图像中创建blob # Create blob from image
    blob = cv.dnn.blobFromImage(image=image, size=(300, 300), mean=(104, 117, 123), swapRB=True)
    model.setInput(blob)
    output = model.forward()
    # 遍历每个检测# through each test
    for detection in output[0, 0, :, :]:
        # 提取检测的置信度# Extract the confidence of the detection
        confidence = detection[2]
        # 仅在检测置信度高于某个阈值时，绘制边界框，否则跳过
        # Draw a bounding box only if the detection confidence is above a certain threshold, otherwise skip
        if confidence > .4:
            # 获取类的ID# Get the class ID
            class_id = detection[1] 
            # 将类的id 映射到类
            # Map the class id to the class
            class_name = class_names[int(class_id) - 1]
            color = COLORS[int(class_id)]
            # 获取边界框坐标
            # Get the bounding box coordinates
            box_x = detection[3] * image_width
            box_y = detection[4] * image_height
            # 获取边界框的宽度和高度
            # Gets the width and height of the bounding box
            box_width = detection[5] * image_width
            box_height = detection[6] * image_height
            # 在每个检测到的对象周围绘制一个矩形
            # Draw a rectangle around each detected object
            cv.rectangle(image, (int(box_x), int(box_y)), (int(box_width), int(box_height)), color, thickness=2)
            # 将类名文本写在检测到的对象上
            # Write the class name text on the detected object
            cv.putText(image, class_name, (int(box_x), int(box_y - 5)), cv.FONT_HERSHEY_SIMPLEX, 1, color, 2)
    return image

2.4.3. Start


xxxxxxxxxx
cd ~/orbbec_ws/src/astra_visual/detection
python target_detection.py

After clicking the image box, use the keyboard [f] key to switch the body pose estimation.


xxxxxxxxxx
if action == ord('f'):state = not state  # 功能切换# Function switch