7. 3D object recognition

1、synopsis

MediaPipe is a data stream processing machine learning application development framework developed and open-source by Google. It is a graph based data processing pipeline that enables the construction of various forms of data sources, such as video, frequency, sensor data, and any time series data. MediaPipe is cross platform and can run on embedded platforms (such as Raspberry Pi), mobile devices (iOS and Android), workstations, and servers, while supporting mobile GPU acceleration. MediaPipe provides cross platform, customizable ML solutions for real-time and streaming media. The core framework of MediaPipe is implemented in C++and provides support for languages such as Java and Objective C. The main concepts of MediaPipe include Packets, Streams, Calculators, Graphs, and Subgraphs.

The characteristics of MediaPipe:

2、Three-dimensional object recognition

3D object recognition: Recognisable objects have:['Shoe', 'Chair', 'Cup', 'Camera'],There are four categories; Click [f key] to switch the recognized object.

Note: The f key can switch objects only when the Raspberry PI is connected to the Internet, because the pre-training model needs to be connected to the Internet.

2.1、activate

  1. First set the proxy IP for the ROS-wifi image transfer module, for specific steps, please see the basic use of 1. The use of ROS-wifi image transfer module in micros car tutorial, this tutorial will not be elaborated.
  2. Linux system connects to ROS-wifi image transfer module, start docket, enter the following command to connect ROS-wifi image transfer module

image-2024030900005 If the preceding information is displayed, the proxy connection is successful

  1. Open a new terminal and execute the following command

Shoe image-20240125180415024

Cup image-2024030900006

  1. Notes

After the steering engine is in the center, press "ctrl+C" to end the program.

  1. If the camera picture is upside down, see 3. Camera picture correction (must-read) tutorial, this tutorial is no longer explained

2.2、Code parsing

The location of the function source code is located,

The main process of the program: subscribe to the image from esp32, through MediaPipe to do the relevant recognition, and then through opencv to display the processed image.