MediaPipe Development

mediapipe github：https://github.com/google/mediapipe mediapipe ：https://google.github.io/mediapipe/ dlib ：http://dlib.net/ dlib github：https://github.com/davisking/dlib

1.Introduction

MediaPipe is a data stream processing machine learning application development framework developed and open source by Google. It is a graph based data processing pipeline used to construct data sources that utilize various forms, such as video, audio, sensor data, and any time series data. MediaPipe is cross platform and can run on embedded platforms (such as Raspberry Pi), mobile devices (iOS and Android), workstations, and servers, and supports mobile GPU acceleration. MediaPipe provides cross platform, customizable ML solutions for real-time and streaming media.

The core framework of MediaPipe is implemented in C++and provides support for languages such as Java and Objective C. The main concepts of MediaPipe include Packets, Streams, Calculators, Graphs, and Subgraphs.

Characteristics of MediaPipe:

End-to-end acceleration: The built-in fast ML inference and processing can be accelerated even on ordinary hardware.
Build once, deploy anytime, anywhere: A unified solution suitable for Android, iOS, desktop/cloud, web, and IoT.
Ready to use solution: A cutting-edge ML solution that showcases the full functionality of the framework.
Free open source: Framework and solutions under Apache 2.0, fully scalable and customizable.

Deep learning solutions in MediaPipe


Android	iOS	C++	Python	JS	Coral
Face Detection	✅	✅	✅	✅	✅	✅
Face Mesh	✅	✅	✅	✅	✅
Iris	✅	✅	✅
Hands	✅	✅	✅	✅	✅
Pose	✅	✅	✅	✅	✅
Holistic	✅	✅	✅	✅	✅
Selfie Segmentation	✅	✅	✅	✅	✅
Hair Segmentation	✅		✅
Object Detection	✅	✅	✅			✅
Box Tracking	✅	✅	✅
Instant Motion Tracking	✅
Objectron	✅		✅	✅	✅
KNIFT	✅
AutoFlip			✅
MediaSequence			✅
YouTube 8M			✅

2.Use

This only demonstrates the case of py files.


cd ~/UARTDemo/scripts
python3 06_FaceLandmarks.py                            
python3 07_FaceDetection.py                            
python3 08_Objectron.py                                
python3 09_VirtualPaint.py                            
python3 10_HandCtrl.py                                 
python3 11_GestureRecognition.py

During use, attention should be paid to the following

Hand detection, posture detection, overall detection, and facial detection all have point cloud viewing functions, taking facial detection as an example.
All functions [q key] are set to exit.
Overall detection: including hand, face, and body posture detection.
3D object recognition: recognizable objects include: ['Shoe ',' Chair ',' Cup ',' Camera '], a total of 4 categories; Click the 'f' button to switch to recognizing objects; The Jetson series does not use keyboard buttons, and switching to recognize objects requires changing the [self. index] parameter in the source code.
Brush: When the index and middle fingers of the right hand merge, it is in the selection state, and a color selection box pops up. When the two fingertips move to the corresponding color position, select the color (black is the eraser); The index and middle fingers are in a drawing state and can be drawn on the drawing board at will.
Finger control: Click the 'f' button to switch the recognition effect.
Finger recognition: gesture recognition designed based on the right hand can be accurately recognized when certain conditions are met. The recognizable gestures include: [Zero, One, Two, Three, Four, Five, Six, Seven, Eight, Okay, Rock, Thumb_up (like), Thumb_down (thumb down), Heart_single (one handed comparison)], a total of 14 categories.

Program operation:


xxxxxxxxxx
python3 06_FaceLandmarks.py

design sketch:


xxxxxxxxxx
python3 07_FaceDetection.py

design sketch:


xxxxxxxxxx
python3 08_Objectron.py

design sketch:


xxxxxxxxxx
python3 09_VirtualPaint.py

design sketch:


xxxxxxxxxx
python3 10_HandCtrl.py

design sketch:


xxxxxxxxxx
python3 11_GestureRecognition.py

design sketch:

Note: These cases are all used on the IMx219 onboard camera. If you want to use a USB camera, you can modify the capture in the program to look like the following.

3.MediaPipe Hands

MediaPipe Hands is a high fidelity hand and finger tracking solution. It uses machine learning (ML) to infer the 3D coordinates of 21 hands from a single frame.

After palm detection of the entire image, precise key point localization of 21 3D hand joint coordinates within the detected hand area is performed through regression based on the hand marking model, which is known as direct coordinate prediction. This model learns consistent internal hand pose representations and is robust even to partially visible hands and self occlusion.

In order to obtain real ground data, approximately 30K real-world images were manually annotated using 21 3D coordinates, as shown below (obtaining Z values from the image depth map, if each corresponding coordinate has a Z value). In order to better cover possible hand postures and provide additional supervision on the properties of hand geometry, high-quality synthetic hand models were drawn in various backgrounds and mapped to corresponding 3D coordinates.

4、MediaPipe Pose

MediaPipe Pose is an ML solution for high-fidelity body pose tracking. Using BlazePose research, 33 3D coordinates and full body background segmentation masks were inferred from RGB video frames, which also provided impetus for the ML Kit pose detection API.The landmark model in the MediaPipe pose predicted the positions of 33 pose coordinates (see figure below).

5.dlib

The corresponding case is facial effects.DLIB is a modern C++toolkit that includes machine learning algorithms and tools for creating complex software in C++to solve real-world problems. It is widely used in industries and academia in fields such as robotics, embedded devices, mobile phones, and large-scale high-performance computing environments.The dlib library uses 68 points to mark important parts of the face, such as the right eyebrow at 18-22 points and the mouth at 51-68 points. Get using the dlib library_ frontal_ face_ The detector module detects faces and uses shape_ predictor_ 68_ face_ Landmarks. dat feature data for predicting facial feature values

Face Detection	Face Mesh	Iris	Hands	Pose	Holistic

Hair Segmentation	Object Detection	Box Tracking	Instant Motion Tracking	Objectron	KNIFT