MediaPipe is a data stream processing machine learning application development framework developed and open-source by Google. It is a graph based data processing pipeline that enables the construction of various forms of data sources, such as video, frequency, sensor data, and any time series data. MediaPipe is cross platform and can run on embedded platforms (such as Raspberry Pi), mobile devices (iOS and Android), workstations, and servers, while supporting mobile GPU acceleration. MediaPipe provides cross platform, customizable ML solutions for real-time and streaming media. The core framework of MediaPipe is implemented in C++and provides support for languages such as Java and Objective C. The main concepts of MediaPipe include Packets, Streams, Calculators, Graphs, and Subgraphs.
The characteristics of MediaPipe:
3D object recognition: Recognisable objects have:['Shoe', 'Chair', 'Cup', 'Camera'],There are four categories; Click [f key] to switch the recognized object.
Note: The f key can switch objects only when the Raspberry PI is connected to the Internet, because the pre-training model needs to be connected to the Internet.
#Use the provided system for direct input
sh start_Camera_computer.sh
xxxxxxxxxx
#Systems that are not data:
docker run -it --rm -v /dev:/dev -v /dev/shm:/dev/shm --privileged --net=host microros/micro-ros-agent:humble udp4 --port 9999 -v4
If the preceding information is displayed, the proxy connection is successful
xxxxxxxxxx
ros2 run yahboom_esp32_mediapipe 08_Objectron
Shoe
Cup
xxxxxxxxxx
ros2 run yahboom_esp32_mediapipe control_servo
After the steering engine is in the center, press "ctrl+C" to end the program.
The location of the function source code is located,
xxxxxxxxxx
~/yahboomcar_ws/src/yahboom_esp32_mediapipe/yahboom_esp32_mediapipe/08_Objectron.py
x#!/usr/bin/env python3
class Objectron:
def __init__(self, staticMode=False, maxObjects=5, minDetectionCon=0.5, minTrackingCon=0.99):
self.staticMode=staticMode
self.maxObjects=maxObjects
self.minDetectionCon=minDetectionCon
self.minTrackingCon=minTrackingCon
self.index=3
self.modelNames = ['Shoe', 'Chair', 'Cup', 'Camera']
self.mpObjectron = mp.solutions.objectron
self.mpDraw = mp.solutions.drawing_utils
self.mpobjectron = self.mpObjectron.Objectron(
self.staticMode, self.maxObjects, self.minDetectionCon, self.minTrackingCon, self.modelNames[self.index])
def findObjectron(self, frame):
cv.putText(frame, self.modelNames[self.index], (int(frame.shape[1] / 2) - 30, 30),
cv.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 3)
img_RGB = cv.cvtColor(frame, cv.COLOR_BGR2RGB)
results = self.mpobjectron.process(img_RGB)
if results.detected_objects:
for id, detection in enumerate(results.detected_objects):
self.mpDraw.draw_landmarks(frame, detection.landmarks_2d, self.mpObjectron.BOX_CONNECTIONS)
self.mpDraw.draw_axis(frame, detection.rotation, detection.translation)
return frame
def configUP(self):
self.index += 1
if self.index>=4:self.index=0
self.mpobjectron = self.mpObjectron.Objectron(
self.staticMode, self.maxObjects, self.minDetectionCon, self.minTrackingCon, self.modelNames[self.index])
class MY_Picture(Node):
def __init__(self, name):
super().__init__(name)
self.bridge = CvBridge()
self.sub_img = self.create_subscription(
CompressedImage, '/espRos/esp32camera', self.handleTopic, 1)
self.objectron = Objectron()
def handleTopic(self, msg):
start = time.time()
frame = self.bridge.compressed_imgmsg_to_cv2(msg)
frame = cv.resize(frame, (640, 480))
action = cv.waitKey(1) & 0xFF
if action == ord('f') or action == ord('F') : self.objectron.configUP()
frame = self.objectron.findObjectron(frame)
end = time.time()
fps = 1 / (end - start)
text = "FPS : " + str(int(fps))
cv.putText(frame, text, (20, 30), cv.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 1)
cv.imshow('frame', frame)
def main():
print("start it")
rclpy.init()
esp_img = MY_Picture("My_Picture")
try:
rclpy.spin(esp_img)
except KeyboardInterrupt:
pass
finally:
esp_img.destroy_node()
rclpy.shutdown()
The main process of the program: subscribe to the image from esp32, through MediaPipe to do the relevant recognition, and then through opencv to display the processed image.