Mediapipe

MediaPipe is an open-source cross-platform framework developed by Google. It provides a set of tools, libraries, and pre-trained models for building various multimedia processing applications. MediaPipe offers capabilities for real-time video and audio processing, including tasks like object detection, tracking, face detection, gesture recognition, and more.

Legacy version

The legacy version of MediaPipe refers to the earlier versions of the MediaPipe framework released prior to the major version 0.8. These legacy versions were still widely used before the introduction of the modular and improved architecture in MediaPipe.

The legacy version of MediaPipe also provided a set of tools, libraries, and pre-trained models for multimedia processing, similar to the current version. However, there are some notable differences between the legacy version and the current version:

In the legacy version, MediaPipe used a monolithic pipeline architecture where the entire processing graph was defined and executed as a single unit.
The legacy version had limited support for hardware acceleration, which meant that it relied primarily on the CPU for processing. This could result in suboptimal performance for certain applications that require real-time or computationally intensive tasks.
The legacy version had a less modular structure compared to the current version. It was more difficult to reuse individual components or integrate them into other applications.

It’s important to note that the legacy versions of MediaPipe are still functional and can be used for certain applications. However, the newer versions of MediaPipe offer significant improvements in terms of modularity, flexibility, performance, and ease of use.

Legacy version	status	Current version
Face Detection	upgraded	Face Detection
Face Mesh	upgraded	Face landmark detection
Iris	upgraded	Face landmark detection
Hands	upgraded	Hand landmark detection
Pose	upgraded	Pose landmark detection
Holistic	upgraded	Holistic landmarks detection
Self Segmentation	upgraded	Image segmentation
Hair Segmentation	upgraded	Image segmentation
Object Detection	upgraded	Object Detection
Box tracking	support ended
Instant motion tracking	support ended
Objectron	support ended
KNIFT	support ended
AutoFlip	support ended
MediaSequence	support ended
Youtube 8M	support ended

Face detection

comes with 6 landmarks
applied to any live viewfinder experience that requires an accurate facial region of interest

Face mesh + Iris = Face landmark detection

estimates 468 3D face landmarks in real-time
infer the 3D facial surface
able to track landmarks involving the iris, pupil, and the eye contours
does not infer the location at which people are looking

Image of man in profile overlaid with blue mesh demonstrating facial landmark detection.

Hands = hand landmark detection

hand and finger tracking solution
infers 21 3D landmarks of a hand from just a single frame

Diagram listing hand landmarks with labels.

Pose = pose landmark detection

33 3D landmarks and background segmentation mask on the whole body from RGB video frames

Diagram listing pose landmarks with labels.

Holistic = Holistic landmarks detection

Live perception of simultaneous human pose, face landmarks, and hand tracking in real-time

Composite image demonstrating concurrent landmark detection in various scenarios.

Self Segmentation + Hair Segmentation = Image Segmentation

Person and background
Person’s hair only
Person’s hair, face, skin, clothing, and accessories

Side by side image of a person and the same image with person sliced out.

References

https://developers.google.com/mediapipe

https://developers.google.com/mediapipe/solutions/guide

https://pypi.org/project/mediapipe/