Mediapipe
MediaPipe is an open-source cross-platform framework developed by Google. It provides a set of tools, libraries, and pre-trained models for building various multimedia processing applications. MediaPipe offers capabilities for real-time video and audio processing, including tasks like object detection, tracking, face detection, gesture recognition, and more.
Legacy version
The legacy version of MediaPipe refers to the earlier versions of the MediaPipe framework released prior to the major version 0.8. These legacy versions were still widely used before the introduction of the modular and improved architecture in MediaPipe.
The legacy version of MediaPipe also provided a set of tools, libraries, and pre-trained models for multimedia processing, similar to the current version. However, there are some notable differences between the legacy version and the current version:
- In the legacy version, MediaPipe used a monolithic pipeline architecture where the entire processing graph was defined and executed as a single unit.
- The legacy version had limited support for hardware acceleration, which meant that it relied primarily on the CPU for processing. This could result in suboptimal performance for certain applications that require real-time or computationally intensive tasks.
- The legacy version had a less modular structure compared to the current version. It was more difficult to reuse individual components or integrate them into other applications.
It’s important to note that the legacy versions of MediaPipe are still functional and can be used for certain applications. However, the newer versions of MediaPipe offer significant improvements in terms of modularity, flexibility, performance, and ease of use.
| Legacy version | status | Current version |
|---|---|---|
| Face Detection | upgraded | Face Detection |
| Face Mesh | upgraded | Face landmark detection |
| Iris | upgraded | Face landmark detection |
| Hands | upgraded | Hand landmark detection |
| Pose | upgraded | Pose landmark detection |
| Holistic | upgraded | Holistic landmarks detection |
| Self Segmentation | upgraded | Image segmentation |
| Hair Segmentation | upgraded | Image segmentation |
| Object Detection | upgraded | Object Detection |
| Box tracking | support ended | |
| Instant motion tracking | support ended | |
| Objectron | support ended | |
| KNIFT | support ended | |
| AutoFlip | support ended | |
| MediaSequence | support ended | |
| Youtube 8M | support ended |
Face detection
- comes with 6 landmarks
- applied to any live viewfinder experience that requires an accurate facial region of interest
Face mesh + Iris = Face landmark detection
- estimates 468 3D face landmarks in real-time
- infer the 3D facial surface
- able to track landmarks involving the iris, pupil, and the eye contours
- does not infer the location at which people are looking

Hands = hand landmark detection
- hand and finger tracking solution
- infers 21 3D landmarks of a hand from just a single frame

Pose = pose landmark detection
- 33 3D landmarks and background segmentation mask on the whole body from RGB video frames

Holistic = Holistic landmarks detection
- Live perception of simultaneous human pose, face landmarks, and hand tracking in real-timeĀ

Self Segmentation + Hair Segmentation = Image Segmentation
- Person and background
- Person’s hair only
- Person’s hair, face, skin, clothing, and accessories

References
https://developers.google.com/mediapipe