This study focuses on optimizing third-party model integration and GPU acceleration strategies in the Mediapipe framework. As an open-source mobile AI framework developed by Google, Mediapipe achieves low-latency, high-precision real-time processing on mobile devices through its pipeline architecture. However, the framework exhibits significant limitations in supporting third-party model integration. To address this issue, we propose an innovative model integration layer design and successfully implement three models: YOLOv11, YOLOv11-Pose, and RTMPose. Regarding GPU acceleration strategies, this research explores two key aspects: model inference parameter optimization and inference result parsing, proposing a comprehensive performance optimization solution. Experimental results demonstrate that on the Android platform, this integration solution achieves significant improvements in model execution efficiency while maintaining excellent deployment convenience.
Key words
Mediapipe /
YOLOv11 /
RTMPose /
mobile AI /
TfLite
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] Mediapipe框架[EB/OL].[2025-6-24].https://ai.google.dev/edge/mediapipe/framework?hl=zh-cn.
[2] Lugaresi C,Tang J,Nash H,et al.Mediapipe:A framework for building perception pipelines[J].arXiv preprint arXiv:1906.08172,2019.
[3] 刘星辰,杨瑞,刘林鑫,等.基于深度学习的中国通用手语识别系统[J].电脑与电信,2024(11):43-47.
[4] 邵晨悦,孟青云,查佳佳,等.基于视觉识别技术的手势自动跟随研究[J].智能计算机与应用,2024,14(11):117-123.
[5] Hidayatullah P,Syakrani N,Sholahuddin MR,et al.YOLOv8 to YOLO11:A Comprehensive Architecture In-depth Comparative Review[J].arXiv preprint arXiv:2501.13400,2025.
[6] Maji D,Nagori S,Mathew M,et al.Yolo-pose:Enhancing yolo for multi person pose estimation using object keypoint similarity loss[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,2022.
[7] Sengupta A,Jin F,Zhang R,et al.mm-Pose:Real-time human skeletal posture estimation using mmWave radars and CNNs[J].IEEE Sensors Journal. 2020,20(17):10032-10044.
[8] Jiang T,Xie X,Li Y,et al.Rtmpose:Real-time multi-person pose estimation based on mmpose[J].arXiv preprint arXiv:2303.07399,2023.