Research on Third-party Model Integration and GPU Acceleration Strategies in Mediapipe Framework

ZHANG Gang; YUAN Ting; XIAO Ning-jie; YANG Hong-kai; YANG Zong-jun

Computer & Telecommunication ›› 2025, Vol. 1 ›› Issue (6) : 37-41.

ZHANG Gang, YUAN Ting, XIAO Ning-jie, YANG Hong-kai, YANG Zong-jun

Author information +

History +

Abstract

This study focuses on optimizing third-party model integration and GPU acceleration strategies in the Mediapipe framework. As an open-source mobile AI framework developed by Google, Mediapipe achieves low-latency, high-precision real-time processing on mobile devices through its pipeline architecture. However, the framework exhibits significant limitations in supporting third-party model integration. To address this issue, we propose an innovative model integration layer design and successfully implement three models: YOLOv11, YOLOv11-Pose, and RTMPose. Regarding GPU acceleration strategies, this research explores two key aspects: model inference parameter optimization and inference result parsing, proposing a comprehensive performance optimization solution. Experimental results demonstrate that on the Android platform, this integration solution achieves significant improvements in model execution efficiency while maintaining excellent deployment convenience.

Key words

Mediapipe / YOLOv11 / RTMPose / mobile AI / TfLite

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

ZHANG Gang, YUAN Ting, XIAO Ning-jie, YANG Hong-kai, YANG Zong-jun. Research on Third-party Model Integration and GPU Acceleration Strategies in Mediapipe Framework[J]. Computer & Telecommunication. 2025, 1(6): 37-41

References

[1] Mediapipe框架[EB/OL].[2025-6-24].https://ai.google.dev/edge/mediapipe/framework?hl=zh-cn.
[2] Lugaresi C,Tang J,Nash H,et al.Mediapipe:A framework for building perception pipelines[J].arXiv preprint arXiv:1906.08172,2019.
[3] 刘星辰,杨瑞,刘林鑫,等.基于深度学习的中国通用手语识别系统[J].电脑与电信,2024(11):43-47.
[4] 邵晨悦,孟青云,查佳佳,等.基于视觉识别技术的手势自动跟随研究[J].智能计算机与应用,2024,14(11):117-123.
[5] Hidayatullah P,Syakrani N,Sholahuddin MR,et al.YOLOv8 to YOLO11:A Comprehensive Architecture In-depth Comparative Review[J].arXiv preprint arXiv:2501.13400,2025.
[6] Maji D,Nagori S,Mathew M,et al.Yolo-pose:Enhancing yolo for multi person pose estimation using object keypoint similarity loss[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,2022.
[7] Sengupta A,Jin F,Zhang R,et al.mm-Pose:Real-time human skeletal posture estimation using mmWave radars and CNNs[J].IEEE Sensors Journal. 2020,20(17):10032-10044.
[8] Jiang T,Xie X,Li Y,et al.Rtmpose:Real-time multi-person pose estimation based on mmpose[J].arXiv preprint arXiv:2303.07399,2023.