聚焦Mediapipe框架中第三方模型的高效接入与GPU加速策略优化。Mediapipe作为Google开源的移动端AI框架,凭借其管道架构在移动端实现了低延迟、高精度的实时处理。然而,该框架在支持第三方模型接入方面存在明显不足。针对这一问题,提出了一种创新的模型接入层设计方案,并成功实现了YOLOv11、YOLOv11-Pose和RTMPose三个模型的接入。在GPU加速策略方面,本研究从模型推理参数优化和推理结果解析两个方面进行了探讨,提出了一套完整的性能优化方案。实验结果表明,在Android平台上,该接入方案在模型运行效率方面取得了显著提升,同时保持了良好的部署便捷性。
Abstract
This study focuses on optimizing third-party model integration and GPU acceleration strategies in the Mediapipe framework. As an open-source mobile AI framework developed by Google, Mediapipe achieves low-latency, high-precision real-time processing on mobile devices through its pipeline architecture. However, the framework exhibits significant limitations in supporting third-party model integration. To address this issue, we propose an innovative model integration layer design and successfully implement three models: YOLOv11, YOLOv11-Pose, and RTMPose. Regarding GPU acceleration strategies, this research explores two key aspects: model inference parameter optimization and inference result parsing, proposing a comprehensive performance optimization solution. Experimental results demonstrate that on the Android platform, this integration solution achieves significant improvements in model execution efficiency while maintaining excellent deployment convenience.
关键词
Mediapipe /
YOLOv11 /
RTMPose /
移动端AI /
TfLite
Key words
Mediapipe /
YOLOv11 /
RTMPose /
mobile AI /
TfLite
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Mediapipe框架[EB/OL].[2025-6-24].https://ai.google.dev/edge/mediapipe/framework?hl=zh-cn.
[2] Lugaresi C,Tang J,Nash H,et al.Mediapipe:A framework for building perception pipelines[J].arXiv preprint arXiv:1906.08172,2019.
[3] 刘星辰,杨瑞,刘林鑫,等.基于深度学习的中国通用手语识别系统[J].电脑与电信,2024(11):43-47.
[4] 邵晨悦,孟青云,查佳佳,等.基于视觉识别技术的手势自动跟随研究[J].智能计算机与应用,2024,14(11):117-123.
[5] Hidayatullah P,Syakrani N,Sholahuddin MR,et al.YOLOv8 to YOLO11:A Comprehensive Architecture In-depth Comparative Review[J].arXiv preprint arXiv:2501.13400,2025.
[6] Maji D,Nagori S,Mathew M,et al.Yolo-pose:Enhancing yolo for multi person pose estimation using object keypoint similarity loss[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,2022.
[7] Sengupta A,Jin F,Zhang R,et al.mm-Pose:Real-time human skeletal posture estimation using mmWave radars and CNNs[J].IEEE Sensors Journal. 2020,20(17):10032-10044.
[8] Jiang T,Xie X,Li Y,et al.Rtmpose:Real-time multi-person pose estimation based on mmpose[J].arXiv preprint arXiv:2303.07399,2023.
基金
国家自然科学基金项目,项目编号:81373883; 2025年大学生创新训练项目,项目编号:xj2025118450147