Mediapipe框架中第三方模型的接入和GPU加速策略的研究

张钢; 袁霆; 肖宁杰; 杨鸿凯; 杨宗骏

电脑与电信 ›› 2025, Vol. 1 ›› Issue (6) : 37-41.

应用技术与研究

Mediapipe框架中第三方模型的接入和GPU加速策略的研究

张钢, 袁霆, 肖宁杰, 杨鸿凯, 杨宗骏

作者信息 +

Research on Third-party Model Integration and GPU Acceleration Strategies in Mediapipe Framework

ZHANG Gang, YUAN Ting, XIAO Ning-jie, YANG Hong-kai, YANG Zong-jun

Author information +

文章历史 +

摘要

聚焦Mediapipe框架中第三方模型的高效接入与GPU加速策略优化。Mediapipe作为Google开源的移动端AI框架,凭借其管道架构在移动端实现了低延迟、高精度的实时处理。然而,该框架在支持第三方模型接入方面存在明显不足。针对这一问题,提出了一种创新的模型接入层设计方案,并成功实现了YOLOv11、YOLOv11-Pose和RTMPose三个模型的接入。在GPU加速策略方面,本研究从模型推理参数优化和推理结果解析两个方面进行了探讨,提出了一套完整的性能优化方案。实验结果表明,在Android平台上,该接入方案在模型运行效率方面取得了显著提升,同时保持了良好的部署便捷性。

Abstract

This study focuses on optimizing third-party model integration and GPU acceleration strategies in the Mediapipe framework. As an open-source mobile AI framework developed by Google, Mediapipe achieves low-latency, high-precision real-time processing on mobile devices through its pipeline architecture. However, the framework exhibits significant limitations in supporting third-party model integration. To address this issue, we propose an innovative model integration layer design and successfully implement three models: YOLOv11, YOLOv11-Pose, and RTMPose. Regarding GPU acceleration strategies, this research explores two key aspects: model inference parameter optimization and inference result parsing, proposing a comprehensive performance optimization solution. Experimental results demonstrate that on the Android platform, this integration solution achieves significant improvements in model execution efficiency while maintaining excellent deployment convenience.

导出引用

张钢, 袁霆, 肖宁杰, 杨鸿凯, 杨宗骏. Mediapipe框架中第三方模型的接入和GPU加速策略的研究[J]. 电脑与电信. 2025, 1(6): 37-41

ZHANG Gang, YUAN Ting, XIAO Ning-jie, YANG Hong-kai, YANG Zong-jun. Research on Third-party Model Integration and GPU Acceleration Strategies in Mediapipe Framework[J]. Computer & Telecommunication. 2025, 1(6): 37-41

中图分类号： TP391.4

参考文献

[1] Mediapipe框架[EB/OL].[2025-6-24].https://ai.google.dev/edge/mediapipe/framework?hl=zh-cn.
[2] Lugaresi C,Tang J,Nash H,et al.Mediapipe:A framework for building perception pipelines[J].arXiv preprint arXiv:1906.08172,2019.
[3] 刘星辰,杨瑞,刘林鑫,等.基于深度学习的中国通用手语识别系统[J].电脑与电信,2024(11):43-47.
[4] 邵晨悦,孟青云,查佳佳,等.基于视觉识别技术的手势自动跟随研究[J].智能计算机与应用,2024,14(11):117-123.
[5] Hidayatullah P,Syakrani N,Sholahuddin MR,et al.YOLOv8 to YOLO11:A Comprehensive Architecture In-depth Comparative Review[J].arXiv preprint arXiv:2501.13400,2025.
[6] Maji D,Nagori S,Mathew M,et al.Yolo-pose:Enhancing yolo for multi person pose estimation using object keypoint similarity loss[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,2022.
[7] Sengupta A,Jin F,Zhang R,et al.mm-Pose:Real-time human skeletal posture estimation using mmWave radars and CNNs[J].IEEE Sensors Journal. 2020,20(17):10032-10044.
[8] Jiang T,Xie X,Li Y,et al.Rtmpose:Real-time multi-person pose estimation based on mmpose[J].arXiv preprint arXiv:2303.07399,2023.