摘要
为保证访问负载的均衡分布,分布式存储系统往往依赖访问热度信息进行文件放置。然而,访问热度信息在文件存入系统时刻并不可知,并且随时间不断变化,依赖访问热度信息的放置算法需要不断调整文件的存储位置,产生高昂的迁移成本。本文提出一种细粒度均衡的新型分布式文件放置算法。该算法利用文件访问热度同已创建时间之间的相关性,通过保证各节点所存储数据量在创建时间维度上的细粒度相似性,实现较好的访问负载均衡。该算法仅基于文件的创建时间属性,该属性在文件存入系统时刻属于已知信息并且不随时间变化。实验结果表明,相较于HDFS系统的随机放置算法,本文算法能够更好地实现访问负载的均衡分布,提高访问性能。
Abstract
To gain a balanced distribution of access load, existing distributed storage systems often make file placement decisions based on popularity information. However, on one hand, popularity information is not known at the moment when the file is initially stored in the system. Therefore, placement optimization algorithms which depend on global popularity information are not practical. On the other hand, popularity information changes over time. File storage locations are required to make frequent adjustment,resulting in high data migration cost. In this paper, a new kind of distributed file placement algorithm is proposed, which does not depend on popularity information. It makes use of the correlation between the file's popularity and its creation time. Through a fine-grained similarity of different storage nodes in data amount created in each time interval, the algorithm achieves ideal load balance.Moreover, the algorithm is very practical and causes little migration cost, since it only depends on file creation time information,which is known and static. Experimental results show that the algorithm achieves a more balanced distribution of access load compared to the random placement algorithm adopted in HDFS.
关键词
放置算法 /
分布式文件存储系统 /
文件访问热度 /
负载均衡 /
细粒度相似
Key words
placement algorithm /
distributed file storage system /
file popularity /
load balance /
fine-grained similarity
刘硕, 辛刚.
一种细粒度均衡的新型分布式文件放置算法[J]. 电脑与电信. 2018, 1(1-2): 41-43
LIU Shuo , XIN Gang.
A Fine-grained Load Balanced File Placement Algorithm for Distributed Storage Systems[J]. Computer & Telecommunication. 2018, 1(1-2): 41-43
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}