一种Deep Web聚焦爬虫

全文: PDF(0 KB)
输出: BibTeX | EndNote (RIS)

摘要

聚焦爬虫是搜索引擎的网页自动获取程序，是搜索引擎发现和索引深层网（Deep Web）数据的关键一步。本文通过介绍使用PageRank算法分析网页的重要性的一种聚焦爬虫，通过网站结构图剪枝技术及页面判断算法过滤与主题无关的URL，有效提高deep web数据集成的质量和效率。

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	（）
	作者相关文章
	黄昊晶

关键词 ：聚焦爬虫, Deep Web, PageRank, 网站结构图剪枝, 页面判断

Abstract：

Focused crawler is programme of the search engine to automatically download websites， and is a key which the search engines find and index the deep web data．This paper describes a kind of web crawler which using PageRank algorithm to analysis the importance of website，by website structure pruning technic and determine pages algorithms to filter URL is not page-topic needed，effectively improve the quality and efficiency of deep web data integration．

Key words： Focused crawler Deep Web PageRank Site structure pruning Determine pages

收稿日期: 1900-01-01 年卷期日期: 2011-03-10 出版日期: 2011-03-10

引用本文:

黄昊晶. 一种Deep Web聚焦爬虫[J]. , 2011, 1(03): 0-0.
Huang Haojing . A Kind Of Deep Web Focused Crawler. , 2011, 1(03): 0-0.

链接本文:

https://www.computertelecom.com.cn/CN/ 或 https://www.computertelecom.com.cn/CN/Y2011/V1/I03/0