Abstract:With the rapid growth of Internet content, the pressure for rapid identification of the network content is becoming
higher and higher. This paper researches on the content recognition based on clustering algorithm, which is very important to maintain
the security of network and the health of the network. The Internet content recognition at present mainly uses the keywords, but
it is unable to meet the actual demand of the network contents and server contents stored in different ways. In view of the practical
problems, the recognition of unstructured data stored in the forms of graphics, images and audio is researched. The existing clustering
algorithm is improved based on the law of the development of Internet content, in order to filter and discriminate the Internet
content in the greatest degree, to maintenance the Internet security.