摘要
针对图书情报领域常用的规范数据,提出一种轻量级智能检索系统构建方案,旨在践行FAIR原则(可发现性、可访问性、互操作性、可重用性),并为解决传统数据检索方法中存在的效率瓶颈提供切实可行的技术路径。规范数据检索系统基于创新型数据库系统DuckDB及一系列开源人工智能软件构建,依托已有IT基础设施实现本地化部署,并为用户提供通过SQL或自然语言检索规范数据的功能。实践证明,采用该技术方案开发的系统兼具实用价值与成本效益,可助力用户在海量规范数据中高效地查找并获取目标信息。
Abstract
This article introduces an approach for developing a lightweight and intelligent retrieval system specifically designed for authority data commonly utilized in library and information science. The approach aims to implement the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) and to offer a viable technical pathway to overcome efficiency bottlenecks associated with conventional data retrieval methods. Built with the novel database system DuckDB and a suite of open-source AI tools, the authority data retrieval system supports on-premises deployment by leveraging existing IT infrastructure and enables data retrieval through either SQL queries or natural language. Practice demonstrates that the system developed with this technical solution strikes a great balance between practicality and cost-effectiveness, empowering users to efficiently find and extract target information from massive authority data.
关键词
规范数据 /
数据检索 /
DuckDB
Key words
authority data /
data retrieval /
DuckDB
张伟.
规范数据检索系统设计与实现[J]. 电脑与电信. 2025, 1(6): 24-28
ZHANG Wei.
Design and Implementation of Authority Data Retrieval System[J]. Computer & Telecommunication. 2025, 1(6): 24-28
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] IFLA.Functional Requirements for Authority Data A Conceptual Model[EB/OL].(2013-07-29)[2025-05-30].https://www.ifla.org/wp-content/uploads/files/assets/cataloguing/frad/frad_2013.pdf.
[2] MotherDuck.What is DuckDB?[EB/OL].[2025-05-30].https://motherduck.com/learn-more/what-is-duckdb/.
[3] ObjectBox.Embedded databases what is an embedded database? and how to choose one[EB/OL]. (2022-04-08)[2025-05-30].https://objectbox.io/how-to-choose-embedded-database/.
[4] IBM.What is OLAP (online analytical processing)?[EB/OL].[2025-05-30].https://www.ibm.com/think/topics/olap.
[5] MotherDuck.DuckDB vs SQLite: Performance, Scalability and Features[EB/OL].[2025-05-30].https://motherduck.com/learn-more/duckdb-vs-sqlite-databases/.
[6] Vanna AI.How Vanna Works[EB/OL].[2025-05-30].https://vanna.ai/docs/.
[7] Belcic I.What is RAG (retrieval augmented generation)?[EB/OL].(2024-10-21)[2025-05-30]. https://www.ibm.com/think/topics/retrieval-augmented-generation.
[8] Literal AI.Chainlit Overview[EB/OL].(2025-03-31)[2025-05-30].https://docs.chainlit.io/get-started/overview.
[9] 图书馆·情报与文献学名词审定委员会.图书馆·情报与文献学名词[M].北京:科学出版社,2019.
[10] BASSETT L.JSON必知必会[M].魏嘉汛,译.北京:人民邮电出版社,2016.
[11] 马尔斯. JSON实战[M].邵钏,译.北京:人民邮电出版社,2018.
[12] WARD I.Documentation for the JSON Lines text file format.[EB/OL].(2024-11-12)[2025-05-30].https://jsonlines.org.
[13] Chroma.Chroma Introduction[EB/OL].(2025-05-27)[2025-05-30].https://docs.trychroma.com/docs/overview/introduction.