This article introduces an approach for developing a lightweight and intelligent retrieval system specifically designed for authority data commonly utilized in library and information science. The approach aims to implement the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) and to offer a viable technical pathway to overcome efficiency bottlenecks associated with conventional data retrieval methods. Built with the novel database system DuckDB and a suite of open-source AI tools, the authority data retrieval system supports on-premises deployment by leveraging existing IT infrastructure and enables data retrieval through either SQL queries or natural language. Practice demonstrates that the system developed with this technical solution strikes a great balance between practicality and cost-effectiveness, empowering users to efficiently find and extract target information from massive authority data.
Key words
authority data /
data retrieval /
DuckDB
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] IFLA.Functional Requirements for Authority Data A Conceptual Model[EB/OL].(2013-07-29)[2025-05-30].https://www.ifla.org/wp-content/uploads/files/assets/cataloguing/frad/frad_2013.pdf.
[2] MotherDuck.What is DuckDB?[EB/OL].[2025-05-30].https://motherduck.com/learn-more/what-is-duckdb/.
[3] ObjectBox.Embedded databases what is an embedded database? and how to choose one[EB/OL]. (2022-04-08)[2025-05-30].https://objectbox.io/how-to-choose-embedded-database/.
[4] IBM.What is OLAP (online analytical processing)?[EB/OL].[2025-05-30].https://www.ibm.com/think/topics/olap.
[5] MotherDuck.DuckDB vs SQLite: Performance, Scalability and Features[EB/OL].[2025-05-30].https://motherduck.com/learn-more/duckdb-vs-sqlite-databases/.
[6] Vanna AI.How Vanna Works[EB/OL].[2025-05-30].https://vanna.ai/docs/.
[7] Belcic I.What is RAG (retrieval augmented generation)?[EB/OL].(2024-10-21)[2025-05-30]. https://www.ibm.com/think/topics/retrieval-augmented-generation.
[8] Literal AI.Chainlit Overview[EB/OL].(2025-03-31)[2025-05-30].https://docs.chainlit.io/get-started/overview.
[9] 图书馆·情报与文献学名词审定委员会.图书馆·情报与文献学名词[M].北京:科学出版社,2019.
[10] BASSETT L.JSON必知必会[M].魏嘉汛,译.北京:人民邮电出版社,2016.
[11] 马尔斯. JSON实战[M].邵钏,译.北京:人民邮电出版社,2018.
[12] WARD I.Documentation for the JSON Lines text file format.[EB/OL].(2024-11-12)[2025-05-30].https://jsonlines.org.
[13] Chroma.Chroma Introduction[EB/OL].(2025-05-27)[2025-05-30].https://docs.trychroma.com/docs/overview/introduction.