[Submitted on 20 Dec 2023]
Download a PDF of the paper titled dIR — Discrete Information Retrieval: Conversational Search over Unstructured (and Structured) Data with Large Language Models, by Pablo M. Rodriguez Bertorello and Jean Rodmond Junior Laguerre (Computer Science Department and 1 other authors
Abstract:Data is stored in both structured and unstructured form. Querying both, to power natural language conversations, is a challenge. This paper introduces dIR, Discrete Information Retrieval, providing a unified interface to query both free text and structured knowledge. Specifically, a Large Language Model (LLM) transforms text into expressive representation. After the text is extracted into columnar form, it can then be queried via a text-to-SQL Semantic Parser, with an LLM converting natural language into SQL. Where desired, such conversation may be effected by a multi-step reasoning conversational agent. We validate our approach via a proprietary question/answer data set, concluding that dIR makes a whole new class of queries on free text possible when compared to traditionally fine-tuned dense-embedding-model-based Information Retrieval (IR) and SQL-based Knowledge Bases (KB). For sufficiently complex queries, dIR can succeed where no other method stands a chance.
Submission history
From: Pablo Rodriguez Bertorello [view email]
[v1]
Wed, 20 Dec 2023 18:41:44 UTC (7,402 KB)