In the realm of natural language processing, question answering over tabular data, also known as TableQA, presents a big challenge. SAP in consequence presents a challenge in enhancing Large Language Models’ (LLMs) proficiency in reasoning over its data because it resides in tables, and table perturbations limit LLM robustness.
Still now, no method has succeeded combining Deep Learning with structure data analysis for several reasons, and spolier alert, research is advancing towards aggregating multiple reasoning pathways. I will explain in the last section.
When designing an NLP pipeline where one of the sources of data is a table, the key challenges in handling the structural nature of tables and the difficulty of linearizing tables without losing critical structural and relational information are two of the main problems, and not the only ones. Models also struggle with precise numerical computation (SAP representation of a 263 number), and the risk of crucial details being overshadowed by a dense Query result.
Recent advancements like the StructGPT framework (specialized interfaces for reading structured data) show promise in engaging LLMs with structured data but lack in integrating symbolic reasoning, a critical aspect for enhancing LLM capabilities in tabular reasoning. Symbolic AI and learning from examples (neural networks) must be defined together to understand rules and learn from many examples.
Other approaches go into building or fine tuning a model; like TaBERT, TaPas, or PASTA don’t solve the fundamental problem of table perturbations requiring a continuous pre-training which complicate over time and size.
Recent LLM advancements have shown potential for tabular reasoning, with techniques like Chain-of-Thought illustrating effectiveness although Chain of Thought is not designed for tabular data.
Google research – Chain of Tables
Released this first week of 2024, Chain of Tables explicitly incorporates tabular data within a reasoning chain (Agent). This method guides LLMs to iteratively perform operations and update tables, forming a chain representing the reasoning process for table-related problems.
Chain of Table process involves defining common table operations and prompting LLMs for step-by-step reasoning. Each operation enriches or condenses the table, aiding in reaching accurate predictions. This iterative process continues until a solution is reached.
Google depicts a comparison of three different reasoning methods applied to a complex table: (a) generic reasoning, (b) program-aided reasoning, and (c) the proposed Chain of table approach.
In the scenario, a table combines a cyclist’s nationality and name in a single cell. Method (a) struggles to provide the correct answer due to the complexity of multi-step reasoning. Method (b) employs program execution, such as SQL queries, but it too is unable to accurately parse the name and nationality from the table. On the other hand, method (c) Chain of table uses a series of operations to iteratively modify the complex table into a format more suited to the query. This enables the Large Language Model (LLM) to arrive at the correct answer.
Google research presents an interesting approach for integrating LLMs with tables, since fine-tuning a LMs for table understanding is complex.
Google method also brings challenges by itself, not mentioned in the paper; .
Using Google’s approach, intermediate results are stored in the transformed tables. These intermediate tables with aggregated data are hosted on the agent memory and could require extensive memory management techniques for the Agent, a topic which is still lacking much investigation.
Microsoft analyzed how good GPT4 is for tabular data.
In November 23, Microsoft, in a paper titled “GPT4Table: Can Large Language Models Understand Structured Table Data?” investigated the capabilities of Large Language Models in understanding and processing structured table data.
A traditional LLM to DB Query or TableQA is a 2-step process
- Question Decomposition
- Data Retrieval
This must is orchestrated, and use a powerful LLM like OpenAI and an Agent, which chains the SQL question to another agent that interprets the result of that SQL question to form the answer.
Ye et al 2023
Microsoft Research designed a benchmark comprising tasks to evaluate LLMs’ called Structural Understanding Capabilities. These tasks include cell lookup, row retrieval, and size detection, each presenting unique challenges, described in the paper.
Microsoft research really extensive indicating GPT4 capabilities to understand a table structure, but still far from perfect; even in simple tasks like table size detection, there are failures.
Not mentioned in the paper; cost or latency. Longer context windows resulting from a query might reduce quality, and processing large amounts of data in a prompt to GPT-4, where a single gigabyte of raw text costs hundreds of dollars in API indicates it’s essential to be strategic and selective in data extraction from extensive corpora.
Conclusion
In this blog I discuss the challenges and recent advancements in the field of NLP focus on question answering over tabular data (TableQA). Highlighting the difficulties faced by Large Language Models in reasoning over data presented in tables, particularly due to table perturbations that limit their robustness. Recently released by Google Research, Chain of Tables incorporates tabular data within a reasoning chain improving LLM capability to reason over table-based questions.