Industry-First Observability Features Elevate Trust in Generative AI Applications

deepset Cloud’s Cost-Crushing Dashboard Shows What Works; Where Others Fall Short deepset’s Industry-First Observability Features Elevate Trust in Generative AI Applications

deepset Cloud became the first large language model (LLM) platform to provide insights into the precision and fidelity of responses from LLM generative AI through a first-of-its-kind “Groundedness Observability Dashboard.” With this 01/2024 release, the deepset Cloud platform is redefining how generative AI users approach trust, safety, and efficiency in their generative AI applications.

Understanding Hallucinations for Unparalleled Trust and Safety

The Groundedness Observability Dashboard displays trend data for how well generative AI responses are grounded in the source documents. For the first time, this feature provides a quantifiable score to measure the factuality of an LLM’s output. The results serve as a guide for developers in modifying their RAG setup, fine tuning models, and altering prompts to improve accuracy and reliability of generated responses. Simplified insights into what works enables users to track how well the model can use the provided data to answer queries in a reliable manner. When tracked over time, this allows for comparisons with other widely-available LLM platforms.

Greater Confidence in Response Quality

deepset Cloud’s Source Reference Prediction generative response annotation is also now generally available. Response Annotation adds academic-style citations to the LLM-generated answer. Those citations reference the respective document on which a statement is based. Users can then review the source material in order to fact-check generated answers or gain a better understanding of the source data in its original context.

The combination of deepset Cloud’s Groundedness Dashboard and Source Reference Prediction gives organizations greater confidence in the quality of the responses in their LLM applications, and provides visibility when an application’s accuracy does not meet requirements.

Using groundedness to optimize retrieval

Groundedness isn’t just a useful metric for measuring the faithfulness of your LLM-generated answers to a knowledge base. It can also be used as a proxy to identify the ideal hyperparameters for your retrieval step. Optimizing the number of documents embedded in the query can reduce your LLM costs by a significant factor. See our accompanying blog post for an example of how this metric was used to reduce LLM costs by 40%, through clever hyperparameter optimization alone.

A Trust Layer for Generative AI Applications

These new features emphasize deepset’s commitment to building a robust trust layer within generative AI applications. The new features effectively detect hallucinations and provide benchmarking tools, allowing users to make informed decisions about the reliability of their AI models.

[To share your insights with us, please write to [email protected]]

Source link