Custom Classifier for Hallucination Detection in Question-Answering Tasks
Large language models (LLMs) are omnipresent; however, they are not always production ready. These models can often generate toxic, harmful and factually incorrect responses. Therefore, it is extremely crucial to test these models before exposing them to the customers. Detecting whether a model’s output is grounded is a very crucial and challenging problem. Our customers require systems that can answer questions from large document corpus. Detecting whether the model’s output is grounded with respect to the provided context is an urgent problem for industry specific scenarios. Additionally, there are no open-source grounding datasets available specific to our industry scenarios. In this work, we use Red Teaming techniques to find test cases which can trigger ungrounded responses from the target language model (model to be deployed). Further, we use the generated dataset to finetune a custom classifier for grounding detection for industry specific scenarios. Jaya Susan Mathew Data Scientist Jaya Mathew is a Senior Applied scientist at Microsoft where she is part of the Cloud for Industry team. Her work focuses on the deployment of machine learning solutions to solve real business problems for customers in multiple domains. Prior to joining Microsoft, she has worked with Nokia and Hewlett-Packard on various analytics and machine learning use cases. She holds an undergraduate as well as a graduate degree from the University of Texas at Austin in Mathematics and Statistics respectively. Prerna Singh Applied Scientist 2 @ Microsoft Prerna is currently working as an Applied Scientist 2 in the Industry AI group @Microsoft where she develops Machine Learning based solutions for different industrial verticals including finance and sustainability. Prior to Microsoft, she was a graduate student at Carnegie Mellon University(CMU) at Pittsburgh. At CMU, she worked on developing machine learning models to understand the response time of cancer drugs in individuals and also explored the potential of ECG as a biometric signal for human identification. Before CMU, she worked as a Researcher in the field of Reinforcement Learning and Robotics at TCS Research and Innovation(TCS R&I) labs. During her tenure at TCS R&I, she designed RL algorithms for training robots to perform target reaching tasks. Prerna is passionate about machine learning, NLP and Reinforcement Learning, and has also published multiple papers in these domains.