Enhancing Document Q&A with Chart Understanding
This talk will discuss enhancing document question answering with plot/graph understanding. Currently, document question answering task involves 3 steps: I) Extracting text from a document, this involves parsing the document through pdf2text ii) Prompt creation – the extracted text from the document is combined with the question iii) Query GPT with the prompt. The current text extraction tools are unable to parse the data inside the chart images present in the document. These charts or graphs often contain important information which can be used to improve the quality of answers generated. In this work, we discuss how visual language models can extract data inside graph images. This extracted numeric information is added to the prompt to improve the quality of question answering task. Prerna Singh Applied Scientist 2 @ Microsoft Prerna is currently working as an Applied Scientist 2 in the Industry AI group @Microsoft where she develops Machine Learning based solutions for different industrial verticals including finance and sustainability. Prior to Microsoft, she was a graduate student at Carnegie Mellon University(CMU) at Pittsburgh. At CMU, she worked on developing machine learning models to understand the response time of cancer drugs in individuals and also explored the potential of ECG as a biometric signal for human identification. Before CMU, she worked as a Researcher in the field of Reinforcement Learning and Robotics at TCS Research and Innovation(TCS R&I) labs. During her tenure at TCS R&I, she designed RL algorithms for training robots to perform target reaching tasks. Prerna is passionate about machine learning, NLP and Reinforcement Learning, and has also published multiple papers in these domains.