Using Retrieval Augmented Generation to Enhance Spatial Queries in FORMATION Asset Tracking
We recently had the pleasure of hosting another CIEE intern, Ratul Pradhan . During February and March we tasked him with an open question: can we use AI to unlock data in FORMATION’s software platform such that we can enable our customers to ask questions of using a simple chat bot? Ratul did not disappoint.
The Challenge
FORMATION is an asset tracking solution that allows workers to keep track of things in the workplace. By tracking assets and annotating information on objects with photos, comments, keywords, etc. workers are able to share and document a lot of information about important objects and features of the workplace. We use a traditional search engine to unlock this information for workers on a map. However, we had a hunch that we should be able to do more with this wealth of information.
OpenAI’s ChatGPT , and similar chat bots by others such as Google’s Gemini and Twitter/X’s Grok enable users to ask questions via a chat. These products are powered by a family of AIs called Large Language Models (LLMs). These AI models are able to interpret written text, images, or even spoken text and come up with a good answer.
However, while such models know a lot about the combined wealth in the form of the content of the public internet, they know nothing about the data that we have in our app. Our customers use FORMATION to track assets in the workplace and they also create tasks and points on the FORMATION map. This provides us with rich data on what is happening in the workplace.
Technical Implementation
To answer questions about this data, an LLM would have to know about this data somehow. To enable this, we decided to use Retrieval Augmented Generation (RAG) . This is a technique that makes it possible to ask questions about things that these models don’t know about and haven’t been trained on. RAG works by first doing a search against a database with your data and then using the retrieved results to add these to the context provided to the chat bot. The retrieved results are used to augment the information given to the chatbot. The chatbot can then look at the search results and construct an answer.
The primary goal of this project was to find out if we can use LLMs and RAG to query our data and get reasonably good answers. An additional goal was to do this using self-hosted, open source models and services so that we can avoid sending our customer data to third parties. This is of course important to our customers as what happens in their workplaces is highly confidential.
After some experimentation, we settled on Llama2 and Gemma . These are two popular open source LLMs that developers can download and use on their laptop. While not as powerful as the closed source alternatives available from OpenAI, Google, and others, these models are still good enough to be useful in combination with RAG.
To create a demo, Ratul used Langchain , which is a popular tool to experiment with LLMs and RAG. A considerable challenge was transforming the structured data that we have in FORMATION into a form where we could insert it into a vector database. Vector databases are commonly used to implement semantic search and as such a popular tool to use with RAG. They work by calculating embeddings (vectors) that capture the meaning of text or images using an AI model. These vectors are then stored in the database. When loo king up things from the database a semantic vector is calculated for the query which is then used to find semantically similar things in the database. Langchain makes it really easy to experiment with different models for calculating these semantic vectors and tuning how the retrieval part of rag works.
There are of course a lot of challenges with this. A lot of the models are designed to interpret human text and not structured data. Additionally, some of these models tend to have only a limited vocabulary and ignore semantically relevant things outside of this vocabulary. This limits their understanding of data. The trade off here usually is model and vector size. Where bigger, computationally more expensive models and vectors may yield better results.
After some tinkering, Ratul managed to line everything up and we were able to get some answers from our data. Questions like: “Given that I’m on the third floor of the Ahoy building, where would I go if I’m hungry”. Ahoy here is the co-working space that we are in. And we managed to get the satisfying answer stating that there is a cantine on the second floor.
Conclusions
Like all good research, this project raises a lot of questions and potential follow ups. Also, this is by no means ready to be delivered to our customers. For example, these open source LLMs still have a lot of issues with hallucinating answers. Additionally, the model we used for calculating embeddings has a lot of limitations. It has a limited vocabulary and there are some challenges to make it understand spatial information, for example. This limits what we can ask and makes it tricky for an LLM to provide good answers. To address this, we’d have to work on improving RAG querying.
However, we do see a credible path to doing more with these technologies and are very excited about the possibilities that are unlocked by these new technologies. We are planning to continue to experiment with this and hope to eventually be able to productize this so we can provide valuable insights to our customers and make our product even more useful to them.
Also, we are looking forward to our next batch of interns who will be joining us over the summer for some more experimentation. If you are interested in joining our mission, reach out to us and keep an eye on our open positions page. We regularly host interns via a variety of international exchange programs.