"How to 10x chatbot UX? " - Add Image Responses to GPT knowledge retrieval apps

Written by AI Jason - December 30, 2023

Welcome to our blog post on how to enhance chatbot user experience by adding image responses to GPT knowledge retrieval apps!

The Importance of Rich Media in Knowledge Retrieval

When it comes to building chatbot applications that retrieve knowledge, one common issue arises with the use of large language models: the lack of engaging media content. While text-based responses may be useful, they often fall short in capturing the attention and interest of users. This is where the integration of rich media, such as images, gifs, and videos, becomes crucial.

Traditional knowledge retrieval systems, whether in the form of chatways as PDFs or websites, typically focus on delivering plain text responses. While the information provided is valuable, it lacks the visual appeal and engagement that rich media can offer. But why is it challenging to incorporate images into these systems?

The Challenge of Incorporating Images in Chatbot Responses

The primary reason why existing Q&A systems can't generate image responses is that they don't extract and utilize image data when training the large language models. Let's take a website as an example. Websites usually return data in either text or raw HTML format, and most of the time, only the text data is chosen for its cleanliness. Unfortunately, this means that important elements like image URLs and reference links are excluded.

The same issue arises with PDF files. PDF data loaders typically extract only text and ignore image files altogether. As a result, image URLs are not fed into the large knowledge models, rendering them unable to retrieve and generate image responses.

A Case Study: Building a Knowledge Retrieval Chatbot with Image Responses

In this case study, we will address the challenge mentioned above and demonstrate how to build a large language model Q&A chatbot that can respond with image references. Our solution involves converting raw HTML files into clean markdown format, which preserves both text and image URLs in a structured manner.

Step 1: Scripting the Raw HTML from the Website

First, let's open our project folder in Visual Studio Code and create an .env file to store the API key. We will be using a popular service called Browserless for website scraping, so make sure to sign up for an account and retrieve your API key. Once you have it, add it to the .env file.

Step 2: Importing Libraries and Loading Environment Variables

In the app.py file, we need to import the necessary libraries and load the environment variables stored in the .env file. If you haven't installed the required libraries, you can easily do so by running "pip install html-to-text, land, chain, llama-index, openai, python-dotenv, bullet" in your terminal.

Step 3: Scripting the Raw HTML

Next, we'll create a function called "scrape_site" that takes a URL as input and retrieves the raw HTML from the website using the Browserless service. We'll define the necessary headers and body structure for the request, convert the body response to JSON format, and then parse the response to extract the raw HTML.

Step 4: Converting HTML to Markdown

Now that we have the raw HTML, we need to convert it to clean markdown format. To achieve this, we'll create a function called "convert_html_to_markdown" and utilize the HTML-to-Text library, which automatically converts HTML to markdown format. We'll set the "ignore_links" parameter to false to preserve the links, and then run the converter to obtain the cleaned markdown content.

Step 5: Handling Image URLs

In some cases, the converted markdown may not have absolute URLs for the image assets. To address this, we need to create a function called "get_base_url" that extracts the domain from the URL and converts the relative image URLs in the HTML to absolute URLs. We'll use the Beautiful Soup library to filter and modify the HTML, ensuring the image URLs are in the correct format.

Step 6: Bringing It All Together

Finally, we'll create a function called "get_markdown_from_url" that combines all the previous steps. This function takes a URL as input, scrapes the raw HTML, converts it to markdown, and handles any image URL conversions. The result is a clean markdown format with both text and image reference data.

Using the Vector Index for Similarity Search

Now that we have our markdown data, we can create a vector index using the Llama Index library. This index allows us to perform similarity searches and retrieve relevant information based on user queries. We'll create a function called "generate_answer" that takes the user query and the vector index as inputs. This function will utilize the vector index to retrieve a list of relevant notes (documents) and then pass them to the GPT-3.5 model to generate an answer. The answer will be formatted in markdown, preserving any image references.

Conclusion

In this blog post, we have explored how to enhance chatbot user experience by incorporating image responses into GPT knowledge retrieval apps. By converting raw HTML into clean markdown format and utilizing the Llama Index library for vector indexing, we can create chatbot applications that go beyond plain text and provide engaging visual content. Whether it's answering questions on a website or extracting information from PDF files, the possibilities for creating interactive and informative chatbot experiences are endless.

FAQs

1. Can this approach be used for chatbots other than knowledge retrieval apps?

Yes, the approach outlined in this blog post can be applied to any chatbot application that can benefit from image responses. Whether it's customer support, e-commerce, or entertainment, incorporating rich media can greatly enhance the user experience.

2. Are there any limitations to using image responses in chatbots?

While image responses offer a more engaging user experience, it's important to consider the file size and loading time implications. Large images can slow down chatbot response times and may not be suitable for all users, especially those with limited bandwidth or mobile data plans.

3. What other types of media can be integrated into chatbot responses?

In addition to images, chatbot responses can also include videos, gifs, audio clips, and even interactive elements such as surveys or quizzes. The choice of media will depend on the specific use case and the preferences of the target audience.

4. Can this approach handle multiple image references in a single response?

Yes, the approach described in this blog post can handle multiple image references in a single response. The markdown format allows for the inclusion of multiple images, gifs, or other media elements using specific syntax.

5. Are there any privacy concerns with using image responses in chatbots?

When using image responses in chatbots, it's essential to consider privacy and data protection regulations. Ensure that any images or media used are properly licensed or sourced from a reliable and legal repository to avoid copyright infringement or privacy violations.

If you enjoyed this blog post and found it helpful, consider subscribing to our newsletter to stay updated on the latest advancements in chatbot technology. We look forward to seeing the exciting applications you create with image responses in your chatbots!

Master AI-Powered Scraping: Extract Data from 99% of Websites

In today's data-driven world, the ability to extract and utilize information from the web is a crucial skill. Whether you're a data scientist, a business analyst, or just someone looking to gather ins
How to Earn $1,370+ Daily with Canva AI's New Money-Making Method

If you're looking for a unique and underrated side hustle that can potentially earn you over $1,370 per day, then you're in for a treat. This method leverages the power of Canva's AI tools to create s
Build a Full-Stack App for FREE with No Coding Using Bolt.DIY, Gemini 2.0, and Deepseek-V3

Building a full-stack application without any coding knowledge and for free might sound too good to be true, but with the right tools, it's entirely possible. In this article, we'll guide you through
DeepSeek V3 Released: Could This Free LLM Outperform ChatGPT?

In the ever-evolving landscape of artificial intelligence, new models and tools frequently emerge, each promising to revolutionize how we interact with technology. The latest entrant generating buzz i
Is Journalist AI the Ultimate AI Writing Tool You've Been Looking For?

Is Journalist AI the ultimate AI writing tool you've been searching for? In this article, we delve into an in-depth review of Journalist AI, exploring its features, advantages, and potential drawbacks