Unlock AI Potential: Create Powerful Copilots with GPT-4 Turbo and Azure AI
Unlock AI Potential: Create Powerful Copilots with GPT-4 Turbo and Azure AI
What happens when you combine the new GPT-4 Turbo with Vision large language model with Vision and Search in Microsoft's Azure AI platform? Well, the combination, as I'll demonstrate today, can enable direct lookups from image inputs over your organizational data to ground generative AI responses. This marks a significant improvement in the accuracy of natural language processing and image recognition tasks to enable new generative AI scenarios. Video inputs are also uniquely supported when you combine GPT-4 Turbo with Vision AND Azure AI Vision. And best of all, with the new Azure AI Studio, it's easy to build and orchestrate powerful copilot style apps that now leverage the power of both.
The Capabilities of GPT-4 Turbo with Vision
Azure AI Studio provides a single destination to directly leverage GPT-4 Turbo with Vision, which I'll refer to as GPT-4V for short, from the Azure OpenAI service and experiment with it in the playground. The model brings with it extensive open world visual understanding, which means images can now be used as an input to generate text-based responses. To give you a flavor of what it can do, let's dive into the new Azure AI Studio, where I've uploaded an image of a right-angled triangle. I've visually pointed to areas in the image with hand-written questions on the math problem to solve. And you can see GPT-4V is describing the image. it's also acknowledging the math problem and then generating a response with a detailed, step-by-step breakdown of its reasoning.
Here's another example, this time of temporal anticipation. I've uploaded three images and I'm prompting GPT-4V to predict what will happen next based on the images. And it's able to predict that the player will kick the ball towards the goal, attempting to score, with the goalkeeper attempting to block the shot. Vision and language capabilities like this open up brand-new scenarios when building copilot-style apps.
Practical Use Case: Vacation Rental Assistant
Let's explore a practical use case for GPT-4V with Vision. In the system message, I've given the model context of its purpose. In this case, I want it to function as a vacation rental assistant. I've written a prompt to provide a description, along with tips for enhancing the property listing based on the images I've uploaded. And GPT-4V knows what vacation listings look like and how they are worded. So, first, it generates a short title and description, followed by a bulleted list of features of the property. And finally, it generates sample text for an enhanced listing with tips on how to customize things further, all derived from the details in the images. In fact, in all of these examples, you can see GPT-4V's visual reasoning capabilities and how easy it is to experiment with the model using Azure AI Studio.
GPT-4 Turbo with Azure AI Vision
Now let's look at what happens when you combine the baseline capabilities of GPT-4V with Azure AI Vision. Here, for example, I'm building a chat experience that is part of an outdoor adventure site. I've enabled the Azure AI Vision service. So now we can use video as an input for GPT-4 Turbo with Vision through the native integration of Azure AI Vision Video Retrieval.
In my prompt, I'll ask where this destination is with a recommendation on the type of equipment required for camping in the month of January. And you can see it knows the location, and based on the conditions for that time of year, it makes a recommendation for footwear and suggests additional equipment recommendations. It even recommends that I check the weather forecast to make any adjustments before my trip. Behind the scenes, the video is broken down into still image frames using Azure AI Vision's Video Retrieval model, which is automatically deployed on the backend. The most relevant frames are presented to the GPT-4V model, which is then able to reason over images. And combined with the additional context provided in the prompt, it presents a list of recommended equipment based on its open-world understanding. That's pretty powerful in itself.
GPT-4V with Azure AI Vision for Image Analysis
That said, we can do even more when we combine GPT-4V and Azure AI Vision for image analysis tasks. Let's compare the generated response from GPT-4V on its own with GPT-4V with Azure AI Vision enabled in the Azure AI Studio playground. By using Azure AI Vision, bounding boxes appear over the image and specific items are called out, resulting in a more detailed description compared to GPT-4V on its own.
In fact, with Azure AI Vision, we can do direct lookups of image data, especially when grounded with enterprise image data and combined with Azure AI Search for Retrieval Augmented Generation. In our outdoor company enterprise chat app, I'll upload the same image but prompt it to find me a tent like this one under $200. And again, you can see it's able to reason over the image, pick out the tent, and generate a response and recommendation based on the closest item in our catalog. In this case, it recommends the TrailMaster X4 tent with a direct link to purchase. With retrieval augmented search enabled, this level of specificity is made possible because, behind the scenes, Azure AI Search uses vector search with image embeddings along with our state-of-the-art semantic re-ranker for information retrieval. It brings in the metadata that Azure AI Vision has derived from the image, as well as context from the user prompt and reasons over the images and metadata in the catalog to find the top results which are then presented to the GPT-4V large language model to generate an informed response.
How to Build Your Own Experience with Azure AI Studio
Now that you've seen the capabilities and power of GPT-4 Turbo with Vision and Azure AI, you might be wondering how you can build your own AI app. Azure AI Studio makes it easy. By grounding the GPT-4V model with your enterprise data and catalog images, you can create your own generative AI apps.
In Azure AI Studio, you can add your data by selecting or adding a data source from Azure Blob Storage or an existing Azure AI Search instance. You can also upload files and associated metadata manually. Once you've provided your grounding data, Azure AI Search works alongside GPT-4V to generate informed responses based on the images and metadata.
It's easy to translate everything you do in Azure AI Studio into working code. You can view the code behind your app and deploy it as a new web app or update an existing one directly from Azure AI Studio. The end-to-end experience of exploring, building, testing, orchestrating, and deploying your generative AI apps is made simple with Azure AI Studio.
Conclusion
The combination of GPT-4 Turbo with Vision and Microsoft's Azure AI platform with Vision and Search opens up new possibilities for generative AI apps. With improved natural language processing and image recognition tasks, you can create powerful copilot-style apps that leverage the capabilities of both GPT-4V and Azure AI Vision. Whether you're building a vacation rental assistant or an outdoor adventure chat experience, the integration of these technologies provides accurate and detailed responses based on image inputs and contextual prompts. Start unlocking the AI potential today with Azure AI Studio.
FAQs
-
Can I use GPT-4 Turbo with Vision for language-based responses?
Yes, GPT-4 Turbo with Vision can generate text-based responses based on image inputs. Its open world visual understanding allows it to understand and describe images in detail.
-
What scenarios can be enabled with GPT-4 Turbo and Azure AI Vision?
The combination of GPT-4 Turbo and Azure AI Vision opens up new scenarios for building copilot-style apps. From vacation rental assistants to outdoor adventure chat experiences, you can leverage the power of image inputs and contextual prompts to provide accurate and customized responses.
-
How does Azure AI Vision enhance image analysis tasks?
Azure AI Vision enhances image analysis tasks by providing bounding boxes and specific item callouts in response to image inputs. This allows for more detailed descriptions and direct lookups of image data.
-
Can I build my own generative AI app with Azure AI Studio?
Yes, Azure AI Studio provides a comprehensive platform where you can build, test, and deploy your own generative AI apps. By grounding the GPT-4 Turbo with Vision model with your enterprise data and catalog images, you can create customized and powerful apps.
-
How can I translate my AI Studio project into working code?
Azure AI Studio makes it easy to translate your project into working code. You can use the View Code button to see the code behind your app, and from there, deploy it as a new web app or update an existing one.




