Vision

Learn how to use vision capabilities to analyze images

Some models have vision capabilities. This allows you to ask questions about images in natural language and receive natural language or structured data as a response.

Quickstart

Images can be made available to the model either by providing a URL to an image or by providing a base64 encoded image. Images can be passed in the user messages.

Choosing a model

Not all models support vision capabilities. See our models page for more information. We generally recommend using GPT and Claude models for vision tasks.

Next steps

Structured output

Learn how to generate structured data with your text models

Full API reference

View the full list of endpoints and parameters