mistral-ocr-latest

This documentation is valid for the following list of our models:

  • mistral/mistral-ocr-latest

Model Overview

This Optical Character Recognition API from Mistral sets a new standard in document understanding. Unlike other models, Mistral OCR comprehends each element of documents—media, text, tables, equations—with unprecedented accuracy and cognition. It takes images and PDFs as input and extracts content in an ordered interleaved text and images.

Maximum file size: 50 MB. Maximum number of pages: 1000.

Setup your API Key

If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.

How to Make a Call

Step-by-Step Instructions
  • Copy the code from one of the examples below, depending on whether you want to process an image or a PDF.

  • Replace <YOUR_AIMLAPI_KEY> with your AIML API key from your personal account.

  • Replace the URL of the document or image with the one you need.

  • If you need to use different parameters, refer to the API schema below for valid values and operational logic.

  • Save the modified code as a Python file and run it in an IDE or via the console.

API Schema

Extract text from images using OCR.

post

Performs optical character recognition (OCR) to extract text from images, enabling text-based analysis, data extraction, and automation workflows from visual data.

Authorizations
AuthorizationstringRequired

Bearer key

Body
modelundefined · enumOptionalPossible values:
documentone ofRequired

Document to run OCR

or
pagesany ofOptional

Specific pages you wants to process

Example: "3" or "0-2" or [0, 3, 4]
stringOptional
or
integer[]Optional
or
any | nullableOptional
include_image_base64boolean | nullableOptional

Include base64 images in response

image_limitinteger | nullableOptional

Max images to extract

image_min_sizeinteger | nullableOptional

Minimum height and width of image to extract

Responses
post
/v1/ocr
201

Successfully processed document with OCR

Example #1: Text Recognition From an Image

We’ve found a photo of a short handwritten text for OCR testing and will be passing it to the model via URL:

Thanks, Reddit!
Response

Example #2: Process a PDF File

Let's process a PDF file from the internet using the described model:

Response

Example #3: Process a PDF File And Parse the Response

As you can see above, the model returns markdown containing the recognized text with formatting elements preserved (headings, italics, bold text, etc.), along with the location of images within the text and the images themselves in base64 format, if you have enabled the corresponding option include_image_base64. However, the markdown is returned as a string with newline characters and other string attributes, so you might need to parse the output separately to get clean markdown containing only the formatted text and images. In this example, we’ve written code that make it for us.

Step-by-step example explanation
  • Send OCR request The ocr_process() function sends a POST request to the AIML API with the URL of a PDF document. It asks for OCR results including embedded base64 images.

  • Receive structured OCR output The API returns a JSON response containing extracted Markdown text and optional base64-encoded images for each page.

  • Create output directory The script creates an output_images/ folder to store images extracted from the base64 data.

  • Replace image placeholders For each Markdown block, the script finds image placeholders like ![img-0.jpeg](img-0.jpeg) and replaces them with local links to newly saved images.

  • Detect image format The script checks the base64 image header (data:image/png;base64, etc.) to determine whether to save the image as .png or .jpg.

  • Decode and save images The base64 image is decoded and saved to a file in the output_images/ folder.

  • Combine Markdown All Markdown blocks from all pages are joined into a single .md file (output.md), separated by horizontal rules.

  • Done The final Markdown file includes properly linked images and is ready for use or preview.

Response before parsing
Response after parsing

Contents of the output.md file:

Content of output_images subfolder

How it looks in any Markdown viewer:

It looks almost like the original PDF, but all the text has been recognized, and the markdown is easy to use further, for example, to embed in a web page. Enjoy!

Last updated

Was this helpful?