diff --git a/pages/generative-apis/how-to/query-ocr-models.mdx b/pages/generative-apis/how-to/query-ocr-models.mdx new file mode 100644 index 0000000000..5bccc4ffac --- /dev/null +++ b/pages/generative-apis/how-to/query-ocr-models.mdx @@ -0,0 +1,120 @@ +--- +title: How to query OCR models +description: Learn how to interact with powerful OCR models using Scaleway's Generative APIs service. +tags: generative-apis ai-data ocr-models ocr-api +dates: + validation: 2026-04-14 + posted: 2026-04-14 +--- +import Requirements from '@macros/iam/requirements.mdx' + +Scaleway's Generative APIs service allows users to interact with powerful OCR (Optical Character Recognition) models hosted on the platform. + +OCR models can extract structured text from documents such as PDFs and images, preserving formatting and layout in the output. + + +OCR models are currently available via the [OCR API](https://www.scaleway.com/en/developers/api/generative-apis/#path-ocr-beta-create-a-text-extraction) only and are not yet integrated into the Scaleway console playground. + + + + +- A Scaleway account logged in to the [console](https://console.scaleway.com) +- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization +- A valid [API key](/iam/how-to/create-api-keys/) for API authentication +- Python 3.7+ installed on your system + +## Query OCR models via API + +You can query the models programmatically using your favorite tools or languages. +In the example that follows, we will use the MistralAI Python client. + +### Install the MistralAI SDK + +Install the MistralAI SDK using pip: + +```bash +pip install mistralai +``` + +### Initialize the client + +Initialize the MistralAI client with your base URL and API key: + +```python +from mistralai.client import Mistral + +# Initialize the client with your server URL and API key +mistral = Mistral( + server_url="https://api.scaleway.ai", # Scaleway's Generative APIs service URL + api_key="" # Your unique API secret key from Scaleway +) +``` + + +This code sample requires `mistralai >= 2.0.0`. For `mistralai <= 1.12.4` (also named `v1`), replace `from mistralai.client import Mistral` with `from mistralai import Mistral`. + + +### Generate an OCR text extraction + +You can now generate a text extraction. +In the example below, the sample PDF file, [scaleway-impact-report-10-pages.pdf](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-impact-report-10-pages.pdf), is sent to the OCR model via a public URL. The extracted text from each page is written to a local Markdown file. + +```python +# Generate a text extraction using the 'mistral-ocr-2512' model +FILE_URL = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-impact-report-10-pages.pdf" +MODEL = "mistral-ocr-2512" + +res = mistral.ocr.process( + model=MODEL, + document={ + "document_url": FILE_URL, + "type": "document_url", + } +) + +filename = FILE_URL.split("/")[-1].split(".")[0] +with open(f"{filename}.md", "w") as f: + for page in res.pages: + f.write(page.markdown) + +# Print the generated response +print(f"File processed. Result markdown file stored in: {filename}.md") +``` + +Once the script completes, a Markdown file named `scaleway-impact-report-10-pages.md` is created in the current directory, containing the extracted and formatted text from each page of the PDF. + + +You can replace `FILE_URL` with the URL of any publicly accessible PDF or image file. +For example, you can provide a file from Object Storage using an [Object Storage pre-signed URL](https://www.scaleway.com/en/docs/object-storage/how-to/access-objects-via-https/). + + +Alternatively, you can also provide a local PDF file encoded in Base64 format. + +```python +import base64 + +FILE_PATH = "path/to/your/file.pdf" +MODEL = "mistral-ocr-2512" + +with open(FILE_PATH, "rb") as file: + file_content = file.read() + encoded_file= base64.b64encode(file_content).decode("utf-8") + +res = mistral.ocr.process( + model=MODEL, + document={ + "document_url": f"data:application/pdf;base64,{encoded_file}", + "type": "document_url", + } +) + +filename = FILE_PATH.split("/")[-1].split(".")[0] +with open(f"{filename}.md", "w") as f: + for page in res.pages: + f.write(page.markdown) + +# Print the generated response +print(f"File processed. Result markdown file stored in: {filename}.md") +``` + +Refer to the dedicated [OCR API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-ocr-beta-create-a-text-extraction) for a full list of all available parameters. \ No newline at end of file diff --git a/pages/generative-apis/menu.ts b/pages/generative-apis/menu.ts index 4dc37eaebf..c71605a70f 100644 --- a/pages/generative-apis/menu.ts +++ b/pages/generative-apis/menu.ts @@ -42,7 +42,11 @@ export const generativeApisMenu = { label: 'Query audio models', slug: 'query-audio-models' }, - { + { + label: 'Query OCR models', + slug: 'query-ocr-models' + }, + { label: 'Query reranking models', slug: 'query-reranking-models' },