Reducing size of Docling Pytorch Docker image


Last couple of days I’ve been working on optimizing the Docker image size of a PDF processing microservice. The service uses Docling, an open-source library developed by IBM Research, which internally uses PyTorch. Docling can extract text from PDFs and various other document types. Here’s a simplified version of our FastAPI microservice that wraps Docling’s functionality.

import os
import shutil
from pathlib import Path
from docling.document_converter import DocumentConverter
from fastapi import FastAPI, UploadFile

app = FastAPI()
UPLOAD_DIR = "uploads"
os.makedirs(UPLOAD_DIR, exist_ok=True)
converter = DocumentConverter()

@app.post("/")
async def root(file: UploadFile):
    file_location = os.path.join(UPLOAD_DIR, file.filename)
    with open(file_location, "wb") as buffer:
        shutil.copyfileobj(file.file, buffer)
    result = converter.convert(Path(file_location))
    md = result.document.export_to_markdown()
    return {"filename": file.filename, "text": md}

The microservice workflow is straightforward:

  • Files are uploaded to the uploads directory
  • Docling converter processes the uploaded file and converts it to markdown
  • The markdown content is returned in the response

Here are the dependencies listed in requirements.txt:

fastapi==0.115.8
uvicorn==0.34.0
python-multipart==0.0.20
docling==2.18.0

You can test the service using this cURL command:

curl --request POST \
  --url http://localhost:8000/ \
  --header 'content-type: multipart/form-data' \
  --form file=@/Users/shekhargulati/Downloads/example.pdf

On the first request, Docling downloads the required model from HuggingFace and stores it locally. On my Intel Mac machine, the initial request for a 4-page PDF took 137 seconds, while subsequent requests took less than 5 seconds. For production environments, using a GPU-enabled machine is recommended for better performance.

The Docker Image Size Problem

Initially, building the Docker image with this basic Dockerfile resulted in a massive 9.74GB image:

FROM python:3.12-slim
RUN apt-get update \
    && apt-get install -y
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
docling-blog  v1  51d223c334ea   22 minutes ago   9.74GB

The large size is because PyTorch’s default pip installation includes CUDA packages and other GPU-related dependencies, which aren’t necessary for CPU-only deployments.

The Solution

To optimize the image size, modify the pip installation command to download only CPU-related packages using PyTorch’s CPU-specific package index. Here’s the optimized Dockerfile:

FROM python:3.12-slim
RUN apt-get update \
    && apt-get install -y \
    && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Building with this optimized Dockerfile reduces the image size significantly:

docling-blog v2 ac40f5cd0a01   4 hours ago     1.74GB

The key changes that enabled this optimization:

  1. Added --no-cache-dir to prevent pip from caching downloaded packages
  2. Used --extra-index-url https://download.pytorch.org/whl/cpu to specifically download CPU-only PyTorch packages
  3. Added rm -rf /var/lib/apt/lists/* to clean up apt cache

This optimization reduces the Docker image size by approximately 82%, making it more practical for deployment and distribution.


Discover more from Shekhar Gulati

Subscribe to get the latest posts sent to your email.

Leave a comment