Reducing size of Docling Pytorch Docker image

Last couple of days I’ve been working on optimizing the Docker image size of a PDF processing microservice. The service uses Docling, an open-source library developed by IBM Research, which internally uses PyTorch. Docling can extract text from PDFs and various other document types. Here’s a simplified version of our FastAPI microservice that wraps Docling’s functionality.

import os
import shutil
from pathlib import Path
from docling.document_converter import DocumentConverter
from fastapi import FastAPI, UploadFile

app = FastAPI()
UPLOAD_DIR = "uploads"
os.makedirs(UPLOAD_DIR, exist_ok=True)
converter = DocumentConverter()

@app.post("/")
async def root(file: UploadFile):
    file_location = os.path.join(UPLOAD_DIR, file.filename)
    with open(file_location, "wb") as buffer:
        shutil.copyfileobj(file.file, buffer)
    result = converter.convert(Path(file_location))
    md = result.document.export_to_markdown()
    return {"filename": file.filename, "text": md}

The microservice workflow is straightforward:

Files are uploaded to the uploads directory
Docling converter processes the uploaded file and converts it to markdown
The markdown content is returned in the response

Here are the dependencies listed in requirements.txt:

fastapi==0.115.8
uvicorn==0.34.0
python-multipart==0.0.20
docling==2.18.0

You can test the service using this cURL command:

curl --request POST \
  --url http://localhost:8000/ \
  --header 'content-type: multipart/form-data' \
  --form file=@/Users/shekhargulati/Downloads/example.pdf

On the first request, Docling downloads the required model from HuggingFace and stores it locally. On my Intel Mac machine, the initial request for a 4-page PDF took 137 seconds, while subsequent requests took less than 5 seconds. For production environments, using a GPU-enabled machine is recommended for better performance.

The Docker Image Size Problem

Initially, building the Docker image with this basic Dockerfile resulted in a massive 9.74GB image:

FROM python:3.12-slim
RUN apt-get update \
    && apt-get install -y
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

docling-blog  v1  51d223c334ea   22 minutes ago   9.74GB

The large size is because PyTorch’s default pip installation includes CUDA packages and other GPU-related dependencies, which aren’t necessary for CPU-only deployments.

The Solution

To optimize the image size, modify the pip installation command to download only CPU-related packages using PyTorch’s CPU-specific package index. Here’s the optimized Dockerfile:

FROM python:3.12-slim
RUN apt-get update \
    && apt-get install -y \
    && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Building with this optimized Dockerfile reduces the image size significantly:

docling-blog v2 ac40f5cd0a01   4 hours ago     1.74GB

The key changes that enabled this optimization:

Added --no-cache-dir to prevent pip from caching downloaded packages
Used --extra-index-url https://download.pytorch.org/whl/cpu to specifically download CPU-only PyTorch packages
Added rm -rf /var/lib/apt/lists/* to clean up apt cache

This optimization reduces the Docker image size by approximately 82%, making it more practical for deployment and distribution.

Discover more from Shekhar Gulati

Subscribe to get the latest posts sent to your email.

The Docker Image Size Problem

The Solution

Discover more from Shekhar Gulati

Share this:

Related

Leave a comment Cancel reply