Using boto3 with Jython


Few days back I had a requirement that I had to use boto3 with Jython. boto3 is AWS EC2 python SDK that you can use to work with various Amazon Cloud API’s. Jython is the JVM implementation of Python. We were packaging our Jython scripts and boto3 and its dependencies inside a JAR. boto3 and Jython work great together when you use them in a normal way i.e. when boto3 can load its data model files from file system. This does not work when you package your script and its dependencies inside a JAR as the model files are then not available on the filesystem but are available on the classpath. In this blog, I will show you how we used boto3 to overcome this limitation.

Using boto3 with Jython

If you don’t have to package your Jython scripts inside a JAR file then you can very easily use boto3 with Jython. Create a new virtual env and download boto3 using pip as shown below.

$ virtualenv venv --python=python2.7
$ source venv/bin/activate
$ pip install boto3

Download Jython standalone jar from their website. Add the site-packages directory to Jython classpath as shown below.

$ export JYTHONPATH=~//venv/lib/python2.7/site-packages
$ java -jar jython-standalone-2.7.0.jar

Once you are inside Jython REPL you check if boto3 library is available.

java -jar jython-standalone-2.7.0.jar
Jython 2.7.0 (default:9987c746f838, Apr 29 2015, 02:25:11)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_75
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>>
>>> from boto3.session import Session

If the import command works fine without any error then that means you have successfully imported boto3 inside your Jython REPL.

You can always check your search path to understand if the library path is searchable or not by using sys.path application

>>> import sys
>>> sys.path
['', '~/playground/boto3/venv/lib/python2.7/site-packages', '~/playground/jython/Lib', '~/playground/jython/jython-standalone-2.7.0.jar/Lib', '__classpath__', '__pyclasspath__/']

Now you can use boto3 library to connect with AWS EC2 API.

Using boto3 with Jython from inside a JAR

If you package Jython scripts and boto3 library inside a jar and then using Java's scripting API try to execute your code then you will get the exception shown below.

botocore.exceptions.DataNotFoundError: Unable to load data for

This means boto3 is unable to load its data model. This is because it expects them to be available on filesystem but as the script and its dependencies are packaged inside a JAR boto3 can’t load them hence it throws exception.

We solved this problem by writing our own custom data loader. When you create boto3.session.Session you can provide it an instance of botocore.session.Session that uses a custom data loader.

from boto3.session import Session
session = Session(aws_access_key_id=aws_access_key_id,
                               aws_secret_access_key=aws_secret_access_key,
                               botocore_session=session_with_custom_loader)
ec2 = session.resource('ec2', region_name='us-east-1', use_ssl=False)

You can register your custom loader by creating botocore session as shown below.

from botocore.session import Session

session_with_custom_loader = Session()

session_with_custom_loader.lazy_register_component('data_loader',
                                       lambda: create_loader())
import os

from botocore.loaders import instance_cache

def create_loader():
    return Loader()


import com.shekhar.ec2.support.ClasspathBasedBotoLoader as ClasspathBasedBotoLoader

class Loader(object):
    def __init__(self):
        self._cache = {}
        self._search_paths = []

    @property
    def search_paths(self):
        return self._search_paths

    @instance_cache
    def list_available_services(self, type_name):
        return ClasspathBasedBotoLoader.listAvailableServices()


    @instance_cache
    def determine_latest_version(self, service_name, type_name):
        return max(self.list_api_versions(service_name, type_name))

    @instance_cache
    def list_api_versions(self, service_name, type_name):
        return ClasspathBasedBotoLoader.listApiVersion(service_name, type_name)


    @instance_cache
    def load_service_model(self, service_name, type_name, api_version=None):
        if api_version is None:
            api_version = self.determine_latest_version(
                service_name, type_name)
        full_path = os.path.join(service_name, api_version, type_name)
        return self.load_data(full_path)

    @instance_cache
    def load_data(self, name):
        import json

        return json.loads(BotoLoader.loadFile(name))

The ClasspathBasedBotoLoader loader is implemented in Java and use TrueZip library to query the JAR file.

import de.schlichtherle.truezip.file.TFile;

import java.io.ByteArrayOutputStream;
import java.net.URI;
import java.net.URL;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

public class ClasspathBasedBotoLoader {

    public static List<String> listAvailableServices() throws Exception {
        List<String> availableServices = new ArrayList<>();
        List<TFile> tFiles = serviceDir();
        for (TFile tFile : tFiles) {
            availableServices.add(tFile.getName());
        }
        Collections.sort(availableServices);
        return availableServices;
    }

    private static List<TFile> serviceDir() throws Exception {
        List<TFile> tFiles = new ArrayList<>();
        URL url = BotoLoader.class.getClassLoader().getResource("boto3/data");
        TFile tFile = new TFile(url.toURI());
        TFile[] listFiles = tFile.listFiles();
        tFiles.addAll(Arrays.asList(listFiles));
        return tFiles;
    }

}

Leave a comment