Deploying Pretrained Custom Keras Model Using Amazon Sagemaker

This guide may differ on different on the newest versions of sagemaker sdk and tensorflow at the time of writing the latest tensorflow version is 2.5 since only tensorflow 2.1.0 had solid support and compatability in deployments tensorflow 2.1.0 will be used in here.


You can find the jupyter notebook related to this in here which explains step by steps to follow

Getting Things Ready - Things to note

Versions in use

  • Tensorflow 2.1.0
  • Sagemaker SDK 2.52.1

Training the custom Model

Here in the notebook, we’ll use inbuilt functions in keras to simplify and easily carry out training.

So, I will make a small classifier to identify cats and dogs.


Usually, the 1st thing I’ll check is what data i have and in what format they are in, so as the 1st step in here. I’ll be collecting data of different cats and dogs and will be adding them into two folders cat and dog separately.

+-- notebook.ipynb
+-- data/
|   +-- train/
|       +-- cat/
|       +-- dog/

Then we’ll be loading these images using keras ImageDataGenerator. Here we can split data into training and validation sets using validation_split option in ImageDataGenerator of keras. I’ll be separating training and validation images as 80% and 20% respectively. (validation_split=0.2)

train_datagen = ImageDataGenerator(rescale=1./255,

train_dataset = train_datagen.flow_from_directory("data/train",
                                          batch_size = batch_size,
                                          class_mode = 'binary',
test_dataset = train_datagen.flow_from_directory("data/train",
                                          batch_size =batch_size,
                                          class_mode = 'binary',

So now I have the training and testing datasets separated as 80% and 20% of the whole data I have in my dataset. Now it’s time to carry out training!

Model Selection and Training

For the simplicity sake I’ll use an existing model within Keras applications, but process will be similar in case you want to build your own model and trainup. You can find all the models that are available within keras applications and some sample useage of those in here. I’ll be using VGG16 model in this example, why? well I think with the metrics provided in the Keras applications table and size of the model will be enough in our use case.

So we’ll load up the model trained with imagenet weights using tf.keras.applications.VGG16(include_top=False,weights="imagenet") you can find more options in keras documentation which you can refer in here.

from tensorflow.keras.models import Model
def make_model():
    base_model = tf.keras.applications.VGG16(include_top=False,weights="imagenet")    
    x = base_model.output

    # add a global spatial average pooling layer
    x = keras.layers.GlobalAveragePooling2D()(x)
    x = keras.layers.Dense(1024,activation='relu')(x)

    # Consider only one class it'll be where it's cat or not (binary)
    predictions = keras.layers.Dense(1,activation='sigmoid')(x)
    model = Model(inputs=base_model.input, outputs=predictions)
    return model
model = make_model()

Then we can compile the model selecting binary_crossentropy as the loss and with rmsprop where these you can select according to your requirements.


Here we add few options to stop model training of the model if it has reached a certain accuracy level as well as to reduce the learning rate based on the validation accuracy.

earlystop = EarlyStopping(monitor='val_loss',patience=10,verbose=1)
learning_reduce = ReduceLROnPlateau(patience=3,monitor="val_accuracy",verbose=1,min_lr=0.000001,factor=0.5)
callbacks = [earlystop,learning_reduce]

Then carryout training,

  • steps per epoch / validation steps => How many batch samples to use in a single epoch
  • epochs => No of training cycles
  • validation data => validation data seperated from all the data in earlier step.
  • callbacks => Objects to execute different functionlities inside training cycles (Here to stop training early and reduce learning rate.),
         steps_per_epoch = int(training_samples)//batch_size,
        validation_steps = int(validation_samples)//batch_size,
         epochs = 200,
         validation_data = test_dataset,
          callbacks = callbacks

Here you can use both .h5 (HDF5) format as well as tensorflow SavedModel format, but as for h5 format i came across problems in sagemaker deployment where eventhough the model was deployed to Sagemaker endpoint trained weights were’t reflected on the endpoint. So I recommand to stick on to tensorflow SavedModel format.

Loading Model

Load up the trained model giving the path where the saved model is stored in.

# Load models from tensorflow saved model format
model = keras.models.load_model(saved_model_name)

convert Tensorflow model to the format(TF protobuf format) ready to be deployed in Sagemaker.

def convert_to_aws(loaded_model):
    given a pre-trained keras model, this function converts it to a TF protobuf format
    and saves it in the file structure which aws expects
    from tensorflow.python.saved_model import builder
    from tensorflow.python.saved_model.signature_def_utils import predict_signature_def
    from tensorflow.python.saved_model import tag_constants
    import tensorflow as tf
    import os
    import shutil
    if tf.executing_eagerly():
    dirpath = 'export'
    if os.path.exists(dirpath) and os.path.isdir(dirpath):
    # This is the file structure which AWS expects. Cannot be changed. 
    model_version = '1'
    export_dir = 'export/Servo/' + model_version
    # Build the Protocol Buffer SavedModel at 'export_dir'
    builder = builder.SavedModelBuilder(export_dir)
    # Create prediction signature to be used by TensorFlow Serving Predict API
    signature = predict_signature_def(
        inputs={"inputs": loaded_model.input}, outputs={"score": loaded_model.output})
    session = tf.compat.v1.Session()
    init_op = tf.compat.v1.global_variables_initializer()
    # Save the meta graph and variables
        sess=session, tags=[tag_constants.SERVING], signature_def_map={"serving_default": signature}) 
    #create a tarball/tar file and zip it
    import tarfile
    with'model_deploy.tar.gz', mode='w:gz') as archive:
        archive.add('export', recursive=True)

Upload converted model to AWS bucket.

import sagemaker

sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model_deploy.tar.gz', key_prefix='model')

# View details of the uploaded bucket
print(f"Bucket name is: {sagemaker_session.default_bucket()}")

Make a Sagemaker Model. (This is what Sagemaker endpoint will use to run inference on)

!touch #create an empty python file
import boto3, re
import tensorflow as tf
from sagemaker import get_execution_role

# the (default) IAM role you created when creating this notebook
role = get_execution_role()

# Create a Sagemaker model (see AWS console>SageMaker>Models)
from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model_deploy.tar.gz',
                                  role = role,
                                  framework_version = tf.__version__,
                                  entry_point = '')

Deploy the model

Here an endpoint will be deployed connected to sagemaker model made in earlier steps.

deployement_instance_type = "ml.m4.xlarge"
# Deploy a SageMaker to an endpoint
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type=deployement_instance_type)
endpoint = predictor.endpoint

Here endpoint will be the name of the endpoint which you can get from AWS console as well.

Running Inference

Time to run some testing on the deployed model. Here we can convert the image to a numpy array of correct shape to be sent into the final endpoint.

Note: image = image.resize((224,224), Image.NEAREST) by default keras preprocessing use nearest pixel from input image to do resizing where in Pillow library in default it use cubic interpolation on all pixels to resize image, inorder to keep both same use Image.NEAREST option in Pillow.

import sagemaker
from sagemaker.tensorflow.model import TensorFlowModel
from PIL import Image
import tensorflow as tf
def _image_file_to_tensor(image_file):
    Reads an zlb image file and coverts it to a tensor (ndarray).

    Converts zlb image to size and format necessary for inference using machine learning model.

        image_file : Image file

    image =
    image = image.resize((224,224), Image.NEAREST)
    image = np.asarray(image)
    image = np.expand_dims(image, axis=0)
    return image

predictor=sagemaker.tensorflow.model.TensorFlowPredictor(endpoint, sagemaker_session)
# .predict send the data to our endpoint
model = keras.models.load_model(saved_model_name)
data = _image_file_to_tensor('data/train/cat/nice_cat.jpg') 
#Sagemaker predictions
print(f"Predictions from Sagemaker {predictor.predict(data)}")
#actual keras Predictions
print(f"Predictions from Keras {model.predict(data)}")

As for the results from both keras and sagemaker you should get exact same results. Enjoy!