Amazon SageMaker Deployment
This is a quick guide to deploy your trained models using the Amazon SageMaker model hosting service.
Deploying a model in SageMaker is a three-step process:
- Create a model in SageMaker
- Create an endpoint configuration
- Create an endpoint
For more information on how models are deployed to Amazon SageMaker checkout the documentation here.
We will be using the Amazon SageMaker Python SDK which makes this easy and automates a few of the steps.
Pricing
Sagemaker deployment pricing information can be found here. In short: you pay an hourly rate depending on the instance type that you choose. Be careful because this can add up fast - for instance, the smallest P3 instance costs >$2000/month. Also note that the AWS free tier only provides enough hours to run an m4.xlarge instance for 5 days.
Setup your SageMaker notebook instance
Setup your notebook instance where you have trained your fastai model on a SageMaker notebook instance. To setup a new SageMaker notebook instance with fastai installed follow the steps outlined here.
Ensure you have the Amazon SageMaker Python SDK installed in the kernel named Python 3. An example command to run is the following:
pip install sagemaker
Per-project setup
Train your model on your notebook instance
Create a Jupyter notebook on your SageMaker notebook instance for your project to train your fastai model.
An example based on the pets lesson 1 exercise is the following:
from fastai.vision import *
path = untar_data(URLs.PETS)
path_img = path/'images'
fnames = get_image_files(path_img)
pat = re.compile(r'/([^/]+)_\d+.jpg$')
bs=64
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(),
size=299, bs=bs//2).normalize(imagenet_stats)
learn = create_cnn(data, models.resnet50, metrics=error_rate)
learn.fit_one_cycle(8)
learn.unfreeze()
learn.fit_one_cycle(3, max_lr=slice(1e-6,1e-4))
Export your model
Now that you have trained your learn
object you can export the data
object and save the model weights with the following commands:
learn.export(path_img/'models/resnet50.pkl')
Zip model artefacts and upload to S3
Now we have exported our model artefacts we can zip them up and upload to S3.
import tarfile
with tarfile.open(path_img/'models/model.tar.gz', 'w:gz') as f:
t = tarfile.TarInfo('models')
t.type = tarfile.DIRTYPE
f.addfile(t)
f.add(path_img/'models/resnet50.pkl', arcname='resnet50.pkl')
Now we can upload them to S3 with the following commands.
import sagemaker
from sagemaker.utils import name_from_base
sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
prefix = f'sagemaker/{name_from_base("fastai-pets-model")}'
model_artefact = sagemaker_session.upload_data(path=str(path_img/'models/model.tar.gz'), bucket=bucket, key_prefix=prefix)
Create model serving script
Now we are ready to deploy our model to the SageMaker model hosting service. We will use the SageMaker Python SDK with the Amazon SageMaker open-source PyTorch container as this container has support for the fast.ai library. Using one of the pre-defined Amazon SageMaker containers makes it easy to write a script and then run it in Amazon SageMaker in just a few steps.
To serve models in SageMaker, we need a script that implements 4 methods: model_fn
, input_fn
, predict_fn
& output_fn
.
- The
model_fn
method needs to load the PyTorch model from the saved weights from disk. - The
input_fn
method needs to deserialze the invoke request body into an object we can perform prediction on. - The
predict_fn
method takes the deserialized request object and performs inference against the loaded model. - The
output_fn
method takes the result of prediction and serializes this according to the response content type.
The methods input_fn
and output_fn
are optional and if obmitted SageMaker will assume the input and output objects are of type NPY format with Content-Type application/x-npy
.
For more information on how the PyTorch model serving works check the project page here.
An example script to serve a vision resnet model can be found below:
import logging, requests, os, io, glob, time
from fastai.vision import *
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
JSON_CONTENT_TYPE = 'application/json'
JPEG_CONTENT_TYPE = 'image/jpeg'
# loads the model into memory from disk and returns it
def model_fn(model_dir):
logger.info('model_fn')
path = Path(model_dir)
learn = load_learner(model_dir, fname='resnet50.pkl')
return learn
# Deserialize the Invoke request body into an object we can perform prediction on
def input_fn(request_body, content_type=JPEG_CONTENT_TYPE):
logger.info('Deserializing the input data.')
# process an image uploaded to the endpoint
if content_type == JPEG_CONTENT_TYPE: return open_image(io.BytesIO(request_body))
# process a URL submitted to the endpoint
if content_type == JSON_CONTENT_TYPE:
img_request = requests.get(request_body['url'], stream=True)
return open_image(io.BytesIO(img_request.content))
raise Exception('Requested unsupported ContentType in content_type: {}'.format(content_type))
# Perform prediction on the deserialized object, with the loaded model
def predict_fn(input_object, model):
logger.info("Calling model")
start_time = time.time()
predict_class,predict_idx,predict_values = model.predict(input_object)
print("--- Inference time: %s seconds ---" % (time.time() - start_time))
print(f'Predicted class is {str(predict_class)}')
print(f'Predict confidence score is {predict_values[predict_idx.item()].item()}')
return dict(class_name = str(predict_class),
confidence = predict_values[predict_idx.item()].item())
# Serialize the prediction result into the desired response content type
def output_fn(prediction, accept=JSON_CONTENT_TYPE):
logger.info('Serializing the generated output.')
if accept == JSON_CONTENT_TYPE: return json.dumps(prediction), accept
raise Exception('Requested unsupported ContentType in Accept: {}'.format(accept))
Save the script into a python such as serve.py
Deploy to SageMaker
First we need to create a Predictor class to accept jpeg images as input and output JSON. The default behaviour is to accept a numpy array.
from sagemaker.predictor import Predictor
class ImagePredictor(Predictor):
def __init__(self, endpoint_name, sagemaker_session):
super().__init__(endpoint_name, sagemaker_session=sagemaker_session, serializer=None,
deserializer=json_deserializer, content_type='image/jpeg')
We need to get the IAM role ARN to give SageMaker permissions to read our model artefact from S3.
role = sagemaker.get_execution_role()
In this example we will deploy our model to the instance type ml.m4.xlarge
. We will pass in the name of our serving script e.g. serve.py
. We will also pass in the S3 path of our model that we uploaded earlier.
from sagemaker.pytorch import PyTorchModel
model=PyTorchModel(model_data=model_artefact, name=name_from_base("fastai-pets-model"),
role=role, framework_version='1.0.0', py_version='py3', entry_point='serve.py', predictor_cls=ImagePredictor)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')
It will take a while for SageMaker to provision the endpoint ready for inference.
Test the endpoint
Now you can make inference calls against the deployed endpoint with a call such as:
url = <some url of an image to test>
img_bytes = requests.get(url).content
predictor.predict(img_bytes); response
Local testing
In case you want to test the endpoint before deploying to SageMaker you can run the following deploy
command changing the parameter name instance_type
value to local
.
predictor = model.deploy(initial_instance_count=1, instance_type='local')
You can call the predictor.predict()
the same as earlier but it will call the local endpoint.