I build software with Python, Django, ReactJS & React-Native. Every day. All day long.

Django-Storages and Amazon S3

How to serve your media files via Amazon's Simple Storage Service

Yesterday I migrated the media files of Publishizer.com to Amazon's Simple Storage Service. This relatively simple task took me almost five hours and was quite a frustrating experience, so I thought I better write this down for later reference.

1. Create a Group

  • Go to https://console.aws.amazon.com/ and login
  • From the list of services select IAM
  • Click at Groups and then Create New Group
  • I named my group Webservers
  • At the second step Permissions I selected Custom Policy
  • I named the policy Webservers-S3 and provided the following code

Webserver Group Policy:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
    "Version": "2014-04-10",
    "Statement": [
        {
        "Effect": "Allow",
        "Action": "s3:*",
        "Resource": "*"
        }
    ]
}

This should allow the group to do anything with the S3 service.

2. Create a User

  • Now that we have a group with generous permissions, click at Users
  • Click at Create New User.
  • I named my user Webserver
  • Make sure that Generate an access key for each User is checked
  • In the pop-up window click at the link Show User Security Credentials
  • Note down the two keys, you will need them in your Django settings later
  • Click at the new user and then at Add User to Groups in the bottom pane
  • Select the Webservers group.

3. Create a Bucket

  • Go back to the Console
  • Select S3
  • Click at Create Bucket - give it a name and a region
  • Click at your new bucket and then at Properties at the top right
  • Click at Permissions and at Add more permissions
  • Select Authenticated Users and grant List, Upload/Delete, View Permissions
  • Click at `Edit bucket policy'

Bucket Policy:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Sid": "AllowPublicRead",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "s3:GetObject",
            "Resource": [
                "arn:aws:s3:::bucketname/*",
                "arn:aws:s3:::bucketname"
            ]
        }
    ]
}

Make sure you replace bucketname with your bucket name.

This should allow anyone to access the files if they have the URL. For Publishizer this is necessary because we will soon allow people to embed campaign widgets in their websites. If you want to make sure that your media files can only be accessed from your own server, you can create a more restrictive policy that requires a certain IP or the request referrer header to be yourdomain.com.

4. Test Your Bucket

At this point in time you should be able to upload stuff into your bucket and access it via a web browser. You can try to manually upload an image and then access it via https://bucketname.s3.amazonaws.com/filename.png.

5. Install Django-storages

Although it seems to be quite dated, everyone still seems to use it, so I decided to use django-storages.

Note that I am NOT using S3 for static files. This is because it is a pain in the ass because it slows down deployments. I know that there are tricks where you would only download file headers from Amazon in order to figure out if a static file needs replacement and then you would store this information in a cache but I have not managed to get this working. Our static files (js, css, a few tiny images) would be cached by our user's browsers anyways, so serving them ourselves would only slow down the user for a few milliseconds and only during the first visit. That's OK with me. It's the media files (i.e. book covers) that are big and many so it makes sense to offload them to Amazon's unlimited disc space.

  • In your Django project add django-storages and boto to the requirements.txt
  • Add 'storages' to your INSTALLED_APPS setting.

Now it gets a bit complicated.

First of all, create a s3utils.py file somewhere in your project:

1
2
3
4
"""Custom S3 storage backends to store files in subfolders."""
from storages.backends.s3boto import S3BotoStorage

MediaRootS3BotoStorage = lambda: S3BotoStorage(location='media')

I do this because I might want to have several subfolders in my bucket in the future (i.e. one for media files, one for static files, one for something completely different that does not come from Django). For now I will only have one folder for media files.

Here is what I have added to my local_settings.py file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
USE_S3 = False
AWS_ACCESS_KEY = 'XXXX'
AWS_SECRET_ACCESS_KEY = 'XXXX'
AWS_STORAGE_BUCKET_NAME = 'bucketname-dev'
AWS_QUERYSTRING_AUTH = False
S3_URL = 'https://%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME

if USE_S3:
    DEFAULT_FILE_STORAGE = 'myproject.s3utils.MediaRootS3BotoStorage'
    THUMBNAIL_DEFAULT_STORAGE = 'myproject.s3utils.MediaRootS3BotoStorage'
    MEDIA_URL = S3_URL + '/media/'

MEDIA_ROOT = os.path.join(PROJECT_ROOT, '../..',  'media')
STATIC_ROOT = os.path.join(PROJECT_ROOT, '../..', 'static')
  • This gives me the chance to turn off S3 when doing local development. If I really need to test something against S3, I can turn it on but use a another bucket for development work.
  • Locally I would use bucketname-dev but on the real server of course I would use the real bucketname. That means I have setup two identical buckets.
  • DEFAULT_FILE_STORAGE is the setting that activates the magic. From now on all FileFields will upload their content to Amazon.
  • I'm using easy_thumbnails and the setting THUMBNAIL_DEFAULT_STORAGE makes sure that my thumbnails get uploaded to Amazon as well.
  • Usually I set MEDIA_URL to /media/ in my normal django settings. When we activate S3, we must change that setting so that it is an absolute path to Amazon's S3 service.
  • You can leave MEDIA_ROOT and STATIC_ROOT as you would usually have it. When you switch off S3, everything will just continue to work like it did before.

6. Test Your Setup

You should be able to run your local development server now and upload a file via your app. It should end up in your Amazon S3 bucket. If it does not work, you might need to create a file $HOME/.boto with the following content:

1
2
3
[Credentials]
aws_access_key_id = XXXX
aws_secret_access_key = XXXX

I'm not sure though if this is really needed (after all you have those settings in your Django settings already). I just remember that I ran into problems and this got me one step further.

7. Upload Your Current Media Files to S3

If you upgraded a legacy app, you will now have the problem that all your media files are still located on your webserver but you want them to be on Amazon's cloud before you make the switch. Thankfully there is a tool similar to rsync which syncs a local folder with a S3 folder.

  • On your webserver download and unpack s3cmd
  • cd into the tool's folder and install the tool via python setup.py install
  • Run s3cmd --configure and add your AWS access keys.

Finally, this is how I uploaded all my existing media files to S3:

1
s3cmd sync --delete-removed --skip-existing ~/webapps/media/* s3://bucketname/media/
  • The first parameter is the path to your media folder
  • The second parameter is your bucketname and the sub-folder that you want to use (if any)
  • Find out more about s3cmd sync

8. Refactor Where Necessary

In our Django projects we usually have a setting:

  • `FULL_DOMAIN = 'https://example.com'

Then we add a context processor that adds this setting to all templates. This is very helpful for email templates, because here you cannot rely on the {% static "img/test.png" %} tag because it would generate a relative path. When your customer opens the email in his email client, those relative paths would, of course, be broken. Therefore we refer to static files in our email templates like this:

1
<img src="{{ FULL_DOMAIN }}{% static "img/test.png" %}" />

This will no longer be sufficient if you set USE_S3 = True. To work around this I created a templatetag like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from django import template
from django.conf import settings

register = template.Library()

@register.assignment_tag
def get_full_domain(consider_s3=True):
    if consider_s3:
        if settings.USE_S3:
            return ''
    return settings.FULL_DOMAIN

So basically if we are using S3, we don't return anything, if we are not using S3, we return the FULL_DOMAIN setting.

My email template would now look like this:

1
2
{% load project_tags %}
<img src="{% get_full_domain %}{% static "img/test.png" %}" />

By the way: I know that Django provides the Sites framwork for stuff like this but we never use it and since we are always only creating single site Django projects, having the site name in the Database is a maintenance nightmare. It makes much more sense to us to set the sitename in the settings.

9. Flip The Switch

After this odyssey, you should be able to change the local_settings.py on your production server like above and set USE_S3 = True. Then run your deployment and visit your site.

Voila! Images are now served via Amazon! Now if this is not a good reason to lean back and experience a good bottle of wine...

 
comments powered by Disqus