June 21, 2017

Private files for your Rails app using S3

We've had a look at how to secure your files in your Rails application with Paperclip, and it is a great way to get started. As your application grows though, you will likely want to store your files somewhere else. Removing your state off of the servers that serve your requests will help with both scaling out your application if needed by allowing any one of your application servers to be able to access and use a file and can help with disaster recovery. If your server goes down, your files are safe in another service. So where should we put them?

Amazon S3

As ever, the current ubiquitous way to do something on the web is with Amazon. Their service for files is Simple Storage Service. If you don't already have an AWS account, you'll need to head over there and sign up. S3 stores your files in buckets, once you have an account setup, you should create a bucket, probably per environment but starting with development, to use with your application to store your files in.

Connecting to S3

Now that we have a bucket, we need to connect the application to S3. This means configuration to manage. As ever, I always reach for dotenv to manage my environment variables in a Rails app. So we need to add it to our Gemfile:

# Gemfile
# omitted
gem 'dotenv-rails'
# omitted

Now that we have that, we need to run bundle and then we are good to get going

S3 Configuration

We now have dotenv in our Rails application which means we are ready to configure our connection to S3:

# .env
AWS_ACCESS_KEY_ID = access_key_id
AWS_SECRET_ACCESS_KEY = secret_access_key
AWS_BUCKET = bucket-name
AWS_REGION = bucket-region

This is the information you'll need from your S3 account to be able to connect to your bucket. Now that we have the information that we need, we need to tell Paperclip to use S3 instead of our file system on the application server.

Paperclip and S3

There are a few ways that you can connect to your bucket using Paperclip. I prefer setting default settings that are loaded when your application loads:

# config/applicaton.rb
module SecureDownloads
  class Application < Rails::Application
    # Initialize configuration defaults for originally generated Rails version.
    config.load_defaults 5.1

    # Settings in config/environments/* take precedence over those specified here.
    # Application configuration should go into files in config/initializers
    # -- all .rb files in that directory are automatically loaded.

    # Don't generate system test files.
    config.generators.system_tests = nil

    config.paperclip_defaults = {
      storage: :s3,
      s3_permissions: 'private',
      s3_region: ENV['AWS_REGION'],
      s3_credentials: {
        bucket: ENV['AWS_BUCKET'],
        access_key_id: ENV['AWS_ACCESS_KEY_ID'],
        secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
      }
    }
  end
end

As you can probably tell, the section that we've added here to the default application.rb is the config.paperclip_defaults hash. If you have more than one Paperclip class in your application, you might not want to go with this approach, or you can just override any settings in another class if you need. Most of the stuff in here is pretty self-explanatory. It's all our settings that we have set up in our .env file, making sure that the files we put in our bucket are private so they can't be seen or downloaded by those who aren't allowed.

Now that we have our configuration to connect Paperclip with S3, we need to tell the Image class what it needs to know to use it. Let's have a look at the changes we need:

# app/models/image.rb
class Image < ApplicationRecord

  belongs_to :user

  has_attached_file :asset, styles: { thumb: "200x200>" }, url: ':s3_domain_url', path: 'assets/:class/:id/:style.:extension'

  validates_attachment_content_type :asset, content_type: /\Aimage\/.*\z/

  def s3_path(style: nil)
    asset.s3_object(style).presigned_url("get", expires_in: 10.seconds)
  end

  def s3_download_path
    asset.s3_object.presigned_url("get", expires_in: 30.seconds)
  end

end

Let's go through the changes that we have here. First up we have changed the call to has_attached_file, adding in url: ':s3_domain_url', path: 'assets/:class/:id/:style.:extension'.

This tells our Image that the URL that we need for it is an S3 URL and the type needed for the bucket we have, it'll be in the style of https://bucketname.s3.aws.com and our path that we want to use inside our buckets. These let our app know where to put our image when we upload it, and how what URL to use after it is uploaded.

We have added a few methods as well.

def s3_path(style: nil)
  asset.s3_object(style).presigned_url("get", expires_in: 10.seconds)
end

We are defining s3_path so that we can to our image that is inside our private bucket. We do that with a presigned_url that is called on s3_object which we are passing one of our styles to and having it expire in 10 seconds, which should be long enough to have a page render and then have our URL stop being accessible. We need to use the presigned_url to load our images as this is an authenticated request to a URL provided by AWS so that we can get to it despite it being private. In 10 seconds that URL will no longer work, and anyone using it will get an Access denied message from S3.

Our other method:

def s3_download_path
  asset.s3_object.presigned_url("get", expires_in: 30.seconds)
end

Is the same, except we are giving someone a little longer to sort out their download, and it is explicitly asking for the original. We could write a method that would allow us to both displays an Image and use it for download, but by having two it is very clear what is for what and should avoid the possibility of serving an original Image instead of a thumbnail in the case that only one style is for purchase and the others are for marketing. The times used for expires_in in both of these instances are arbitrary and could easily be changed to whatever suits you and your application's needs.

Updating the interface

The first thing that we can do now that we have our images on S3 and have our presigned URLs to display them is get rid of our secure_image_display route and what action in the controller. So we can remove:

# config/routes.rb
get '/images/:id/display', to: "images#display", as: "secure_image_display"

And:

# app/controllers/images_controller.rb
def display
  @image = Image.find(params[:id])
  send_file @image.asset.path(:thumb)
end

We needed these when we had to get the files off our own application server, so we needed a way to them, now our s3_path method will do that job for us. So let's use it, we need to change the pages that show our image thumbnails to use them:

<% # app/view/users/show.html.erb %>
<h1>Images offered by <%= @user.name %></h1>

<% @user.images.each do |image| %>
  <%= link_to image_tag(image.s3_path(style: :thumb)), user_image_path(current_user, image) %>
<% end %>
<% # app/views/images/show.html.erb %>
<h1><%= @image.asset_file_name %> offered by <%= @image.user.name %></h1>

<% unless @image.user == current_user %>
  <%= form_for [current_user, @image], url: user_image_purchase_path(current_user, @image), method: :post do |f| %>
    <%= f.submit "Purchase" %>
  <% end %>
<% end %>

<%= image_tag @image.s3_path(style: :thumb) %>
Downloading a Purchase

All that we have left now is to have our User be able to download their purchases. We have two options for this. We can allow the browser to link to our file on S3 inline and try to display it in the browser, or we can force it to download to our filesystem. Depending on what you are building, either might be appropriate, so let's take a look at both of them.

Opening in the browser

If we want our purchase to be opened in the browser, we need to change our purchased page:

<h1>Images purchased by <%= current_user.name %></h1>

<% current_user.purchased_images.each do |purchase| %>
  <%= link_to image_tag(purchase.image.s3_path(style: :thumb)), purchase.image.s3_download_path %>
<% end %>

By using our s3_download_path directly in the page, a click will attempt to open this straight in the browser if it can. If you want to go this route, you can remove the following code:

# config/routes.rb
get :download
# app/controllers/images_controller.rb
def download
  image = Image.find(params[:image_id])
  send_file image.asset.path
end

As we won't be serving the file download, this means that we can get rid of the code that does. You don't want dead code lying around your application. It's confusing when you try to come back to your app. Just remove it and bring it back with source control if you need to.

Now we have our other option.

Forcing the file to download

If you want to make sure that the file downloads to the user's computer rather than downloads in the browser the change we want to make is a little different. This time we will be keeping the route and controller action. We do need to add our s3_path to our purchased page though.

<% # app/views/users/purchased.html.erb
<h1>Images purchased by <%= current_user.name %></h1>

<% current_user.purchased_images.each do |purchase| %>
  <%= link_to image_tag(purchase.image.s3_path(style: :thumb)), user_image_download_path(current_user, purchase.image) %>
<% end %>

In this, we are displaying our thumbnail from S3, and then we are linking to the download action on our images controller. We already have the route hooked up. We just need to change how we serve the file.

# app/controllers/images_controller.rb
# omitted code
def download
  image = Image.find(params[:image_id])
  data = open(image.s3_download_path)
  send_data data.read, :type => data.content_type, :x_sendfile => true, filename: image.asset_file_name
end

Now to download our image, we need to get our Image from our database like normal, but after that we need to open the presigned_url from S3 via our s3_download_path that we defined, then we are taking the data that is from that sending it to the user's computer with send_data. This will initiate a download with their browser, and they will get their file.

There are tradeoffs in using this method of downloading though. The file will get downloaded to your server and then passed off to the user. If these are big files, then that could be a problem as they will experience a delay, and it will block an instance of your application while it is downloading. If they are small files, then this should be fine. As ever, different things work well in different situations.

Which way should you use?

That depends on your app and your requirements. Personally I'd probably prefer to send the download to the user, but mainly because then users won't be moved out of the app, they will get their file and be able to go on with what they are doing, rather than being sent to an external service that they may or may not have heard of, giving a more seamless application experience. If I were going to use the showing in the browser method, I'd do a bit more work and show it in the interface of the application, rather than just redirecting to the S3 URL, but this article is a bit more proof of concept than that.

This should get you started with S3 and storing files. It is a much better solution for your application as it grows to use multiple application servers, your files won't be tied to anyone one server, giving it the flexibility to send a download to your user from any of your servers.

Given the tradeoffs mentioned though, there is certainly room for improvement here. If you have a way that you've done it that seems more flexible, especially with larger files, I'd love to discuss it, either in the comments or feel free to send me an email.