Website delivery
with Cloudfront

14th September 2014
(initially published the 5th March 2014)

The main interest of static website is to be able to be stored in the cloud, that is to say on a content delivery network able to serve our data with amazing performances, a very low cost, and multiple benefits in terms of lightness, safety and reliability.

This article sets out to show how to host a static website (wether built from a generator like Jekyll, Pelican or Hyde, or “by hand”) on Amazon Web Services, especially Amazon S3 in order to store the website files, and CloudFront in order to deliver them. The main purpose is to be able to get a website fast from anywhere, reliable, secure, highly scalable and inexpensive (with a low traffic, the hosting will cost you about one dollar a month).

To this purpose, we will use:

Hosting on S3

After having created an AWS account, go to the management console, then to S3: this service will store the website files.

Creating buckets

The website will be located on www.domain.tld. The domain name root, domain.tld, will redirect to this location. Create two buckets (with Create bucket) named from the expected URL: domain.tld and www.domain.tld.

Bucket hosting the website

In www.domain.tld properties, activate website hosting (Static Website Hosting then Enable website hosting): you can choose a home page (index.html) or an error page (error.html). The website will be available from the Endpoint URL: keep it, we will need it in the following section.

Buckets redirecting

In domain.tld properties, choose Static Website Hosting in order to select Redirect all requests: we provide domain.tld. This bucket will stay empty.

From Endpoint location, we can now see the files hosted in www.domain.tld bucket. We can upload those files from the AWS console, but we will explain on the last part how to upload it with one bash command line.

Serving data with Cloudfront

S3 hosts our data in one unique location. Data stored in Dublin will be provided quite fast to a visitor located in Paris (about 200 ms in order to load the home page of this website) but less in New-York (500 ms) or Shanghai (1,300 ms).

Amazon CloudFront is a CDN, serving content to end-users with high availability and high performance. The access time falls below 100 ms in Paris, New-York and Shanghai.

In return, a propagation delay exists between an upload and its update on Cloudfront. We will see in the last part how to notify any modification.

Creating the distribution

In the AWS management console, choose CloudFront, Create Distribution, then Web.

In Origin Domain Name, we provide the address previously copied, similar to www.domain.tld.s3-website-eu-west-1.amazonaws.com. The field will automatically propose an other value (like www.domain.tld.s3.amazonaws.com); don’t click on it: URL ending with / wouldn’t lead to /index.html. Choose an ID in Origin ID.

Leave the other field as default, except Alternate Domain Names where we provide our domain name: www.domain.tld. Indicate the homepage in Default Root Object: index.html.

Our distribution is now created: we can now activate it with Enable. InProgress status means Cloudfront is currently propagating our data; when it’s over, the status will become Deployed.

Domain name with Route 53

Creating zone

Route 53 service will allow us to use our own domain name. In AWS console management, select Route 53 then Create Hosted Zone. In Domain Name, put your domain name without any sub-domain: domain.tld.

Redirecting DNS

Select the newly created zone, then the NS type. Its Value field gives 4 addresses. Tell your registrar to make the DNS pointing to them.

Domain with Cloudfront

Back to Route 53, in domain.tld, create 3 records set with Create Record Set:

Of course, it is possible to redirect sub-domains to other services (with NS, A and CNAME records) and to use mails (MX records).

Now, an user going to domain.tld or www.domain.tld will target the same name buckets (thanks to Route 53) which redirect to www.domain.tld (thanks to S3). This address directly leads (thanks to Route 53) to the Cloudfront distribution, which provides our files stored in the bucket www.domain.tld. Now, we just have to send our website to Amazon S3.

Deploying Jekyll to the cloud

We will now create a sh file which will build the website, compress and send to Amazon S3 files which has been updated since the previous version, and indicate it to Cloudfront.

Prerequisite

We will use s3cmd in order to sync our bucket with our local files. Install the last development version (you need to install at least the 1.5 version) which allows us to invalidate files on Cloudfront.

On Mac OS, with Homebrew, install s3cmd with --devel option, and gnupg for secured transfers:

brew install --devel s3cmd
brew install gpg

Now we have to gives s3cmd the ability to deal with our AWS account. In Security Credentials, go to Access Key and generate one access key and its secret Key. Then configure s3cmd with s3cmd --configure.

In order to optimize images, we will install jpegoptim and optipng:

sudo brew install jpegoptim
sudo brew install optipng

Building Jekyll and compressing files

We first build Jekyll into the _site folder:

jekyll build

Using jekyll-press plugin will optimize HTML, CSS and JS files. If jpegoptim and optipng are installed, we can optimize images:

find _site -name '*.jpg' -exec jpegoptim --strip-all -m80 {} \;
find _site -name '*.png' -exec optipng -o5 {} \;

Then, in order to improve performances, we compress HTML, CSS and JS files with Gzip, that is to say all files out of static/ folder:

find _site -path _site/static -prune -o -type f \
-exec gzip -n "{}" \; -exec mv "{}.gz" "{}" \;

Uploading files to Amazon S3

We use s3cmd to upload the website; only the updated files will be sent. We use the following options:

We first send static files, stored in static/, assigning them a 10 weeks cache duration:

s3cmd --acl-public --cf-invalidate -M \
      --add-header="Cache-Control: max-age=6048000" \
      --cf-invalidate \
      sync _site/static s3://www.domain.tld/

Then we send the other files (HTML, CSS, JS…) with a 48 hours cache duration:

s3cmd --acl-public --cf-invalidate -M \
      --add-header 'Content-Encoding:gzip' \
      --add-header="Cache-Control: max-age=604800" \
      --cf-invalidate \
      --exclude="/static/*" \
      sync _site/ s3://www.domain.tld/

Finally we clean the bucket by deleting files which have been deleted in the local folder, and we invalidate the home page on Cloudfront (cf-invalidate doesn’t do it):

s3cmd --delete-removed --cf-invalidate-default-index \
      sync _site/ s3://www.domain.tld/

Deploy in one single command

We put those command in one single file, named _deploy.sh, located in Jekyll folder:

#!/bin/sh
# Building Jekyll
jekyll build

# Compressing and optimizing files
find _site -path _site/static -prune -o -type f \
      -exec gzip -n "{}" \; -exec mv "{}.gz" "{}" \;
find _site -name '*.jpg' -exec jpegoptim --strip-all -m80 {} \;
find _site -name '*.png' -exec optipng -o5 {} \;

# Synchronisation des médias
s3cmd --acl-public --cf-invalidate -M \
      --add-header="Cache-Control: max-age=6048000" \
      --cf-invalidate \
      sync _site/static s3://www.domain.tld/

# Sync media
s3cmd --acl-public --cf-invalidate -M \
      --add-header 'Content-Encoding:gzip' \
      --add-header="Cache-Control: max-age=604800" \
      --cf-invalidate \
      --exclude="/static/*" \
      sync _site/ s3://www.domain.tld/

# Delete removed files
s3cmd --delete-removed --cf-invalidate-default-index \
      sync _site/ s3://www.domain.tld/

You only have to execute sh _deploy.sh to update the website. A few minutes may be required in order to update CloudFront data.

Stats

Although our site is static and served by a CDN, it is quite possible to analyze the logs if you do not want to use a system based on a javascript code, such as Piwik or Google Analytics. Here, we will automate the task (recovery logs, processing and displaying statistics) from a server (in our example, a Raspberry Pi in Raspbian) and we will use Awstats.

Retrieving logs

Let’s start by activating logs creation on our Cloudfront distribution. In the AWS Management Console, select the Amazon S3 service and create a “statistics” bucket which will store logs waiting to be retrieved. Then, in Cloudfront, select the distribution that provides our website, then Distribution settings, Edit, and select the statistics bucket in the field Bucket for Logs.

We create locally a folder that will retrieve these logs, then we can then retrieve the logs and then delete the bucket using s3cmd:

mkdir ~/awstats
mkdir ~/awstats/logs
s3cmd get --recursive s3://statistics/ ~/awstats/logs/
s3cmd del --recursive --force s3://statistics/

Installing and configuring Awstats

We begin by installing and copy Awstats (where www.domain.tld is your domain name):

sudo apt-get install awstats
sudo cp /etc/awstats/awstats.conf \
        /etc/awstats/awstats.www.domain.tld.conf
sudo nano /etc/awstats/awstats.www.domain.tld.conf

In this configuration file, change the following settings to specify how to treat Awstats logs Cloudfront (where user is your username):

# Processing multiple gzip logs files
LogFile="/usr/share/awstats/tools/logresolvemerge.pl /home/user/awstats/logs/* |"
# Formating Cloudfront generated logs
LogFormat="%time2 %cluster %bytesd %host %method %virtualname %url %code %referer %ua %query"
LogSeparator="\t"
# Domain names (website and Cloudfront)
SiteDomain="www.domain.tld"
HostAliases="REGEX[.cloudfront\.net]"

Finally, we copy the images which will be displayed in the reports:

sudo cp -r /usr/share/awstats/icon/ ~/awstats/awstats-icon/

Generating stats

Once this configuration is done, it is possible to generate statistics as a static HTML file using:

/usr/share/awstats/tools/awstats_buildstaticpages.pl \
    -dir=~/awstats/ -update -config=www.domain.tld \

The statistics are now readable from the awstats.www.domain.tld.html file. It is then possible to publish it, send it to a server or email for example.

Regular updating

To automate the generation of statistics at regular intervals, creating a stats.sh with nano   / awstats / stats.sh that retrieves logs and generates statistics:

#!/bin/sh
# Retrieving logs
s3cmd get --recursive s3://statistics/ ~/awstats/logs/
s3cmd del --recursive --force s3://statistics/
# Generating stats
/usr/share/awstats/tools/awstats_buildstaticpages.pl \
    -dir=~/awstats/ -update -config=www.domain.tld \

We give the rights to this file so it can be executed, and then create a cron task:

sudo chmod 711 ~/awstats/stats.sh
sudo crontab -e

To generate statistics of every six hours for example:

0 */6 * * * ~/awstats/stats.sh

Published on the 14th September 2014 by Sylvain Durand.