How to allow Cloudflare to fetch content from a Backblaze B2 private bucket

Storing Content in Backblaze B2 for a CDN

Many customers have expressed interest in hosting static data for their website (ranging from minified Javascript applications to multi-hour 8K video) because of the security, reliability, and affordability of Backblaze B2 storage. One solution to ensuring performance and availability is to route requests through a CDN (Content Delivery Network) such as Backblaze's Bandwidth Alliance partner Cloudflare, taking advantage of Cloudflare's performance and the free data transfer between Backblaze B2 and Cloudflare.

How does a CDN like Cloudflare work?

Cloudflare leverages DNS (Domain Name System) so that content requests come to Cloudflare's servers. Through caching and private high-speed links, Cloudflare ensures high availability and reliability from storage. A website's domain name is registered with Cloudflare (and transferred from its domain name registrar), so that Cloudflare becomes responsible for serving content from that domain. Behind the scenes, Cloudflare allows a website's domain to be aliased to some other domain, so that a user may see images and content from https://www.coffeemaniacs.com when those images and that content is actually being served from Backblaze B2 (https://f345.backblazeb2.com/file/coffemaniacs-storage)

Backblaze Buckets are on the Internet Securely

Although all buckets are addressable from the internet, only public buckets can be accessed by just anybody. By default, Backblaze B2 storage is private, which means that access requires authentication. Backblaze's various integration partners have incorporated this security into their tools to keep Backblaze B2 as user-friendly as possible while still maintaining security.

Website Content from Secure Buckets

Putting these elements together means that customers serving data from their website want to serve from their website; they want to store their photos and videos and all of their digital content in a private bucket, available through (and only through) their website. When hosting a website directly, adding the authentication required to pull data from Backblaze B2 is straightforward. Fronting a website through Cloudflare is slightly more complex: now Cloudflare has to access private buckets to retrieve and cache data, which means Cloudflare has to authenticate its requests to Backblaze B2.

Web Workers for the Win

Cloudflare offers web workers, small Javascript snippets that allow rewriting HTTP and HTTPS requests on the fly. These make it straightforward to add authentication headers to content requests, and authenticate the link between Cloudflare and Backblaze B2. Even better, Cloudflare's web workers can be uploaded directly into Cloudflare's servers making the automation of the process straightforward. Web workers are available at all plan levels (including the free plan), for a nominal charge (please see Cloudflare's site for their pricing and plan). At the free level, only one script is possible, but one is enough to allow Cloudflare access to otherwise private data.

Web Workers, Updates, and Authorizations: Working Together

One solution is to use a Web Worker to rewrite the request URLs on the fly, adding an Authorization parameter. Although an authorization is good for any number of requests, B2 authorizations eventually expire (the longest period of time an authorization can persist is 7 days). Manually updating the authorization each week would be a chore: this is something that should happen automatically. Fortunately, both Backblaze B2 and Cloudflare offer APIs that can automate the process. The procedure here uses a Python script and the B2 APIs to get a new B2 download authorization, good for 7 days from the moment the script runs. After embedding that authorization into a Javascript snippet, the Python script uses the Cloudflare APIs to upload the script. By using a scheduler such as cron on Linux or MacOS, or schtasks on Windows, an administrator can automate running the script every day or two, thus ensuring the authorization code is always current. This article contains the complete script, as well as instructions on modifying it with the right parameters.

Setting up to enable access to your private bucket

Building this connection requires:

  • A Backblaze account
  • An ApplicationKey and ApplicationKeyID that gives read access to the private bucket
  • The name of the Backblaze B2 file server for the bucket
  • A Cloudflare account
  • A top-level internet domain (such as www.pawneeparks.org)
  • Cloudflare Web Worker access
  • The Cloudflare API key
  • The Cloudflare Zone ID
  • A Python3 program to get a refreshed authorization token and upload it to Cloudflare along with the worker code (available at Github). (Once Cloudflare's recently announced Workers KV comes out of beta, we will update this article to store the authorization key in this distributed database.)
  • A server running cron (or something similar) to run the python3 update at regular intervals

Step by Step

Gather the information needed about your Backblaze B2 account

If you do not already have a Backblaze account, sign up for one here. Backblaze B2 includes 10 gigabytes of free storage and does not require a billing method to get started exploring the possibilities.Screenshot_2018-10-02_08.33.09.png

 

After creating (or signing into) your Backblaze B2 account, go to My Account on the top menu, and then select Buckets in the right hand menu

Screenshot_2018-10-02_08.42.41.png

This will open the bucket UI screen, and if this is a new account, it will have no buckets (yet).

Screenshot_2018-10-02_10.10.35.png 

Redirecting traffic requires having a bucket. Clicking on 'Create Bucket' brings up the bucket creation dialog.

Screenshot_2018-10-02_10.19.23.png

Choose a name (not the one in the dialog). Bucket names must be globally unique across all Backblaze B2 accounts. Choosing a name already in use will return an error; should this happen, simply choose another name. Users of the redirected content will not see this name.

Screenshot_2018-10-02_10.20.01.png

As long as we are here, get the BucketId for this bucket (this is a globally unique identifier). Next, upload a file (it does not matter what). Click on Upload/Download.Screenshot_2018-10-02_10.22.10.png

 

Click on upload, and send a text or HTML file up to Backblaze B2 (we will retrieve it as part of testing the integration) later. This file will be referred to later as uploadedFile.html.

Screenshot_2018-10-02_12.04.46.png

 Just drag and drop a file, and upload it to the Backblaze B2 bucket.

Screenshot_2018-10-02_12.12.43.png

Find the white i in a small gray circle at the far right of the file listing (circled in green), and click on that. It gives information about the file: we are looking for the fileserver for this account (all buckets from this account will utilize this particular fileserver). 

Screenshot_2018-10-02_12.14.35.png

As shown, the fileserver for this bucket is https://f001.backblazeb2.com. Make a note of the fileserver, as this is the top-level domain that is the target of the remap. Also note that the filepath is <fileserver>/file/<BucketName>/filename — this pattern is used to source content through the remapped domain (more detail on this later).

Next, get (or create) an applicationKeyId and applicationKey to generate authorization tokens. Once the file information dialog is dismissed, click on the 'Buckets' menu item in the left-hand menu to return to the main Buckets page. Near the top of the page, is a link to Show Account Key and Application Key.

Screenshot_2018-10-02_12.46.37.png

Click on this link to go to the key management.Screenshot_2018-10-02_13.02.46.png

Although it is possible to use the AccountId and AccountMasterKey, it is preferable to use a key with less access. Scrolling down this screen a bit gives:

Screenshot_2018-10-02_10.38.28.png

This will create an ApplicationKeyID and ApplicationKey (and this one is scoped to provide full read and write access to the bucket). Please note that the ApplicationKey is displayed exactly once. Although another key can be created with similar permissions, this particular key cannot be regenerated. This key gives access to your bucket, and should be kept securely. Application keys enable a great deal of flexibility in granting access to your stored content.

Screenshot_2018-10-02_10.40.38.png

Note the ApplicationKey (again: this is the only time it will be displayed) and the ApplicationKeyId for the bucket. The ApplicationKeyId, along with the KeyName, are listed (as are all keys created for an account).

This is all the information required from Backblaze.

 

Set up the Cloudflare account

If you do not already have a Cloudflare account, sign up for one at Cloudflare.

Screenshot_2018-10-01_16.01.33.png

and sign up:

Screen_Shot_2018-10-01_at_4.07.09_PM.png

After signing up, register your top-level domain with Cloudflare by going to 'add record'.  Ensure the record type is CNAME. This will map your top level domain (here static.pawneeparks.org) to the source that Cloudflare will fetch content from.

Click on 'add record', and then make sure the cloud icon directly to the right of the 'Automatic TTL' choice box and directly to the left of the 'Add Record' button is orange; if it is gray, click it once to change the setting from 'DNS only' to 'DNS and HTTP Proxy'.

The next step requires adding billing information, and a subscription to web workers. After this is accomplished, go the the workers page, and launch the editor.

Screenshot_2018-10-02_14.44.26.pngDo not modify the default script (there is no need). However, the script must be saved by clicking the 'save' button (circled in green). Clicking this button may not appear to do anything, but it is absolutely required for the next step, which is to specify the route on which the worker script is enabled.Screenshot_2018-10-02_14.53.20.png

Click on the 'routes' tab (circled in green) to show the routes.

Screenshot_2018-10-02_15.04.43.png click 'Add Route' to add a route, and it should show up.

Screenshot_2018-10-02_15.31.31.png

and the script will show as enabled for this script:

Screenshot_2018-10-02_15.21.33.png

 

The routing is set up, and the script will be taken care of automatically by python script, but we will need an API key for Cloudflare. Click on 'Dashboard' to return to the main Cloudflare interface, and then go to the Overview.Screenshot_2018-10-02_15.44.05.png

 

Several things require attention here. First, SSL should be set to Full (Strict) to ensure that Cloudflare verifies certificates from Backblaze B2 storage. Next, the Zone ID is required to upload our worker script, as is the API key. After noting the Zone ID, click on 'Get your API key'.

Screenshot_2018-10-02_15.55.21.png

and scroll down to the bottom of this page. The final section is API Keys. The required information is the global API key. Click View to display the API key, and make note of it. Cloudflare will require account verification.

Screenshot_2018-10-02_16.00.58.png

It will reveal your API key:

Screenshot_2018-10-02_16.01.58.png

Please note: under no circumstances display your API key in a public forum.

 

Setting up the authorized web worker with Cron

Since Backblaze B2 authorization tokens expire, feeding a CDN requires updating the authorization token. The simplest way to do this is to replace the entire worker script with a new one, where the authorization token is hard-coded into the Javascript. To make this easier, here is a script which, given the identities, identifiers, and keys, will get a new authorization token from Backblaze, embed it into a web worker script that will add the token as a header to incoming requests, and then upload the script to Cloudflare. By default, the script's authorization tokens are valid for one week (the maximum possible time). If this script is scheduled to run once a day, then the script can be missed for five days before the authorization token expires.

Setting up a Python script to run at regular intervals is beyond the scope of this guide, as is setting up a Python3 environment. This is the Python3 script to get a B2 authorization token and upload a web worker to authorize Cloudflare requests to the private bucket is available.

This script requires some customization, as a number of values are specific to the user.

cloudflareEmail
The email address registered as the account owner in Cloudflare
bucketSourceId
The hexadecimal bucket identifer for the source bucket in Backblaze B2
bucketFilenamePrefix
The filename prefix (if any) for which the B2 ApplicationKey is valid
cfZoneId
The Zone ID for the Cloudflare account
b2AppKey
This is the Backblaze B2 Application Key to authorize access to the Backblaze B2 bucket. This is the secret key that is displayed exactly one time and never again.
b2AppKeyId
This is the Backblaze B2 Application Key ID. It is also a long string, but it is displayed in the list of existing keys.
maxSecondsAuthValid
The number of seconds for which the authorization is valid when created. The default script value is a week, which is the maximum time-to-live of any authorization token. The valid time may be set to a smaller value. However, this should be weighed against how often the cron job will run. If the cron job runs once a day, then if it is skipped for a day or two or even five, the authorization remains active if the authorization token lasts for a week.

import requests
import base64
import json

flagDebug = True

cloudflareEmail = 'fearlessleader@pottsylvania.gov'
bucketSourceId = 'cdb0bd378798e11f6427041b'
bucketFilenamePrefix = ''
cfZoneId = '625b68ff559a2fa5247c9c51e3c6374d'
cfAppKey = 'c641673a3ae68de751172aab8805a3579eca6'
# the preceding 'b' causes these to be treated as binary data
# for b64 encoding.
b2AppKey = b'K000uBzMpPUsL0zM32R9MEgpU9yT4IoQ'
b2AppKeyId = b'000d0da781f4e4b0000000033'

# An authorization token is valid for not more than 1 week
# This sets it to the maximum time value
maxSecondsAuthValid = 7*24*60*60 # one week in seconds

### DO NOT CHANGE ANYTHING BELOW THIS LINE ###

baseAuthorizationUrl = 'https://api.backblazeb2.com/b2api/v2/b2_authorize_account'
b2GetDownloadAuthApi = '/b2api/v2/b2_get_download_authorization'

cfUploadWWUrl = "https://api.cloudflare.com/client/v4/zones/" + cfZoneId + "/workers/script"

idAndKey = b2AppKeyId + b':' + b2AppKey
b2AuthKeyAndId = base64.b64encode(idAndKey)
basicAuthString = 'Basic ' + b2AuthKeyAndId.decode('UTF-8')
authorizationHeaders = {'Authorization' : basicAuthString}
resp = requests.get(baseAuthorizationUrl, headers=authorizationHeaders)

if flagDebug:
    print (resp.status_code)
    print (resp.headers)
    print (resp.content)

jData = json.loads(resp.content)

bAuToken = jData["authorizationToken"]
bFileDownloadUrl = jData["downloadUrl"]
bPartSize = jData["recommendedPartSize"]
bApiUrl = jData["apiUrl"]


if flagDebug:
    print("authorizationToken: " + bAuToken)
    print("downloadUrl: " + bFileDownloadUrl)
    print("recommendedPartSize: " + str(bPartSize))
    print("apiUrl: " + bApiUrl)

workerTemplate = """addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request))
})
async function handleRequest(request) {
let authToken='<B2AUTH_TOKEN>'
let b2Headers = new Headers(request.headers)
b2Headers.append("Authorization", authToken)
modRequest = new Request(request.url, {
    method: request.method,
    headers: b2Headers
})
const response = await fetch(modRequest)
return response
}"""

workerCode = workerTemplate.replace('<B2AUTH_TOKEN>', bAuToken)


#Can now update the web worker
#curl -X PUT "https://api.cloudflare.com/client/v4/zones/:zone_id/workers/script" -H
#"X-Auth-Email:YOUR_CLOUDFLARE_EMAIL" -H "X-Auth-Key:ACCOUNT_AUTH_KEY" -H
#"Content-Type:application/javascript" --data-binary "@PATH_TO_YOUR_WORKER_SCRIPT"

cfHeaders = { 'X-Auth-Email' : cloudflareEmail,
              'X-Auth-Key' : cfAppKey,
              'Content-Type' : 'application/javascript' }

cfUrl = 'https://api.cloudflare.com/client/v4/zones/' + cfZoneId + "/workers/script"

resp = requests.put(cfUrl, headers=cfHeaders, data=workerCode)

if flagDebug:
    print(resp)
    print(resp.headers)
    print(resp.content)
Have more questions? Submit a request

3 Comments

  • 0
    Avatar
    kc

    From the user donnacha (Repost)

    It would be smart to add some rough idea of how much the Cloudflare workers will cost. In fact, you should lead with that information because no-one wants to walk into yet another billable relationship with no idea of the fees they might rack up. That will be the main thing deterring most of your readers from trying this.

    As far as I can see, it is possible to use workers with any level of Cloudflare account, including the free plan, but the minimum cost is $5 per month PER DOMAIN. There is no free tier for workers.

    Your $5 per month gets you 10 million requests of 5 milliseconds of CPU Processor Time. It is not clear whether or not 5ms is sufficient for the script provided in this article. Users on the paid "Pro" and "Business" Cloudflare plans get 10ms and 50ms respectively.

    The article mentions that the authorization can be set to occur once per week, but what does this mean in practice?

    Does the connection between Backblaze and Cloudflare only have to be authorized 4 or 5 times per month? Do we actually only need 5 of our 10,000,000 requests?

    Or does each asset have to be individually authorized? Does the script run every time an asset is requested? Can each request end up costing several of our 10m credits if 5ms is not sufficient processing time?

    Your $5 also gets you only ONE SCRIPT, meaning that you cannot use up your pool of 10 million requests with other clever tricks, such as having contact forms in your static site. The Cloudflare site does not mention if it is possible to buy slots for additional scripts.

    Thinking from the user's point of view and adding this information to your article would make it far more useful, would result in more users deciding to try it, and would make the article more viral because people like to write, tweet, post, and podcast about the things they actually try.

    Edited by donnacha 3 days ago
  • 0
    Avatar
    Nathan Verrilli

    Hi Donnacha -

    5 milliseconds is more than enough time to execute the javascript that gets uploaded. Please note that my script is python, and needs to be run once every two or three days to upload a new script with an updated authorization: the authorization I create lasts only for 7 days. The javascript webworker is embedded within that script, and is uploaded directly to Cloudflare.

    The webworker is run for every request to the private bucket (it has to be, to add the authorization parameter to the URL).

    The authorization the python script creates and uploads is good for 7 days (and any number of authorizations within that time period). Depending on the authorization key permissions, the authorization can be for a single file, for any file with a specified prefix, or for the entire bucket (that's a user decision).

    As of this moment in time, 7 days is the maximum time an authorization is valid.

    I anticipate our making this easier once (a) we determine what 'easier' means, and (b) have the time to make that happen.

    Your comments have heavily influenced my thinking on (a), and I am considering how to re-word portions of this article to make those points clearer. Thank you for taking the time to comment; I will certainly use your thoughts to make the article better.

    Cheers,
    Nathan

    Edited by Nathan Verrilli
  • 0
    Avatar
    donnacha

    Nathan, thank you for taking the time to respond here and by email, much appreciated.

    I wish you the best of luck with this. What I forgot to highlight in my comment was how exciting this exploding area of tech is, and I am grateful to all the guys like you who are sweating the details to make entirely new forms of infrastructure possible. It feels very much like the early days of the Web. Keep up the good work.

Please sign in to leave a comment.
Powered by Zendesk