Working at a startup poses interesting challenges for those working in product development.
There are always jobs to be done, but entering the market quickly and bringing your product up to competitive feature parity and building out the "why we're different" mote requires ruthless prioritization as well as some creative, short feature timelines with a willingness to roll-forward or pivot at short notice.
For the engineers (myself included), that can tend towards what I will term "code-complete vision". Code-complete vision is the idea given a problem, an engineer proposes and builds a vision-perfect version of their solution. I sometimes think of it as the engineer being their own consumer.
This approach can have a time and place (and I can count the amount of times I've under-engineered a solution - including the problem that led to today's post), but the trade-off cost for doing so in a startup is this:
Time-to-market.
Feature validation.
Feedback cycles from the end-user.
Without overtly stating the obvious, there is always a fine balance between feature release and engineering. Underbake your cake, and you won't serve the customer what they want. Spend too long in the oven and the customers may already have eaten and are satisfied.
At Visibuild, we are working closely with our customers and remain transparent so that our finger is on the pulse and we can prioritize what needs to be done as effectively as possible. Once these decisions have been made, the internal team works together to put together the vision of the feature and work backwards on slicing up the end-goal into actionable pieces that help to bring value to the customer as soon as possible.
One feature that I recently built out was the ability for customers to bulk export PDFs of our customers "Visis" (a universal umbrella term that covers the project's inspections, issues, tasks and non-conformance reports).
Bulk PDF exports for Visis in the UI
This feature was heavily requested, and we wanted to slice up the end-result into iterations that enabled us to get the feature out to users quicker.
We sliced up the iterations for this feature into two pieces:
The first iteration would introduce the customer-facing UI on the web application and utilize the backend flow we already had for emailing out PDF exports for a single Visi as an email attachment.
The second iteration would focus on replacing the emailed zip files with the PDFs, storing those zip files remotely and replace the email attachment with a link to the downloads.
The problem with the first flow is the limitation of email attachments. There is an attachment size for the emails that we know we would eventually hit as project data gets heavier and exports for those Visis becomes more difficult. As users begin to use our feature more, the limit of hitting this will be imminent. This was a known assumption when defining our first iteration, so we capped the bulk exports to have a maximum of fifty Visis requested for export at one time. This cap is also not ideal for us nor our customers, and so rolling forward to using download attachments is ideal.
Note: it is still possible that our job may fail with an export cap of fifty, however we knew that this would help mitigate the risks.
Knowing the requirement to keep the second iteration high on the prioritization list, the breathing space given after the release of the first iteration has afforded us the chance to spike out the solution for the second iteration of replacing the zipped PDFs.
Spiking the solution
The current flow looks something like this:
Old flow
To clarify, the backend setup for the first iteration would generate the PDFs in-memory and eventually generate the ZIP folder with each PDF inside. The requirements for the second iteration release looked like this:
No longer send ZIP as an email attachments for bulk exports.
Store the generated ZIP folder remotely.
Manage the lifecycle of the asset (from moment of request to the removal of the folder).
Set up a way to manage a secure download link to the asset (if valid).
The new flow can be pictured like so:
New flow
This blog post goes over the process that I went through in order to build out a working spike that emulated our stack close enough to validate the concept.
Our current stack uses React.js on the frontend, Ruby on Rails for the backend and is hosted on Amazon Web Services. Because of this, the rest of the blog post have the following two goals:
Building out a Rails + React.js project that can demonstrate the ability to download remote ZIP folder assets securely.
Setting up an infrastructure folder to organize and deploy an infrastructure stack to AWS using the AWS CDK.
I deemed that going deep into the processes that we already had set up (emailing and generating the PDFs) was out-of-scope for this project.
The project starts by cloning the project and building from there.
$ git clone https://github.com/okeeffed/demo-rails-with-react-frontend demo-aws-sdk-s3-gem
$ cd demo-aws-sdk-s3-gem
At this stage, you could install the dependencies required with bundle and yarn, but the next step we want to do is getting the project ready for the AWS CDK.
$ mkdir infra
$ cd infra
$ npx cdk init app --language typescript
At this stage, we have the foundations setup for the frontend, backend and infrastructure.
The next logical step in this process is to first focus on the infrastructure for hosting our assets that we wish to download through a download link.
Building out the infrastructure for hosting our assets
For this particular spike, I am suggesting that we use AWS S3 as the destination for our assets. This blog post won't go into explaining S3 too deeply, but some of the features about S3 that I am interested in:
The lifecycle rules of S3 mean we can set a rule to expire assets for a bucket. This means that all uploaded assets are ephemeral from the moment they are stored. Removing the assets after a set amount of time is a benefit to us for storage costs and optimization, which is a huge win for those on a tight budget. It will mean that we need to track the expiry, but we will do that later on when we move onto the Rails application.
The second capability with pre-signed URLs means that we can use our access key (which we will specifically set for access to this S3 bucket only) to generate a URL that can be used by anyone who gets the URL to download the asset. We can use this on the front-end as our download link that is provided for the client to download their file directly from the frontend application (where a link is valid and ready-for-download).
To create out bucket, let's edit to default example stack in the file infra/lib/hello-cdk-stack.ts:
import * as cdk from "aws-cdk-lib";
import { Construct } from "constructs";
import * as s3 from "aws-cdk-lib/aws-s3";
export class InfraStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
new s3.Bucket(this, "DemoBlogRailsAssetsBucket", {
bucketName: "demo-blog-rails-assets-bucket",
removalPolicy: cdk.RemovalPolicy.DESTROY,
encryption: s3.BucketEncryption.KMS,
bucketKeyEnabled: true,
lifecycleRules: [
{
enabled: true,
expiration: cdk.Duration.days(1),
},
],
});
}
}
This stack is straightforward, but the options we are passing cover the following:
Specifically setting the bucket name to be demo-blog-rails-assets-bucket.
Setting the removal policy of the bucket to DESTROY. This means that when we tear down the stack, it will also delete the bucket so that we do not receive any unexpected costs (for production applications, you will NOT want this).
I am setting S3 server-side bucket encryption for objects at rest. The options can be found on the docs. This is optional for the demo, but worth exploring if you plan to set this up for production.
I am setting a lifecycle rule to remove an upload asset one day after upload. This is contrives for demo purposes, but worth setting up sensible defaults for your production application.
Once this is done, you can deploy the infrastructure from the infra folder.
Please note that you must have your AWS credentials setup for the deployment to work. I won't cover this, but I personally use aws-vault to manage my accounts locally.
If everything deploys successfully, you can double check your bucket is there using the AWS CLI:
$ aws s3 ls | grep rails
2022-09-30 13:41:17 demo-blog-rails-assets-bucket
At this point, we can move back to the Rails application and set things up there.
Setting up for Ruby
For the demo, we will also add a few more gems that will aid in emulating the functionality that we want:
Once they are in your Gemfile, run bundle to install the gems.
What each Gem is used for:
aws-sdk-s3 is our SDK to upload files and generate pre-signed URLs.
dotenv-rails allows us to load environment variables from a local .env file.
faker will generate some fake data for us.
parallel is used for a helper script to upload zip files in parallel.
rubyzip helps us generate ZIP folders with the files we are creating.
sidekiq is our asynchronous background job gem. You can see a basic implementation on my blog post here.
Note: if you are using a different version of Ruby than specified in the .ruby-version and Gemfile, be sure to update those values.
Setting up the dummy assets
The first thing we will do is write a script to add some objects to our bucket. This is essentially a quick sense-check that things work as we expect at this current point.
In bin/sync-assets, add the following:
#!/usr/bin/env ruby
# frozen_string_literal: true
require 'dotenv/load'
require 'zip'
require 'faker'
require 'aws-sdk-s3'
require 'parallel'
require 'rack/mime'
require 'active_support/isolated_execution_state'
require 'active_support/core_ext/numeric/time'
require 'securerandom'
unless ENV['ASSETS_S3_BUCKET']
puts 'ASSETS_S3_BUCKET environment variable is not set'
exit 1
end
files = [
{ title: 'file1', body: Faker::Lorem.paragraph },
{ title: 'file2', body: Faker::Lorem.paragraph },
{ title: 'file3', body: Faker::Lorem.paragraph }
]
def generate_zip(file)
temp_file = Tempfile.new("#{file[:title]}.zip")
Zip::OutputStream.open(temp_file) do |zos|
zos.put_next_entry("#{file[:title]}.txt")
zos.puts(file[:body])
end
temp_file
end
zips = files.map do |file|
generate_zip(file)
end
Parallel.each(zips, in_threads: 5) do |zip|
object_key = SecureRandom.uuid
Aws::S3::Object.new(ENV['ASSETS_S3_BUCKET'], "#{object_key}.zip").upload_file(
zip,
{
content_type: Rack::Mime.mime_type(File.extname(zip.path))
}
)
end
This script will do the following:
Generate three ZIP folders with three text files within.
Upload them to S3.
Run bin/sync-assets in the command line to execute the Ruby script.
If there are no errors, you can validate that this was successful by heading to the S3 bucket in the AWS portal or checking the files using the AWS CLI:
$ aws s3api list-objects-v2 --bucket demo-blog-rails-assets-bucket
# You will get back something like this
{
"Contents": [
{
"Key": "60df0050-abb3-4397-ae4b-ffef637ae682.zip",
"LastModified": "2022-10-12T06:30:27+00:00",
"ETag": "\"b87c63f5300caa176b930780679590ce\"",
"Size": 186,
"StorageClass": "STANDARD"
},
{
"Key": "a51226cd-ad72-4975-a218-59d5250813ab.zip",
"LastModified": "2022-10-12T06:30:27+00:00",
"ETag": "\"91267f5c8ba69738489ccafabe4f4c2e\"",
"Size": 182,
"StorageClass": "STANDARD"
},
{
"Key": "a63e4be5-93c2-44da-b198-769baae353af.zip",
"LastModified": "2022-10-12T06:30:27+00:00",
"ETag": "\"f0f70aec61ef1cce2cebb6cce5b4c5e9\"",
"Size": 182,
"StorageClass": "STANDARD"
}
]
}
We have all three expected zip files uploaded as expected (albeit it may not seem obvious from the metadata, but we can infer that from the three objects returned).
At this point, we know we have files in the bucket and confirm that the infrastructure is working fine!
Creating a limited policy for our bucket
Now that we have set up our bucket and checked that it works with our script, it is time to create a programmatic AWS access key that we can use for writing to and reading from our new S3 bucket.
Log into the AWS portal and head to IAM. Once there, the first thing we want to do is create a new policy.
Create new policy
We want to select S3 as the service.
Select service
We then want to add GetObject, PutObject access for our new bucket that we've created.
Adding the access we need
Add the specific bucket
Since we are also encrypting with KMS, we also need to add kms:Decrypt and kms:GenerateDataKey capabilities (as outlined on their support guide).
Add an additional policy and follow it like the above for S3, selecting those two actions. Leave it for all resources.
With the KMS permissions needed
Creating an AWS Access Key for our bucket
Next, in IAM, select users and create a new user. We want the user to have programmatic access.
Adding a new user
On the next page, add the new policy that we created to give the limited access to the specific S3 bucket that we want.
Add policy to the user
Create the user and take note of the keys that AWS gives you.
Once that is all done, take the access key ID and secret key and we add a new .env file to the root of our Rails application that we will read in development.
Ensure that .env is ignored and not staged by Git.
We can now begin on making changes to the Rails application.
Setting up the additional files for our Rails application
Given that we've pulled the code from a previous blog post, there are some things worth noting:
The components controller file app/controllers/components_controller.rb was generated and components#index is the root route.
Our application entry point for the frontend is app/javascript/application.ts, which imports the base React.js application from app/javascript/components/application.tsx.
With that understanding out of way, we want to make some modifications to our Rails app by adding some new models, controllers and jobs. Run the following from the CLI in the root folder:
$ bin/rails g model ExportAsset ref_id:string content:string s3_url:string status:string expires_at:datetime
$ bin/rails db:migrate
$ bin/rails g controller api/v1/jobs create
$ bin/rails g controller api/v1/assets index
$ bin/rails g sidekiq:job upload_asset
The above does the following (in order):
Creates a new model ExportAsset. We will use this to track our asset export status, as well as whether or not it has expired.
We run the migration for ExportAsset.
We create a Api::V1::Jobs controller to schedule a new background job.
We create a Api::V1::Assets controller for use to query when we land on our downloads page.
We scaffold out a new job UploadAssetJob for generating a ZIP file, then uploading that file to S3.
The POST endpoint to schedule our upload job
Let's first update our job in the file app/controllers/api/v1/jobs_controller.rb:
require 'faker'
require 'securerandom'
class Api::V1::JobsController < ApplicationController
def create
@asset = ExportAsset.new(content: params[:content], status: 'pending', ref_id: SecureRandom.uuid)
if @asset.save
UploadAssetJob.perform_async(@asset.id)
render json: { message: 'Accepted', id: @asset.ref_id }, status: :accepted
else
render json: { errors: @asset.errors.full_messages }, status: :unprocessable_entity
end
end
end
This API endpoint will be used to create a new ExportAsset entity and set the status to pending. If the asset saves successfully, it will return a "202 Accepted" status. If it does not, the error will be returned.
Creating our job to generate a zip and upload the asset
In app/sidekiq/upload_asset_job.rb, add the following:
require 'zip'
require 'aws-sdk-s3'
unless ENV['ASSETS_S3_BUCKET']
puts 'ASSETS_S3_BUCKET environment variable is not set'
exit 1
end
class UploadAssetJob
include Sidekiq::Job
def perform(asset_id)
asset = ExportAsset.find(asset_id)
return unless asset
begin
sleep 10 # simulate waiting to process time
asset.update(status: 'processing')
zip = generate_zip(asset.content)
upload_to_s3(asset, zip)
sleep 10 # simulate processing time
s3_url = generate_presigned_url(asset)
asset.update(status: 'completed', expires_at: Time.now + 1.minutes, s3_url:)
rescue StandardError
asset.update(status: 'failed')
end
end
private
def generate_zip(content)
string_io = Zip::OutputStream.write_buffer do |zos|
zos.put_next_entry('content.txt')
zos.puts(content)
end
string_io.string
end
def upload_to_s3(asset, zip)
Aws::S3::Object.new(ENV['ASSETS_S3_BUCKET'], "#{asset.ref_id}.zip").put(
{
body: zip,
content_type: 'application/zip'
}
)
end
def generate_presigned_url(asset)
s3 = Aws::S3::Resource.new
object_key = "#{asset.ref_id}.zip"
s3.bucket(ENV['ASSETS_S3_BUCKET']).object(object_key).presigned_url(:get, expires_in: 60) # 5 minutes
end
end
The background job has been written to go do the following:
Simulate processing time with sleep.
Update the status to be processing.
Add the uploaded to content to a zipped text file.
Upload that file to S3.
Generate a pre-signed URL.
Set that pre-signed URL and expires_at field in the asset.
Some things to note about this code written:
sleep is specifically for emulation in development for this spike. Using that in production is an obvious foot gun.
This job workflow will be iterated upon and changes will be made prior to the final implementation. Some of the error handling and functionality here is contrived. Use this to get a feel for things and then use best practices.
Now that our code is written to be able to manage our lifecycle emulation and upload assets to our S3 bucket, let's create an API endpoint to get the status of the job.
The GET endpoint for asset state
Finally, we need a way for our download page to setup the download link or provide information if the status of the job is not completed.
In app/controllers/api/v1/assets_controller.rb, update the code to be the following:
Now that we have our business logic in the backend setup, we can move to the frontend to tie it all together.
Setting up the frontend
Our production application at Visibuild is not using the ESBuild setup that I have cloned from my other blog post, and so I ended up using React Router to manage both the home and download page. This code is contrived to test out the business logic. Please take what I am doing with React Router here with a grain of salt.
First, install React Router with the Node package manager of your choosing:
$ yarn add react-router-dom
Next, we can override the code from the cloned repository in app/javascript/components/application.tsx:
The above code will help use load the correct page component for the / and /download routes.
Again, I must emphasize that this is contrived work. I am unclear on best practices for React Router when using Hotwire (and we do not use Hotwire at Visibuild).
We now need to add our missing page files for the home page and download page.
Adding in our home page
In app/javascript/components/HomePage.tsx, add the following:
import * as React from "react";
import axios from "axios";
import { Link } from "react-router-dom";
export function HomePage() {
const [status, setStatus] = React.useState<string | null>(null);
const [id, setId] = React.useState<string | null>(null);
const handleSubmit = React.useCallback(
(e: React.FormEvent) => {
e.preventDefault();
const data = new FormData(e.target);
axios
.post("/api/v1/jobs", {
content: data.get("content"),
})
.then(({ data }) => {
setStatus("success");
setId(data.id);
})
.catch(() => setStatus("error"));
},
[setStatus]
);
return (
<div>
<h2>Welcome to home</h2>
<h3>Create a job to upload to S3</h3>
<form onSubmit={handleSubmit}>
<input type="text" name="content" placeholder="Content" />
<input type="submit" value="Submit" />
{status === "success" && (
<>
<p>Job created successfully</p>
<Link to={`/download?ref_id=${id}`}>Go to download page</Link>
</>
)}
{status === "error" && <p>Job creation failed</p>}
</form>
</div>
);
}
This basic page will present an input to start a new job to zip up the text value and upload it to S3.
If successful, it will render a download link. If there is an error, it will just let us know.
Next on the jobs-to-do list is to create our downloads page that we link to after the successful scheduling of a job.
Creating the downloads page
In app/javascript/components/DownloadPage.tsx, add the following:
import * as React from "react";
import axios from "axios";
import { useSearchParams } from "react-router-dom";
export function DownloadPage() {
const [searchParams] = useSearchParams();
const [status, setStatus] = React.useState<string | null>(null);
const [url, setUrl] = React.useState<string | null>(null);
React.useEffect(async () => {
axios
.get(`/api/v1/assets/${searchParams.get("ref_id")}`)
.then(({ data }) => {
setUrl(data.url);
setStatus(data.status);
})
.catch((err) => {
if (err.response.status === 404) {
return setStatus("not_found");
}
setStatus("failed");
});
}, [setUrl, setStatus]);
return (
<div>
<p>File download</p>
{status === "not_found" && <p>File not found</p>}
{status === "pending" && <p>Job is pending</p>}
{status === "processing" && <p>Job is processing. Check again soon.</p>}
{status === "failed" && <p>Job failed</p>}
{status === "completed" && url && (
<a href={url} download>
Download
</a>
)}
{status === "expired" && (
<p>The URL for the export has expired. Please re-order.</p>
)}
</div>
);
}
When you head to /download and provide a query param for ref_id, it will make a request to search for that asset, and if the asset is ready with a status of completed, it will provide a download link.
The value for that download link will be our pre-signed S3 URL to download the zip file asset!
The above code could do with a refactor, but I am leaving it in (as I would normally leave a refactor for an actual implementation).
We are almost ready to run the application, but just need to do some clean-up for the Rails router and CSRF token check.
Setting up the Rails router
At this point, we can head to our router and make some changes so that our api routes are setup, that we connect our URL for the download path and that we are using the components#index controller from this particular Git clone to be our home page.
Rails.application.routes.draw do
get 'jobs/create'
root 'components#index'
get 'download', to: 'components#index'
# Define your application routes per the DSL in https://guides.rubyonrails.org/routing.html
namespace :api do
namespace :v1 do
resources :jobs, only: [:create]
resources :assets, only: [:show]
end
end
end
Visiting /download will redirect it to the components#index route, which in turn will let React Router do its thing. Again, I am unsure on this particular implementation detail for Hotwire best practices with React Router, but I figure that it is out-of-scope for the spike and we can leave it be.
Finally, let's sort out the CSRF token check in development.
Disabling CSRF in development
We need to update the application config so that CSRF forgery is set to false for development. We can do that in config/application.rb:
require_relative 'boot'
require 'rails/all'
# Require the gems listed in Gemfile, including any gems
# you've limited to :test, :development, or :production.
Bundler.require(*Rails.groups)
# Load dotenv only in development or test environment
Dotenv::Railtie.load if %w[development test].include? ENV['RAILS_ENV']
module DemoRailsWithReactFrontend
class Application < Rails::Application
# Initialize configuration defaults for originally generated Rails version.
config.load_defaults 7.0
# Enable us to send requests without auth token
config.action_controller.default_protect_from_forgery = false if ENV['RAILS_ENV'] == 'development'
end
end
This is not something you want as false in production environments. I haven't set up this project to manage this correctly on development, and will ignore it for the sake of the spike. There are plenty of resources on setting up your React.js frontend to pass the CSRF token.
Running the application
At this point, we can run bin/dev to boot up the Rails app and head to port 3000.
bin/dev is setup to run Procfile.dev for us, which will startup ESBuild, the Rails server and our Sidekiq server.
Note: Sidekiq requires Redis to be configured.
Once the server is running, we can head to http://localhost:3000 and see our application running.
Running application
Enter in a value, select submit and when successful, we will get our accepted response from the backend.
Content accepted
When accepted, the link that displays will link through to our download page. Select it now and you will end up on a page with a pending response.
Job pending
If you remember, our job was designed to emulate processing time with sleep, so after ~10 seconds, if you reload the page you will see the job in the processing state.
Job process
After the job has begun processing, another sleep will take us into the completed stage.
Job completed
The completed stage has a download link to our new asset. If you click the link, the download will begin for the zip file.
Once downloaded, if you open the zip file, you will get our content.txt file that was generated with the input text we sent.
Downloaded
We can also confirm that the asset was uploaded to S3 with the same AWS CLI call we made earlier:
$ aws s3api list-objects-v2 --bucket demo-blog-rails-assets-bucket
# Notice that we now have four items! The new one is the ZIP file.
{
"Contents": [
{
"Key": "370d66b7-28c8-464e-8940-e53fd03aaef3.zip",
"LastModified": "2022-10-12T09:10:23+00:00",
"ETag": "\"f0f726f898163d0f934046033389a6c6\"",
"Size": 139,
"StorageClass": "STANDARD"
},
{
"Key": "60df0050-abb3-4397-ae4b-ffef637ae682.zip",
"LastModified": "2022-10-12T06:30:27+00:00",
"ETag": "\"b87c63f5300caa176b930780679590ce\"",
"Size": 186,
"StorageClass": "STANDARD"
},
{
"Key": "a51226cd-ad72-4975-a218-59d5250813ab.zip",
"LastModified": "2022-10-12T06:30:27+00:00",
"ETag": "\"91267f5c8ba69738489ccafabe4f4c2e\"",
"Size": 182,
"StorageClass": "STANDARD"
},
{
"Key": "a63e4be5-93c2-44da-b198-769baae353af.zip",
"LastModified": "2022-10-12T06:30:27+00:00",
"ETag": "\"f0f70aec61ef1cce2cebb6cce5b4c5e9\"",
"Size": 182,
"StorageClass": "STANDARD"
}
]
}
Notice that now we have four items in the Contents array, whereas we had three before. We can infer that our zip file is the new entry.
If you wait the allocated amount of time that we set in the database for the expiry, you can reload and get an expired state on the downloads page.
Expired
If the bucket URL expired, but the user was still on a page with a download link, then clicking the download link will display the standard AWS object expired response.
AWS expired link
There is also a not found state in my code example, although you will likely want to consider a redirection to a 404 page if an asset does not exist instead of showing the message like I have here.
Not found
The S3 bucket after expiry time has elapsed
The last important piece of the puzzle is the validation that the objects are automatically cleared when the lifecycle policy time we set for the bucket elapses.
Below is an image that I took for the bucket one day after the initial work was completed. You can see that the bucket is now empty, woohoo!
Lifecycle expired
Cleanup
Now that we are done, we can tear down the AWS S3 bucket that we created to ensure we aren't paying money for unused spike assets.
$ npx cdk destroy
This will remove the bucket since we set the bucket policy to DESTROY at the beginning of this post, as well as any other configuration for that CloudFormation stack.
The bucket will need to empty because it can be destroyed. Either write a script to clean the bucket up, use the UI or wait out the one day for the assets to be automatically removed by the lifecycle rule.
Wrap up
The focus today was to follow along with a raw process that I will go through when spiking out new features and/or iterations.
We managed to successfully set up a project that could demonstrate a rudimentary version of setting download links that have lifecycle status associated. We also created an S3 bucket that will remove files after an elapsed period of time, which is great for us when it comes to cost optimization and keeping the number of assets manageable.
Spikes like this one give us the opportunity to explore technology and see what a first-version looks like. From here, I will review the models, lifecycle flow, talk with the product team and polish off any security and execution concerns. Once the product team here at Visibuild is aligned, we will move into action with a more polished version of this workflow.