Tuesday, February 14, 2012

Choosing the right browser local storage

So your new shiny HTML5 web app needs to persistently store data in the browser. What do you store where?

Good ol' caching

Just because you're building a fancy-pants offline HTML5 app doesn't mean you should overlook the classics. Lots of content can be happily cached the old fashioned way: by setting appropriate expiration headers. This works great for things like images, especially when their presence is not absolutely critical for the functioning of the app. This lets you conserve your more sophisticated persistent caches for things that are truly critical. Like...

Application Cache

The application cache is driven by the oft-misunderstood cache manifest file. This is where you should cache your main page, critical javascript, and stylesheets. Don't spend your application cache space frivolously, because once you hit quota, not all browsers will even give the user the opportunity to grant more. Chrome and Safari, for example, will simply refuse to cache your content. Storage quotas are an area that is poorly documented and rapidly changing.

As long as you keep it to code and markup, staying under quota shouldn't be difficult. You don't need to worry about versioning, as the browser takes care of all that. However, you should probably incorporate version tracking into your build & release process, and make your clients publish their version numbers to the server so you can see who's on what version.

HTML5 Storage

The localStorage object is a highly convenient place to stick your actual data models. It's one big persistent key-value store. But beware, there are pitfalls as your data grows! Specifically:

localStorage is subject to a quota that cannot be increased in most browsers (typically 5*2^20 bytes). Again, quotas are an area where browsers are still weak and evolving.
You may be using twice as much storage as you think because the encoding is UTF-16. So you really get 5*2^19 characters. And remember to include the cost of your keys, not just your values.
If you decide to migrate from localStorage to one of the other options described below, prepare for a painful rewrite, because localStorage is synchronous, and the others are asynchronous. Too bad Javascript doesn't have first-class continuations.

webSQL (sqlite)

This is an abandoned web standard, so you might be tempted to skip it. However, it is the most widely available option for storing structured data, and not likely to go away just yet. It has the benefit of an expandable quota in webkit-based browsers. There are a few pitfalls to avoid:

You pick an expected size when a database is first created. This size is checked against the available quota and possibly prompts the user to grant more quota.
The aforementioned interaction only happens on creation, and you can't deleted a database once it's created. So if you later update your code to ask for more quota, you will be silently ignored. The openDatabase call will succeed, but your quota hasn't really increased, and when you fill it up the user may or may not be offered the opportunity to expand it.
You can't list existing databases to figure out where all your quota went (at least not from within Javascript, though you can usually find the underlying sqlite files on disk).

IndexDB

The successor to webSQL with the W3C's holy blessing. Only available on Chrome, Firefox, and IE10. No iOS support, so I haven't used it extensively.

Filesystem API

You get to read and write files and directories, all within a sandboxed filesystem. This is great for large chunks of data, especially binary chunks. The downsides:

Only supported in Chrome so far.
Naively reading and writing a few tens of thousands of small files (to mimic the localStorage key-value store, for example) is slow, so you'll need to implement your own more intelligent on-disk data structures. If this sounds like unnecessarily reinventing the database, that's because it is. Don't use the Filesystem API for this kind of data, just use one of the above databases.

Conclusion: this stuff is still half-baked, but worth using anyway

The benefits of making your application work offline can be significant, and a combination of the above techniques can cover any modern browser. The biggest pain point is quota management, since most browsers seem to lag behind on adding user interface elements to control all this stuff. In many cases it's not even possible to revoke a quota decision once made without hacking on the browser's files directly.

Friday, February 10, 2012

Many sites block access from Amazon EC2

I'm a generally happy customer of Amazon Web Services. So when I needed to set up a VPN server, I figured EC2 would be a fine place to stick it. Unfortunately, this has some unintended consequences for VPN users whose browsing traffic gets routed out through Amazon's IP space.

Many high-profile sites (including Yelp and the whole Stack Overflow family) block access from EC2. This can lead to pretty unfriendly errors:

I've even seen a site that just breaks subtly when some assets load and others are blocked.

I can only assume EC2 is home to enough badly behaved crawlers and content-stealing bots that they ruined it for the rest of us. I've seen others comment on the difficulty of sending email from EC2 due to reputation problems, but I haven't seen much comment on this HTTP blacklisting. For me it's just an inconvenience, but if I was trying to build a search engine it would make EC2 unusable.

Saturday, October 1, 2011

Sproutcore Bindings, Observers, and Performance

Here's another Sproutcore performance tip that applies when you have computed properties depending on other computed properties. Consider this controller: Assume that the subpath property is updated to reflect our current URL (I'm not showing that code here, but check out sproutcore-routing). We figure out what month and year to show by parsing this URL fragment, or by providing suitable defaults based on the current time. Then we combine them into a date, which we use elsewhere to draw the appropriate calendar.

This all works fine, but it suffers a performance problem. To understand why, you must understand a little about how computed properties are implemented, and the difference between Observers and Bindings.

Computed Properties are Invalidated by Observers

An Observer is essentially a hook that runs immediately whenever you change a property. Computed properties like month, year, and firstDate are immediately invalidated by Observers whenever their dependencies change, which is good because it prevent a lot of potential race conditions. However, this means that every time subpath changes, firstDate will get invalidated even if month and year haven't really changed values. This is an inevitable side-effect of the fact that we want both

immediate invalidation of cached properties when their dependencies change, and
lazy evaluation of computed properties.

Bindings to the Rescue

This is one of the reasons we have Bindings. Bindings are implemented on top of Observers, but they are smarter and lazier. A Binding connects two properties (hereafter called "left" and "right"). The Binding will notice (through an Observer) that its left property has been invalidated. It will then wait until the end of the current run loop (which helps aggregate changes), and then it will read the left property (triggering re-evaluation). If it sees the same value it already had before, it will do nothing to the right property, and any computed properties that depend on the right property will not get invalidated.

Which brings us to the solution: By inserting bindings between our computed properties, we ensure that firstDate only re-evaluates when the month or year has truly changed. This in turn prevents everything else that depends on firstDate from getting re-evaluated and re-rendered every time an irrelevant change is made to subpath.

Classes like Date make this technique even more important, because two Date objects representing the exact same time are not === to each other, or even ==. So even if your drawing code uses a Binding to watch the date, it will detect a change every time firstDate re-evaluates, regardless of whether it truly represents a different date.

Wednesday, September 21, 2011

Shrinking the stock Ubuntu AMIs for EC2

Ubuntu provides official Amazon Machine Images (AMIs) for use on the EC2 cloud. By default you get a root filesystem of 8GB. If you're keeping all your application data on a separate partition (often a wise idea), the 8GB root partition may be more than you need.

You can shrink the root volume by stopping the instance, detaching the volume, attaching it to another running instance, and rsyncing its contents into a fresh, smaller volume. There are detailed instructions on the Ubuntu forums. However, there is one extra critical step that's required on at least the newest Ubuntu images (Natty) that I haven't seen documented anywhere.

By default, Ubuntu locates the root filesystem by volume label, not by device name. Take a look in /etc/fstab and you'll see which label it's looking for (currently "uec-rootfs"). So after you copy into the new volume, do

e2label /dev/xvdg uec-rootfs

(assuming /dev/xvdg is your new volume).

Thursday, September 15, 2011

Unit Testing Javascript with the Rails Asset Pipeline

The Rails 3.1 Asset Pipeline offers new possibilities for unit testing your Javascript/CoffeeScript assets. By making it easy to preprocess, bundle, and serve your code, Rails leaves you with very little glue to write.
Here's one strategy I've been using that offers several benefits that were previously hard to combine:

write both code and tests in any combination of pre-processable languages, relying on Sprockets and the asset pipeline to automatically find and compile everything.
tests can execute in the browser with a nice graphical UI.
the exact same tests can execute from the shell with no browser. I'm using therubyracer.

I have an "app/assets/javascripts/test.js" manifest file like this:
And I have an "app/assets/javascripts/test" directory that contains Jasmine specs. All the tests defined in there will automatically be included.
To run in the browser, you just need <script type="text/javascript" src="/assets/test"></script>. But for maximum debugability, put it into an Erb template and use <%= javascript_include_tag "test" %>. That way you can call it with &debug_assets=1 to get useful filenames and line numbers in your stack traces.

To run from the shell, we can grab our source directly out of the asset pipeline and stick it into a Javascript execution environment:
Of course there are still a few more details to take care of to trigger the tests, produce nice console output, and set Rake's exit status. Here is a full example config.

Wednesday, September 14, 2011

Keeping a Sproutcore 2.0 App Fast

Our application makes heavy use of the excellent open-source Sproutcore 2.0 library. One thing I've discovered about Sproutcore is that if you want your application to be as responsive as possible, there are a few tweaks you can make to keep re-rendering to a minimum.

One example is related to Sproutcore's #if and #unless template helpers. They are simple, clean, and nice to use. But as the amount of complex content that you're wrapping inside them grows, they can get expensive and cause perceptible delays as they rerender. Instead of re-rendering complex content, it's much cheaper to just hide it when you don't want it to be visible.

Here is an implementation of the helpers #showIf and #showUnless. You can use them like #if and #unless, but instead of rerendering their content, they just show or hide it with the CSS display:none property. (Except these don't offer {{else}} — you need to use a #showIf & #showUnless together for the same effect.) If you're wondering why my Javascript is so bizarre, it's because it's CoffeeScript. You can easily compile this to Javascript without installing anything by heading over to the CoffeeScript site, clicking "Try CoffeeScript", and pasting in the above code.