Saturday, August 31, 2013

How to Deploy ownCloud with postgresql, nginx, php-fpm, and chroot

ownCloud is a promising open-source, self-hosted alternative to Dropbox. While I can't quite endorse it for mission critical, large-scale deployment, I look forward to watching it mature. Recently, I spent about half a day getting it all working in my preferred deployment environment: nginx instead of Apache, postgresql instead of mysql, and carefully chrooted for better security isolation.

  1. On Debian Stable (Wheezy), you need these packages:
    apt-get install postgresql-9.1 nginx-light php5-fpm php5-gd php-xml-parser php5-intl php5-pgsql
  2. Download the ownCloud source tarball. I used owncloud-5.0.10.tar.bz2.
  3. Unpack it as /var/lib/owncloud/public.
  4. Make sure root owns all of it.
    chown -R root /var/lib/owncloud/public
  5. Create a dedicated owncloud user.
    adduser --system --no-create-home owncloud
  6. Give the owncloud user ownership of just the config and apps directories:
    chown -R owncloud:nogroup /var/lib/owncloud/public/config /var/lib/owncloud/public/apps
  7. Make the ownCloud data directory outside the web root, to make it less likely that a webserver misconfiguration will leave you with gaping security holes:
    mkdir /var/lib/owncloud/data
    chown owncloud:nogroup /var/lib/owncloud/data
    
  8. Put the bare minimum needed for a working chroot into /var/lib/owncloud:
  9. Create the postgres user and database. Normally I prefer to use peer authentication locally, so that passwords aren't necessary. But I don't think php5-pgsql supports that, so pick a password:
    sudo -u postgres createuser --pwprompt owncloud
    sudo -u postgres createdb -O owncloud owncloud
    
  10. Create a php-fpm process pool by putting this into /etc/php5/fpm/pool.d/owncloud.conf
  11. Configure nginx. This example puts owncloud into a subdirectory, so that it will be accessible like https://example.com/owncloud:
  12. Now fire up nginx and php5-fpm
    service restart nginx
    service restart php5-fpm
    
  13. Visit https://yourserver.com/owncloud. You should be greeted with the ownCloud configuration page, where you'll need to choose postgresql as your database, enter your database username (owncloud) and password, and set the data directory to /var/lib/owncloud/public.
  14. Remember to schedule background tasks. Rather than put a working shell in my chroot, I just use the web-cron method. Edit the ownCloud user's crontab with "crontab -e -u owncloud" and add this:
    */5 *  *  *  * curl --resolve yourdomain.com:443:localhost https://yourdomain.com/owncloud/cron.php
    
  15. Optional step for the security paranoid: owncloud is running chrooted in a very spartan jail. But it still has access to the local loopback interface and the local network. I use iptables to prevent it from talking to anything other than my postgresql database (this breaks some automatic update / app-store features in owncloud):
    iptables -A OUTPUT -d 127.0.0.1/32 -p tcp -m owner --uid-owner owncloud -m tcp --dport 5432 -j ACCEPT
    iptables -A OUTPUT -m owner --uid-owner owncloud -j REJECT --reject-with icmp-port-unreachable
    

Tuesday, March 13, 2012

Faster Rails Asset Precompilation

If you have multiple Javascript assets getting compressed by Uglify, your assets:precompile step might take a long time. Mine was adding several minutes to every deployment.

The assets:precompile task actually compiles everything twice. Once with digest-based filenames, and once with plain filenames. But as long as you're careful to always use the Rails-provided asset path helpers, you don't need the plain filenames, and you can use this task instead to cut your precompilation time in half:


namespace :assets do
  namespace :precompile do
    task :digest_only do
      invoke_or_reboot_rake_task "assets:precompile:primary"
    end
  end
end
Just drop that into your lib/tasks directory and invoke it with rake assets:precompile:digest_only.

Tuesday, February 14, 2012

Choosing the right browser local storage

So your new shiny HTML5 web app needs to persistently store data in the browser. What do you store where?

Good ol' caching

Just because you're building a fancy-pants offline HTML5 app doesn't mean you should overlook the classics. Lots of content can be happily cached the old fashioned way: by setting appropriate expiration headers. This works great for things like images, especially when their presence is not absolutely critical for the functioning of the app. This lets you conserve your more sophisticated persistent caches for things that are truly critical. Like...

Application Cache

The application cache is driven by the oft-misunderstood cache manifest file. This is where you should cache your main page, critical javascript, and stylesheets. Don't spend your application cache space frivolously, because once you hit quota, not all browsers will even give the user the opportunity to grant more. Chrome and Safari, for example, will simply refuse to cache your content. Storage quotas are an area that is poorly documented and rapidly changing.

As long as you keep it to code and markup, staying under quota shouldn't be difficult. You don't need to worry about versioning, as the browser takes care of all that. However, you should probably incorporate version tracking into your build & release process, and make your clients publish their version numbers to the server so you can see who's on what version.

HTML5 Storage

The localStorage object is a highly convenient place to stick your actual data models. It's one big persistent key-value store. But beware, there are pitfalls as your data grows! Specifically:

  1. localStorage is subject to a quota that cannot be increased in most browsers (typically 5*2^20 bytes). Again, quotas are an area where browsers are still weak and evolving.
  2. You may be using twice as much storage as you think because the encoding is UTF-16. So you really get 5*2^19 characters. And remember to include the cost of your keys, not just your values.
  3. If you decide to migrate from localStorage to one of the other options described below, prepare for a painful rewrite, because localStorage is synchronous, and the others are asynchronous. Too bad Javascript doesn't have first-class continuations.

webSQL (sqlite)

This is an abandoned web standard, so you might be tempted to skip it. However, it is the most widely available option for storing structured data, and not likely to go away just yet. It has the benefit of an expandable quota in webkit-based browsers. There are a few pitfalls to avoid:

  1. You pick an expected size when a database is first created. This size is checked against the available quota and possibly prompts the user to grant more quota.
  2. The aforementioned interaction only happens on creation, and you can't deleted a database once it's created. So if you later update your code to ask for more quota, you will be silently ignored. The openDatabase call will succeed, but your quota hasn't really increased, and when you fill it up the user may or may not be offered the opportunity to expand it.
  3. You can't list existing databases to figure out where all your quota went (at least not from within Javascript, though you can usually find the underlying sqlite files on disk).

IndexDB

The successor to webSQL with the W3C's holy blessing. Only available on Chrome, Firefox, and IE10. No iOS support, so I haven't used it extensively.

Filesystem API

You get to read and write files and directories, all within a sandboxed filesystem. This is great for large chunks of data, especially binary chunks. The downsides:

  1. Only supported in Chrome so far.
  2. Naively reading and writing a few tens of thousands of small files (to mimic the localStorage key-value store, for example) is slow, so you'll need to implement your own more intelligent on-disk data structures. If this sounds like unnecessarily reinventing the database, that's because it is. Don't use the Filesystem API for this kind of data, just use one of the above databases.

Conclusion: this stuff is still half-baked, but worth using anyway

The benefits of making your application work offline can be significant, and a combination of the above techniques can cover any modern browser. The biggest pain point is quota management, since most browsers seem to lag behind on adding user interface elements to control all this stuff. In many cases it's not even possible to revoke a quota decision once made without hacking on the browser's files directly.

Friday, February 10, 2012

Many sites block access from Amazon EC2

I'm a generally happy customer of Amazon Web Services. So when I needed to set up a VPN server, I figured EC2 would be a fine place to stick it. Unfortunately, this has some unintended consequences for VPN users whose browsing traffic gets routed out through Amazon's IP space.

Many high-profile sites (including Yelp and the whole Stack Overflow family) block access from EC2. This can lead to pretty unfriendly errors:

I've even seen a site that just breaks subtly when some assets load and others are blocked.

I can only assume EC2 is home to enough badly behaved crawlers and content-stealing bots that they ruined it for the rest of us. I've seen others comment on the difficulty of sending email from EC2 due to reputation problems, but I haven't seen much comment on this HTTP blacklisting. For me it's just an inconvenience, but if I was trying to build a search engine it would make EC2 unusable.

Saturday, October 1, 2011

Sproutcore Bindings, Observers, and Performance

Here's another Sproutcore performance tip that applies when you have computed properties depending on other computed properties. Consider this controller: Assume that the subpath property is updated to reflect our current URL (I'm not showing that code here, but check out sproutcore-routing). We figure out what month and year to show by parsing this URL fragment, or by providing suitable defaults based on the current time. Then we combine them into a date, which we use elsewhere to draw the appropriate calendar.

This all works fine, but it suffers a performance problem. To understand why, you must understand a little about how computed properties are implemented, and the difference between Observers and Bindings.

Computed Properties are Invalidated by Observers

An Observer is essentially a hook that runs immediately whenever you change a property. Computed properties like month, year, and firstDate are immediately invalidated by Observers whenever their dependencies change, which is good because it prevent a lot of potential race conditions. However, this means that every time subpath changes, firstDate will get invalidated even if month and year haven't really changed values. This is an inevitable side-effect of the fact that we want both

  1. immediate invalidation of cached properties when their dependencies change, and
  2. lazy evaluation of computed properties.

Bindings to the Rescue

This is one of the reasons we have Bindings. Bindings are implemented on top of Observers, but they are smarter and lazier. A Binding connects two properties (hereafter called "left" and "right"). The Binding will notice (through an Observer) that its left property has been invalidated. It will then wait until the end of the current run loop (which helps aggregate changes), and then it will read the left property (triggering re-evaluation). If it sees the same value it already had before, it will do nothing to the right property, and any computed properties that depend on the right property will not get invalidated.

Which brings us to the solution: By inserting bindings between our computed properties, we ensure that firstDate only re-evaluates when the month or year has truly changed. This in turn prevents everything else that depends on firstDate from getting re-evaluated and re-rendered every time an irrelevant change is made to subpath.

Classes like Date make this technique even more important, because two Date objects representing the exact same time are not === to each other, or even ==. So even if your drawing code uses a Binding to watch the date, it will detect a change every time firstDate re-evaluates, regardless of whether it truly represents a different date.

Wednesday, September 21, 2011

Shrinking the stock Ubuntu AMIs for EC2

Ubuntu provides official Amazon Machine Images (AMIs) for use on the EC2 cloud. By default you get a root filesystem of 8GB. If you're keeping all your application data on a separate partition (often a wise idea), the 8GB root partition may be more than you need.

You can shrink the root volume by stopping the instance, detaching the volume, attaching it to another running instance, and rsyncing its contents into a fresh, smaller volume. There are detailed instructions on the Ubuntu forums. However, there is one extra critical step that's required on at least the newest Ubuntu images (Natty) that I haven't seen documented anywhere.

By default, Ubuntu locates the root filesystem by volume label, not by device name. Take a look in /etc/fstab and you'll see which label it's looking for (currently "uec-rootfs"). So after you copy into the new volume, do

e2label /dev/xvdg uec-rootfs
(assuming /dev/xvdg is your new volume).

Thursday, September 15, 2011

Unit Testing Javascript with the Rails Asset Pipeline

The Rails 3.1 Asset Pipeline offers new possibilities for unit testing your Javascript/CoffeeScript assets. By making it easy to preprocess, bundle, and serve your code, Rails leaves you with very little glue to write.
Here's one strategy I've been using that offers several benefits that were previously hard to combine:
  • write both code and tests in any combination of pre-processable languages, relying on Sprockets and the asset pipeline to automatically find and compile everything.
  • tests can execute in the browser with a nice graphical UI.
  • the exact same tests can execute from the shell with no browser. I'm using therubyracer.
I have an "app/assets/javascripts/test.js" manifest file like this:
And I have an "app/assets/javascripts/test" directory that contains Jasmine specs. All the tests defined in there will automatically be included.
To run in the browser, you just need <script type="text/javascript" src="/assets/test"></script>. But for maximum debugability, put it into an Erb template and use <%= javascript_include_tag "test" %>. That way you can call it with &debug_assets=1 to get useful filenames and line numbers in your stack traces.

To run from the shell, we can grab our source directly out of the asset pipeline and stick it into a Javascript execution environment:
Of course there are still a few more details to take care of to trigger the tests, produce nice console output, and set Rake's exit status. Here is a full example config.