When I read Why Doesn't Sandstorm Just Run Docker Apps?, I instantly knew I wanted to pick Kenton Varda's brain. This week I had the honor to spend some time with Kenton.
In this week's podcast interview, we are talking with Kenton about the recent launch of Sandstorm. We go into:
- Kenton's background at Google
- Why NOT use Docker in Sandstorm?
- When crowdfounding $50,000... what will you use it for?
- What's the future of Sandstorm?
Kenton, can you tell us about your background?
I worked for Google for seven and a half years, ending at the end of last year. I worked on a few things there, I worked on search infrastructure for a while, helping to launch universal search. For a while I worked on sharing and access control for things like Google Docs, when you click on the share button on Google Docs the stuff you see there I wrote the infrastructure to enforce it. I'm probably best known for my work on Protocol Buffers, which is the data interchange format used by all of Google's servers to talk to each other, and for a lot of Google's storage. I didn't invent Protocol Buffers but I rewrote it and I open sourced it. Now there are probably thousands of companies using it, including companies like Twitter, and even Blizzard, gaming companies.
What is Sandstorm? Where's it come from, how did you come up with it, and can you tell us some examples of what you can do with it?
For the past year and a half since I left Google I have been working on a couple of things, one is Cap'n Proto, a replacement for Protocol Buffers and RPC system. Sandstorm is built on top of that. Sandstorm is a radically easier way to run personal instances of web apps. It let's you have your own personal server and install apps to it entirely through a web interface like you are installing apps on a mobile phone. So there's no editing of config files or running commands at a command line. An end user, a non-technical user, can actually use Sandstorm and run apps on it. You run apps like email or document editors or blogging platforms, anything that involves personal data owned by a particular user makes sense on Sandstorm.
Would you use Sandstorm on a cloud like Amazon or Rackspace or CenturyLink Cloud? Or would you use it on your own laptop, how does that work?
Sandstorm is open source, so you can run it anywhere you want. We think most users probably want to use it in the cloud to get the benefits of being able to reach their server wherever they are regardless of the current state of their internet connection. You really wouldn't run it on a laptop, certainly, because a laptop is not running most of the time. You could, and many users do, run it on a machine on their home network for increased privacy. What we hope to see is a combination of some users doing that and a lot of users using cloud hosting services. We plan to offer a managed hosting service ourselves for something like $5 a month to help fun development. So its up to you, Either way you can install the software you want, there are no limits.
How big is the catalog of software? Besides blogging and email applications, how long has Sandstorm been around?
I started working on Sandstorm at the beginning of this year  so it hasn't been around very long. It sort of became usable a couple of months ago and we've been porting a lot of apps to it. Most of the apps that exist right now are things that we ported ourselves from open source apps. Since its based on containers we can take any Linux based tech stack and port it to Sandstorm. We have about 10-15 apps right now. There's things like Wordpress and Ghost, blogging apps, there's Etherpad and EtherCalc, document and spreadsheet editors. We ported Apache Wave, formerly known as Google Wave, you can now run that on Sandstorm. You can go try out the whole list if you go to demo.sandstorm.io, you can try installing any of these apps and see for yourself.
Using Sandstorm would be an alternative to setting up Ghost or Wordpress on your own, on your own machine?
Yes. It is much easier than setting up yourself because you just point and click. You click install and you have the app running. It takes like 5 seconds to spin up a container.
One of the things I wanted to talk to you about is that you have a different point of view about Linux containers. The trendy thing is to go with Docker, and Sandstorm is not built on Docker. Why not? Why did you decide not to go with Docker, can you tell us more about that?
Sandstorm uses the same Linux kernel features that Docker uses, but it turns out that there is not much reason for us to actually use Docker. The Linux kernel features involved are actually pretty simple to use. Its only a couple of dozen lines of C++ code that we use to set up our container. What Docker does is provide a configuration layer on top of that and enough features to create completely transparent containers in which you can host arbitrary existing Linux distros. That's not what Sandstorm is trying to do. Sandstorm is willing to say, OK, apps in our containers have to be tweaked a bit to understand our container environment. Like they have to store their data under /var because that's the only directory that is going to be writable in our containers. Or they have to understand that procfs isn't going to be available for them to use because we want to turn that off to reduce the attack surface of the Linux kernel. In any case, apps written for Sandstorm, since they do need to have a user interface, you can't just ssh into a Sandstorm app. The only way for a user to configure it would be through a user interface. So apps have to be specifically packaged for Sandstorm to make any sense on Sandstorm, and if that is the case we might as well say we are going to set up our containers in this one particular way and all apps have to conform to that.
So you wouldn't, say, set up a Hadoop cluster with Sandstorm. It's not meant for programming tasks, it's meant more for web application tasks?
Right, Sandstorm is meant to run web apps for users. Unlike Docker, it has a user interface that is actually a web interface instead of a command line interface and is meant specifically for the kinds of apps that have private user data in them. We do not see Sandstorm as a competitor to Docker at all because it is a completely different use case that it is aiming at.
How complex can the apps be within Sandstorm? You can do a Wordpress application but can you do multi-tier applications? How complex can you get if you are thinking of developing apps for Sandstorm?
Can you define what you mean by multi-tier? Can you have micro-services in a bunch of containers within Sandstorm that talk to each other, are the Wordpress and MySQL in the same container, can you tell us a bit more about the backend implementation? So, if you are using something like MySQL you actually want to run it in the same container as the application because you don't expect users to understand what MySQL is and why they need to start that up. Though we do encourage apps to use SQLite instead because when you have a small database and there is only one app talking to it there's not a lot of reason to have a whole MySQL instance for that. Apps will be able to talk to each other, this is a part we are still working on, but they talk to each other through Cap'n Proto RPC. The idea there is that we are introducing a user interface called "powerbox", the powerbox is a way to introduce apps to each other. What it does is one app says "hey, I implement this Cap'n Proto interface here's an object reference implementing it" and passes it off to the system. Another apps says "hey I need an object implementing this particular Cap'n Proto interface" and asks the system for that. The system then displays a picker to the user saying "here are all the apps you can hook this up to, which one do you want to use?" The system is privileged with the knowledge of all your apps without having to pass that on to the app making the request. When you choose one you are implicitly saying that you want that app to have permission to talk to the other app, so there's no need for a separate permission question. The requesting app just gets that object reference and can start making requests to it. So that's how you can hook up multiple containers to talk to each other. So you can imagine an environment where you do a more traditional distributed application that is going to do some big data crunching by creating a bunch of instances and hooking them up, but that's really not what Sandstorm is going for, its more about user-facing apps.
Does it do anything with the persistence layer, if you are using a SQLite database if a container crashes, or do you not have that problem? How do you deal with persistence problems, if a disk drive crashes is there a backup?
Right now the code we have runs on a single machine and each container gets a private slice of the filesystem. If you want to make backups there is a button in the user interface that just downloads a zip of the app storage and you can re-upload that later. We will develop a more sophisticated backup system for when you want to run Sandstorm on a cluster of machines. It will not only written to the local filesystem, but will also have written to other machines before the HTTP request returns so you know you are not going to lose data if one particular machine dies.
Like Flynn, which is a Docker-based Platform as a Service, you guys have chosen a crowd-funding approach to funding. You are almost finishing a $50K crowd-funding raise, and it looks like they are getting pretty close but anybody listening, if you think this is cool stuff, please go support it. What are you going to do with that money, what are your goals for the $50K?
I'm going to use the money to feed myself! So, software development is expensive, we do have to pay developers, so that's a big part of what we will use the money for. We will probably have to raise money from other places as well, that's the idea. If you pay into our crowd-funding campaign now you are basically pre-paying for service. If you choose one of these service perks you are pre-paying for managed Sandstorm hosting which will become available early next year. We also have perks for people who want to self-host, things like t-shirts and stickers and such, as well.
So its kind of an app store, consumery front-end, for developing containerized web apps? So if people are interested in that concept, this could be a managed version of that from some really smart people. How big is the team?
We have four people, there's me and two other developers who have been working on porting a lot of the apps and adding some new features, and there is my co-founder Jade who is developer relations.
What are the biggest challenges that you see in adopting Linux containers today, for regular developers in corporations?
It depends on what you are trying to do with them. One misconception I think a lot of people have is about security of Linux containers. There's kind of a conflict between security and transparency. You want to have a fully transparent system that appears to the guest app as if it a complete Linux system running on bare metal or a VM. You have to enable a lot of kernel features which gives the container a large attack surface. There are Linux kernel vulnerabilities found on a fairly regular basis that would allow an app inside that container to break out. Now for a lot of people that doesn't really matter, if you are running apps that you, more or less, trust are not malicious then Docker is pretty awesome. You can run what feels like VMs a lot more efficiently than VMs. You do have to keep in mind that Linux containers with all the kernel features enabled are not a good sandbox right now. That is why in Sandstorm we actually disable a lot of the Linux kernel features. We don't mount procfs, we don't mount sysfs, we use setconf to disable a lot of system calls that apps really have no business using. In that way we get closer to something that is not going to be broken quite as often.
So one of Sandstorm's benefits is that it is more secure than Docker?
Yes, because it has a smaller attack surface it is a lot less likely that an app is going to be able to find an exploit and break out.
Where did the Sandstorm name come from?
Sandstorm is a sandbox in the cloud, is one way to look at it. Another way to look at it is that the app instances on Sandstorm are meant to be very "fine-grained", you can imagine a bunch of grains of sand flying around, we actually call our app instances "grains". It's interesting, actually, our containerization is designed to allow super fine-grained instances so not only does every user have their own instance of an app but every document can be a separate instance for document oriented apps. There's a couple of reasons why that is a good idea; one is that Sandstorm itself can implement sharing on those so you don't have to have every app have its own sharing model, another is that if you have one document that is public and one document that is super secret that you don't want anyone to know about you don't have to worry that someone that has access to the public document might be able to hack through using a bug in the app to get access to the private document. Sandstorm is enforcing that security for you. To make the containers that fine-grained we have to do things like make sure that all of the app's code and libraries are shared read-only among all instances and aggressively shut down instances that aren't in use. If you have a per-user instance of an app they are not going to be using it 99% of the time so it shouldn't be running and using up RAM at that time.
Do you see a corporate version of Sandstorm in the future, or is this really for personal use?
We think that corporations will have a lot of interest in this because there are a lot of corporations that can't really use SaaS offerings today because either some of them feel Google is a competitor and so they don't want to use Google Apps, some of them can't due to regulations. In the United States we have things like HIPAA, in other countries things are even stricter, in Germany its really hard to put data on another companies servers because of the regulations there. There are a lot of companies that really want to keep their data in-house on their own machines and have a lot of trouble maintaining that, its a lot of work. So we think that Sandstorm will make a lot of sense for them and we plan to eventually build an enterprise offering optimized for those environments.
What Linux container projects are you most excited about? What are you watching, what open source projects do you think are very promising, and what should we be looking out for?
Well, I think Sandstorm is exciting! To be honest, I obviously know all about Docker and Docker is pretty cool for the use case it targets. I haven't kept up enough on the other container projects out there.
Just curious, while you are developing Sandstorm if you ever come up with, if you ever see soem cool stuff let me know. I'm always looking out for cool technologies. So what's next for Sandstorm? What's the roadmap, where is it all going?
It looks like we're going to be funded so next up we have to build our hosting service which we've been selling in the perks. So that means making Sandstorm able to use multiple machines and treat that as a single instance so we can actually scale up to however many users we get. Implementing things like quotas, so that one user can't consume all our resources, but also adding features like the ability for apps to export API's is a request we are getting a lot because that will not only enable apps to talk to apps outside of Sandstorm but also client-side apps. For instance, if you want a mobile client that talks to your server there needs to be an API for it to talk to. Currently, the only way you can access an app is through a browser and its not really possible to export an API through that path. That's the thing about Sandstorm apps since they are intended to be sandboxed and intended to prevent leakage of data we do not allow an app to arbitrarily talk to the rest of the world. It does not get arbitrary network access, it has to go through Sandstorm's features for communicating, for exposing APIs, or sending emails for instance. So we have to expand those features and make it easier to add new protocols to Sandstorm. Which we will be doing through a notion of "cloud device drivers" which are really more protocols that we support.
What do you think the future of Linux containers is, how are they changing the technology world? Do you think that containers will take over virtual machines, how do you think the balance of that is going to work out?
I think containers make much more sense than virtual machines for most use cases. Obviously, they are much cheaper, why should you be running a whole separate kernel for every application, that's a lot of waste and all the emulation layer and everything you need there. There are the security issues but over time that will improve, we will find and fix the security holes and we will find ways to lock down containers better. Long term, I would say that VMs won't make a lot of sense in a few years just because of the huge waste of resources to do what you could do with containers.
Any more on the crowd-funding perks?
In addition to the hosting perks there are a number of things like stickers, t-shirts, you can get App Store credit. Our App Store is going to support a "pay what you want" model for open source apps which we hope will encourage more open source web app development. So you'll get credit that you can actually give to these open source apps in the future. There were seats on the "app committee" which is going to direct our porting efforts but they've been sold out, unless you want to splurge for a Key Individual Sponsorship level at which point we will actually put your picture on our team page. And you'll get a app committee seat and a bunch of other goodies. There are also a few LAN Party invites left. You may have heard of my epic LAN Party house, its been on the internet a few times.
Where can you find information about the LAN Party?
Its all on the campaign page, just look at the perk list.
And where can people find out more about Sandstorm?
Sandstorm.io. We are @sandstormio on Twitter, we are also on Facebook and Google+. If you go to Sandstorm.io there are all the buttons for all those.