Sunday, March 25, 2012

How a web application can download and store over 2GB without you even knowing it

I have been experimenting with the HTML5 offline application cache some more over the last few days, doing boundary tests in an attempt to learn more about browser behaviour in edge cases.

One of these experiments was testing the cache quota.

Two weeks ago, I blogged about generating and serving an offline application manifest using ASP.NET MVC. I reused that code to add hundreds of 7MB PDF files to the cache.
public ActionResult Manifest()
{     
    var cacheResources = new List<string>();
    var n = 300; // Play with this number

    for (var i = 0; i < n; i++)
        cacheResources.Add("Content/" + Url.Content("book.pdf?" + i));

    var manifestResult = new ManifestResult("1")
    {
        NetworkResources = new string[] { "*" },
        CacheResources = cacheResources
    };

    return manifestResult;
}
I initially tried adding 1000 PDF files to the cache, but this threw an error: Chrome failed to commit the new cache to the storage, because the quota would be exceeded.

After lowering the number of files several times, I hit the sweet spot. I could add 300 PDF files to the cache without breaking it.

Looking into chrome://appcache-internals/, I can see the size of the cache being a whopping 2.2GB now for one single web application.


As a user, I had no idea that the website I'm browsing is downloading a suspicious amount of data in the background. Chrome (17.0.963.83), nor any other desktop browser that I know of, warns me. I would expect the browser to ask for my permission when a website wants to download and store such an excessive amount of data on my machine.

Something else I noticed, is that other sites now fail to commit anything to the application cache due to the browser-wide quota being exceeded. I'm pretty sure this 'first browsed, first reserved' approach will be a source of frustration in the future.
To handle this scenario we could use the applicationCache API to listen for quota errors, and inform the user to browse to chrome://appcache-internals/ and remove other caches in favor of the new one. This feels sketchy though; shouldn't the browser intervene in a more elegant way here?


What are your thoughts? What would you want your browser to do in these scenarios?

Saturday, March 17, 2012

Sent from my phone

According to Matt from 37signals the line "Sent from my iPhone" at the bottom of an email means this:
Let’s be honest. “Sent from my iPhone” really means “I’m not going to bother to proofread and correct this because it would take me an extra 30 seconds.”
I agree. I too use this line as an excuse to write a terse message and omit proper salutations.

However, I also think these four simple words greatly helped the viral growth of the mobile phone. Having early adopters brag about how they're sending emails from their fancy new phone must have been an invaluable form of word-of-mouth advertising.

I think it's safe to say that those four words first are a marketing technique and then a convenient way of letting your correspondents know that there might be a typo in your message. If not, the default line wouldn't include the brand name.

I'm turning this thing into a rambling though. What I really wanted to share is a snippet of the much recommended book The Lean Startup, where Eric Ries shares a story on how Hotmail used a similar technique in 1996 to fuel their viral engine of growth.
But everything changed when they made one small tweak to the product. They added to the bottom of every e-mail the message "P.S. Get your free e-mail at Hotmail" along with a clickable link.
Within weeks, that small product change produced massive results. Within six months, Bhatia and Smith had signed up more than 1 million new customers. Five weeks later, they hit the 2 million mark. Eighteen months after launching the service, with 12 million subscribers, they sold the company to Microsoft for $400 million.
I find it incredibly intriguing how a few words can have such an impact. Maybe you do too.

Wednesday, March 14, 2012

HTML5 Offline Web applications as an afterthought in ASP.NET MVC

Recently I prototyped a mobile web application using ASP.NET MVC, jQuery Mobile and some HTML5 features. One of the key goals was to find out how far you can push a web 'application' until the browser starts getting in the way. Working disconnected is one of these things that appear to be a major showstopper at first.

However - to my surprise honestly - the HTML5 Offline Web applications API seems to be widely implemented across modern browsers already. Not of all of them though. Looking into the specifics, the API itself is fairly straightforward. At his core, you will find the manifest file, which dictates which files should be cached by the browser. The API provides other useful events and methods for inspecting the status of the cache and swapping the cache for a newer version, but they are out of scope today. A useful resource to read up on the full API can be found here, and a working example implementation can be found here.

The manifest file

Back to the manifest file. A manifest file could look like this.


The first line in the file should say CACHE MANIFEST. If you want to write comments, you should prefix the lines with a number sign.

In the CACHE section you declare which files should be cached. An important and interesting note is that these files will be served from the cache, even if you're online.

In the NETWORK section you declare which files the browser should try to download from the server, regardless of whether the user is online or offline.

In the last section, the FALLBACK section, you can define fallback resources to be used when the user is offline.

Serving and generating the manifest file

Now that we got all this theory out of the way, let's look at generating and serving the manifest file using ASP.NET MVC.

I started by adding a ResourcesController with one action named Manifest.
public class ResourcesController : Controller
{             
    public ActionResult Manifest() { }
}
This action should serve a text file, using a specific cache-manifest MIME type. To accommodate this I created a new action result, which inherits from the FileResult class, and overwrites the content type.
public class ManifestResult : FileResult
{
    public ManifestResult(string version)
        : base("text/cache-manifest") { }    
}
I also made this same class (for the sake of example) responsible for formatting and writing the manifest file to the output stream. That's why I added a few extra properties to the manifest result, one for each section and one for versioning. Versioning the file comes in handy when you want to expire the cache, because it only expires when the manifest file changes.
public class ManifestResult : FileResult
{
    public ManifestResult(string version)
        : base("text/cache-manifest")
    {
        CacheResources = new List<string>();
        NetworkResources = new List<string>();
        FallbackResources = new Dictionary<string, string>();
        Version = version;
    }

    public string Version { get; set; }

    public IEnumerable<string> CacheResources { get; set; }

    public IEnumerable<string> NetworkResources { get; set; }       

    public Dictionary<string, string> FallbackResources { get; set; }        
}
To write the file to the output stream, I had to override the WriteFile method.
protected override void WriteFile(HttpResponseBase response)
{
    WriteManifestHeader(response);            
    WriteCacheResources(response);
    WriteNetwork(response);
    WriteFallback(response);
}

private void WriteManifestHeader(HttpResponseBase response)
{
    response.Output.WriteLine("CACHE MANIFEST");
    response.Output.WriteLine("#V" + Version ?? string.Empty);            
}

private void WriteCacheResources(HttpResponseBase response)
{
    response.Output.WriteLine("CACHE:");           
    foreach (var cacheResource in CacheResources)
        response.Output.WriteLine(cacheResource);
}

private void WriteNetwork(HttpResponseBase response)
{
    response.Output.WriteLine();
    response.Output.WriteLine("NETWORK:");            
    foreach (var networkResource in NetworkResources)
        response.Output.WriteLine(networkResource);
}

private void WriteFallback(HttpResponseBase response)
{
    response.Output.WriteLine();
    response.Output.WriteLine("FALLBACK:");
    foreach (var fallbackResource in FallbackResources)
        response.Output.WriteLine(fallbackResource.Key + " " + fallbackResource.Value);
}
In the CACHE section I wanted to include all my static resources, meaning the contents of the Scripts and Content folder. To do this in a simple and low-maintenace fashion I introduced the GetRelativePathsToRoot method. This method takes the path of a virtual folder, recursively scans its content and returns a list of relative paths for each file.
private IEnumerable<string> GetRelativePathsToRoot(string virtualPath)
{
    var physicalPath = Server.MapPath(virtualPath);
    var absolutePaths = Directory.GetFiles(physicalPath, "*.*",   SearchOption.AllDirectories);

    return absolutePaths.Select(
        x => Url.Content(virtualPath + x.Replace(physicalPath, ""))
    );
}
For the Content folder, the result could look something like this.


To add pages to the CACHE section, I used the Url.Action method.

For the NETWORK resources, I added an asterisk, which basically means that the cache shouldn't be used when the user is online. I didn't specify any fallback resources in this example.
public ActionResult Manifest()
{
    var pages = new List<string>();
    pages.Add(Url.Action("SomeAction", "ControllerName"));    

    var scriptsPaths = GetRelativePathsToRoot("~/Scripts/");
    var contentPaths = GetRelativePathsToRoot("~/Content/");

    var cacheResources = new List<string>();
    cacheResources.AddRange(pages);
    cacheResources.AddRange(contentPaths);
    cacheResources.AddRange(scriptsPaths);
    
    var manifestResult = new ManifestResult("1.0")
    {
        NetworkResources = new string[] { "*" },
        CacheResources = cacheResources
    };            

    return manifestResult;
}
Setting up a route and including the manifest

Now that we are able to generate and serve a manifest file, we should set up a specific route for the manifest file; some browsers aren't very forgiving and expect it to have a specific name and location: /cache.manifest.
routes.MapRoute("cache.manifest", "cache.manifest", new { controller = "Resources", action = "Manifest" });
The last step I had to take was include a reference to the manifest file in the html element.
<html manifest="@Url.RouteUrl("cache.manifest")")/>
Poor man's testing

To verify if all of this works, you can look at the console of the Chrome developer tools. You should see something like this.


That console logging has proven to be extremely useful when debugging the manifest file.

You could also just browse to the manifest file to inspect its content. Don't mind this screenshot too much, obviously there's plenty of cleaning up to do in my Scripts folder.


Summary

In this post I showed you a technique I came up with to take advantage of ASP.NET MVC to easily generate, maintain and serve an HTML5 Offline Webappliction manifest file:
  • Create a controller and action that can serve the file
  • Create a new action result, which returns the correct MIME type and formats the file
  • Set up a specific route
  • Include a reference to the manifest in the html tag

Remember, this is a proof of concept, it's not perfect. I look forward to any feedback you might have!

Sunday, March 11, 2012

Learning: the Hacker Way

I have had a fair amount of discussions on continuous learning and knowledge sharing the past few days. It became rather obvious that a lot of us have developed their own techniques, but also that maybe most of us are still in search of more efficient techniques. Having gone through several phases myself, I would like to share my current way of learning: the Hacker Way.

Here are some snippets taken from a recent letter from Mark Zuckerberg addressed to the Facebook shareholders.
Hacking is an inherently hands-on and active discipline. Instead of debating for days whether a new idea is possible or what the best way to build something is, hackers would rather just prototype something and see what works. There's a hacker mantra that you'll hear a lot around Facebook offices: Code wins arguments.
I like to believe learning should be a hands-on activity as well. Basically, stop consuming, start producing. Don't get me wrong, I do think there is value in reading blog posts (I might be slightly biased on this one), reading books and watching videos, but I find that this value is marginal compared to what you gain by actually doing it.

I remember I wanted to step up my JavaScript game two years ago, and ordered a book. A few months after finishing the book, I finally had the chance to implement something in JavaScript, but I couldn't. I remembered some syntax, and most concepts, but I couldn't do it. Eventually I really learned JavaScript by doing it on the job, having to constantly fall back on the book and Stackoverflow, making costly mistakes along the way.

It's not a terrible idea to pick up a new technology on the job, but it very much depends on the environment. If you're in the consultancy business, there often is no room to pick up a brand new technology on the job. Customers expect you to know what you're doing.
Imagine you really want to get into mobile development, and you were able to get in the race for a new mobile project. Getting interviewed by the customer, you have to admit that you haven't built something for mobile yet, but you did read a book and you find mobile development really interesting. It's really hard to sell that. If you are able to show something you made, and talk about things that work and things that don't work in the real world, the customer can get a far better feel if you would be a good fit for the project.
The Hacker Way is an approach to building that involves continuous improvement and iteration. Hackers believe that something can always be better, and that nothing is ever complete. 
Hacker culture is also extremely open and meritocratic. Hackers believe that the best idea and implementation should always win - not the person who is best at lobbying for an idea or the person who manages the most people. 
This is where it becomes interesting and where you can really make the difference as a company.

Depending on various parameters, you often don't have the room to experiment on the day job, but if you're building something by yourself or in a small team without these constraints, you can get wild: experiment, innovate and fail often. Not only do you build experience faster, but you also (maybe indirectly) challenge existing practices and build a deeper understanding of the used methodologies and technologies. Also, having the freedom to innovate, aspiring to get closer to the Silver Bullet, can bring yourself, the company and the industry to the next level.

This is just the first step of the cycle though. The next step should be to get these little projects and acquired experiences out there. Share them with your peers and the community, educate those who want to listen and ask for feedback from those who care, hopefully fueling the inspiration of others.

A welcome side-effect might be that the results could be used to extend your personal and your company's portfolio. This can help you prove that you can do more than just talk the walk, talk is cheap after all.

I want to know from you what the flaws are in my way of thinking. Is the described technique naive, too idealistic or unrealistic?
Which technique has proven to work best for you?