This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

Tuesday, August 15, 2006

Tutorial: How to Profile and Optimize Ajax Applications

I've been asked by a few folks recently on how to profile an Ajax application and improve its performance, so I thought I would detail some of the specific and general strategies to do this in a tutorial here.

I recently had to do this the last two weeks for the HyperScope project. HyperScope was suffering from very slow page load time, including some of its operations. I decided to launch into a profiling and performance session to improve it.

Premature Optimization is the Root of All Evil...

The first thing to know about profiling happens far, far before you do an optimization session, at the beginning of your project; it is one of the most challenging phases because it requires you to balance two different approaches that people struggle with.

The first rule is to avoid premature optimization at all costs. I have literally been at companies that I've seen destroyed by premature optimizations. I can't name names, but just know that doing premature optimization can massively destroy your design and create such a complicated system that when you actually come to your performance and profiling phase, you don't know where to start. The most important thing is to build the right system. Don't take short cuts based on what you believe will affect its performance; build a system that is maintainable, that satisfies getting your functionality out the door, that satisfies getting your system built quickly, whatever your important things are, but don't do premature optimization.

...But Order of Magnitude is Important

On the other hand, it's very important to do what I call Order of Magnitude calculation, also known as Big O in more formal computer science. These are decisions that you need to make that will affect the performance in an order of magnitude way. When I say order of magnitude, it means the difference between 100 milliseconds and 1,000 milliseconds, or a couple of seconds and 10, 20, 30 seconds; these are tricky and you should make these decisions early on.

In general, what I do, is I'll do a simple prototype that exhibits the major characteristic of several different architectures. I'll see how these prototypes perform in an order of magnitude way and build these before I decide on my architecture. Some of this is just intuition and experience, but for any area of the system that you think might cause performance problems or is high risk for performance issues should have a simple prototype before. You shouldn't go crazy about this.

One of the major decisions I had to make for the HyperScope project was the kind of architecture I was going to use. One of my options was to create a sort of DOM walker architecture where I implemented all of Engelbart's functionality by walking over a DOM tree. My other alternative was to use XSLT on the client side to render an XML data structure according to the addressing and viewspec options in force, then blit the HTML results to the screen. The third alternative was to somehow use HTML in conjunction with XSLT and XPath. Each of these architectures have their strengths and their weaknesses, but I wanted to understand their order of magnitude performance characteristics before I chose one. I prototyped all three, and I found that the DOM walker architecture suffered from one or two orders of magnitude worse performance than the other two architectures. I ended up settling on the XSLT architecture both because it was faster, handled some important edge conditions, and worked across Internet Explorer better than the HTML + XSLT/XPath architecture.

Scalability vs. Performance

A quick note. Many programmers confuse scalability with performance. Scalability has to do with how your performance changes as you add more of something, such as more users. Your performance can be very low, but you can have linear scalability, which people confuse. For example, Enterprise JavaBeans promised linear scalability; what they didn't say was the overall performance actually was pretty lousy to start with in most cases. It's important to separate these two terms. This entire discussion is about end-user performance, not about scalability or how an application changes as you add users, that's a whole other topic.

The Profiling Infrastructure, or, The Scientific Mind

After you've designed your architecture and built your system, you might find that early builds of your system are slow. At a certain point, you can decide to continue building all of your functionality or you can decide to do a profiling session.

At this point, the most important thing is to approach the task like a scientist. You want to approach this with a fresh mind. Like a scientist, you need to collect data. You need to create a profiling infrastructure.

So your first step is creating a profiling infrastructure which basically means you need a way to calculate your total page load time and the time it takes to run important functions. This should include fetching all of your resources, executing all of your page load functionality, doing your actions, and you should also have a way to capture the amount of time for individual segments. It's important to differentiate trying to optimize page load time from trying to optimize some operation after you are already loaded; you should separate these two things.

Let's focus on page load time first. You need to create a profiling infrastructure that can capture the instant your page starts to load, the amount of time it takes to pull all of its resources, such as Javascript and so on, and finally whatever operations occur on page load.

In HyperScope, since I use the Dojo project, I use the dojo.profile library. The one tricky thing is that dojo.profile can't capture the total amount of page load time, because obviously it can't start up before Dojo is loaded. What I had to do, if you crack open Hyperscope, you will see at the very, very top before everything else that there is a script block that captures the initial start time, with a new Date().getTime() call. I save this value on the window object as a variable, such as window.docStartTime, and then later on after I've finished everything, I can calculate the end time and now I've got the total document time.

Something else that is important is that you want to have a way to toggle your profiling infrastructure on and off without editing your source code. Again in HyperScope, I put a URL flag that you can put up in the browser location bar, such as ?profiling=true. If that is on, then I will dump out the profiling . This is important, because you need to be able to switch back and forth between profiling without editing your code, especially on a production server where you want to see how the network and its particular characteristics affect performance.

Now that you've got a profiling infrastructure you can start collecting data like a scientist.

Collecting Data

There are a couple of important things about collecting data. The first thing you need to know is you should always prime the pump. That's basically saying you should always take many readings of your data, such as five or ten readings, and you should throw the first three away. This is called priming the pump because you never know how different parts of the infrastructure may affect initial performance; for example, if you are hitting a SQL database, the first call might cache the SQL results, so subsequent calls are faster, or some of the information from the disk might stay in memory in a memory cache.

Browser Caches, Local Proxies, and Remote Servers

nce you've primed the pump, you need to differentiate two things in your test runs. You should see what the time is where you clear the browser cache each time. This simulates hitting a site for the very first time. You should also get readings for pulling from the browser cash. You'll get two different numbers, and both are important.

The other thing that you should do is you should both get readings for a local server, where you're running a local web server on your machine, so you can see the characteristics without the network getting in the way. You should also put it up on a remote web server on a real network and then see how that affects performance, because network latency will kill you. The Internet has terrible latency in general, even if the bandwidth of a particular connection is fast. Make sure the web server locally and remotely are the same one; if production is Apache, make it Apache local, not Jetty or something else. Also try to simulate the same configuration options.

Finding Bottlenecks, or Hunt the Wumpus

Once you've got your data, in general what you'll find is you'll have one, two, or three bottlenecks that consume 30 to 60 percent of the total time. You'll start by attacking these bottlenecks.

Every time you attack a bottleneck, you will experiment and try some proposed solution; you'll turn your profiling on and find that, for example, if you used a different kind of DOM method, or a different strategy, that you'll actually get better or worse performance. The most important thing is to attack the low-hanging fruit, the bottlenecks, and don't get lost in small meaningless performance gains. Optimizing creates complexity in general, so only sacrifice simplicity if it means a good performance gain. At a certain point, you'll find that there is a law of diminishing returns, where you'll have good enough start-up performance and good enough application performance, and it won't be worth adding more complexity.

Specific Ajax Performance Tips

Now that you know some general principles about profiling and about setting these things up, I want to share some specific things that tend to affect Ajax performance, especially start-up time.

One of them is very surprising. I have found on the Ajax applications that I have profiled and improved the performance of, that there is something that affects 50 or 60% of the start-up time. This bottleneck is fetching too many resources on page load. When you first load a page, the browser needs to fetch many different resources. It needs to grab a lot of Javascript files, image files, perhaps an XSLT file, such as for HyperScope, and the latency over the network will massively kill your application. This even affects loading resources from the browser cache, because the browser generally needs to ask the webserver if something has changed, getting lots of 304 Not Modifieds that still have high latency.

So, one of your big strategies should be to merge as many resources as you can into one file, and have the client grab just this file. Going back to HyperScope, I found this held true. I had a tremendous amount of Javascript that was application-oriented. I had third-party libraries such as Dojo and Sarissa. I had an XSLT file that rendered everything. I had Dojo Widgets and custom Dojo Widgets that had their own template files, and so on. What I did was I created a build system that would merge all of these things together into one file, inlining them together.

You want to make sure that you've got the ability to compile all of your resources together in an optimized state, into one big Javascript file, for example. But you also want to ensure that you've got some flag where you can load up all of your resources dynamically without being merged together, because a build cycle can take a very long time and one of the strengths of Ajax and Javascript is your fast dev cycle, you can work much faster. That's very important to keep in mind.

Just to give you an example, in HyperScope's build file, I am doing a whole range of tasks. I'm taking all of my HyperScope Javascript and taking all of the Dojo files that I need and merging them together into one Javascript called all.js. Then, I take a third-party library, Sarissa, and I "Dojoified" them so that they can be merged into all.js as well. I also inlined my XSLT file into a Javascript variable, so again, it's not a separate file. I inline all of my Dojo Widgets, so that all of its code is there. I also use a regular expression in my Ant build file which changes all of my paths so that this will work during production time as well as during debug time.

All of this together, massively, massively helps load time. I have found that this bottleneck affects other apps as well; it's the best place to start. At Rojo, I found very similar characteristics and created a similar system. So, this is something that you'll want to do.

Other things you can do is to use Dojo's Javascript compressor; create a custom Dojo profile; and make sure to turn on GZip compression on your web server.

Finally, if you are generating dynamic HTML from XSLT or being pulled from a remote web server, and then inserting this into your document in one giant chunk, keep in mind that the size of this HTML will greatly affect the performance of putting it into the document. Cleaning up this HTML and making it much smaller will affect performance. In HyperScope, I used to write out inline event handlers in the HTML I generated from my client-side XSLT, such as onmouseover, onmouseout, etc., on each row. Moving these out and just creating a single event handler on the HTML's container that catches these events majorly improved the performance because the size of the HTML was reduced overall.

Perceived Performance

Once you've hit the wall of actual performance improvements, but things are still too slow, you will need to hit perceived performance. There's lots of strategies here, but I'll use one from HyperScope again to show you. In HyperScope, if you're working with a large document, it can take a while to apply all of Engelbart's addressing options and viewspecs. In the past, I would wait for everything to be finished against one of these documents before blitting the results to the screen. At a certain point, I was running out of ways to make the actual performance better, so what I did was I started pushing the results to the screen as I got them for each row, before they were finished, using a JavaScript setInterval that would run every few milliseconds. The perceived performance is that the speed is vastly faster when in fact the total amount of time is roughly similar, but it makes a dramatic perceived difference. This is for two reasons; first, the browser doesn't 'lock up' as all the HTML is inlined into the document, due to the window.setInterval; second, the user sees what they need at the top of the screen earlier.

Knowing When to Stop

All of this work can pay off. For the HyperScope project, I was able to create an actual performance improvement of about 70%. In addition, perceived performance improved, as the user saw a document much quicker even though parts of the document were still being blitted to the screen in the background.

Knowing when to stop is the important, final point. With HyperScope, there were more bottlenecks that I could have attacked. However, we only have a few weeks of development left, and there is still more important functionality and QA that needs to be addressed. Once you've brought performance to a level that is solid, you should evaluate how this new performance fits into your timeline, other features, etc. Once you've hit your other important milestones you can return to profiling if it is needed and you have time.

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]