16 May 2013

Simulating Real World Latency during Automation

So why  not use Jmeter to run performance tests?

The main problem with Jmeter is that it doesn't give a good assessment of front end performance.  It can send load and give back response times in data retrieval, URI end point response times, page load, etc.

But... it's harder to see how long it takes a AJAX menu to pop up after login, or for a "one page site" that loads in all the data asynchronously, to determine the performance hit per browser of loading dynamic HTML 5 elements.  

For this reason, I came up with a process of simulating and reporting the latency during browser automation tests.

I approached this by:
  •     Recording the time between a submit/save/delete and the next rendered screen or alert
  •     Setting up a Latency Generator
  •     Configuring the Automation Framework with Jenkins to dynamically kick off latency, launch the automation, capture the results to a report.

Recording Latency In The Tests

First I approached the idea of how to capture the time it takes to perform an action.  I had hoped there was a Cucumber gem to do this... Those I found really didn't help me much.  They handled a different set of problems. I wanted to know how long it would take from the click of a button, to the load of the next screen.

I realized I needed to write my own wrapper.

It's really very simple.  I define a class variable to be the current time, like:
@start = Time.now

This is run right before the action... for example:
@start = Time.now
@browser.div(:id=>"submit").click

Then I add in a "wait until" in Watir (Wait For in Groovy/geb) like this:
Watir::Wait.until { @browser.alert.exists? }

In the above example it's waiting for an alert window to appear.  If your next screen had a field or button instead, you'd do the wait until, for that element to be loaded.

Just after the Wait Until, I add the end of the timer:
@end = Time.now - @start

with a output to the display:
p "It took #{end} seconds to load XYZ screen."

While it may not be perfect down to the ms, it is quite useful in judging latencies and changes with different bandwidth connections.

I can see that I get a general time of 0.83 seconds on some submit action.  Then change the bandwidth, and rerun and see I am now at 2 seconds on average.

I have since modified this to now output the performance results to a CSV file.  More details on that in a later post.

Setting Up Netem

Second, I added in some throttling of bandwidth and making use of latency and packet loss.  This is handled via a linux tool called Netem.  Netem allows me to generate latency, packet loss and bandwidth throttling.  For example, if you wanted to discover how long it took to go from clicking "submit" to the rendered "dashboard" on a DSL line Or from a visitor from Europe.  Or how long it would take from clicking save, to getting the save confirmation for a user in Asia.

Netem requires the use of Linux. So I choose to set up a special Linux VM that would run Netem.... and set up a Proxy on that VM, so that the browser could connect to it for bandwidth simulation.

Setting Up the VM

I had to set up a Linux VM to be a proxy. I used Squid to be the proxy server.  Squid defaults to using port 3128.  You can follow many tutorials online on setting up squid (like http://www.cyberciti.biz/tips/howto-rhel-centos-fedora-squid-installation-configuration.html)

Then I modified iptables (i.e. sudo vi /etc/sysconfig/iptables) and added a INPUT for port 3128.

Afterwards I restarted iptables.

To test it, I configured the browser's proxy manually to my Proxy and port.  Then ramped up the latency to crazy amounts on the VM and verified that the browser performance degraded.

Netem Commands

I took various target markets for the company I work with (Asia, Europe and US) and gathered some latency reports from the IT dept.

I also took some target customer bandwidth profiles (10Mbit, 768k, 128k, etc.)

Last, I grabed some concept of what we might see for packet loss in the real world.

Here's some examples:

US traffic simulation:

   tc qdisc add dev eth0 root netem delay 80ms 10ms

Asia traffic simulation:

   tc qdisc add dev eth0 root netem delay 160ms 70ms

Bandwidth throttling:

    tc qdisc add dev eth1 root handle 1:0 tbf rate 200kbit buffer 1600 limit 3000
    tc qdisc add dev eth1 root handle 1: cbq avpkt 1000 bandwidth 10Mbit

    Simulate 768k down and 128k up:

    tc qdisc replace dev eth0 root handle 1:0 tbf rate 768kbit burst 2048 latency 100ms
    tc qdisc replace dev eth1 root handle 2:0 tbf rate 128kbit burst 2048 latency 100ms

Kill Netem:

To end any Netem protocols running, I use:
tc qdisc del dev eth0 root

Automating Netem as part of Cucumber Automation

First, I made a change in the env.rb file within the features folder.  In that file, in the begin block, I added the highlighted part:

def environment
  (ENV['ENVI'] ||= 'proxy').downcase.to_sym
end

Before do  |scenario|
  p "Starting #{scenario}"
  if environment == :int
    @browser = Watir::Browser.new(:remote, :url=>"http://[my qa selenium grid server]:4444/wd/hub", :desired_capabilities=> browser_name)
    @browser.goto "http://[my integration test env]:8080"
  elsif environment == :local
    @browser = Watir::Browser.new browser_name
    @browser.goto "http://[my integration test env]:8080"
  elsif environment == :proxy
    profile = Selenium::WebDriver::Firefox::Profile.new
    proxy = Selenium::WebDriver::Proxy.new(:http => "[my centos VM running netem and squid goes here]:3128")
    profile.proxy = proxy
    driver = Selenium::WebDriver.for :firefox, :profile => profile
    @browser = Watir::Browser.new(driver)
    @browser.goto "http://[my integration test env]:8080"

  else
    @browser = Watir::Browser.new browser_name
    @browser.goto "http://[alternate test env]:8080"
  end
end

So now, when I send the command: Cucumber features/performance_test_Asia.feature ENVI=proxy
it will kick off cucumber, launch Firefox - which will use the proxy we set up on the Linux VM.  that Linux VM, if using Netem to trigger expected latency from Asia - will simulate real world response times.

Configuring Jenkins

Now that the automation works, we want to turn Netem latency on before a test, and off after a test.  The bet way to do this, that I've found is to use Jenkins.

Jenkins configuration on the Netem VM

In my case I have that Linux VM with the proxy and netem. I put Jenkins on it.  I created jobs pertaining to netem.  Jobs like:
1. Start Netem with Latency for Asia (70ms-140ms)
2. Start Netem with Latency for US-Michigan (10ms-80ms)
3. Start Netem with Latency for Europe (70ms - 100ms)
4. Start Netem with 0.3 percent packet loss
5. Stop Netem

Each job is just running a shell script on that linux VM.  I.e. a stand alone job to simply run
sudo tc qdisc add dev eth0 root netem delay 100ms 70ms
for example.

The stop Netem job, simply runs:
 sudo tc qdisc del dev eth0 root


In my case, I needed to use the ssh plugin for Jenkins. This allows me to sudo commands. So although i'm not ssh'ing anywhere, the SSH plugin allows me to authenticate on the same box with the account that has sudo privledges.  I have more on setting that up in a separate blog.

At any rate, here's what you do next.... you go to a job, like "Start Netem with Latency for Asia..." in the job details, right click the build link and copy out the URL.  Paste that build URL for each job.

These URL's will look something like: 
http://[Your Jenkins]:8080/view/[Your Project]/job/Stop%20Netem/build?delay=0sec

Jenkins configuration on the Automation Environment

Step 1

Back at the main automation suite (hopefully you have jenkins running jobs there - if not, you'll need to set Jenkins up), create a job.  This will be a free standing job. All it will do is call a script to hit those URL's you copied.

How do you do that? Well you could curl it, if your automation jenkins is on a linux env.  Or you could do wget... or... you could script it in ruby/python/groovy... or in my case, I use Jmeter.  I simply have a Jmeter script that hits that URL.

So to recap:
In the automation environment, I have a Jenkins job that is a parent job. It starts Netem, by calling a Jmeter script to hit the appropriate Netem URL on the Linux VM.  As long as this environment can talk to the linux proxy server, this will work fine.

Step 2

Next I create another job in the automation Jenkins. This job will execute the Cucumber script.  It is a simple stand alone job that runs the command line:
cucumber ENVI=proxy features/my_asia_performance.feature

Step 3

After creating that Cucumber job, I link it as a child to the job we made in step 1.  In this way, Step 1, sends a command to turn on Netem on the Linux VM.  Then that job is configured (in the configure of the job) to launch a project when it's finished. That project will be the automation job we made in step 2.

Step 4

After setting the cucumber automation as a child project of the Netem job in step 1, we will now make the last job.  This will again be a stand alone job, it will simply stop Netem.  It will be a child of the Cucumber job/project.

So create a new Jenkins job, and have it curl/wget or use a script to hit the URL to the linux Jenkins that will launch the Netem kill job.

Step 5.

Edit the Cucumber job, so that after it finishes, it will launch a project - this being the Netem stop/kill job you created in Step 4.

All done.

To run it, simply run the first job in step 1.

That job will turn Netem on, setting the appropriate latency - then when that job finishes it will automatically launch the browser automation job (which is recording the response times) - then after that job finishes, it will launch the job to turn off the Netem latency.

Pipping Out the Output to CSV

Most people don't want to read log data to find response times.  I got a request to keep this data I was gathering in a common flat file.  So I decided to go with CSV.

 What I did was use the native CSV class in Ruby to handle this:
  CSV.open("C:\\#{locale}_{$TestTime}.csv", "ab") do |csv|
    csv << ["Save Action","#{@CF_save_end}", Time.now]
  end

I'll go into more detail on this in my next blog post.