Telephone: (215) 716-7373

Home » NetAnalysis Myth Series » The Myths of Network Latency

The Myths of Network Latency

Bookmark and Share

This article is also available as a Podcast on "The ROOT Cause" podcast series available on iTunes.

There is a great deal of confusion surrounding the concept of Latency. This is not surprising as it is really many different concepts discussed as if they were one. Latency impacts on all areas of the enterprise including, networks, servers, disk systems, applications, databases and browsers. This article describes the different areas in which Latency occurs and how to differentiate between them. Such differentiation will improve the accuracy of all testing and troubleshooting, whether manual or automated.

The importance of measuring latency is becoming increasingly apparent to the IT industry. We see new products coming to market that claim to be able to monitor latency in various forms. Maybe they can—and maybe they only kinda can. With all the variables and distributed components that are involved in modern enterprise networks—it is far too easy to combine completely different issues into one metric. This drastically reduces the value of the metrics or worse—sends you off on a wild goose chase. Tools are only tools and as in any other situation, they are only as good as the professional using them. Their output needs to be analyzed with an eye on the big picture. Their implementation needs to be well thought out and spot on correct.

Many methods for measuring and calculating these metrics exist and are topics that will be covered in future articles. Here we focus only on breaking out the different areas and types of latency that affect performance.

NETWORK LATENCY: Everyone loves to blame the network, especially with regard to latency. Bad LAN or WAN design can cause all sorts of issues. However, at the time of this writing in 2008, those issues tend to be more of a "Go-No-Go" type of problem. Network designs will block or allow communication, true—-but they seldom slow it down anymore (although there are exceptions). If it is too slow, it is usually distance that is the cause. But don't blame a 300 millisecond Ping Time between Europe and Asia on a bad WAN. Distance matters. "You canut change the laws of physics Captain."

The first step is to break out the various areas that can be a part of Network Latency.

Round Trip Response Time (RTT): AKA Network Latency is determined by TCP at the beginning of the connection setup (the Three Way Handshake). Since there is minimal overhead at this point, the time this takes should represent the true transport time. However, there are common designs that will change this.

Proxy Servers: Do you have a Proxy Server or WAN Optimizer near your client? If so, the RTT for that TCP connection is not end-to-end to the Server Side, it is only end-to-end to that Proxy or WAN Optimizer. Sure, you can account for this in your calculations—but only if you have RTT from that Proxy to the Server side. It is very doable, but requires planning.

Multi-Tier Design: What about the Network Latency between Tier One and Tier Two—-or Tier Three? If they are all in the same data center there may be no significant latency—assuming the design is correct in that data center. (Of course, if you are troubleshooting, that isn't a wise assumption.) However, you will see Tiers in different locations. In such cases, this aspect of latency is important.

SERVER LATENCY: Memory, Disk System, CPU, Design and usage all impact significantly on how quickly the servers themselves can process requests—or make them. This metric must be separated from the other latency metrics to properly diagnose bottlenecks. In my article series titled "Baselining-StressTesting-PerformanceTesting-OhMy!" I introduced the topic of BESTlining (as opposed to Baselining). In a nutshell, it is a way to measure aspects of Server Latency by removing as much Network Infrastructure from the picture as possible.

APPLICATION LATENCY: Application Latency is different than Server Latency--but often bundled into Server metrics due to difficulties in drilling down deep enough to separate them. To better understand the difference, picture Websphere JVM's performance compared to the Operating System of the physical server on which it is hosted. Just looking at response time doesn't properly separate those issues. If you are just getting performance metrics, this may not matter. However, if you are troubleshooting a problem—-it matters a great deal. Application Latency is more frequently the source of problems than many organizations realize. 

Another factor that can cloud this metric is Database Latency. Database Latency should not be lumped in with Application Latency. Database optimization is critical, but if an application is sending out sloppy calls, it will still slow down the works.

Protocol usage is another area to explore in measuring or troubleshooting Application Latency. I will cover this in more detail in an upcoming article on Network Utilization. While it is often considered a network utilization issue, to a large degree, it is the Application that controls how the protocols are used. For example, when an application uses many TCP connections and small packet sizes, it is usually a result of how the code was written. It may have been a non-issue when the components were near to each other but over a 70 millisecond or higher WAN link, it can bring an application to its knees. To make matters worse, WAN Optimizers are far less successful in resolving this particular type of problem.

DATABASE LATENCY: is a frequent cause of trouble. Fragmentation, inadequate Indexing and many other database design factors can slow down response time. Again, this is often lumped into either Server or Application Latency, but should not be. It is a separate variable.

BROWSER and WORKSTATION LATENCY: If you are receiving many rows of tabular data, or many images, or large Java Applets, or anything else of this type, you are tasking your personal workstation and its browser. This is frequently the main culprit and should be looked at quickly when different locations, or different users within the same location, experience trouble that others do not. Additional factors include:

  • Spyware running on the PC (particularly with laptops that travel).
  • Disk Fragmentation
  • Older workstations
  • Browser settings
  • Operating system versions and patch levels

SUMMARY: Latency is not monolithic, although it is often treated that way. Time invested in accurately measuring these various aspects of latency will save you hours, days, weeks or even months of work. Last but not least, please remember that ALL of this CAN be measured and brought together into an accurate picture. It isn't hard—it just requires the correct set of skills and some open-source software like WireShark.

Related Topics:

Back to main topic: NetAnalysis Myth Series
The Myth of Network Utilization
The Myth of Automated Metrics
The Myths of Network Utilization & Automated Metrics--Combined
Plastic Lock Security
How IT Vendors Direct IT Best Practices

Subscribe to receive the latest news & specials on our products:

Quick Find
Use keywords to find the product you are looking for.
Advanced Search

New Articles
All Topics
 Application & NetAnalysis
 Case Studies .
 NetAnalysis Myth Series
 Team Building
Articles RSS Feed
Shipping & Returns
Privacy Notice
Conditions of Use
Contact Us
Catalog Feed

Copyright © 1999-2013 Barry Koplowitz