SaaS Performance Breaks: How Can Enterprises Protect Themselves?

eWEEK DATA POINTS: For many enterprises, every passing moment of a downed SaaS application translates to dollars lost and productivity loss. Here are some pointers on how to circumvent this from happening.

eweek.logo.DataPoints-UPDATE

Software-as-a-service (SaaS)-based applications are now the lifeblood of most organizations, but they’re certainly not foolproof. Breaks in performance (speed, availability, reachability) are occurring more frequently for popular applications. The most recent widely publicized examples include Microsoft 365, Salesforce and NetSuite.

These breaks are due, in part, to the ever-expanding internet infrastructure, which naturally increases the surface area for performance glitches. Combined with the rapid adoption of SaaS applications worldwide, this can strain even the most robust and reputable service providers.

SaaS apps are increasingly used to support employee productivity in areas such as customer relationship management (CRM), salesforce automation, collaboration, and supply chain and inventory management. Enterprise users have become so reliant on SaaS apps that when these slow down or become unavailable altogether, key departments—and in some cases, an organization’s entire revenue-generating engine—go idle.

SaaS apps provide many benefits, including convenience and speed of implementation, but it is incumbent upon enterprise users to proactively insulate themselves from inevitable performance hiccups.

This eWEEK Data Points article, with industry information provided by Catchpoint CEO Mehdi Daoudi, presents five keys for doing this. Catchpoint provides digital experience monitoring on a unified platform.

Data Point No. 1: Prioritize Your Most Critical End-User (Employee) Locations

Large global enterprises with employees spread across numerous regional offices need to know where their most significant concentrations of employees are and monitor as physically close to them as possible. There are many performance-impacting elements standing between the SaaS service source (cloud) and employees. A geography-specific problem anywhere in this chain (CDNs, DNS providers, regional ISPs, local ISPs, transit networks, etc.) can degrade the employee experience.

The more closely SaaS performance can be monitored from multiple locations, the better. Users should be wary of the notion that monitoring from only two or three geographies within a large region (like the U.S.) is sufficient. In addition to getting a more granular view into localized performance, more measurements from more places allows IT teams to make comparisons that ultimately boost performance for all employees. 

For example, performance should be relatively consistent for employees in roughly the same geographies (such as Boston and New York in the Northeast). For companies using regional CDNs to support SaaS application and data delivery, volatile performance ranges within the same general geography may signify that a contracted CDN is not optimally configured. These users then have an opportunity (and data-based proof) to highlight this to their CDN partners, who can make modifications to maximize performance more uniformly.

If an organization has employees scattered around the world, then it must ensure that all employees have a first-class experience. Employees expect the same level of performance they’d get at headquarters, regardless of their actual location.

Data Point No. 2: Cover a Diverse Range of Network Vantage Points

The rise in BYOD means it’s no longer sufficient to monitor performance only for traditional desktop use. You also have to monitor from cellular and WiFi networks; and if you don’t, you could have a major telemetry blind spot.

Wireless networks are notorious for producing blackouts. A recent example was the Ericsson expired SSL certificate, which caused widespread data outages for O2 and SoftBank mobile services in December.  

Cloud-based monitoring (measuring the round trip speed of packets traveling from the cloud to an online service and back again) has received a lot of attention recently, but relying solely on the cloud for monitoring is never a good idea, especially in the case of SaaS applications that are hosted on the same cloud providers.

That’s because tests run from the cloud to a cloud-located service enjoy some form of dedicated network connection as well as preferential data routing. Think of it like a VIP's cleared traffic route through a crowded city. This streamlined data path is far afield from that of an average employee, who receives his/her content after a long, circuitous route through ISPs, CDNs, wireless networks and various other pathways. In other words, if you are relying solely on cloud-based monitoring to measure the performance of a SaaS application, you are likely getting a skewed metric.

Data Point No. 3: Pay Close Attention to Micro-outages

Micro-outages are outages that are relatively short in duration (less than an hour) and/or only impact isolated user/employee segments. The ability to detect micro-outages improves in direct correlation to monitoring from a larger number of nodes and network vantage points. Having this ability is crucial, because micro-outages are often the first sign that something is going wrong, providing a warning signal that a more widespread outage may be imminent.

The most recent case-in-point was Facebook's 14-hour plus global outage on March 13. While this widespread outage began at around 12:06 p.m. ET on Wednesday, it was actually preceded by a micro-outage that lasted from 12:02 a.m. to approximately 12:39 a.m. ET earlier that day. 

It’s impossible to say the escalation from micro-outage to full-blown outage could have been avoided. But even if not, detection would have provided a forewarning symptom and given Facebook more time and a head start to act on the issue, proactively communicating to customers that it was aware and working on resolving it. Micro-outages in SaaS applications can give users a similarly valuable heads-up.

Additionally, micro-outages can be silent SLA killers. Increasing internet complexity means a green light on a SaaS service provider’s external dashboard is no longer a guarantee that employees are enjoying excellent performance. We know companies that have collected millions in SLA non-compliance fines from SaaS providers through more comprehensive monitoring capabilities.

Data Point No. 4: Reject Mean Time to Innocence (MTTI); Embrace Mean Time to Repair (MTTR)

For many enterprises, every passing moment of a downed SaaS application translates to dollars lost and productivity loss. During this time, many organizations fall prey to human nature by focusing on mean time to innocence, or MTTI. In this scenario, the parties involved—for example, the IT infrastructure team, the network team, the SaaS service provider and the IT administrative team—are more focused on proving their innocence and why their domain isn’t the source of the problem. Of course, what really matters is a culture focused on mean time to repair (MTTR)—swiftly identifying and repairing what is the problem.  

Any SaaS-based enterprise must marry performance monitoring with advanced diagnostics that show precisely where the problem lies—whether with the SaaS provider, the enterprise’s own data center or something in between. The ability to precisely pinpoint the source of an issue (versus weeding out all potential sources through a lengthy process of elimination) enables the appropriate personnel to quickly work on a fix.

If the source of the problem lies with the SaaS provider, the enterprise user can flag the problem while possessing the data to back it up. If the source lies within the enterprise, IT teams can immediately start resolving it. Even if the source lies somewhere out in the “internet wild”—a local ISP, a slow CDN, or something outside both the provider’s and enterprise’s respective zones of control—at least the user can proactively communicate to impacted employees.

Data Point No. 5: Explore Performance Optimization Opportunities Behind Your Own Firewall

SaaS-based enterprises often have more available techniques than they realize to optimize performance through slight adjustments made behind their own firewall. One example is SaaS configurations—the administrative tasks IT teams must handle to get the SaaS application up and running, including employee onboarding, creating group memberships, delegating access privileges and more. We have seen examples of SaaS performance improving tremendously based on how the service is configured for a particular employee location.

In addition to optimizing configurations, there are other data center-based techniques, including network overlay services to control and accelerate connectivity to SaaS applications, data reduction (identifying and eliminating repetitive transmission of duplicate data), traffic shaping (SaaS applications and data are classified to prioritize the most critical traffic) and more.

Data Point No. 6: Always Trust Yourself the Most

Most SaaS service providers should be commended for delivering overall excellent levels of performance, especially as their workloads increase at staggering rates. However, the lesson for organizations is always trust yourself the most, even when working with highly reputable, established SaaS providers. Having a bird’s eye view into your employees’ experiences, from close proximity and multiple network perspectives, combined with advanced diagnostics and proactive internal data center tweaks is the foundation for a SaaS monitoring strategy that protects users and keeps their employees productive.

If you have a suggestion for an eWEEK Data Points article, email cpreimesberger@eweek.com.