A Clear Path Forward
Global leader in network and people connectivity uses AppDynamics to gain critical visibility into customer-facing platforms, avoid catastrophic service outages, and reduce MTTD by up to 50%
Improved MTTD by as much as 50%
Achieved service availability of 99.9%
Improved performance by 40%
Maintaining a delicate balance
For nearly 30 years, Cisco Systems has united people, businesses and machines around the world and beyond. Through its comprehensive portfolio of networking, security, computing and telephony solutions, the name Cisco is synonymous with connectivity.
Today, the company markets its products and services to tens of thousands of customers in more than 115 countries and employs a small army of technology and customer experience professionals to support it all. But even the world-class network of back-end collaboration, communication, and monitoring systems behind this extensive operational footprint experiences issues. And when it does, the global leader in connectivity is sent scrambling to figure out what happened, how to fix it, and how to prevent it from happening again.
“Supporting a global customer base that uses a wide range of products and services requires more than just a single tool or application. It involves an integrated ecosystem of customer-facing and internal tools that all have to work together seamlessly to help us identify product issues and resolve them before they become full-blown problems,” says Yatin Wadhavkar, Information Security and Application Monitoring Team Lead at Cisco.
“AppDynamics has been a game changer for us because it not only helps us keep track of whether a service is available, it also allows us to see if it’s performing as our internal or external customers expect.”Yatin WadhavkarLeader @Information Security
Cisco Systems is the worldwide leader in designing, manufacturing, and selling Internet Protocol-based networking and associated services. It provides a broad line of products for transporting data, voice, and video within buildings and across campuses.
Cisco’s Security Cloud Operations spans multiple product offerings and sits within its Security Business Group. Cisco Cloud Web Security (CWS) provides industry-leading security and control for the distributed enterprise. Users are protected everywhere, all the time, when using CWS through Cisco worldwide threat intelligence, advanced threat defense capabilities, and roaming user protection.
The team is responsible for managing the site’s performance and capacity, quickly finding the root cause of issues and fixing problems fast to ensure seamless service.
Worldwide support demands visibility on a global scale
Customer support at scale is easy to talk about but difficult to achieve in practice. Cisco’s Support Case Manager (SCM) platform is one of the company’s primary means of connecting both internal and external customers who need support with the teams capable of providing it. Because it links dozens of critical back-end applications and tools, if one suffers an unexpected outage or service degradation, the entire system suffers.
“SCM is a mission-critical service because it’s a direct entry point for our customers to open a new case, ask a question, and review open cases with a support specialist for any of the products or services they’ve purchased,” says Naveen Kumar Kodali, Senior Technical Program Manager, Business Intelligence and Analytics at Cisco. “If a customer can’t get through to tech support easily, it’s going to create problems.”
Keeping SCM operating at peak capacity and performance is easier said than done, however. The sheer volume of applications and services connected within it make it nearly impossible for a team of human engineers to manage every inch of its web of dependencies. In the past, the team had primarily maintained the systems indirectly, relying on alerts from logs or customer complaints along with notifications from the disparate solutions that monitored its various components.
However, operating monitoring tools created huge visibility gaps that risked one connected service taking the entire platform down and put the support organization in a position where it could only react to issues as they arose. Instead, the team needed a solution that would enable it to more proactively identify issues and take corrective actions to protect against them before they became full-blown problems.
“The AppDynamics professional services team was invaluable in helping us specifically define what the solution needed to look like and build a reasonable time line for getting it done.”Seshagirirao SurapaneniTechnical Program Manager @Cisco
“The most critical piece is that we’re also able to measure the minimum time it takes to detect if there is an issue, which will influence how we go about resolving an issue and how long that resolution will take.”Yatin WadhavkarLeader @Information Security
Responsiveness is an essential component of effective customer-facing support. But the extent of a team’s ability to respond to customer inquiries and help requests is often a function of how much visibility it has into the various systems that could impact the customer’s experience.
“SCM touches everything from our Salesforce.com instances and other business-critical applications to our customer-facing website and support portal,” explains Wadhavkar. “Despite how important it is to keep each component of this network up and running, we never had visibility into how any of the upstream and downstream connected systems were performing or impacting the customer experience.”
For the SCM support team, limited visibility into system and app dependencies resulted in a variety of unwanted, difficult-to-address issues. Over the course of a year, SCM experienced nearly 30 incidents or service interruptions that took as many as 60 hours to resolve, costing the Cisco Technical Assistance Center (TAC) valuable time and opportunities to address customer concerns and troubleshoot their issues.
“Every day we handle nearly 8,000 issues and support over 150,000 users worldwide,” says Kodali. “The impact of an outage or even downtime on the customer experience can’t be overstated. Any delays in response times make it impossible for our customers to operate normally, and that reflects badly on our whole organization.”
Eager for a way to avoid similar performance issues in the future, Cisco turned to AppDynamics for help. TAC leaders engaged with the AppDynamics professional services team to map SCM’s complicated dependencies tree and instrument and implement an end-to-end monitoring and optimization solution that would save time for the internal team and eliminate future frustrations for Cisco customers.
Specifically, the professional services team immediately identified key dependencies — those directly connected that would have the most immediate and severe impact on the service if they went down. With the help of more than 20 subject matter experts, the team implemented 15 core AppDynamics capabilities mapped directly to the most vulnerable parts of SCM.
“The AppDynamics professional services team was invaluable in helping us specifically define what the solution needed to look like and build a reasonable time line for getting it done,” says Seshagirirao Surapaneni, Technical Program Manager. “They guided us away from a catchall’ philosophy to a ‘catch critical’ one, which is essential for helping us get to the exact posture we need to have and eliminate noise from too many alerts from too many places. Though we probably could have handled the transition internally, having someone else there to do it so we could focus on more immediate needs was a lifesaver.”
After the initial conversation, TAC leaders agreed that no other solution provided the full-stack observability and automation and remediation capabilities AppDynamics is known for. With AppDynamics the SCM team can proactively monitor health and performance across virtually every layer of the platform — application, network, database, and user end points — from a single source.
Unlike before when the team was inundated with unnecessary noise and noncritical alerts from various disparate monitoring tools, AppDynamics consolidates data into a convenient, configurable dashboard that gives anyone on the team real-time visibility into server performance, Kubernetes clusters, and the statuses of MySQL, Oracle, and MongoDB databases without further instrumentation.
“AppDynamics has been a game changer for us because it not only helps us keep track of whether a service is available, it also allows us to see if it’s performing as our internal or external customers expect,” Wadhavkar states. “The most critical piece is that we’re also able to measure the minimum time it takes to detect if there is an issue, which will influence how we go about resolving an issue and how long that resolution will take.”
Each service covered by AppDynamics can be finely tuned with specific thresholds and performance benchmarks whose alerts are integrated with Webex, so the team never misses an important notification about a potential problem and can proactively take steps to prevent service degradation or outages. At the same time, TAC and SCM support team leaders can leverage AppDynamics’ log analytics capabilities to better understand potential points of failure, project how they’ll impact both the business and the user experience across key performance metrics, and take action in advance to prevent failures from happening.
“The level of transparency and visibility we gain with AppDynamics is a night-and-day difference from what we had in the past,” says Clement Joseph, Site Reliability Engineering Lead. “Having all that information in one place helps us get ahead of issues before they become business- and customer-impacting problems. Now if we receive an alert and think it's critical, we’re able to inform customers and our frontline agents so they're aware of it even before they start to experience any effects from it.”
Adopting AppDynamics has worked wonders for customer-centric Cisco, dramatically improving end-to-end visibility across its SCM environment and beyond and helping the company to enhance the customer experience for internal and external audiences alike.
Since deploying AppDynamics, the SCM team has seen its service availability jump to nearly 100% while simultaneously averting five documented major incidents to date. “Since we’ve started using it, we’ve achieved a 50% reduction in our mean time to discovery (MTTD) because of better data and analytics, averted dozens of potential incidents that could have further impacted our customers, and significantly expanded our monitoring capabilities to every layer of our tech stack,” Kodali says. “It allows us to be everything our customers expect the global leader in connectivity to be — connected, responsive, and always on their side.”
“Since we’ve started using AppDynamics, we’ve achieved a 50% reduction in our mean time to discovery because of better data and analytics, averted dozens of potential incidents that could have further impacted our customers, and significantly expanded our monitoring capabilities to every layer of our tech stack.”Naveen Kumar KodaliSenior Technical Program Manager @Business Intelligence and Analytics at Cisco