NTT Communications Case Study Interview

NTT Communications had been maintaining and operating customer systems using multiple monitoring tools. By introducing Zabbix, they realized significant cost reduction, as well as automation and streamlining of operations.

Objective

To automate and streamline maintenance operations by consolidating multiple monitoring solutions

Requirements

System must allow for assembly of redundant configurations

Lower cost of maintenance in order to retain monitoring managers

Approach

Consolidate three monitoring servers with Zabbix

Coordinate with automation tools to automate fault isolation and recovery

Outcome

Significantly improved availability with redundant configurations

Simultaneous reduction of cost and increased efficiency with integration of monitoring systems

Aiming for Operation Without Human Intervention

NTT Communications, a major telecommunications carrier, is a global company which provides ICT solutions to corporations worldwide. They have developed a variety of businesses around the world, including network services, data centers, cloud services, and security services.

Their wide range of services includes support for customer system integration, maintenance, and operation. For such support operations, reduction of cost via automation is in demand.

Mr. Satoshi Imai belongs to Group 2 of Operation Management, part of Solution Services, a department within the Business Solution Division. The group had the goal of operating with as little human intervention as possible, and were working on automation. As part of that effort, Mr. Imai turned his attention to Zabbix.

Using 3 Monitoring Tools at Once Led to Server Overwhelm

The department to which Mr. Imai belongs monitors the systems that NTT Communications has built for its customers, and is in charge of maintenance, recovery, and operation. Before the introduction of Zabbix, they used three different companies’ monitoring tools at the same time.

Mr. Imai recalls the state of affairs at the time: “Because we were using three tools, we had prepared two servers for each project. We felt that setting up a server for each project was a challenge, and since the version for each server was different, we weren’t able to standardize the settings.”

Another issue was configuration for a single server. “If even one server was down, I would be contacted, even in the middle of the night. Sometimes I even rushed over by taxi. And even though we had a large space for the many servers we had, heat was also a concern.” (Mr. Imai)

Even more pressing was the decision to relocate Imai’s department in 2021. “As we relocated the site, the servers would also have to move, but it would be challenging to stop monitoring and relocate. So, we decided to set up new servers at the new location, temporarily monitor with those at the same time, and then stop the original servers,” says Mr. Imai.

Even so, if they were to set up a server for each project, the number would be enormous. Thus, it was decided to introduce Zabbix to consolidate the servers.

Fortunately, NTT Communications has the ZABICOM team, which has been providing customers with Zabbix configuration, maintenance, and support for over ten years. Mr. Imai decided to build a redundant configuration of Zabbix with the help of the team.

Consolidating 12 Servers into 1

Mr. Imai speaks about what they required from the new monitoring system. “First of all, we couldn’t change the services we were providing to customers, so instead we focused on it having our existing monitoring capabilities. In addition to that, it was possible to have a redundant configuration. You could work out a way to get redundant configuration on any server, but the idea was to run the database on its own. Then, since only one process was running in Company N’s monitoring system, it made concurrent monitoring impossible, so we considered a monitoring manager that could improve that process. We also looked into whether it would be suitable for automating the operations we use internally.” Zabbix was what met all of Mr. Imai’s requirements.

Approval was easily granted because the consolidation of servers was expected to reduce costs significantly. “First, in migrating ten companies’ monitoring servers that I’m in charge of, twelve servers ended up consolidated into one. We were able to anticipate a large reduction in cost that would be more than enough if we migrated,” says Mr. Imai. “Now that we have integrated 30 companies’ worth into Zabbix, I think we’ve achieved a 20% reduction in costs compared to before.”

A Plan to Increase Database Speed

Mr. Imai
NTT Communications Corporation
Mr. Imai

During introduction, there were some difficulties, such as encountering unknown bugs that had occurred in storage, but the introduction of Zabbix itself went smoothly. Also, in order to increase the speed of the database, which is important for Zabbix, they purchased a DB server that could increase writing speed, and connected it to the database directly with a DAC cable. In addition to that, the redundant configuration for the disk was set to the fastest setting of RAID10.

The difference between Company N, which they had been using to this point, and Zabbix, which had just been introduced, was the timing of monitoring, which worried Mr. Imai. “With Company N’s tool, we monitored every five minutes, and if an abnormality occurred during that monitoring, we retried every minute. In other words, if there was an issue, the customer would be notified within six minutes at the latest, or in as little as one minute. Zabbix didn’t have this retry function, so I was concerned about how we would deal with that difference,” Mr. Imai reveals.

As a solution to this, Mr. Imai thought to ensure a six-minute window at the very least, which had been the longest time taken for notifications thus far. “With Zabbix, we decided to notify our customers that there would be two alerts, with monitoring at three-minute intervals. By doing this, we were able to absorb the difference in functions.” (Mr. Imai)

In addition, a common template for monitoring items was created in order to improve the efficiency of monitoring setup work. This sped up work and made it possible to standardize monitoring. “Since everything is connected to a template, new projects are automatically linked to the server. With Company N’s tool, we had to set up monitoring items one by one, but now all we need to do is apply the template. Recently, the number of devices and ports to be monitored has increased, and doing this manually would take a huge amount of time, so automation was essential,” says Mr. Imai.

Significant Reduction of Maintenance Outages by Using Redundant Configuration

In this manner, a monitoring system with redundant configuration using Zabbix was completed. Thanks to the increased availability, there were no longer any incidents requiring rushing to work at night, and operation was going smoothly.

“It’s contributing to automation of fault isolation and recovery, working with automation tools. Because of the redundant configuration, the number of maintenance outages has also been reduced greatly. We can also respond to vulnerabilities without stopping monitoring, so we do it on a once-per-week basis.” (Mr. Imai)

In addition, until now it had been necessary to check three different screens because they were using three tools concurrently, but now that has been reduced to a single screen. Not only the number of servers, but also the number of monitoring PCs could be reduced. Until now, there had been four to five computers per person, but now there is approximately one per person.

Mr. Imai points out the advantages of Zabbix as compared to other systems, such as the ability to monitor concurrently, the ease of redundant configuration, as well as the simplicity of API linking. “Since we can increase the number of processes, we can run monitoring targets concurrently in units of 100 or 200. Also, when the performance of the Zabbix server approaches its limit, setting up a Zabbix proxy makes it easy to scale out, as it’s possible to monitor hundreds to thousands of companies on a single screen,” explains Mr. Imai.

Mr. Imai says that the map and parent-child functions are also useful. “The map function shows the network configuration, and areas where anomalies occur are displayed in red, so you can see at a glance where they’re occurring within the network. Then with the parent-child function, because it prevents alerts due to a failure of a device in front on multiple devices ahead, it’s possible to reduce the number of notifications.”

Mr. Imai also appreciates that user privileges are separated into three categories. With Company N’s tool, user privileges were not differentiated, and all users had administrator privileges, so operators had sometimes deleted settings in the past. “With Zabbix, you can set three types of users: users who can only view, users who can add monitoring settings, and users who can control everything. This helps prevent operational errors,” says Mr. Imai.

In addition, Mr. Imai says that because graphs can be displayed by default, and images of the graphs can be quoted in reports all at once, “The work of issuing monthly reports to customers has become more efficient.”

Future Expansion of Introduction & Expectations for Further Functions

Although not supported by Zabbix 5.0 which is currently in use, Mr. Imai says that he would like to use the geomap function with a future update. The geomap function makes it possible to check at a glance which locations have alerts by connecting to Google Maps, making it easier to monitor stores within a chain. “Up until now, when a failure occurred, I would draw a diagram and explain it myself. We used to check every fault ourselves, but with this function, we could visualize them and respond more quickly,” says Mr. Imai, showing optimism for the update.

Mr. Imai had become a test subject, introducing Zabbix within the scope of the project he was in charge of, but now that introduction is expanding within the department.

“From now on, we plan to integrate everything with Zabbix, except for those cases that aren’t suitable due to security reasons. Currently, we’re using Zabbix to monitor about 30 companies, but we would like to expand that to about 100 in the future.” (Mr. Imai)

Operation with Zabbix, which has increased availability and made reports easier to read, has been well received by customers. Mr. Imai’s efforts, which began on a small scale, are now spreading to improve the efficiency of the entire department, also leading to improved customer satisfaction.

System Overview

Number of Zabbix Servers:2
Number of Zabbix Proxies:2
Redundancy: Yes. Active - Active
Number of sites:1,000
Number of monitored devices: 40,000
Number of triggers: 41,000
Number of items: 130,000
Number of users: 75
NVPS:426

NTT Communications Corporation

As a "DX Enabler™ " that contributes to the realization of customer digital transformation, NTT Communications strives to solve customer's management issues and create a smarter society through the use of ICT.

Through the integration of our global operations on July 2019, we expanded our service menu and expanded our coverage area.

Our Company will support its customers' global businesses with even more sophisticated systems and solutions.

Head office:
Tokyo, Japan
Founded:
1999
Employees:
9,000
(July,2022)
Capital:
230.9 billion JPY

Want to share your story of using Zabbix Monitoring Solution?

Fill out this Questionnaire or contact our Marketing Team for further assistance.

Get started in 10 minutes - absolutely FREE

Download Zabbix

Zabbix is a professionally developed open-source software with no limits or hidden costs