Skip to main content

Introducing Geotab's server operations and on-call team

Last updated on May 26, 2022 in Productivity by Matt Broughall |  2 minute read

Geotab's Server Operations On-Call team responds to critical requests 24/7, keeping all servers and systems operational for customers.

Geotab has a large and sophisticated server infrastructure. There are thousands of servers in cloud hosted environments all working together. Telematics has gone from a useful service, in many cases, to a mission critical function for those companies using telematics. Keeping these servers and systems operational for the customers is critical. Geotab systems are very well tested and built to very high standards of reliability but there is some reliance on third-party components and sometimes engineers make mistakes that do impact customers.


Geotab’s engineers take every issue or failure extremely seriously. Geotab has recently established a team to further enhance the response to and resolution of such issues. The designated 24/7 Server Operations On-Call team, formerly a sub-team within Engineering Support, are always one alarm away from solving any critical problem. The team is expected not only to deal with the daily on-call requests, but to work towards preventing future issues from arising in the first place.

Who are the Server Operations On-Call engineers?

The Server Operations engineers are Geotab’s best troubleshooters. They are recruited internally and chosen for their ability to thrive under pressure, their deep understanding of Geotab’s systems and, most importantly, their excellent troubleshooting skills. They are required to make business-critical decisions independently on a regular basis. If a production system goes down, for whatever reason, team members will be woken up at any hour (night or day) to work on the issue until it is fully resolved or, at the least, a workaround has been found that restores service while the root cause is identified and corrected.

What is the troubleshooting process?

The Server Operations On-Call team is Geotab’s first line of defense, making sure the company meets its Service Level Agreements regarding up-time. In practice, this means responding 24/7 to any of the automated alarms that monitor Geotab’s 1000+ production servers, as well as responding to any critical issues raised by Resellers or Strategic Partners.


Once an “on-call” event is triggered, the Server Operations engineer will triage the issue and, if they are unable to resolve it on their own, will escalate to the relevant subject matter expert in departments such as Internal Development, MyGeotab Development, Security, IT, Development Operations, and more. All of Geotab’s technical teams have a 24/7 on-call rotation to monitor all service disruptions, encouraging fast resolution and minimising customer impact.


Using the vast amount of performance metrics available to them, the team has built many tools and dashboards to help assess the state of whichever machine or service triggered the on-call. Millions of queries are made daily to help make sure all systems are operational. In the event of a failure, each type of issue has a specific troubleshooting strategy. The team is working constantly to improve their monitoring tools and eliminate false positives.

What is the “War Room?”

The team’s escalation policy culminates in what is called a “War Room.” On rare occasions when on-call teams are unable to resolve, or see a clear path to resolving, a service degradation, or if there is widespread outage, the team initiates their “War Room” policy. All Software Development leads and the CEO are engaged, no matter the time of day. Depending on the hour, participants meet in either a boardroom or virtual room and stay until they come up with a solution, or at least a temporary workaround.


Regardless of what level of escalation is required, the Server Operations engineer that received the initial on-call will stay involved with the issue until it is fully resolved, both to orchestrate the efforts of the many teams involved and to promote accountability.

How does the team help improve future products?

In addition to responding to active on-calls, the team leverages their knowledge and experience in developing new iterations of Geotab products. Working closely both with customers and the Development team, they are in a unique position to assist ongoing efforts to build more robust and reliable solutions.


When the Server Operations team identifies some outage or failure of the service, they dig down to its root cause, understanding at the most fundamental level how it happened and how it affected the customer. The developers then make changes to the code to prevent it from happening again. The Server Operations team helps validate the changes, making sure the problem has indeed been fixed.


The Server Operations On-Call engineers form the backbone of Geotab’s commitment to reliability, providing customers, resellers and partners with dependable service now and in the future.


See also: The ups and downs of server monitoring

If you liked this post, let us know!


Geotab's blog posts are intended to provide information and encourage discussion on topics of interest to the telematics community at large. Geotab is not providing technical, professional or legal advice through these blog posts. While every effort has been made to ensure the information in this blog post is timely and accurate, errors and omissions may occur, and the information presented here may become out-of-date with the passage of time.

Subscribe to the Geotab Blog

Sign up for monthly news and tips from our award-winning fleet management blog. You can unsubscribe at any time.

Other posts you might like

Geofenced zones in MyGeotab

What is geofencing for vehicles?

Learn about geofencing and how to use it to your fleet’s advantage with zones and rules in MyGeotab.

June 13, 2024

illustration of ev charging and battery life

How long do electric car batteries last? What 6,300 electric vehicles tell us about EV battery life

Compare the average battery degradation for different vehicle makes and model years.

May 31, 2024

Animation of vehicles on intersecting roads going in different directions

What is telematics?

Learn about telematics and how it works.

May 22, 2024

Colleagues working closely

Effective Leadership in Driver Risk Management

Discover the significance of leadership commitment in managing driver risk, the importance of public disclosure, and the role of regular reporting in maintaining a safe and efficient fleet.

November 12, 2023

View last rendered: 06/18/2024 01:36:06