Zensai is dedicated to upholding a reliable and secure environment for our customers.
This article describes the operational procedures we have incorporated to ensure Learn365 customers and partners can rely on a safe, stable, and well performing platform at all times in all places.
In this article
- Site Reliability Engineering culture
- 24/7 operation and incident management
- Handling of customer inquiries
Site Reliability Engineering culture
With a goal to create scalable and reliable software systems, Zensai has fully adopted the Site Reliability Engineering (SRE) culture and structure in our Support and SRE team to ensure platform stability, reliability, and security at all levels.
Role of the Support and SRE team
Incorporating aspects of software engineering to the area of operations and infrastructure, a key principle behind SRE is to task highly technically skilled and experienced people with the job of ensuring operational reliability.
Our SRE engineers are specialists with a broad skillset that understands the whole stack of applications, the underlying infrastructure, and the database structure. This creates a team structure and healthy dynamic where support engineers work together with SRE engineers to answer customer enquiries, solve issues, analyze problems, discover trends, and ensure reliability by preventing issues from escalating.
As technical skills are a priority for our approach to operation, all levels of support engineers are highly trained within Microsoft’s technology and complete Secure Code and SecOps training courses on a yearly basis .
Monitoring
With a mission to keep systems running and performing optimally at all times, we constantly keep an eye on sites and services, across all the eight regional datacenters Learn365 is available in, to ensure they are operating within healthy metrics.
Our constant tracking and analyzation of platform numbers means that, most times, incidents will be discovered even before they manifest. Also, the continuous capturing of data provides us with insights into how to improve the platform for optimal security, stability, and reliability. This knowledge is employed by the Support and SRE team itself whom is also tasked with managing and developing these improvements.
We react to incidents on all levels. If a customer environment shows unhealthy numbers, we ensure that the customer is reached out to so we can find a solution. If an issue touches broader, we will address it immediately and communicate it via the Learn365 Health Status Page.
24/7 operation and incident management
To ensure timely detection and response to potential security incidents, our Security Operations Center (SOC) provides 24/7 monitoring of data activity of the Learn365 platform worldwide.
The SOC houses an incident response team that continuously monitors, overlooks, and analyzes the platform for abnormal activity to ensure that potential cybersecurity incidents are correctly identified, analyzed, defended against, investigated, and reported on.
Handling of customer inquiries
Due to our constant monitoring and proactive engagement with customers, most issues are handled at an early stage and most times before the customer has even noticed them.
When customers experience issues with Learn365 functionality or have general questions, we do, however, have a wealth of helpful resources and a team of helpful support engineers ready to assist.
Complete product documentation, answers to frequently asked questions, and our Answer Bot is available from our customer Help Center to aid with detailed product information at all hours every day of the year. If customers experience broken functionality or general issues, our support engineers are ready to assist with 24/5 support. Ensuring all customer solutions operate best possibly at all times.