Data center maintenance and operations audits – when, why and how should they be carried out?
Regular audits of data center operations and maintenance contracts are essential throughout their life cycle in order to check that procedures are in line with service quality objectives, to optimize them, to update them when contracts are renewed, as well as to reduce costs. When, why and how should they be carried out?
Why and under what circumstances should a maintenance audit be carried out?
As cornerstones of organizations’ information systems, data centers are assets that need to be maintained and improved over the years to ensure that they achieve expected levels of service continuity, security and energy efficiency – that is the whole point of the operations and maintenance actions undertaken.
Whether carried out internally or externally, these actions involve regular milestones. Many circumstances may therefore lead to a maintenance and operations audit being conducted on a data center, including a change of personnel at the site, the more or less frequent occurrence of performance or availability problems, facilities failing to meet changing needs, the desire to reduce operating costs or even a review – of what has worked well or less well – at the time of renewal of the operations and maintenance contract.
In all cases, the purpose of a maintenance and operations audit is to measure the difference between what is expected and what is actually done, in order to define the levers for potential improvements, as well as to have a concrete basis for defining or redefining priorities.
What does this audit involve?
The maintenance and operations audit is a very comprehensive procedure covering contractual, procedural and, of course, technical aspects. It involves:
- analyses (audit of the room, inventory of technical equipment and operating and maintenance procedures and protocols);
- assessment of the existing situation (audit of maintenance and of contracts signed with the various service providers, assessment and testing of their work, identification of security weaknesses, business continuity and recovery in the event of an incident);
- proposals for potential areas for improvement (recommendations and improvement programs, estimation of achievable gains: service quality, reduction of operating costs, etc.).
It is conducted in two phases, one theoretical and the other practical.
Phase 1: documentary study
To ensure a comprehensive overview of the facilities, procedures and interventions, a data center maintenance and operations audit begins with a study of the as-built records, including dimensioning, site operations, alignment with functional needs (Tier III, etc.). It then analyzes the operating documents: procedures implemented at the site (access, climbing, response cards, etc.), services (maintenance service request tickets), compliance with service level agreements (SLAs) for response and repair times, monitoring of energy consumption and the spare parts inventory, as well as significant events (staff changes, on-call periods, breakdowns, replacements, main corrective and preventive tasks carried out, quotations issued by service providers, etc.).
The purpose of this documentary study is to ensure that operations and maintenance are carried out according to industry standards and in accordance with what had been planned, over a given period. For instance, documents that are missing or have been inadequately completed may indicate a certain lack of discipline in the performance of dedicated maintenance and operations tasks, which implies digging deeper during the site visit.
Phase 2: technical site inspection
A technical inspection of the site is required, regardless of the result of the documentary study. The purpose of the inspection is to ensure that the documentation is consistent with the current state of the site (equipment in operation and work undertaken), to carry out a complete inventory of equipment (electrical supply systems, air-conditioning, inverters, etc.) and its condition (elements not loaded, dirty filters, etc.).
In general, this technical inspection should also be an opportunity to understand the addition of equipment over time (why, how, to meet what needs, what maintenance is involved?), to assess the supervision tools and alarms in place, and to concretely assess the corrective and preventive maintenance services provided (variance compared with the contents of maintenance service request tickets).
In conclusion, the information gathered during the audit should give organizations a better understanding of the costs associated with their data center, and to make the right operations and maintenance decisions, in order to improve performance and extend the lifespan of their equipment as much as possible.