Running IT Infrastructure

What is IT Infrastructure?

This question can be approached from a number of angles, and whilst it is useful to have a standard definition, I think it is best directly addressed from the ground in any particular organisation. The definition may be shaped by segmentation of responsibility, function, and could be influenced by the degree of teamwork, trust and comfort across technology disciplines.

An important pillar supporting good engineering and infrastructure management strategy is well expressed as “3Ms” i.e. “Monitor, Measure, Manage”.

This approach can be used to enable better issue reporting, which in turn can be fed into capacity planning, staff resourcing and allows us to define quality metrics and build a strategy for infrastructure improvement.

Monitoring (& Alerting)

Applying a coherent approach to monitoring of availability (response time,load,latency), and defining your alert conditions is essential.

Protocols such as SNMP, WMI and even a simple ICMP ping can serve well in the process of watching over your platforms, these protocols are also widely supported by open source monitoring software.

Centralised collection of logging messages to enable analysis and reporting is a good addition to any DevOps or Sysadmin toolbox. For centralising log analysis, possibilities include rsyslog, logstash, fluentd and flume. These and other tools continue to evolve but are a starting point in baking your own logging facility

It is very useful to make an assessment of the volume of log messages heading into any central platform, handling large volumes of log message will certainly put IO and processing demands on your monitoring server. Using an approach which batches or caches such that access to the log store can be asynchronous could be useful. The size and complexity of your requirements will steer the necessary architectural choices.

From a high level we should expect to monitor in the following broad classes:

Internally Hosted services

  • Power and Physical environment for key systems
  • Production, staging, and development environments
  • Servers (OS + DB)
  • Systems (Application)
  • Tools (CI, Build, QA/Test)

External and Cloud based services

  • Production, staging, and development environments
  • Servers (OS + DB)
  • Systems (Application)
  • Tools (CI, Build, QA/Test)

Data Network (Internet links and internal)

Links, interfaces, memory and load on:

  • Routers
  • Switches
  • Firewalls
  • VPN Access Devices

Voice Network

  • Routers
  • PSTN Bearers
  • SIP Trunks
  • Voice Gateways
  • Call Traffic Volumes

Measuring – Issue Reporting, Quality Metrics and Capacity Planning

Analysis of incident reports, investigation into QA/test failures and performance issues should provide extra insight into areas which may need new or renewed infrastructure focus.

We are especially interested in patterns and clusters of reports which could suggest underlying problems not apparent from a monitoring perspective. Such a scenario could occur when the duration of an outage is smaller than the monitor probe interval, typical in the case of a network being impacted by a microburst, with a monitoring system sampling every one minute or more.

Quality metrics can be defined which allow us to measure our platforms against SLA, or our own internal targets, without the “3Ms” it will be extremely hard to measure progress of improvements, or to credibly state nn.n% availability is reached.

Capacity planning as part of infrastructure management and deployment is improved as a result of understanding system and network use because we gain insight from a logging, alerting and monitoring perspective which areas of infrastructure are under unplanned load.

Managing – Typical Components in Infrastructure

Below is a sample list of components frequently seen in the Infrastructure definition, whilst dependent on the size of both organisation and installed technology these can all fall to one team to manage, or be handled in a more traditional fashion, i.e. Network, Server, Desktop Team.

New approaches to improving management of infrastructure technology has lead to practices such as DevOps, Configuration Management and automation, these new approaches are effectively targeting increasing the efficiency of managing the components.

Data Networks

  • Switches
  • Routers
  • Firewalls
  • Internet Access
  • LAN, WAN, MAN

Network Services

  • DNS Services
  • Proxies
  • Security (IDS, Threat Management,Anti Spam)

Voice Network

  • VoIP
  • PSTN Interfaces
  • SIP Trunks
  • Call routing

Hosted Environments

  • Internal
  • External

Server (and..Desktop) OS

Arguably the desktop parts should not be seen as infrastructure, but can be lumped in with the other infrastructure in some organisations.

  • Build Standards
  • Patching and upgrades
  • Security
  • Build Automation

Application & Development

  • Dev and QA/Test environments
  • Build, Integration and QA/Test
  • Support of Tools for processes
  • Bug trackers
  • Work scheduler
  • Ticket Trackers
  • Product Systems Architecture
  • DBs
  • Component services

Summary

There are differing views on what constitutes infrastructure, shaped by preference, expertise, organisational structure, politics and policy. However it’s usually best to apply a common sense approach, this guide provides one possible approach and definition.

Like most frameworks, adapt it to fit your requirements and take the bits which are applicable, but remember to think about what you drop or add and assess the possible outcome of doing so.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s