Why our server went down on Monday, 12th December 2012

Rate this item
(1 Vote)

As you are probably aware by now, we were the subject of a distributed denial of service attack which commenced on the evening of Monday, 12th December 2012, targeted against the IP of our server. This article is hopefully an explanation of exactly what happened, a timeline of events, and what we are doing to prevent it happening again. We also hope it answers some questions you might have about the process we went through to recover the sites.

What happened?

We run Wormly Server Monitoring on all our servers, which immediately alerts us by email in the event of any outage - whether this is because the server is offline, an error has been made with a site configuration, etc. - with various escalation routines if the problem does not resolve within certain timeframes.

At 2015hrs on Monday, 12th December 2012 we received notification that sites on our main Managed Virtual Server at Rochen Hosting had been offline for 5 minutes - this happens occasionally if services are being restarted or other maintenance tasks are being carried out, so while we weren't overly worried by this, we logged in to check what was causing the error message.

At 2015hrs we submitted a ticket to Rochen, asking them to investigate the outage, as we couldn't see any apparent cause.

At 2054hrs we received notification from Rochen as follows:

"We are currently experiencing a network-related issue on the UK MVS104 hardware node. An on-site technician is already investigating as we believe a physical network port may have died. Your server itself is still online, but no public (Internet) connectivity is available. We will provide further information as it is available, thank you."

Assured that Rochen were investigating what appeared to be a problem with their hardware, we sent out a tweet, facebook message and email to advise clients that this was the reason our server was unavailable, and left Rochen to get on with fixing the problem - generally even the more serious issues are resolved in an hour or two.

At 0526hrs on Tuesday, 13th December 2011 we received notification from Rochen that our server's IP address was the target of a distributed denial of service attack which had caused serious degredation to their network and inconvenienced many hundreds of clients, and we were asked to identify any sites which might have been the cause.

As our clients are no doubt aware, we never host any websites which might be deemed to be controversial or the target for an attack of this scale.

At 0739hrs we responded to this effect, explaining that we were not aware of any site which would be the target of such an attack, and asking for further detailed information about the attack so that we could attempt to analyse the data to see both where the attack came from, and where it was aimed. Unfortunately the only person who could deal with this response (their Chief System Engineer) who was dealing with the investigation the previous evening was not able to respond to this ticket until 10.42am after we posted a subsequent ticket at 1010hrs explaining that we were still waiting for a response and our server remained offline.

At 1158hrs we were advised by the Chief System Engineer that the attack on our server was comprised of UDP packets which were targeted at our IP address, rather than one specific website we host. This meant that Rochen were not able to give us any information on the attack, and despite requests for packet logs we were not able to look any deeper into the attack itself.

At this point we were given an ultimatum by Rochen:

  1. You can look for a 3rd party DDoS mitigation provider to cover each site on your shared IP. If you go this route we can assign a new IP for your shared sites.
  2. We can assist in providing full CPanel backups of your accounts for a migration to a new provider. 

Unfortunately, option 1 would incur very significant expenses (an IP address for every site we host, and to the tune of £100 per month based on some quotes we obtained to run a ddos mitigation system - PER SITE) which is something that we could not afford to absorb, and would certainly not be willing to pass on to our clients.

Therefore, we were faced with little option but to pursue option 2 and move to another provider - urgently, as our sites had by this point in time been offline for over 15 hours.

At 1244hrs we placed a call with Heart Internet to commission a dedicated server which we planned to move all our accounts over to urgently. This process generally takes between 2-4 hours to complete, and by 1400hrs we had a bare-metal dedicated server available.

At 1244hrs we also requested full CPanel backups of all our hosting accounts at Rochen, which were commenced at 1338hrs (we were not able to access WHM at this point due to the measures put in place to stop the ddos attack affecting other clients).

Between 1400hrs and 1800hrs our staff were drafted in to assist with building the new server, restoring, configuring and testing the accounts and sites as we rolled them out in the new environment.

By 1510hrs the backups were available to download from Rochen, and by 1800hrs the new server was available to begin restoring sites, at which point we set about moving them from the backup location to the new dedicated server.

Between 1800hrs and 0200hrs (now Wednesday, 14th December 2011) our staff meticulously uploaded, restored and checked all the sites which we needed to restore, ensuring that configurations were correct and testing basic features.

By 0500hrs the DNS entries started to reflect the new IP addresses for the nameservers, a process which can take anywhere up to 48 hours (but in reality is generally no more than 4-6 hours).

Why did this happen?

In all honesty, we genuinely do not know, and we do not feel it helpful to 'point the finger' without the appropriate evidence. We cannot tell (due to the nature of the attack and the lack of logs from Rochen) whether the ddos attack was aimed at us a company (hopefully not!), at one of our clients (again, hopefully not!) or as part of a wider campaign against Rochen Hosting themselves.

It is unlikely we will ever know who was behind the attack or their motives, but we are continuing to monitor the traffic on our new dedicated server in an attempt to pick up any ongoing attempts.

In terms of why the ddos attack had such a severe effect not only on our own managed virtual server, but by the sounds of it, on a large part of Rochen's infrastructure including servers which host their reseller packages amongst others, is something that Rochen will need to answer - we haven't really had any explanation for this and nor have other clients.

The harsh reality of our digital world in which we live is that you can find how to take control of botnets and target them at an IP address on freely available websites, and even in books! Somebody with enough time, interest (and in some cases, money) could easily commandeer thousands of infected machines around the world and aim them at an IP address (all the more reason to move away from Microsoft and towards more secure operating systems, in our opinion!).

What are you doing to stop it happening again?

We previously had a Managed Virtual Server with Rochen, which means that while we have control over who is hosted on our server, we do not have any control over the main server configuration or any wider settings such as firewall rules.  This is why we were not able to intervene and stop the problem sooner but had to wait for Rochen to respond.

As we have the skills in-house through Ben Tasker, our IT Manager and Linux Specialist, and we are building our client base progressively, we took the decision to move to a dedicated server, which gives us the option to set the server up how we want it. It also means we can be more flexible with regards to alterning settings for our clients, and can apply tougher firewall rules.

We can't help but notice that our hosting packages on Rochen have been growing more unstable over the past few months, with frequent outages of lesser duration and poor responses to support tickets at time. We have also noticed a growing number of people voicing their own concerns via social media which is a worrying trend given that a strong, reliable hosting provider is critical to our business success.

Therefore we decided to evaluate our options and consider what other hosting providers could deliver a better service, while keeping control of our costs. We chose Heart Internet based on a few things - recommendations from other Joomla! professionals, clear and transparent green credentials, a good customer support reputation, and overwhelmingly, a high level of technical knowledge even from their sales staff, with the first sales contact we spoke to being able to talk about ddos mitigation and firewall rules, and giving us suggestions for how we could identify the problem and resolve it.

We are also addressing the issue of the time spent waiting for responses from the technical team at Rochen by moving to a company with a more robust support team - we will be monitoring this and will ensure that we raise any response times which are outside the SLA we expect.

As a further backup, we are in the process of creating a 'failover' server which will mirror our dedicated server, so that we have a fully functioning backup we can quickly switch to without the need to restore files, which takes a long time. We estimate this to be in place by the New Year at the latest.

Why did it take so long to resolve?

A large chunk of time was taken in identifying what the actual problem was and what needed to be done to resolve it - between 2015hrs on Monday, 12th December and 1158hrs on Tuesday, 13th December.

Once we were made aware of what the problem was, and the lack of any viable option for us to remain with Rochen, we acted swiftly to bring in staff and ensure that the migration to the new server was completed quickly and efficiently. Another large chunk of time was spent transferring the backups from Rochen to the new dedicated server, setting up the server, checking and testing all the sites, and ensuring that everything had transferred to the new server.

We had a couple of issues with DNS relating to our domain-specific nameservers, however these were resolved in the early hours of Wednesday, 14th December 2011

Here's an overview:

  • Problem identified & ticket submitted 2015hrs
  • Response from Rochen: 1054hrs
  • Notification of ddos on our IP: 0526hrs
  • Information on ddos attack & required action provided: 1158hrs
  • Backups requested from Rochen: 1244hrs
  • New server commissioned: 1300hrs
  • Backups commenced by Rochen: 1338hrs
  • Staff available to assist with migration: 1400hrs
  • Server ready for deployment: 1800hrs
  • Commence retrieval of backups: 1800hrs
  • Complete restoration of sites: 0200hrs
  • DNS update largely complete: 0500hrs

We think we worked quite efficiently during this process - and of course it is important to balance getting work done quickly with ensuring that the server is set up safely and it is secure. We will be applying several other security hardening modifications in the next few days which will improve this further. There is one bug which we have identified and are working on, which is causing timeout errors when uploading files in Joomla! - this is something which Ben will be investigating (after he has had a few hours of sleep!).

Were my site/s at risk?

A denial of service attack is less about compromising a website or service, and more about making it unavailable for people to use. So, in this case, no - your websites were not at risk. However, we cannot overlook the fact that this did cause our server to be unavailable for an extended period of time which puts your business at risk. This is why we are beefing up our disaster recovery plans, and raising our SLA requirements from our service providers to ensure that issues are dealt with quicker.

How did you keep your customers informed?

This is the first major incident we have had since we put in place a system of communication to inform our clients about issues which affect their websites or services which we provide.

We have implemented the following systems to communicate with our clients during emergencies:

  • Email newsletter for clients (in addition to our general newsletter)
  • Twitter account (@ViryaTech)
  • Facebook page (www.facebook.com/ViryaTechnologies)
  • Support portal ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it or email This e-mail address is being protected from spambots. You need JavaScript enabled to view it )

At the first sign of any outage which may adversely affect our clients' sites we send out a message on Twitter, Facebook and Email newsletter explaining what the problem is (this was sent at 2117hrs), and what is being done to either workaround or resolve it.

We then use these methods to stay in contact with our clients as the situation develops and eventually is resolved.

During this outage we sent a total of 8 messages on Facebook, 11 tweets, and two email newsletters (we realised that most of our clients use email on our servers, so this is not the most effective way of communicating during a server outage). We're keen not to spam you to death in what is already a stressful situation, so we hope that this was a balanced approach to keeping you up to date.

Of course, we are always keen to improve on our communication when problems arise, so if you have any suggestions for how we could improve this process (or indeed comments on whether you found it helpful) then please do let us know.

I hope that you have found this report helpful, open, and transparent and that it explains all the issues surrounding the past 48 hours.

We're just watching the dust settle and maybe catching a few zeds before getting back to business as usual.  Please accept our apologies for any work which has been delayed as a result of our response to this issue and any ongoing catch-up in the aftermath, we are working as hard as we can to get back to normal operations.

Ruth Cheesley

Ruth Cheesley

Ruth is the owner and Director of Virya Technologies, having founded the company in 2002 as Essex Virus Removals and later rebranded to Suffolk Computer Services. She is  primarily involved with managing the website design team and liaising with our clients from across the world.

Website: www.viryatechnologies.com E-mail: This e-mail address is being protected from spambots. You need JavaScript enabled to view it
More in this category: « Prev Next »

3 comments

  • Comment Link Ruth Cheesley Wednesday, 14 December 2011 20:20 posted by Ruth Cheesley

    Andy, I can't comment on dedicated servers at Rochen, as I only had a managed virtual server and their dedicated servers are far in excess of what we could afford without causing a serious hike in our prices.

    Louisa - thank you for your comments and again apologies for any inconvenience over the period where we were transitioning.

    Ruth

    This e-mail address is being protected from spambots. You need JavaScript enabled to view it
  • Comment Link Andy Wednesday, 14 December 2011 14:41 posted by Andy

    & I was thinking of changing my dedicated server with heart internet over to Rochen ?

    Doh!

    This e-mail address is being protected from spambots. You need JavaScript enabled to view it
  • Comment Link Louisa Fox Wednesday, 14 December 2011 13:50 posted by Louisa Fox

    Well done. I think you have coped effectively and as efficiently as possible given the lack of information and circumstances outside your control.

    This e-mail address is being protected from spambots. You need JavaScript enabled to view it

Add comment


Looking for our open source software?

viryasoftwarelogo

We release and support our open source software at Virya Software

Find us on

facebook    linkedin    twitter     youtube    vimeo    ViryaTechnologiesJoomlaResources    ViryaTechnologiesonTechnorati    rss

Virya Technologies Newsletter

Receive all the latest tips, news and reviews from Virya Technologies.

Come and meet us!

JUN
01

01.06.2012 07:30 - 09:30
Ipswich Connected Business Breakfast

JUN
01

01.06.2012 12:00 - 17:20
Ecademy BlackStar First Friday Working Lunch

JUN
14

14.06.2012 19:30 - 22:00
Joomla! User Group Suffolk Meeting

JUL
06

06.07.2012 07:30 - 09:30
Ipswich Connected Business Breakfast

JUL
06

06.07.2012 12:00 - 17:20
Ecademy BlackStar First Friday Working Lunch

The latest from Virya Technologies

Virya Technologies @yakmoose Are you still looking for #joomla developers? We specialise in Joomla - happy to help! ^RC
ABOUT 9 HOURS AGO
Virya Technologies @tobydecks Do you still need help with #joomla shortcodes? ^RC
ABOUT 9 HOURS AGO
twitter Follow Viryatech on Twitter