This morning we experienced a substantial disruption of service between 8:35 am - 9:10 am. The immediate cause was a lines update being run by one of our larger customers which normally should have been of no consequence. The issue was a recent temporary change made to a message queue which an engineer inadvertently allowed multiple thousands of individual processes to be instantiated or spawned against a core DB where the queue would normally have restricted the particular process to one (1). The impact of which the server ran out of resources and then subsequently failed. The same engineer that resolved this can’t recall why he failed to reenable the job to limit the process queue to 1.
We apologise for any inconvenience caused by this disruption to service.