ACM - Network Outage - INC113

Incident Report for Alene Candles IT

Postmortem

Symptoms:

ACM Networked devices could not connect to the network because their IP addresses were not renewing.

Observations:

  • 2020-07-15:

    • Initial incident was created and troubleshooting began
    • IT discovered that DHCP leases were not renewing to local devices at ACM
    • IT reconfigured the local Silver-Peak appliance to reset the DHCP leases
    • Silver-Peak was restarted and devices began to come back online‌

Resolution:

IT reset the IP helper addresses in the ACM Silver-Peak controller that pointed devices to the DHCP server. Technical steps: https://alene.itglue.com/3221095/docs/5451199

Root Cause Analysis:

There was a flood of DHCP requests that likely came from a small network brown out of the ISP around 10:30PM EST the night before. Due to this flood of DHCP requests the controller that handles the routing was bottle-necked, not allowing any addresses to be released. IT had to go into the controller and reset all the IP helper addresses (and restart the controller) for the DHCP traffic to begin flowing again.

Preventative measures are being handled by an outside project that will reduce the DHCP traffic going to the Silver-Peak.

Posted Jul 15, 2020 - 11:14 EDT

Resolved

This incident has been resolved. Please see Postmortem for details on the incident.
Posted Jul 15, 2020 - 10:52 EDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jul 15, 2020 - 09:32 EDT

Identified

The issue has been identified and a fix is being implemented.
Posted Jul 15, 2020 - 09:13 EDT

Investigating

We are currently investigating this issue.
Posted Jul 15, 2020 - 08:40 EDT
This incident affected: ERP System (IFS Cloud Access) and Network Systems (ACM (8860) Plant).