Knowledgebase Article Number
Published January, 9th 2020, at approximately 11 p.m. EST.
At Bastian Solutions, we continually strive to provide superb customer service at the highest standard, and not only reactive but proactive support. Transparency is a key component in helping achieve this objective, and with that in mind, we are providing a full root cause analysis for the disruptions in service on January 8th, 2019, described below with approximate time stamps.
11:40 PM EST: De'Andra Guthrie, Support Specialist I, received a call from Brendan, one of our Controls Engineers to bring Software up to speed on the issue and request we join a bridge. During this period, De'Andra reviewed logs and worked with our controls team attempting to identify the issue.
12:24 AM EST: Our Support Specialist I, began the escalation procedure to our Lead Support Specialist II, Krystal Nolan, to help investigate why imports were not processing. Restarted import services, then the app server. Restarted AOR and AOR PC.
12:47 AM EST: Support Specialist I transitioned responsibility to the Lead Support Specialist II. SGWS IT team believed there was an issue with the database as they saw a critical error. While they investigated the database issue our Lead Support Specialist II found an error in AOR. AOR looking for a wave and subwave that did not exist. Specifically, Wave 2 and Subwave 90.
1:20 AM EST: SGWS IT team confirmed the database was operating with no issues. Lead Support Specialist II confirmed there was no wave 2 in the common tables in the database.
1:34 AM EST: At this point, our Lead Support Specialist II began escalation to our Support Analyst I, Jason Whittle. Our Support Analyst I assisted the Specialist on additional troubleshooting steps. It was requested that we reach out to the Project Manager.
2:00 AM EST: Software Support Manager, Travis Coleman, was notified of the issue and reach out to our Lead Support Specialist II and Support Analyst I to get details on what error we were seeing and what we had done to investigate the issue. Our Software Support Manager communicated with our Unified Support Manager, David Strawser.
2:40 AM EST: Software Support Manager, Travis Coleman, joined the bridge and took over as primary in troubleshooting the issue. Our Software Support Manager reviewed internal documentation to determine where that wave/subwave could be found since the wave_header and sub_wave_header contained no waves. Confirmed Archive and ran the night before successfully.
3:10 AM EST: Joe from SGWS let us know that a D11 and D12 report for the active wave is showing wave 2. At this point our Software Support Manager, started using the report to determine where the value was being queried.
3:25 AM EST: While our Software Support Manager continued investigating, we had our Support Analyst I engage a development resource.
3:35 AM EST: IT from SGWS let us know that the D11 and D12 report for the active wave are showing wave 2. The developer joined to find the issue. Our Software Support Manager identified the current wave, and subwave was being determined when the application queried the automation_journal entries for the newest record with activity_type equal to 60058.
3:50 AM EST : Our Software Support Manager consulted with the Developer. It was determined that these should not exist and should have been archived when the Wave from the night before was archived. Bastian deleted these entries from the db and had them start the line. This reset the current wave/subwave to 0/1. Line was releasing, but started on subwave 2. Bastian attempted to reset the sequence back to subwave 1 and the line started.
4:03 AM: Line releasing, but now D12 Panda is going offline and not printing labels.
4:15 AM: Tried replacing printer, but the IP was not setup correctly. They installed the old printer back. Can ping it, but it drops connection intermittently. Confirmed it is not a controls or software issue. It would either be an internal hardware or network issue. Bastian worked with SGWS in an attempt to troubleshoot the printer problems.
5:20 AM: They mentioned route 78 missing cases. We found we imported all the files that were sent so the missing case files were never sent to import. Based on our internal investigation, it appeared all routes and orders that were sent in the file had successfully imported into Exacta.
7:00 AM: Stayed on the bridge to ensure no further issues even though the lanes have been releasing since 4:03 AM.
AOR was looking for the next wave/subwave greater than wave 2 and subwave 90. This was due automation journal entries that should have been archived the night before. Deleted the records, started the line and cases started dispensing.
Deleted approximately 60 records with activity type equal to 60058. We started the line and cases started dispensing.
Next Steps and Preventive Actions:
Our commitment to you:
Bastian Solutions understands the impact of the disruption that occurred and affected operations for your organization. It is our primary objective in providing our clients with superb customer service, and we assure you we're taking the required preventative measures to prevent reoccurrence.