At Bastian Solutions, we continually strive to provide superb customer service at the highest standard, and not only reactive but proactive support. Transparency is a key component in helping achieve this objective and with that in mind, we are providing a full root cause analysis for the disruptions in service on March 18th, 2020 described below.
Summary:
11:24 PM EST Brendan, Controls Support Engineer III, received a call from Joseph. He reported D12 PandA was intermittently having issues that would either not print a label at all or print the wrong label.
Brendan engaged our Software Support Team, Krystal Nolan, Support Team Lead, and De'Andra Guthrie, Support Specialist I.
Working with Vic from SGWS, we attempted an AOR restart. At this point, D11 also quit printing labels.
11:54 PM EST Software Support escalated to Evan Sturgis, Support Analyst III. Software Support worked to confirm if they did a test print, it was received by the printer and successfully printed.
12:23 AM EST Brendan engaged Panther Support and had them join the bridge. Panther diagnosed this was a timing issue. Based on further troubleshooting, we did eliminate this as a possibility when the Zebra printer did not flash that it had data when new cases were inducted.
12:35 AM EST Brendan saw two FIFOs that had data in them and attempted to clear them. This data is used to track all the cases between the case dispenser exit and the PandA induction. We were able to get everything back in sequence, but this took approximately an hour of work.
1:24 AM EST The Software Support Manager was notified of the issue and continued receiving updates throughout the night, making sure we had all the appropriate resources involved.
2:32 PM EST Brendan was able to confirm that data was being passed through the following tags Print_W_data[11,1]/Print_W_data[11,2]/Print_W_data[12,1]/Print_W_data[12,2] to AOR. After a high number of reprints, we suddenly had four labels print successfully. The labels were for cases that were not at the PandA( due to the issue at 12:35 AM EST.
3:12 AM EST We continued testing but were still having no print issues. Brendan confirmed that all data was being passed as expected. Evan and Krystal were able to see us sending prints, but nothing was being received by the PandA
Evan escalated to our Software Development Manager for assistance.
3:21 AM EST Evan was able to identify an issue where some print jobs were taking too long. Evan and De'Andra began working with a Software Development Team Lead. Evan tried to print from Bartender directly but saw errors related to connecting to the Seagull License Server.
3:30 AM EST We had approximately ten successful prints from both printers before having more no prints.
3:45 AM EST Evan and Krystal saw that the AOR machine had run out of disc space due to the increased logging to capture and address the issues from earlier in the week. Deleted logs and restarted bartender services and the AOR workstation.
4:15 AM EST The reprint stations after the PandA's began printing in the wrong format.
5:53 AM EST Evan reached out to Travis Zirnheld, Project Manager Team Lead, to see if he had experienced any of these issues during the implementation. Reprints had generic data with the Bartender watermark due to the licensing issues.
7:00 AM EST We were unable to access the bartender database, physically, due to permission issues. Bastian would resume working with Joseph
11:15 AM EST Joseph called the support hotline the following morning to resume working on the issues. Brandon Smith, Support Specialist I, and Travis Coleman, Software Support Manager, assisted. Ultimately, we were able to access the bartender files with no permission issues. We continued with re-importing the files from the previous night and running tests. We did this on two different occasions.
Root Cause:
- Sequence issues releasing from case dispenser due to data being cleared.
- Bartender was unable to resolve the DNS for the Seagull license server.
- AOR had increased logging to capture and resolve issues from earlier in the week. Logging was left enabled for development to review the following day.
Resolution:
- Resolved license connection issue
- Deleted AOR logs and readjusted log level.
Next Steps and Preventive Actions:
- Introduction of evaluated support for all critical issues through 12/31/2020. Details to be sent in a separate email.
- Adjust the log files to the cap at fifteen max files per day at 100mb each.
- Chris Bratten and Linda Grady will work with SGWS on an alternative solution to print and manually apply labels.
- Suggesting we switch all license server references from DNS to IP.
- Implement proactive monitoring solution to send alerts on low disk space utilization.
Our commitment to you:
Bastian Solutions understands the impact of the disruption that occurred and affected operations for your organization. It is our primary objective in providing our clients with superb customer service and we assure you we're taking the required preventative measures to prevent re-occurrence.
|
Comments
0 comments