At Bastian Solutions, we continually strive to provide superb customer service at the highest standard, and not only reactive but proactive support. Transparency is a key component in helping achieve this objective and with that in mind, we are providing a full root cause analysis for the disruptions in service on March 14th and 16th, 2020 described below.
Summary - batch picking issue:
Saturday, March 14
10:07 AM EST Adore Me called, stating they were having trouble doing a batch pick on port 8. Stating nothing was happening when they scanned it.
10:19 AM EST On-call technician (OCT) connected to the site, and then to port 8. I started pulling logs and began investigating.
10:35 AM EST OCT rebooted port 8 and then called site back
10:44 AM EST Email sent after call dropped, asking to verify if still having issues on port
10:58 AM EST Site replied back with a screenshot of the port, Touch showing blank on the pick information
11:11 AM EST - 11:38 AM EST OCT looking for documentation on the process, and did further investigation. Noticed around 11:38 AM that picks were occurring for this order within the last 10 minutes.
11:40 AM EST OCT attempted to call site back, left a VM stating that we'd seen pick confirm messages for that port, and to call back if there was still any trouble.
11:48 AM EST The site called back and confirmed that the issue was now resolved.
Reported a new issue on port 3, trying to batch pick tote 0187, the screen showing blank, and bin only coming halfway down and not presenting.
12:22 PM EST Received an update from Controls/Autostore team. The confirmed continuous pick is running fine, and the issue is isolated to batch picking. No errors in AS console while the robot was waiting halfway down. It was suggested that we restart services on that port, try to batch pick this tote from a different port, and/or try to batch pick a different tote from this port. They didn't believe it was an AutoStore issue.
OCT continuing to investigate.
1:24 PM EST The site tried 0187 on both port 3 and 8, with the same result. Tried to batch pick a different tote on port 3, and it did present, but screen not showing pick info.
1:41 PM EST The site sent an email stating they were unable to do "any" batch pick orders. Up until this point, it was thought to be just the one tote having issues, 0187.
~2:00 PM EST OCT informs escalation resources that assistance may be needed on this issue. Continues investigating.
2:41 PM EST OCT bounced the AutoStoreTaskReconciliation service.
2:52 PM EST - 4:45 PM EST Escalation communicated with OCT via text to get more information on the situation. OCT continued to pull logs and investigate. Escalation called OCT around 3:20 PM got home at 3:43P and continued conversation online. I found an error in the log that indicated the inventory container was not a part of the batch that AutoStore had. We wanted the site to try more examples to check if we would see that same error on those picks. I was told that the shift was over, and we could revisit this on Monday.
Monday, March 16
8:21 AM EST Adore Me called back, initially stating that they had work stations that were waiting for work. After connecting, it was found that AutoStore was in maintenance mode. Once that was changed, the stations started receiving work. Our T1 didn't have to do anything to get the work to start showing up. It was after that that the batch picking issue from Saturday was brought back up and that it was still occurring. Port 4 this time. T1 began to investigate.
9:21 AM - 10:09 AM EST T1 escalated the issue to a T2. T2 was brought up to speed and started assisting with the issue. T1 still on the phone with the site. T2 started consulting with Brad R. on the issue. Tote 0187 is still the one being used for batch picking. I went through logs from the station they were on. The site then informed T1 that they were just going to cancel the wave, and were going to end the call.
11:17 AM EST T2 continued to look at logs and work with Brad, it was recommended that we send this on to our Dev group, as there does seem to be an issue with getting work for that tote. All relevant logs consolidated and sending issues on to Dev.
Summary - waiting for work issue:
Monday, March 16
From email: - picking was showing "work Complete" this morning when we started and no one was able to pick in the morning while there were waves available and more than 4k tasks on the autostore console.
- Autostore has been super slow all day and we were not able to understand what happened. Some ports were being stopped with "No work available" while there is a lot of waves and tasks.
8:21 AM EST Adore Me called back, initially stating that they had work stations that were waiting for work. After connecting, it was found that AutoStore was in maintenance mode. Once that was changed, the stations started receiving work. Our T1 didn't have to do anything to get the work to start showing up.
Summary - AOR issues:
Monday, March 16
From email: - The AOR crashed and we had to restart it with the help of Bastian's support but without explanation
- the divert wasn't working, all packages were getting diverted to hospital after the AOR restarted.
10:05 AM EST Adore Me called and reported there was an AOR fault showing on the HMI.
10:22 AM EST T1 restarted the AOR application, confirmed the fault was now cleared and appeared to be working normally from there.
10:34 AM - 10:41 AM EST Adore Me called and stated that everything going over the scale that started with the letter D is coming back with the same negative weight every time. T1 connected up and was then told that it was now working, and they ended the call.
Root Cause:
The batch picking issue is being sent over to be reviewed by the Development Team, and once a cause is found, a promote will be developed and installed on the system to prevent this from occurring again.
For the waiting for work issue, the AS team determined that AS went into service mode after a robot failure, and was running again by 8:35 AM. This was causing the work stations to be waiting for work. No mention was made on any call, email, or submitted ticket about any AutoStore slowness issue, either on Saturday's calls or Monday's. Confirmed with AutoStore team that they also received no calls/tickets about an issue.
For the AOR issues, it looks like AOR locked up and crashed at 9:22 AM. The AOR logs stopped recording at 9:22:05 and resumed at 10:19:54. The event viewer showed the following error "System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.". This error happens rarely, as, from the event history, the last time this occurred was 6/3/19. This appears to just be an app crash due to system memory allocation. It only happened once in all of 2019. Due to the rarity of the error, I wouldn't recommend any further action at this time.
The only mention of any kind of divert issues we're aware of is the issue that was called in where the scale was returning negative values. That issue resolved itself almost as soon as it was called in, and before we could even connect up to the system. We feel that this was just AOR catching up to the volume and sorting out the diverts.
Resolution:
Batch picking issue is as of yet unresolved but is being sent on to be reviewed and corrected by our Dev team.
Taking AutoStore out of maintenance mode caused stations to be able to receive work. Nothing was done by Bastian to get the work to show.
AOR application restart resolved HMI fault and diverts.
Next Steps and Preventive Actions:
Bastian Dev resources will be engaged to correct the batch picking issues, and provide a promote to accomplish this.
Due to the rarity of AOR crashing, nothing is recommended at this time.
Our commitment to you:
Bastian Solutions understands the impact of the disruption that occurred and affected operations for your organization. It is our primary objective in providing our clients with superb customer service and we assure you we're taking the required preventative measures to prevent reoccurrence.
|
Comments
0 comments