Knowledgebase Article Number
June 24th, 2022, at 4:02 a.m. EST.
At Bastian Solutions, we continually strive to provide superb customer service at the highest standard, and not only reactive but proactive support. Transparency is a key component in helping achieve this objective and with that in mind, we are providing a full root cause analysis for the disruptions in service on June 24th, 2022 described below.
The ticket that was generated for this case has been documented as ticket number: 195724
Thursday, June 24, 2022
Initial call: 1:08am EST
Amber called into the support line because the site had become unable to do putaways or cycle counts on the Exacta Touch system. During the troubleshooting of the system it was deemed necessary to restart the Autostore services located on the Autostore controller PC.
An outbound call was made by De'Andra. This call confirmed with Amber that it was ok to restart the Autostore system. The AS Service Console on the application server was restarted. The Autostore services were restarted in the following order XHandler, XPlanner and ASDriver. Once the services were stopped we started the services back up in the following order ASdriver, XPlanner, XHandler. When attempting to restart the AS service console, the console crashed several times and would not allow a restart. Amber informed support that the robots were crashing. They engaged Controls directly.
04:02 AM EST
Mike Fontaine called the Bastian Support line. He claimed someone from Bastian Support caused issues with their AutoStore and that they had roughly 20 robots offline (service mode) and that they are now crashing. He mentioned that someone on-site was already talking to Bastian Support about an issue they were experiencing. He told them to call Bastian because He (Mike) does not know anything about Exacta. He says they (support) took control, and shortly after they had robots colliding. Our Controls support member, Brendan, that took the call mentioned he would reach out to AutoStore support and reach out to Software to see what is occurring. Call time: 2 minutes 32 seconds.
04:27 AM EST
Brendan called back Mike to give him an update stating he reached out to AutoStore Support and Alex Suarez was going to log in and would be reaching out shortly. From the time of the initial call till now, Brendan worked on getting caught up with what Software was troubleshooting to try and get a better understanding of the issue. He confirmed with Mike more details about the incident. Mike stated he could see in the overview where they shut down the AutoStore, and stated they, support or whoever shut down AutoStore using remote access, put robots into service. He stated his team was either cleaning or in the maintenance office/shop, thus none of them could have put these robots into service.
Mike does state during this call he can see what the person who remoted in did. They shut down the AutoStore, then he sees AutoStore backup completed on the overview. He said the 3 robots they do have off the grid are still shown as off the grid, but that there are 10 other random robots showing they are off the grid but are still out physically on the grid. He saw them restart the system remotely, then crashes occurred soon after. Brendan stated he is working on getting all parties involved working on supporting their system before this incident occurred together and that Alex Suarez will be calling shortly. Call time 4 minutes, 3 seconds.
04:33 AM EST
Alex Suarez was now online and trying to remote into J&J while also preparing to call Mike back. Alex had changed my password earlier that day for J&J, and was having technical difficulties logging in. Alex messaged Brendan saying he may need assistance if he needed to log into their system. The plan from there was to assist Mike over the phone to get his AutoStore back up and running without any further accidents.
Mike gave Alex a recap of the issues thus far. He did state as the system was restarting, they saw 10 robots go into service mode during restart.
First step to help triage this issue was to confirm which robots are actually off the grid vs on the grid. Mike went back up to the service platform to perform this work.
Mike said 5 robots are physically out at this time: 27, 66, 70 73, 98.
Mike said the following was in service but they already moved to Planner. Alex advised to put them into console mode, not planner: 8, 21, 22, 24...
As Mike was reviewing the list of robots that were in service but not off the grid, he noticed 21 was showing that it should be near the service door, but it is not physically there. He sent his team out to look for it and prevent another crash.
During this time Alex had Mike do a visual inspection to compare the closest robots to him match what the computer screen shows. This was to ensure the computer was roughly correct outside of the robots that are now in service mode. Mike agreed the robots he can see physically match their locations on the computer. Mike's team did a quick walkthrough on the grid to check and ensure everything looked in place.
Mike had to make another call so we disconnected and Alex stated he would call him back in roughly 5 minutes. Call time: 19 minutes, 23 seconds.
04:58 AM EST
Upon callback, Mike told Alex that 21 was not found on the grid because it has been off the grid for a while. It is not stored on the service grid, and was accidently put into planner since they had forgotten it was out of service.
We had his team close up the AutoStore service grid doors once they were safely off the grid and tried starting the AutoStore in service mode. Upon startup, AutoStore failed to start showing 2 robots had issues, robot 105 and robot 24 had illegal position errors. This was due to them being on chargers when they were set to service mode, and back to console. They had to go out onto the grid and bring those 2 robots out.
Once we were able to start the system, Mike moved the closest robots to him to ensure the system was operating properly, and those robots where in the correct locations. We moved robots manually for roughly 10 minutes.
Once Mike felt certain the remaining robots where in their correct locations we performed another test to validate their locations. We used the Test Track feature in Service Console. This feature is used during commissioning to drive robots around the grid slowly looking for any obstructions out on the grid. We told all robots to perform the test track procedure moving one cell at a time. This would ensure if any where out of place, they would bump into each other at a low rate of speed. During the test track procedure, we had robot 22 fault with error illegal track as it was on a charger sticking over the grid. This was also one of the robots that was switched to service mode and why we had this error. Someone from maintenance went to grab the robot and bring it into the service area. This was the only robot that faulted. No other crashes occurred, and the remaining robots began to work after we completed the test track procedure. AutoStore was running autonomously for production to continue at roughly 5:45 AM EST. Call time: 51 minutes, 36 seconds.
AutoStore log files where pulled from the AutoStore controller. Event viewer logs were pulled from the AutoStore controller as well as the AutoStore Application server. A case with AutoStore (case 37166) was created and those files with a description of the issues we saw where shared with AutoStore. Alex analysed the logs and saw that multiple robots were put into service mode at the exact same time which is not possible for someone to do with the restraints of the AutoStore Service Console software. Discussions where had between Alex and AutoStore Support engineers and their Development team until the root cause was discovered.
On June 24th, at the time the robots were all set to service, AS Console version 1.2.6-1 was running. This is an old version of console that does not operate properly with today's versions of AutoStore Softwares. As Software stated, the console crashed several times and would not allow a proper restart.
All Service PCs the Maintenance team utilized on the service platforms of the AutoSTore do have the up to date version of service console, version 1.3.4-2.
The old version of service console on the app server, version 1.2.6-1 was uninstalled from that location on 6/29/2022. There was no other instances of this version of Service console on any other devices from what we could see.
Next Steps and Preventive Actions:
Bastian Support will work to ensure the AutoStore system at J&J stays up to date with the latest versions of softwares and firmwares AutoStore releases. These updates are released roughly twice a year from the AutoStore team, once around March, and once around November. After testing and validation of these new versions are completed by other AutoStore systems, we will share this information to the J&J team in preparation of installing these new versions to their system. J&Js current AutoStore versions are currently under review, and proposed updates are being documented for their review/approval.
Our commitment to you:
Bastian Solutions understands the impact of the disruption that occurred and affected operations for your organization. It is our primary objective in providing our clients with superb customer service, and we assure you we're taking the required preventative measures to prevent reoccurrence.