At Bastian Solutions, we continually strive to provide superb customer service at the highest standard, and not only reactive but proactive support. Transparency is a key component in helping achieve this objective and with that in mind, we are providing a full root cause analysis for the disruptions in service on March 18th, 2020 described below.
Summary:
2:52 PM EST Phillip Buress, Support Specialist I, received a call from Kuehne + Nagel reported that users were unable to access continuous picking at workstations.
3:15 PM EST Once connected, Philip was able to recreate the issue getting the error on all modes for picking, putaway, and cycle count in Exacta. Based on errors in the logs, Phillip identified that the ExactaApplicationService or ExactaUserAuthorization service may have been the culprit.
3:30 PM EST Based on the errors indicating what was believed to be a communication issue. Bastian Solutions coordinated a reboot of the workstation and services on the application server. Error was:
15:01:15.078 [18] [scalesg] FATAL Bastian.Exacta.Eas.EasService - Error signing into application fde32111-16b3-4cc9-9b62-5197c300c984 for user scalesg with result code FAILURE and message NHibernate.TransactionException: Transaction not connected, or was disconnected
3:45 PM EST Per Bastian Escalation procedures this issue was escalated internally to William Rosenberg, Support Analyst I, and was brought in to assist with troubleshooting and finding a resolution for this issue.
4:00 PM EST While coordinating a reboot of the application server with those onsite the Bastian team was able to pinpoint the root cause after finding an error in the Resource Lock logs. Here is the error:
15:16:42.868 [17] (null) ERROR Bastian.Exacta.Business.Persistance.UnitOfWork - NHibernate.Exceptions.GenericADOException: could not execute batch command. [SQL: SQL not available] ---> System.Data.SqlClient.SqlException: The transaction log for database 'Exactadb' is full due to 'AVAILABILITY_REPLICA'. at System.Data.SqlClient.SqlConnection.OnError (SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
Based on the error above, it indicated that the transaction log was full.
4:15 PM EST Bastian Support team had located the log drives that were filled and we were attempting to shrink the database logs.
The log drives filled up on both MSSQLWHTN200S and MSSQLWHTN300S.
4:25 PM EST Bastian Support team pulled in a DBA to assist in getting the K&N team back online as quickly as possible.
4:50 PM EST Database logs have been cleared and the Bastian team has asked Chad at Kuehne + Nagel to confirm they were operational again.
5:00 PM EST Confirmed with the on-site team that everything was operational.
Root Cause:
The test database was set up with full recovery which records all transactions to a DB log file. The test database instance was added post-implementation. The transaction log is supposed to be cleared after each backup. Since the test database was set up later, it was not included in the backup and the transaction log was not being shrunk. As a result, the transaction log for the test DB filled up the log drive on the production database server.
Per the FSD, the test server provided by Kuehne + Nagel was supposed to host all the test exacta services, IIS and an instance of SQL Server, which would host the test database.
Resolution:
Moved test database to simple mode and shrunk logs so they would not fill the database server log drive. Moving the database to simple recovery mode will prevent future issues with the transactional logs.
Next Steps and Preventive Actions:
- Confirm SQL Server is set up on the Test Server per the Tech FSD.
- This was not configured, Bastian Solutions DB Analyst installed SQL Development version. This was completed as of 3/19/2020.
- Migrate the Test Database to the Test Server SQL instance.
- Completed as of 3/20/2020.
- Completion of Proactive Monitoring this week to report any drive space issues, below 15%.
- Completed as of 3/19/2020.
- Request an increase in available disk space on both MSSQLWHTN200S and MSSQLWHTN300S for the logs drive.
Our commitment to you:
Bastian Solutions understands the impact of the disruption that occurred and affected operations for your organization. It is our primary objective in providing our clients with superb customer service and we assure you we're taking the required preventative measures to prevent reoccurrence.
|
Comments
0 comments