Customers reporting not being able to sign in
Incident Report for LLC

Follow up on outage Monday afternoon

First off, I'd like to apologize for our outage on Monday May 1, at approximately 18:10 EDT. We appreciate your business and know how important this service is to you and your patients.

While these outages are rare, the past week has been especially spotty given our track record over the past 5 years. Culminating in a larger issue Monday which I will go into more detail here.

What happened

At a high level there were three distinct issues.

  1. An automated update to a service we use that had a bug
  2. Errors while implementing the fix
  3. A slow process to update the website

For a while we were specific on what version of a piece of software we use. But with automatic updates our customers would get the most updated version, which results in quicker bug fixes and the most up-to-date features. The downside is if there is a bug customers can be affected without warning.

After identifying the issue, we ran into several issues while implementing the fix. We have an automated system that builds, tests, and updates the website. This is great as it automates a lot of manual work and can reduce errors from manual processes. The downside is when we made our change, the tests would not pass and our update could not continue.

The third problem, is that the process is slow. It can take anywhere from 10-20 minutes depending on the changes to go through this automated update process. And after a few broken tries to deploy our fix, the amount of time it took to get the fix live on was substantial.

Remediation steps

While we understand this incident was painful for you, our customers, we did learn a few valuable lessons that we will be implementing over the next month to help prevent this type of issue from happening again.

  1. We will be specific about which software version we use, which will allow us to better vet new releases of software we depend on.
  2. We will come up with a faster deployment process for when we need to make quick changes to fix errors.

Thank you again for being a customer and we look forward to making improvements to our system to help avoid and mitigate these types of issues in the future.

Posted over 1 year ago. May 03, 2018 - 13:57 EDT

Calls and error rates have returned to normal. Marking this as resolved. We will write a full post mortem tomorrow.
Posted over 1 year ago. May 01, 2018 - 20:40 EDT
Our update has been fixed and calls are returning to normal. Please refresh your browser or clear the cache to ensure you have the latest updates to the website.
Posted over 1 year ago. May 01, 2018 - 20:06 EDT
We believe we have a fix in place. We are making preparations to update the website now. We will update in 5-10 minutes.
Posted over 1 year ago. May 01, 2018 - 19:45 EDT
Please allow for a little extra time. One step is taking longer than usual
Posted over 1 year ago. May 01, 2018 - 18:55 EDT
Okay we have identified the issue and are making an update to the website now. This should take about 10 minutes.
Posted over 1 year ago. May 01, 2018 - 18:31 EDT
We are continuing to investigate this issue.
Posted over 1 year ago. May 01, 2018 - 18:10 EDT
We are looking into this issue now, we will update asap
Posted over 1 year ago. May 01, 2018 - 18:10 EDT
This incident affected: Website.