Americas

Asia

Oceania

DigiCert validation bug sets up 83,267 SSL certs for revoking

News
31 Jul 20246 mins
Browser SecurityWeb Development

DigiCert’s revocation incident, which has sent website admins scrambling, highlights importance of thorough testing and serves as yet another reminder that process counts as much as code.

Security
Credit: Tapati Rinchumrus

Monday turned into a hectic day for some admins whose sites’ SSL/TLS certificates came from DigiCert. The company announced that it was revoking a small percentage of certificates that it discovered were lacking proper Domain Control Verification (DCV).

DCV is the process through which site ownership is verified before a certificate is issued. 

“The service provided by DigiCert confirms that a website represents the entity that it purports to represent by issuing a digital certificate necessary to encrypt — or ensure privacy of — website data,” explained Luke Connolly, threat analyst at Emsisoft, in an email. “In confirming the owner, there are obviously important verification steps necessary according to standards of trust and cryptography.”

Affected customers had 24 hours to replace the 83,267 impacted certificates, which did not impress Fred Chagnon, principal research director at Info-Tech Research Group.

“My mantra is always, ‘It’s not what you do; it’s what you do about it,’” he said. “We can focus on DigiCert’s regression testing and criticize them for a hole in their process that would have allowed a change like this to squeak through undetected. Though the change window has been imposed by the CABF, this is a very short time period to manage communication, change, and implementation.” 

What went wrong

One of the validation methods approved by the Certification Authority Browser Forum (CABF), whose guidelines provide best practices for securing internet transactions in browsers and other software, involves the customer adding a DNS CNAME record that includes a random value supplied by its certificate provider. The provider, in this case DigiCert, then does a DNS lookup and verifies that the random value is as provided, confirming that the customer controls the domain.

The CABF requires that, in one format of the DNS CNAME entry, the random value be prefixed with an underscore, and DigiCert discovered that, in some cases, that character was not included, rendering the validation non-compliant. By CABF rules, those certificates must be revoked within 24 hours, with no exceptions.

However, DigiCert said in an update to its status page Tuesday, and in an email to customers, “Unfortunately, some customers operating critical infrastructure are not in a position to have all their certificates reissued and deployed in time without critical service interruptions. To avoid disruption to critical services, we have engaged with browser representatives alongside these customers over the last several hours. Based on these discussions, we are now in a position to delay revocations under exceptional circumstances.”

Since then, DigiCert updated its status page to read, “DigiCert continues to actively engage with customers impacted by this incident and many of them have been able to replace their certificates. Some customers have applied for a delayed revocation due to exceptional circumstances and we are working with them on their individual situations. We are no longer accepting any applications for delayed revocation.”

Customers granted exceptions will have until August 3 at 19:30 UTC to replace their revoked certificates.

The root cause

In its root cause analysis of the issue, DigiCert highlighted what appeared to be some process failures during the creation of a modernized service-based system. In the old system, the underscore was automatically placed where it was required; however, as the new service-based architecture was constructed, that critical bit of code, plus the checks to ensure the underscore was where it should be, were omitted from one path.

Not only that, but, said DigiCert, “The omission of an automatic underscore prefix was not caught during the cross-functional team reviews that occurred before deployment of the updated system. While we had regression testing in place, those tests failed to alert us to the change in functionality because the regression tests were scoped to workflows and functionality instead of the content/structure of the random value.” And, it added, in another process failure, nobody compared the outputs from the new system with the outputs of the correctly functioning legacy system.

“Had we conducted those evaluations, we would have learned earlier that the system was not automatically adding the underscore prefix to the random value where needed,” DigiCert said. The problem was only discovered when a customer requested information about the random values.

“It’s a small problem caused by some code that apparently wasn’t properly maintained in a software update of DigiCert’s code,” Emsisoft’s Connolly noted, “and while it appears unlikely that a security vulnerability will result, by the nature of the business it needs to be addressed ASAP.  Unfortunately this results in a huge impact that leaves the businesses with affected certificates scrambling: It can take their websites offline until a new certificate is issued.”

“By all appearances it’s a failure in DigiCert’s QA process, and as such is similar to the recent massive outage caused by a CrowdStrike release,” he added. “It’s a reminder that behind every website and app there’s a team of software architects, developers, testers, managers, etc., any one of whom may make a mistake that results in massive headaches.”

What’s been done

DigiCert moved quickly to address the flaw, detailing things it’s doing to prevent its recurrence. First, it has consolidated and reviewed all random value generators, and it has simplified the user experience so that customers don’t need to know about the specific formats required. The correct ones are now automatically generated.

It has embedded compliance team members, who will review all applicable changes, into all Certificate Authority (CA) and Registration Authority (RA) sprint teams. Furthermore, it is in the process of increasing its test coverage to include compliance-based automated test cases and, later this year, will open source DCV so the community can review it.

However, said Info-Tech Research Group’s Chagnon, “DigiCert has effectively downloaded a ticking time bomb onto their customers due to a problem of their own creation. DigiCert may spin a story that states that they take the security of their customers very seriously and need to show that they are rectifying the issue as soon as possible — but those statements will not be well taken by their customers that incur production outages due to certificates that were revoked based on a supply-chain issue they were still reacting to.”

Connolly was less judgmental, noting, “There are humans doing the work to design, create, and maintain all of these websites and apps, and therefore human errors can be introduced to a product of any company.” 

“This is an excellent example of how the infrastructure that we tend to take for granted and use on a daily basis can have a significant impact when things go wrong,” he said.