Scrubbing tokens from source code is not enough, as shown by the publishing of a Python Software Foundation access token with administrator privileges to a container image on Docker Hub. Credit: Gorodenkoff / Shutterstock A personal GitHub access token with administrative privileges to the official repositories for the Python programming language and the Python Package Index (PyPI) was exposed for over a year. The access token belonged to the Python Software Foundation’s director of infrastructure and was accidentally included in a compiled binary file that was published as part of a container image on Docker Hub. “Although we encounter many secrets that are leaked in the same manner, this case was exceptional because it is difficult to overestimate the potential consequences if it had fallen into the wrong hands — one could supposedly inject malicious code into PyPI packages (imagine replacing all Python packages with malicious ones), and even to the Python language itself,” researchers from security firm JFrog, who found and reported the token, wrote in a report. The incident shows that scrubbing access tokens from source code only, which some development tools do automatically, is not enough to prevent potential security breaches. Sensitive credentials can also be included in environment variables, configuration files and even binary artifacts as a result of automated build processes and developer mistakes. The Python token leak was the result of laziness Ee Durbin, the administrator of PyPI and director of infrastructure for the Python Software Foundation (PSF), wrote an incident report explaining how the leak happened. The leak involved the access token for Durbin’s own account, which had administrative privileges due to his role in the organization. In early 2023, Durbin was working on cabotage-app, a Docker-based tool developed by the PSF that is used to deploy PyPI and associated services on a Kubernetes cluster. While working on the build portion of the codebase, he kept running into API rate limits that GitHub enforces for anonymous access. In what he calls “an act of laziness,” Durbin decided to modify the source code locally to include an access token for his own account in order to bypass the default rate limits and finish the job faster. This was a quick fix, an alternative to configuring a localhost GitHub App to do the build instead of using the GitHub API. While Durbin knew that adding personal access tokens (PATs) to source code is bad security practice, the change was only to his local copy of the codebase and was never intended to be pushed remotely. In fact, the automated build and deployment script was supposed to revert local changes, which should have scrubbed the token. What Durbin didn’t realize was that the token was also included in .pyc (Python compiled bytecode) files generated as part of the build process, and that those files, stored in the __pycache__ folder, were not configured to be excluded from the final Docker image uploaded to Docker Hub. After being notified by JFrog in late June, the PyPI security team revoked the token and reviewed all GitHub audit logs and account activity for possible signs that the token might have been used maliciously. No evidence of malicious use was found. The cabotage-app version containing the token was published on Docker Hub on March 3, 2023, and was removed on June 21, 2024 — fifteen months later. “Cabotage is now entirely self-hosting, which means that builds of the cabotage-app no longer utilize a public registry and deployment builds are initiated from clean checkouts of source only,” Durbin wrote. “This mitigates the scenario of local edits making it into an image build outside of development environments, as well as removing the need to publish to public registries.” Durbin said he will avoid creating personal access tokens for his account in the future unless absolutely needed, because aside from this one case, he doesn’t remember any other instances where such a long-lived token has been helpful. “This is a great reminder to set aggressive expiration dates for API tokens (if you need them at all), treat .pyc files as if they were source code, and perform builds on automated systems from clean source only,” he advised. JFrog congratulated the PyPI security team for responding to their report and revoking the token within an impressive 17 minutes. While having perfect security is never possible, having a clear point of contact for security issues and a fast response time is critical to limiting the impact of security incidents for any organization. Advice for developers Aside from scanning binary artifacts and configuration files for potential secrets, developers should use the new fine-grained GitHub personal access tokens that were introduced two years ago instead of the classic ones. The new tokens enable users to choose the privilege levels and the specific repositories they provide access to. “Creating the ‘one ring to rule them all’ is always a bad idea,” the JFrog researchers wrote in their report. “We highly recommend using this feature, as we frequently encounter situations where a token providing ultimate access to the entire infrastructure gets leaked within a side project or temporary ‘hello-world’ application.” In addition, since 2021 GitHub tokens have a new format that includes a ghp_ prefix and a checksum, making it easier for automated tools to detect them. Old GitHub tokens, which haven’t been deprecated and are still around, are indistinguishable from SHA1 hashes, which are also common in source code and not a security risk, so could be skipped by scanners. Developers are strongly advised to switch to the new token format. SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe