Publicly exposing sensitive data via misconfigured object storages (“buckets”) has been an issue for years. Multiple articles highlight the dangers of misconfigured buckets. In 2022, Laminar Labs found that 21% of public facing AWS buckets contain sensitive data. The article goes into great detail why that is a problem and how it can be mitigated. However, their research focused exclusively on AWS. I wanted to see how prevalent exposed buckets are across the major four providers: AWS, Azure, Google and DigitalOcean.
I found 441.900 publicly accessible buckets in total, 312.000 (70,60 %) belonging to AWS, 50.400 (11,41 %) to Azure, 7.000 (1,58 %) to DigitalOcean and 72.500 (16,40 %) to Google. As scanning all of these would take an excessive amount of time and resources, I narrowed my scan down to 40.000 buckets in total, scanning buckets from each provider based on the previously mentioned percentages. This meant scanning 28.240 buckets from AWS, 4.564 from Azure, 632 from DigitalOcean and 6.560 from Google.
Coming up with a reliable way to determine whether or not a bucket has sensitive data in it can be tricky. Searching by keywords such as “Passport” and “Birth Certificate” brings results, but it’s impossible to generate a comprehensive list of keywords that could cover all kinds of possible sensitive data, especially as filenames are not always accurate (i.e. “45134.pdf” actually being a valid passport). Another way is simply by scanning for certain file extensions. Some file extensions, like “.bak”, are usually a good indicator that there may be sensitive data in the bucket (in this case, a generic backup). Other extensions, like “.pdf”, also work, but they are a less reliable indicator and potentially require manual analysis. When scanning file extensions, it’s important to exclude almost all image file extensions, as images are very common for a legitimate use case of public buckets: Simply hosting easily accessible images for a landing page. I’ve personally found buckets that saved valid passports as JPGs, so this will excluse certain buckets that do contain sensitive data, but are not covered by keywords or other file extensions.
I decided to combine both approaches and assign a confidence value based on whether a keyword was found or a “suspicious” file extension. If the confidence was less than 60 % that this bucket included sensitive data, I’d manually analyze the contents to determine whether or not sensitive data was actually present and prevent false positives. One such case was when a bucket was flagged by extension only, but simply hosted a lot of XML configuration files clearly meant for public consumption.
Of 40.000 scanned buckets, I found that a staggering 11.900, or roughly 29,75 %, of buckets included some form of sensitive data. Splitting it by provider, I found 8.127 of 28.240 (28,78 %) AWS buckets included sensitive data, 1.639 of 4.564 (35,91 %) Azure buckets, 280 of 632 (44,30 %) DigitalOcean buckets and 1.854 of 6.560 (28,26 %) Google buckets. As mentioned in Methodology, I can’t be sure that this list is entirely accurate, so keep in mind that +/-2,5 % of buckets may or may not include sensitive data after all.
Here’s a brief overview of sensitive data that I found in these buckets:
- Active passports and national ID cards
- Invoices from companies to customers, with the customers’ address clearly visible
- Residency permits from different countries
- Degrees from various schools and customers’ grades
- Certificates like those issued by Cambridge English
- Birth certificates
Why exposed buckets are an issue
A lot of different data can be exposed in public buckets, and it doesn’t always have to be end-user data like passports. I’ve personally seen internal emails providing details about upcoming promotional offers, active discount codes for online stores, invoices (including the associated contracts with additional terms) and other things. Having this data exposed could lead to a material impact for any business. Of course, exposing customer data such as ID cards, passports and residency information opens both the company and customers up to extortion attempts. One such case was that of Finnish psychotherapy center Vastaamo, where the hacker first tried to extort the company itself and later moved onto individual patients.
What to do when data is exposed
If you’re a European company doing business in Europe, or a company serving European customers, you will most certainly be covered by the GDPR. If customer data like passports was exposed in the bucket and you’ve become aware of it, you will generally have 72 hours to report the data breach to the relevant Data Protection Authority. In addition, you may be forced to notify individual customers if there is a “high risk” of adverse effects to individuals’ rights. Personally, I always recommend to consult with a lawyer specialized in GDPR, especially if you’re a company operating outside of Europe.
In addition, you’ll need to make sure that the data is no longer publicly accessible, that your applications still work, and that no data has been exfiltrated by a malicious actor. A cloud security consultant can help with these things if you don’t have the expertise in-house. If you’ve exposed sensitive data publicly and need help, I am available. You can schedule a meeting with me via Brevo where we can take a look at your situation.
Exposed buckets continue to be an issue even in the year 2023, when most buckets should be private by default. It’s important to be vigilant and to make sure that you or your company do not accidentally expose private data in otherwise public buckets. Double check your configurations, and do it frequently.Reply by email Back to top