Storing Data in the Cloud: Avoiding Common Mistakes - The First Global Cybersecurity Observatory

Storing Data in the Cloud: Avoiding Common Mistakes

Author: Tzury Bar Yochay, Co-founder, Reblaze

Storing Data in the Cloud - In a recent survey by O’Reilly, 88 percent of respondents said their organizations use the cloud. Although usage of public cloud services varies from organization to organization, one of the most popular uses of the cloud is storage.

Data storage in the public cloud has been available since the mid-2000s. Today, you might expect that organizations would be proficient with its use; unfortunately, this is not always true.

Large-scale compromises of cloud-stored data are still reported frequently. Just in the last year or so, incidents have included Cultura Colectiva (which leaked 146 gigabytes of data about Facebook users), Chtrbox (which didn’t bother to password-protect an AWS database with 49 million records of Instagram users), the infamous Capital One breach (which exposed financial information for 100 million Americans and six million Canadians), and CyberNews’s discovery of a publicly accessible Google Cloud database (possibly from the US Census Bureau), with 800 gigabytes of personal user information and over 200 million detailed user records.

Such incidents are completely unnecessary - it is not that difficult to securely use cloud technologies. Nevertheless, many organizations are still falling short. In this article, we’ll look at some common mistakes in this area, and discuss best practices to use when storing data in the cloud. Along the way, we’ll briefly mention the relevant services offered by the top-tier platforms: Amazon Web Services, Google Cloud, and Microsoft Azure.

Three Considerations When Using Cloud Storage

When maintaining data in the cloud, there are three major tasks to consider: organizing resources, enforcing policies & permissions, and monitoring & reporting. Each is vitally important. Let’s discuss each one.

Organizing Resources

This task is foundational to the other two, but unfortunately it is often neglected. Many executives do not appreciate its importance.

An incorrect or incomplete organization of resources can jeopardize security in several ways. First, if data stores are not clearly structured, it becomes challenging to maintain a good overview of your resources - and you can’t control access to data if you aren’t certain about what data is being stored.

More importantly, a good structure will make it easy to apply appropriate levels of permissions to the various categories of data. On the other hand, if data is stored in sloppy or ad hoc collections, it’s a safe assumption that at least some of it is not being secured properly.

Best practices in data organization include a consistent naming convention, resource tagging (if available), and a logical hierarchy of storage locations (files, buckets, groups, etc.) As we’ll see in a moment, the latter is especially important, because it can help enforce effective access control and organizational policies.

All three of the major cloud service providers (CSPs) offer robust capabilities for organizing data storage, with slightly different relative strengths. AWS’ Simple Storage Service (S3) was the first major storage product offered to the public, and AWS offers resource groups and tagging to help organize and manage a large number of resources, including the ability to perform bulk actions. GCP offers Resource Manager, a dedicated product for hierarchically managing resources by project, folder, and organization. Azure’s strength is derived from Microsoft’s long history of networking products, and it can easily handle a mixed organization of physical data centers and Azure Cloud workloads, offering four built-in levels of management (management groups, subscriptions, resource groups, and resources) along with a tagging system.

All three CSPs also offer specialized products for storing secrets: special data such as API keys, database passwords, privileged account credentials, SSH keys, etc. Secrets have become a vital part of cloud service/microservice architectures, and they must be strictly protected, while at the same time being easily accessible to the services that need them. To accomplish this, AWS offers Secrets Manager and Security Token Service, Google includes Secret Manager, and Azure has Key Vault. All these products are capable of protecting secrets for cloud workloads.

Enforcing Policies and Permissions

Possibly the most common mistake in cloud security is to set incorrect storage access policies. Throughout the decade-plus history of the public cloud, most of the reported security incidents seemed to begin with data that was not properly secured. Even today, cloud users sometimes still leave themselves vulnerable, whether directly (as in the Chtrbox incident) or indirectly (as in the Capital One breach, where a WAF had overly-broad data permissions which were then exploited in an SSRF attack).

Fortunately, the major CSPs have noticed this, and they have made it easier to correctly and securely use their storage capabilities. The top-tier platforms all make it straightforward to set and enforce appropriate policies for resource access.

An important best practice is to follow the principle of least privilege: each user should only have access to the resources that are absolutely necessary. In practice, this means that all data storage should be private by default. Then on an individual basis, access should be enabled for the fewest users necessary.

Most CSPs facilitate this practice with some variant of policy/permission inheritance; a given resource will inherit the security policies of its ‘parent’ in the hierarchy. Therefore, you can easily set restrictions that apply broadly, while granting narrow permissions to specific users for specific resources. This is where it pays off to have thoughtfully organized the data, as discussed previously; a good storage structure can make it simple to set correct policies and permissions for every data resource.

The Big Three cloud platforms all offer capable products in this area. They each integrate their authentication services (AWS’ Identity & Access Management, Google’s Cloud Identity & Access Management, and Azure Active Directory) with their storage products, which makes it straightforward to allow and restrict access for individual users or groups of users. Furthermore, they all support RBAC (Role-Based Access Control), which is the best practice for controlling and managing user permissions: instead of individual users having direct permissions, they are assigned to roles which have permissions. By creating the right roles and groups for each of your users, you can easily limit the impact of bad actors and provide exactly what your users need.

Automated Monitoring and Reporting

By definition, the public cloud is internet-facing. Therefore, it is crucially important to monitor all resources that you have in cloud. Monitoring and reporting allow you to flag anomalous actions within the system, which can help you to prevent attacks in real time. For example, if an internal service attempts to communicate with another service when there’s no legitimate reason for it to do so, this should be investigated immediately.

The best practice here is to automate this process, so that everything runs continuously and automatically with no human action required. It’s not enough to have somebody checking the logs regularly; anomalies need to be sent in real time to your SIEM/SOC, or whatever destination will ensure that someone sees it immediately and can take action on it.

Again, all three top-tier CSPs capably support the needs of most organizations. AWS’ CloudTrail and GuardDuty, Google’s Cloud Logging and Cloud Monitoring, and Azure’s Log Analytics allow you to monitor the events and activities within your cloud. Each platform also offers some capabilities for infrastructure verification: for example, AWS Config can be used to regularly scan and test S3 buckets for unwanted public data exposure.

Conclusion

Most companies have adopted the cloud to some extent. But along with the cloud’s numerous benefits comes a challenge: securely using public-facing storage.

Fortunately, it is possible to use cloud storage correctly. Careful planning and forethought when organizing resource storage allows for properly setting and enforcing permissions & access policies, and setting up automated monitoring and reporting can facilitate the detection and interdiction of potential threats in real time. All three of the top-tier CSPs have mature product lines which support these activities.

If any of the information discussed above was not already familiar to you, then perhaps it’s time to audit your cloud storage usage and see where the vulnerabilities are, with the goal of closing the security holes ASAP. It’s far better to discover them on your own now, instead of reading about them later in the media!

Storing Data in the Cloud: Avoiding Common Mistakes