Feedback on AWS and Azure

Misconfigurations in cloud environments are still a source of major incidents and will keep on reoccurring endlessly. With the news continuously providing new examples: leakage of 1 billion citizens’ data linked to a key leak, phishing campaign using a Kaspersky AWS key, misconfiguration of a NoSQL database, 3TB of sensitive airport data…

The objective of this article is to illustrate how to anticipate a scenario by implementing a Control Tower, or a tool for continuous supervision of the configuration of Cloud resources.

To begin with, a little theory about logs

Cloud logs can be divided into 3 categories:

System logs: They are generated by the OS and applications hosted in IaaS/CaaS mode. The stakes are not different from a classic on premise IS, but only the architecture of logs collection can be adapted.

Security infrastructure admin logs: Includes the logs of the security appliances, but also of the PaaS security services used by the customer and the logs of the network flows. For the appliances, there are no new changes here either, it is the same component already in use and well known. However, for security PaaS services and network logs, it is necessary to implement a specific integration and adapt the detection scenarios.
Cloud Infra API logs: During each API call to create, modify or delete a resource, the Cloud Service Provider will generate a log.

These logs are accessible in dedicated managed services such as AWS CloudTrail, AWS config or Azure activity log:

The time taken to make the logs available will depend on the SLA of the CSP, but they are generally available within 15 minutes after the operation has been carried out.

Exploiting these logs will enable you to move from a manual and static compliance to an automatic and continuous compliance:

What are the technical options for building a Control Tower?

There are three main options for a customer to implement a control tower:

Native (built-in)
Custom native
Cloud Security Posture Management (CSPM)

Native (built-in)

In the first case, the tools activated by the Cloud Service Provider are default, sometimes free of charge, using predefined alerts to assess the compliance of your environments and deliver using a security score.

For example, Trusted Advisor on AWS or Microsoft Defender for Cloud on Azure.

These native and non-customized solutions make it possible to initiate a control tower, but they are limited as they are a generic response to specific problems.

Custom native

Cloud providers provide many services that allow customers to build a compliance tool for their infrastructure. The CSP tools available are customised to create specific compliance alerts and custom dashboards/KPIs.

In this option, it is necessary to allocate 10-to-40-man days to the project, in order to implement the monitoring infrastructure, define the first alerts and build the dashboards.

The use of several tenants, organizations or Clouds will require a specific architecture to be defined as there is no turnkey solution.

CSPM : Cloud Security Posture Management

Wavestone sees a booming market within CSPM where, Marketsandmarkets estimates that the CSPM market will more than double between 2022 and 2027 from $4.2 billion to $8.6 billion.

CSPMs natively support numerous Cloud providers and provide their customers with numerous dashboards based on the major market repositories. Customers can also easily define their own standards, policies and alerts.

The deployment of this type of tool is very simple, within few days it can be accessible to the customer.

The recurring costs may however be significant: typically 3 – 5% of the Cloud bill in addition to the Cloud services to be activated (similar to the native and custom services option).

Detection speed will also be slightly slower as the CSPM SLA adds to the CSP log generation SLA, typically 20 minutes – 1 hour detection time.

What should my Control Tower monitor?

The major problem customers face when implementing a CSPM with proposed alert activation, is the generation of tens or even hundreds of thousands of high criticality alerts to process. Teams don’t know where to start and are often feel discouraged. Care must be taken not to overload the security teams!

For the implementation of a control tower on a production Cloud IS, we recommend deploying security controls in waves of 10 – 15 at a time. To do this, you need to prioritise the most important topics. Below is an example of prioritisation:

Unfortunately, every rule has its exceptions! Mainly linked to the existing Cloud, specific architectures or technical constraints, it is therefore essential to foresee this situation and the associated governance at the design stage:

Validation: by the local CISO and/or the global CISO
Expiration
Review: decentralised (locally or during annual global audits) or centralised (through continuous global monitoring)

Using tags for cloud resources is currently, the easiest way to do this, however, be aware that some resources may not be compatible such as IAM services.

No matter which model is chosen, the issues to be addressed remain mainly the same:

Ensuring the legitimate use and application of exceptions
Define specific indicators on exceptions for subjects at risk from Top Management
Set up regular exception monitoring campaigns
Alerting and dealing with when an exception expires

How to implement an effective remediation process?

The implementation of a control tower will generate numerous alerts, which will have to be corrected. The three options possible are listed below:

Deny

Why remediate when you can simply block non-compliant resources preventively?

With Azure Policy or AWS SCP, it is natively possible to block certain configurations and thus avoid generating new alerts.

For use cases that are not covered, it is possible to set up checks on deployment templates in the CI/CD chains (this nevertheless requires a high level of maturity).

Deploying a deny mechanism on existing environments is rarely implemented as the risk of generating dissatisfaction among development teams is too high:

Existing non-compliant resources can no longer be modified
It will generate an additional burden on the development teams because habits must be changed
…

Automatic remediation

Here, the aim is to correct deviant configurations directly and automatically but beware of side effects!

To do this, it is possible to use the cloud provider’s native services (Azure policy or AWS SSM Manager) or to develop functions for unsupported cases (AWS Lambda, Azure Function or Azure LogicApps).

Manual

Unfortunately, this is the most common solution, but also the most expensive in terms of human resources. Deviating configurations are remediated manually by the teams.

To guarantee the success of a manual remediation, it is necessary to have strong support from top management to ensure the adhesion and motivation of the teams.

The implementation of a Cloud OWSAP type dashboard highlighting the priorities of the moment is a good solution, allowing each person to take responsibility for their area. Each of the subjects mentioned opposite can have one or more indicators.

However, having the support of management is not sufficient, it is necessary to know the person responsible for the resource in order to ask them to make the changes. In a large international group this is not easy. Our recommendation is to appoint at least one security officer per account/subscription who should have detailed knowledge of the applications and the people responsible for the resources.

In parallel, it is necessary to implement an effective training and awareness programme. In order to minimise the number of alerts and avoid filling the bathtub faster than it empties, the development teams must be fully aware of the security requirements in the cloud.

To begin the remediation process, our advice is to start centrally with an ample sized team in charge of implementing the control tower, but also in charge of mobilising and training local relays, enabling local teams to monitor and manage compliance on their own.

Compliance alert or security alert?

Most companies consider that monitoring the compliance of their cloud resources is not a responsibility of the SOC teams. But the boundary is not so easy to define, especially given the number of security incidents in the cloud that stem from configuration errors: public exposure of a storage resource containing critical data, unconfigured MFA on an admin account, or RDP or SSH exposed on the internet.

Generating a security alert to the SOC will leverage existing processes and tools for 24/7 handling even if the SOC resources are not cloud experts.

And finally, this will be a good opportunity to bring Cloud security and SOC teams together to improve security supervision by adapting it to the reality of the cloud.

Compliance in the Cloud, a new Paradigm

Feedback on AWS and Azure

To begin with, a little theory about logs