Automated CTI-powered Purple Teams

Purple Teaming has become a key practice for organizations looking to assess and improve their detection and response capabilities. By bringing together offensive and defensive teams, Purple Team exercises help validate security controls, identify detection gaps, and strengthen incident response processes.

However, traditional Purple Team exercises provide only a snapshot of an organization’s security posture at a given time. In rapidly evolving environments, where infrastructure, applications, security controls, and threats continuously evolve, assessment results can quickly become outdated. Consequently, organizations are left with a critical question: are the detections that worked yesterday still effective today? To answer that question, Purple Teaming must evolve from a periodic exercise into a continuous validation capability.

This article presents a modular workflow we developed to transform threat intelligence into automated adversary simulations. The workflow combines Caldera for attack orchestration, Mythic for realistic Command & Control simulation, and VECTR for measurable Security Operation Center (SOC) assessments: an automated workflow that only needs to be configured once and can be executed whenever needed.

Wavestone’s Purple Team vision and expertise

At Wavestone, we have been performing Purple Team exercises for several years to help our clients ensure that their detection methodologies are not only theoretically sound but truly functional in practice.

The primary objective of our Purple Team approach is to identify technical attack scenarios that go undetected by current security controls, and to identify tailored detection methods that close those gaps. Rather than simply simulating attacks, we systematically evaluate detection across three critical criteria: is the activity logged, has an alert been generated on those logs, and finally, has the alert been properly handled by the SOC team.

With the help of the Blue Team through regular meetings, this structured assessment allows us to identify quick wins (fast, high-impact improvements) and major projects that require deeper architectural changes or long-term investment, all of which are tailored for our client’s environment.

To do so, our Purple Team operations rely on multiple complementary approaches, each with its own strengths and limitations.

Unit Testing

Unit testing is the foundational approach focused on testing specific, isolated TTPs to validate the effectiveness of individual detection rules. By playing these attacks without context or environmental adaptation, security teams can verify that specific log sources, correlations, and alerts are correctly configured and generated as expected. While highly effective for validating individual controls, unit testing provides a restricted view of an organization’s global defensive posture: success against a single, isolated technique does not guarantee the ability to detect and respond to a complex, multi-stage attack chain.

In addition, unit testing introduces several important biases that can distort the realism of detection assessments. First, it requires collaboration with a Blue Team accomplice that provides the required assistance and prevents escalation from becoming too severe. This prevents the identification of some incident response gaps and greatly limits the secrecy of the operation.

Furthermore, once the SOC knows a Purple Team operation is underway, the incident response becomes biased, often for the worse. Since the surprise factor and the pressure of a real incident are absent, these tests do not accurately measure how the SOC would perform under the stress and ambiguity of a genuine, ongoing intrusion.

Trophy-Driven Engagements

Our second approach, trophy-driven engagements, allows us to assess detection through a more realistic scenario. These operations are designed for mature organizations, aiming to evaluate and elevate advanced detection processes and threat hunting capabilities rather than simply validating automated rules.

Similar to a Red Team operation, the offensive team executes a full-scale attack on the information system, not following a pre-defined test list but pursuing predefined trophies. An advantage of this approach is the ability to identify end-to-end scenarios.

Specifically, our trophy-driven engagements often follow the Red to Purple approach: while the Red Team has not been detected, the Blue Team is unaware of the operation which forces genuine and unscripted response. It provides a unique opportunity to evaluate the actual reactions of the security team, effectively bridging the gap between theoretical procedure and real incident response.

However, unlike the unit tests approach, these engagements are not exhaustive: they do not aim to map every specific unit rule on the environment, but rather to test the organization’s overall resilience against a defined adversary blending the detection rules, the escalation processes, the threat hunting and the correlation capability of the team.

Ultimately, trophy-driven engagements represent the final evolution in the Purple Team lifecycle, shifting the focus from “What are our detection flaws?” to “Would a real attacker actually be detected?”

SOC Assessments

SOC assessments focus on evaluating the operational readiness and performance of the SOC. Unlike the previous approaches, which validate detection, this approach measures the human and procedural capacity to detect, qualify, investigate, and remediate threats. It serves to validate that standard operation procedures and playbooks are effectively followed by analysts, while simultaneously identifying visibility gaps in logging and telemetry across the attack lifecycle.

However, SOC assessments often rely on structured scenarios that create a sense of artificiality. The engagement is still a trigger-and-response exercise sugarcoated with procedural validation and human factor evaluation.

Because these tests are centered on known, pre-planned triggers, they fail to force analysts to perform deep, investigative log correlation or to detect anomalous patterns across multiple, seemingly benign events : this test is still designed to « evaluate what is working today » and not « what must work tomorrow ».

Finally, this scripted nature leaves no room for genuine Threat Hunting. Indeed, the SOC is never pushed to proactively uncover the plan of the adversary in the long run. By focusing on reactive playbook execution rather than the ambiguity of an evolving campaign, these assessments miss an important aspect of the human factor evaluation: the inability to detect a sophisticated threat that does not trigger a predefined, “noisy” alert.

The “T-Time Trap”

Despite their differences, all three approaches suffer from the same fundamental limitation: they evaluate an organization’s security posture at a specific point in time.

Modern information systems are constantly evolving. Infrastructure migrations, cloud transformations, software deployments, and changes to security tooling can all affect the effectiveness of detection and response capabilities. A detection rule validated during a Purple Team exercise may no longer function as expected following a routine infrastructure change.

At the same time, threat actors continuously adapt their tactics, techniques, and procedures. With the development of AI augmented attacks, the defensive profile is constantly evolving: what was secured yesterday can be obsolete today.

Consequently, organizations must regularly reassess their defensive capabilities to ensure they remain aligned with the evolving threat landscape.

Yet the cost, complexity, and manual effort associated with traditional Purple Team engagements often prevent organizations from performing assessments at the required frequency. This creates a gap between security validation and operational reality, leaving defenders with only a periodic view of their true defensive readiness.

Empowering the Defense: Self-Service & CTI-Driven Automation

The limitations of traditional Purple Teaming raise an important question: how can organizations validate their defensive capabilities more frequently without significantly increasing costs and operational overhead?

The answer lies in shifting from consultant-driven assessments to defender-driven validation. Rather than waiting for periodic Purple Team engagements, security teams should be able to continuously assess their detection and response capabilities whenever needed.

CTI as the Engine of the workflow

Cyber Threat Intelligence (CTI) provides a valuable source of information on how threat actors operate. By documenting adversaries’ tactics, techniques, and procedures (TTPs), CTI enables organizations to move beyond generic attack simulations and focus on realistic threat scenarios relevant to their environment.

Instead of being treated as static reports consumed once and archived, CTI can serve as the foundation for repeatable defensive assessments. Every newly identified technique, campaign, or threat actor profile can become an opportunity to validate existing security controls and identify detection gaps.

Translating TTPs into Automated Scenarios

While CTI identifies what adversaries do, organizations still need a way to reproduce those behaviors in a controlled and repeatable manner.

By translating documented TTPs into automated attack scenarios, security teams can continuously test their ability to detect and investigate activities associated with specific threat actors. While this translation effort must be performed once, the resulting scenarios can be executed repeatedly with minimal overhead, allowing organizations to validate their defenses whenever needed.

This approach significantly reduces the manual effort traditionally required to prepare and execute Purple Team exercises while ensuring consistency across assessments.

Enabling Autonomous Defensive Assessments

Automation empowers the Blue Team to operate more autonomously. Instead of depending on external engagements or dedicated Red Team resources, defenders can execute assessments themselves whenever operational changes occur.

For example, assessments can be triggered following major infrastructure migrations, the deployment of new security controls, or the publication of threat intelligence related to a relevant adversary.

This self-service approach enables organizations to validate their defensive posture at the required frequency, ensuring that detection capabilities remain aligned with both infrastructure changes and the evolving threat landscape.

Overcoming Market Automation Limits: The Caldera & Mythic Integration

While attack orchestration frameworks already exist, they often come with operational limitations. For instance, Caldera relies on generic agents that do not implement advanced Command and Control (C2) capabilities such as in-memory PowerShell execution, Inline Assembly execution, or Beacon Object Files (BOFs). As a result, while Caldera excels at automating adversary emulation scenarios, it may not accurately reproduce the tradecraft employed by sophisticated threat actors. Furthermore, in environments where realism is a key objective, the presence and behavior of the Caldera agent may allow defenders to quickly identify the exercise, limiting the fidelity of the assessment.

Conversely, modern Command and Control frameworks such as Mythic provide realistic adversary simulation capabilities and advanced execution methods, but they lack the orchestration and automation features required to perform repeatable Purple Team assessments at scale.

To bridge this gap, we developed the Mythic plugin for Caldera, which integrates directly with the Mythic C2 framework. The objective was to combine Caldera’s automation and orchestration capabilities with Mythic’s realistic Command and Control capabilities. Within this architecture, Caldera remains responsible for orchestrating CTI-driven attack scenarios, while Mythic provides the execution layer used to simulate advanced adversary tradecraft.

This integration enables organizations to automate complex attack chains while maintaining a level of realism closer to that of real-world intrusions.

Mythic Caldera plugin: Adversary Emulation Library

The plugin extends Caldera by integrating Mythic C2 and providing custom adversary profiles, fact sources, payloads and parsers. Together, these components enable operators to quickly turn threat intelligence into automated adversary emulation scenarios while significantly reducing the need for manual configuration.

The plugin includes 5 custom adversary profiles, each designed to emulate a distinct threat model and associated attacker tradecraft:

Insider: Simulates an internal attacker, such as a Windows Administrator, with TTPs implemented exclusively using Windows living-off-the-land binaries (LOLBins).
Cybercrime: Simulates an opportunistic attacker leveraging publicly available offensive tools and remote attack techniques conducted through the Mythic SOCKS5 proxy infrastructure.
APT: Simulates a sophisticated threat actor using advanced tradecraft, including low-level Windows API calls, Apollo built-in commands, and in-memory payload execution techniques.
Linux – Insider: Simulates an internal attacker, such as a Linux Administrator, with TTPs implemented exclusively using native Linux commands and utilities.
Linux- Cybercrime: Simulates an opportunistic attacker targeting Linux environments, with TTPs implemented using common open-source offensive tools.

To improve reusability, the plugin leverages Caldera fact sources to dynamically parameterize abilities. Instead of hardcoding environment-specific values, facts such as domain names, IP addresses, credentials, payloads, or operational parameters are injected at runtime. This approach allows the same adversary profile to be reused across multiple environments with minimal modifications.

The library also includes a collection of payloads and parsers used to support advanced attack simulations. Payloads are automatically synchronized with Mythic and can be leveraged by abilities during operation execution, while parsers dynamically extract information from command outputs and transform it into facts that can be consumed by subsequent abilities.

Finally, the plugin provides a growing library of more than 180 reusable abilities covering a wide range of ATT&CK techniques. These abilities can be combined into adversary profiles or executed individually to validate specific detections and response procedures.

Mythic Caldera plugin: Caldera-Mythic Integration

At the core of the integration are two command-line interfaces (CLIs): apollo_exec.py and athena_exec.py. These CLIs interface with the Mythic API and are used by the Caldera agent Sandcat to programmatically task Apollo (Windows) and Athena (Linux) agents .

For example, The Apollo CLI takes a Mythic callback ID, a command, and optional arguments, and supports additional options to extend execution behavior:

-uploads: upload files before execution
-downloads: download files after execution
-deletes: remove files after execution
-ps: import a PowerShell script in-memory before execution
-pid: specify a target process ID for process injection

To streamline the interaction between Caldera and Mythic, the plugin implements two core functionalities:

Connect C2: generates the py and the athena_exec.py CLIs based on the provided Mythic C2 configuration parameters to enable communication with the Mythic API.
Sync Payloads: automatically registers the payloads required by Caldera operations on Mythic, including .NET assemblies, DLLs, executables, and Beacon Object Files (BOFs).

Mythic Caldera plugin: Execution Workflow

Within our workflow, MITRE Caldera is used as an orchestration platform rather than a traditional Command and Control (C2) server. The Caldera agent (Sandcat) is deployed on the same host as the Caldera server and is responsible for coordinating the execution of attack scenarios. Instead of executing abilities directly, it delegates their execution to the Mythic C2 infrastructure.

Depending on the nature of the technique being executed, TTPs are handled through one of two execution paths:

Network-based attacks: network-oriented TTPs, such as lateral movement or remote service interactions, are executed by the Caldera agent through a SOCKS5 proxy exposed by Mythic. Traffic is routed through the Apollo agent using tools such as proxychains.

System execution attacks: Host-based TTPs are executed directly on compromised systems through Mythic agents. In this scenario, the Caldera agent leverages the py CLI to interact with the Mythic API, tasking the Apollo agent to perform the requested action.

Objective Measurement: Assessing SOC Progress Using VECTR

A major limitation of tools such as Caldera is their Red-Team-centric design. While they excel at orchestrating and executing attacks, they do not provide a user-friendly interface for Blue Team analysts to review, enrich, and track assessment results. Consequently, accessing and interpreting the outcome of Purple Team exercises can become tedious, particularly when multiple operations are conducted over time.

To address this challenge, we integrated VECTR into our workflow. VECTR is a Purple Team platform designed to centralize attack and detection data, providing a common operational picture for both Red and Blue Teams. By correlating adversary actions with defensive observations, it enables organizations to objectively measure detection capabilities and track their evolution over time.

To streamline this process, we developed the VECTR plugin for Caldera. Once triggered by the operator, the plugin automatically exports completed operations to VECTR as campaigns, enabling the automatic generation of attack graphs and MITRE ATT&CK heatmaps while eliminating hours of manual reporting effort.

Vectr Caldera plugin: Campaign Creation

The plugin extends Caldera by exporting completed operations as VECTR campaigns. During the export process, the plugin transfers operation steps, execution status, executed commands, MITRE ATT&CK technique mappings, timestamps, and command outputs (stdout/stderr).

The plugin displays available Caldera operations along with their execution status. Once an operation is completed, the operator can trigger the export with a single click after providing the VECTR connection parameters. The export process is performed asynchronously to avoid blocking the Caldera execution thread.

Once exported, the operation appears as a campaign in VECTR. To maintain traceability between both platforms and ensure campaign uniqueness, the campaign name is composed of the Caldera operation name followed by the first 8 characters of the corresponding operation identifier.

Vectr Caldera plugin: Campaign Enrichment

Each ability included in a Caldera operation is mapped to a corresponding test case within the VECTR campaign. As a result, every test case is automatically enriched with relevant Red Team information, including the associated ATT&CK technique, execution status, timestamps, commands, and operational metadata:

For abilities that were executed, command outputs (stdout/stderr) are exported to VECTR and attached as Red Team logs. These logs provide analysts with detailed visibility into the actions performed during the assessment and can be reviewed to better understand the execution flow and investigate detection opportunities.

Bringing It All Together: End-to-End Demonstration

The following video brings together all components presented throughout this article, illustrating an end-to-end automated Purple Team assessment workflow, from automated adversary emulation with Caldera and Mythic to the visualization of adversary activities and operational results within VECTR.

What’s next?

While the workflow significantly reduces the effort required to conduct Purple Team assessments, one manual step remains: translating threat intelligence into executable Caldera abilities and adversary profiles.

Today, this process requires analysts to review CTI reports, identify relevant TTPs, and manually implement the corresponding abilities within the adversary emulation library. Although this effort only needs to be performed once for each technique, it remains dependent on human expertise and can become time consuming when operationalizing large volumes of threat intelligence.

Future work will focus on leveraging Artificial Intelligence to automate this process. By combining large language models with ATT&CK knowledge and existing ability templates, CTI reports could be automatically transformed into executable Caldera abilities, significantly accelerating the operationalization of threat intelligence and further reducing the effort required to maintain an up-to-date adversary emulation library.

This would complete the automation chain, enabling organizations to move from threat intelligence acquisition to automated adversary emulation and SOC assessment with minimal human intervention.

Caldera CTI-powered Puprle Team Cyber Threat Intelligence cybersecurity MITRE ATT&CK Mythic C2 Purple Team SOC

Automated CTI-powered Purple Teams