AWS Detection Engineering: High-Signal Rules from CloudTrail and GuardDuty
A practical method for turning CloudTrail and GuardDuty into detections that fire on real attacker behavior, not noise. We map TTPs to ATT&CK, baseline normal, enrich for context, and wire findings to automated response.

Most AWS detection programs drown in their own telemetry. A single busy account emits tens of thousands of CloudTrail events an hour, and the naive instinct is to alert on anything that looks scary: a DeleteBucket here, an AttachUserPolicy there. The result is an alert queue nobody trusts and analysts who mute the channel by week two. Detection engineering is the discipline of inverting that: you start from how an attacker actually operates in AWS, express that behavior as a precise query over the right log source, and ruthlessly tune until the rule only fires when something is genuinely worth waking a human for.
This article walks the full loop for AWS-native detection: choosing CloudTrail management versus data events as your substrate, treating GuardDuty as a high-confidence signal rather than a raw feed, building rules from TTPs mapped loosely to MITRE ATT&CK for cloud, baselining and enriching to kill false positives, and finally turning a validated detection into automated response with EventBridge and Lambda. No third-party SIEM required, everything here is AWS-native.
Know your substrate: CloudTrail management vs data events
CloudTrail records two broad classes of activity, and conflating them is the most common reason detections are either blind or unaffordable. Management events capture control-plane operations: IAM changes, security group edits, CreateAccessKey, AttachUserPolicy, UpdateAssumeRolePolicy, ConsoleLogin. These are low-volume, high-value, and on by default in every account, this is where identity and persistence attacks surface. Data events capture object-level activity: S3 GetObject, Lambda Invoke, DynamoDB item access. They are where exfiltration shows up, but they are high-volume and billed per event, so you enable them surgically on sensitive buckets and tables rather than account-wide.
- Management events: identity abuse, persistence, defense evasion, privilege escalation. Always on, cheap, your detection bread and butter.
- Data events: exfiltration and access-to-data TTPs. Enable selectively via advanced event selectors scoped to crown-jewel resources to control cost and noise.
- Organization trails: aggregate every member account into one S3 bucket so a compromised account cannot simply stop its own local trail and go dark.
- Centralize into a security-tooling account with the trail bucket locked down by SCP and an immutable Object Lock policy so an attacker who lands in prod cannot delete the evidence.
For querying, you have two AWS-native paths. CloudWatch Logs Insights works well for near-real-time hunting if you ship the trail to a log group. For historical hunting over the S3 archive at scale, Athena over the CloudTrail table is the cost-effective choice. The detection logic is the same, only the dialect changes.
Build from TTPs, not from event names
A good detection encodes a behavior, not a single API call. AttachUserPolicy on its own is routine, platform teams do it constantly. What is anomalous is a principal that was created minutes ago immediately minting itself an access key and attaching AdministratorAccess. That sequence maps to ATT&CK Persistence (T1098 Account Manipulation) and Privilege Escalation, and it is the combination plus the newness of the actor that makes it high-signal. The query below hunts exactly that pattern over CloudTrail in CloudWatch Logs Insights.
fields @timestamp, userIdentity.arn as actor, eventName, requestParameters.policyArn as policy, sourceIPAddress
| filter eventSource = "iam.amazonaws.com"
| filter eventName in ["CreateAccessKey", "AttachUserPolicy", "AttachRolePolicy", "PutUserPolicy", "CreateLoginProfile"]
| filter errorCode not in ["AccessDenied", "UnauthorizedOperation"]
| parse requestParameters.policyArn /(?<scope>Administrator|PowerUser|FullAccess|\*)/
| filter eventName = "CreateAccessKey" or ispresent(scope)
| stats count() as actions,
earliest(@timestamp) as firstSeen,
latest(@timestamp) as lastSeen,
values(eventName) as apis,
values(policy) as policies
by actor, sourceIPAddress
| filter actions >= 2
| sort lastSeen descThe shape matters more than the exact lines. We scope to the IAM event source, whitelist only the privilege-granting and credential-creating calls, drop AccessDenied so failed probes do not flood the rule (track those separately as a recon signal), and require at least two correlated actions by the same actor and source IP inside the window. That correlation is what converts a noisy single-event alert into something an analyst will believe. Run the same pattern in Athena against the CloudTrail S3 table for retrospective hunts when a new IOC drops.
The other high-signal classics
A handful of management-event patterns deserve a dedicated rule in every environment because they are both rare in normal operations and central to real intrusions. Tune the principal allowlists to your account, but the events themselves are the canonical ones to watch.
- DeleteTrail, StopLogging, UpdateTrail, and PutEventSelectors that narrow coverage: ATT&CK Defense Evasion (T1562.008 Disable Cloud Logs). Anyone turning off the lights is either an attacker or a misconfiguration you also want to know about.
- PutBucketPolicy, PutBucketAcl, or DeletePublicAccessBlock that introduces a Principal of "*" or an allow to a foreign account: resource exposure and potential exfiltration staging.
- ConsoleLogin where additionalEventData.MFAUsed = "No" on a human principal, especially from a new ASN or geography: credential abuse.
- UpdateAssumeRolePolicy that adds an external account or overly broad principal to a trust policy: a stealthy cross-account persistence and privilege-escalation path that bypasses key rotation entirely.
- GetSecretValue or kms:Decrypt spikes from a principal that has never touched those secrets before: staging for exfiltration.
Treat DeleteTrail and StopLogging as page-immediately, not triage-tomorrow. They are nearly never legitimate in a mature account, and they are the move an attacker makes right before the part you most need logged. Wire them straight to automated response and an on-call notification, skip the queue.
GuardDuty as a signal source, not a firehose
GuardDuty is already a detection engine, it applies ML and threat intel to CloudTrail, VPC Flow Logs, DNS logs, and (with the relevant protection plans) S3, EKS audit logs, Lambda network activity, RDS login activity, and malware scans on EBS. The mistake is treating its findings as raw events to re-detect. Instead, consume findings as enriched, scored signals: each carries a severity, a resource, an actor, and an ATT&CK-aligned finding type such as PrivilegeEscalation:IAMUser/AdministrativePermissions or Exfiltration:S3/AnomalousBehavior. Your job as a detection engineer is to route, enrich, and correlate them, not to rebuild them.
- Use GuardDuty severity as a first-pass filter, but do not blindly trust the band, fold in your own context (is the resource a crown jewel, is the actor a service role) to re-rank.
- Correlate GuardDuty findings with your CloudTrail rules: a GuardDuty UnauthorizedAccess finding plus your own CreateAccessKey rule firing on the same principal is a far stronger signal than either alone.
- Enable GuardDuty at the organization level with auto-enable for new accounts so coverage cannot silently lapse as the org grows.
- Suppress known-benign finding patterns with suppression rules rather than ignoring whole finding types, so you keep the signal while cutting the recurring noise.
Baseline, enrich, and kill false positives
A rule that fires correctly but constantly is still a failed detection. Three techniques carry most of the tuning. First, baseline: derive the set of principals, source IPs, and regions that normally perform a sensitive action, and alert on deviation from that learned set rather than on the action itself, a first-seen access key from a known automation role is noise, a first-seen key from a brand-new user from a residential ASN is not. Second, enrich at detection time: join the actor ARN against your IAM inventory to know if it is human or machine, resolve the source IP to an ASN and geo, and tag the resource with its data classification. Enrichment is what lets a rule say administrator policy attached to a new user from an unrecognized ASN instead of just AttachUserPolicy. Third, exclude structurally: filter out your CI/CD and IaC pipeline roles by ARN, since Terraform and CloudFormation legitimately perform the exact privileged actions your rules hunt for, an unexcluded pipeline role is the number-one source of false positives in AWS detections.
Turn detections into automated response
The payoff of a high-signal rule is that you can trust it enough to act automatically. The AWS-native pattern is an EventBridge rule matching either a GuardDuty finding or a CloudTrail event delivered to EventBridge, targeting a Lambda (or a Step Functions workflow for multi-step containment). For the highest-confidence detections, response should be immediate and reversible: for a leaked access key, deactivate the key and attach an explicit deny policy to the principal, for a public-bucket change, re-apply the account public access block, for a disabled trail, re-enable logging and snapshot the actor's recent activity for the responder. Always notify in parallel via SNS so a human is looped in even when the machine acts first, and write every automated action back to a case so containment is auditable. Start by alert-only on a new rule, measure its precision against real incidents for a couple of weeks, and only then promote the trusted ones to auto-remediate.
Detection engineering on AWS is not about collecting more, it is about collecting the right plane (management events broadly, data events surgically), expressing attacker behavior as correlated multi-event queries, leaning on GuardDuty for ML-derived signal, and tuning relentlessly with baselining and enrichment until every alert earns its interrupt. Get that loop right and a two-person team can defend an estate of hundreds of accounts, because the system only ever asks for human judgment when it has genuinely earned it.