Who Restarted Prod? ECS Audit in CloudTrail
Originally published at https://fortem.dev/blog/ecs-audit-log-compliance
Every ECS change — UpdateService, StopTask, RunTask — lands in CloudTrail with who, when, and from where. Three CLI commands find the culprit in under 2 minutes.
Use Case · June 16, 2026 · 8 min read
ecs-audit-logecs-compliance-loggingaws-ecs-cloudtrail
How to Find It in CloudTrail
Your ECS service restarted. Or a task was manually stopped. Or desiredCount dropped to zero and nobody admits it. The ECS console shows WHAT happened — not WHO. CloudTrail has the answer, and three CLI commands get you there in under two minutes.
TL;DR
- 01CloudTrail captures every ECS API call — UpdateService, StopTask, RunTask, RegisterTaskDefinition — with who, when, and from where.
- 02Event History is free for the last 90 days. Three CLI commands find the culprit in under 2 minutes.
- 03The userIdentity field tells you human vs CI/CD vs AWS service. Root account activity in ECS is always suspicious.
- 04Download the skill file — an AI agent runs the full fleet audit and produces a structured report automatically.
Why the ECS events tab doesn't tell you who did it
ECS events show WHAT happened — "service updated", "task stopped" — but not WHO. The userIdentity lives in CloudTrail, not in the ECS console. That's the gap most teams waste an hour trying to bridge.
You open the ECS service page. Under Events: "service my-api has started 1 tasks" at 14:23, "service my-api has stopped 1 running tasks" at 14:21. Something stopped your service and triggered a redeploy. The ECS console stops there — it doesn't record the API caller, the IAM identity, or whether it was a human clicking the console or Terraform applying a change.
ECS Events tabCloudTrail
Shows WHAT happenedShows WHO did it, WHEN, and FROM WHERE
Service-level messages onlyAll API calls including StopTask, UpdateService, RunTask
No API caller infouserIdentity: human, CI/CD role, or AWS service
Kept for a few hours90-day Event History, free
Not queryableSearchable by event name, username, resource, IP
KEY INSIGHT: Key insight CloudTrail records every ECS API call automatically — no setup required. The 90-day Event History is free. You're not paying for it already; it's just there. The only thing missing is knowing where to look.
Three commands to find the culprit in under 2 minutes
aws cloudtrail lookup-events with AttributeKey=EventName filters to specific actions. Pipe through jq to extract userIdentity.userName, eventTime, and sourceIPAddress. Covers the last 90 days at no charge.
Find who stopped a task
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=StopTask \
--query 'Events[*].CloudTrailEvent' \
--output text | \
jq -r '. | {
time: .eventTime,
who: (
if .userIdentity.type == "IAMUser" then .userIdentity.userName
elif .userIdentity.type == "AssumedRole" then .userIdentity.sessionContext.sessionIssuer.userName
else .userIdentity.type
end
),
from: .sourceIPAddress,
via: .userAgent,
task: .requestParameters.task
}'
Find who updated a service (deployments, scale changes)
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=UpdateService \
--query 'Events[*].CloudTrailEvent' \
--output text | \
jq -r '. | {
time: .eventTime,
who: (
if .userIdentity.type == "IAMUser" then .userIdentity.userName
elif .userIdentity.type == "AssumedRole" then .userIdentity.sessionContext.sessionIssuer.userName
else .userIdentity.type
end
),
via: .userAgent,
service: .requestParameters.service,
desiredCount: .requestParameters.desiredCount
}'
Narrow by specific user or role
# Find everything a specific IAM user did in the last 24h
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=Username,AttributeValue=john.smith \
--start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u -v-24H +%Y-%m-%dT%H:%M:%SZ) \
--query 'Events[*].{Time:EventTime,Event:EventName}' \
--output table
Rate limit: lookup-events is capped at 2 requests/second per account per region. If you're scripting across many event types, add a 0.5s sleep between calls or use --next-token for pagination. Max 50 events per request; paginate if you need more.
Which ECS events map to which actions
UpdateService = scale change or deployment. StopTask = manual kill. RegisterTaskDefinition = new image or config. RunTask = standalone task launch. Each has a different userIdentity pattern worth knowing.
ScenarioCloudTrail eventNameWho typically calls it
Service scaled up/downUpdateServiceHuman, CI/CD, autoscaler
Deployment triggeredUpdateService + RunTaskCI/CD pipeline
Task manually stoppedStopTaskHuman, script, ECS agent
New task definitionRegisterTaskDefinitionCI/CD pipeline, human
Service created/deletedCreateService / DeleteServiceHuman, Terraform
Cluster deletedDeleteClusterHuman, Terraform
The most ambiguous one is StopTask. It appears in CloudTrail when a human manually stops a task, when a script does it, and when ECS itself stops a task during a rolling deployment. Check userIdentity.invokedBy — if it says ecs.amazonaws.com, ECS triggered the stop internally during service orchestration, not a human.
Decoding userIdentity: human, CI/CD, or AWS service
userIdentity.type tells you who called the API: IAMUser = human, AssumedRole = CI/CD or Lambda, AWSService = autoscaler or ECS itself. Root type should never appear in ECS — alert immediately if it does.
userIdentity.typeMeaningHow to extract the name
IAMUserHuman with IAM credentials.userIdentity.userName
AssumedRoleCI/CD, Lambda, or human via role.userIdentity.sessionContext.sessionIssuer.userName
RootAWS root account — alert immediatelytype = Root is the signal
AWSServiceAWS-owned service (autoscaling, ECS agent).userIdentity.invokedBy
AWSAccountCross-account call from another AWS account.userIdentity.accountId
FederatedUserSSO / identity provider user.userIdentity.principalId
The tricky one is AssumedRole. When a GitHub Actions pipeline runs aws ecs update-service, the CloudTrail event shows type: AssumedRole and the ARN of the role. The human-readable role name is in sessionContext.sessionIssuer.userName. That's the field to surface in your audit report — not the full ARN.
To distinguish console vs CLI vs Terraform, use the userAgent field:
userAgent valueWhat called the API
console.amazonaws.comAWS console (someone clicked)
aws-cli/2.*AWS CLI (manual or script)
Terraform/1.* terraform-provider-aws/*Terraform apply
github-actions/*GitHub Actions CI/CD
ECS ConsoleECS service console actions
KEY INSIGHT: Key insight If
userIdentity.typeisRoot, stop everything else and investigate. Root credentials should never be used for routine ECS operations. A Root call in CloudTrail means either someone is using the root account directly (a security failure) or credentials were compromised.
Alerting in real time: EventBridge rule for critical ECS changes
EventBridge can trigger a notification within seconds of a StopTask or UpdateService call — before you notice the incident. One Terraform resource sets up the rule with no additional infrastructure.
Searching CloudTrail after an incident is reactive. EventBridge makes it proactive: you define a rule that matches specific CloudTrail events, and EventBridge triggers an SNS notification, Lambda, or Slack webhook immediately when the event occurs. For teams running 10+ ECS environments, catching a DeleteService before the on-call rotation starts saves significant incident response time.
Terraform: EventBridge rule for critical ECS events
resource "aws_cloudwatch_event_rule" "ecs_critical" {
name = "ecs-critical-changes"
description = "Alert on destructive or suspicious ECS API calls"
event_pattern = jsonencode({
source = ["aws.ecs"]
detail-type = ["AWS API Call via CloudTrail"]
detail = {
eventSource = ["ecs.amazonaws.com"]
eventName = [
"StopTask",
"DeleteService",
"DeleteCluster",
"UpdateService"
]
}
})
}
resource "aws_cloudwatch_event_target" "ecs_critical_sns" {
rule = aws_cloudwatch_event_rule.ecs_critical.name
target_id = "SendToSNS"
arn = aws_sns_topic.alerts.arn
input_transformer {
input_paths = {
event = "$.detail.eventName"
who = "$.detail.userIdentity.sessionContext.sessionIssuer.userName"
time = "$.time"
service = "$.detail.requestParameters.service"
}
input_template = ""ECS alert: <event> on <service> by <who> at <time>""
}
}
For UpdateService, add a second rule specifically for scale-to-zero: filter where requestParameters.desiredCount = 0. That's the most common accidental incident — someone running a cleanup script that hits the wrong environment.
The Oct 2025 addition: ECS CloudTrail data events
Since October 2025, ECS supports CloudTrail data events for ContainerInstance agent API activity (ecs:Poll, ecs:StartTelemetrySession). These aren't in Event History — they require a CloudTrail trail or CloudTrail Lake.
AWS management events (UpdateService, StopTask, etc.) are what most teams need for incident response. The October 2025 addition is different: ECS now supports CloudTrail data events for ContainerInstance agent API calls — the low-level polling activity between the ECS agent and the control plane.
Management eventsData events (Oct 2025)
What they captureUpdateService, StopTask, RunTask, etc.ecs:Poll, ecs:StartTelemetrySession, ecs:PutSystemLogEvents
CostFree (Event History)Additional CloudTrail charges
In Event History?Yes — 90 daysNo — trail or Lake required
Who needs themEveryone — incident responseEC2 launch type, compliance auditing
Resource type—AWS::ECS::ContainerInstance
For most ECS Fargate teams, data events aren't needed for incident response — management events cover UpdateService and StopTask which is where incidents come from. Data events matter if you run EC2 launch type and need to audit ContainerInstance registration activity, or if compliance requires a full record of agent-to-control-plane communication. Enable them only if you have a specific requirement — at scale, ContainerInstance polling events generate significant volume and cost. Details in the ECS CloudTrail logging docs.
Download the skill file — let the AI agent do the audit
The skill file instructs an AI agent to pull all critical ECS CloudTrail events from the last 24 hours across every cluster in your account and produce a structured "who did what" report. Read-only — no changes applied.
ECS CloudTrail Audit Agent scans all clusters, pulls critical ECS events (Update
The agent lists all clusters, runs lookup-eventsfor each critical event type, decodes the userIdentity, and produces a structured output: "Service X was updated at HH:MM by role deploy-prod via GitHub Actions from IP 140.82.114.3." It also flags Root account activity, unexpected source IPs, and scale-to-zero incidents. For teams where "who did this?" is a recurring post-incident question, this is the 2-minute version of the 20-minute manual process.
"To identify the user who initiates a StopTask API call, view StopTask in AWS CloudTrail for userIdentity information."
— AWS Knowledge Center: Troubleshoot running task count changes in ECS
Book a 20-min fleet walkthrough: fortem.dev/book
Top comments (0)