AWS CI/CD & DevOps is the integrated ecosystem of servicesβCodeCommit, CodeBuild, CodeDeploy, CodePipeline, CloudFormation, CDK, and Systems Managerβthat automate and govern the entire software delivery lifecycle on AWS. It solves the problem of manual, error-prone deployments and inconsistent infrastructure by giving teams a single-vendor, IAM-integrated, audit-trailed path from source code to production. Teams pick the AWS native toolchain for tight integration with IAM, CloudWatch, and every AWS resource (no separate credentials plumbing), for compliance (all API calls logged to CloudTrail), and for the unified billing and support model. The strategic tradeoff versus GitHub Actions + Terraform is flexibility for simplicityβAWS native tools are less portable but operationally tighter.
A fully managed, Git-compatible source control service. Repositories are hosted in AWS, are regionally redundant, and are encrypted at rest (KMS) and in transit (TLS). It was deprecated for new customers in July 2024βexisting repos still work, but AWS is steering new workloads toward GitHub/GitLab with OIDC. Interview insight: Know why it's being deprecated and know the migration path, but also understand it deeply because many enterprises still run it.
There are no per-repository passwords. Access is governed entirely by IAM:
aws iam upload-ssh-public-key (SSH) or HTTP credentials generated from the IAM console.codecommit:GitPull, codecommit:GitPush, codecommit:CreateBranch, codecommit:DeleteBranch, codecommit:GetMergeCommit.{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"codecommit:GitPull",
"codecommit:GitPush"
],
"Resource": "arn:aws:codecommit:us-east-1:123456789012:MyRepo",
"Condition": {
"StringEqualsIfExists": {
"codecommit:References": [
"refs/heads/main",
"refs/heads/develop"
]
}
}
}
]
}
pullRequestCreated, pullRequestMerged, referenceUpdated. Use EventBridge rules to fan out to multiple targets (Lambda, Step Functions, SQS, CodePipeline).// EventBridge rule: trigger CodePipeline on push to main
{
"source": ["aws.codecommit"],
"detail-type": ["CodeCommit Repository State Change"],
"detail": {
"event": ["referenceUpdated"],
"referenceName": ["main"]
}
}
| Dimension | CodeCommit | GitHub |
|---|---|---|
| Auth | IAM (no secrets) | PATs / App credentials / OIDC |
| Pull Requests | Basic (no review apps) | Full code review ecosystem |
| Actions/CI | CodeBuild only | GitHub Actions (rich marketplace) |
| Deprecation | Deprecated for new customers 2024 | Active development |
| Compliance | CloudTrail-native | Requires audit log configuration |
| Cross-account | Via IAM roles | Via GitHub App / OIDC |
| Pricing | $1/active user/mo, $0.06/GB | Free tier generous; Enterprise pricing |
Migration path: Clone CodeCommit repo β push to GitHub β configure OIDC trust β update pipelines to use GitHub source action.
A fully managed, serverless build service. You pay per build minute. No servers to manage. Each build runs in a fresh, isolated Docker container. The build is defined by a buildspec.yml file (or inline YAML in the project configuration).
version: 0.2
# Environment variables available to all phases
env:
variables:
IMAGE_REPO_NAME: "my-app"
AWS_DEFAULT_REGION: "us-east-1"
# Inject secrets from Parameter Store (decrypted at build time)
parameter-store:
DB_PASSWORD: "/myapp/prod/db-password"
# Inject secrets from Secrets Manager
secrets-manager:
API_KEY: "myapp/api-key:API_KEY"
# Export variables to subsequent actions in CodePipeline
exported-variables:
- IMAGE_TAG
phases:
install:
runtime-versions:
nodejs: 18
python: 3.11
commands:
- npm ci
- pip install -r requirements.txt
pre_build:
commands:
# Log in to ECR
- aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com
- export IMAGE_TAG=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
- echo "Building image $IMAGE_REPO_NAME:$IMAGE_TAG"
build:
on-failure: ABORT # ABORT (default) or CONTINUE
commands:
- npm test
- docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .
- docker tag $IMAGE_REPO_NAME:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
post_build:
commands:
- docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
# Write image definition file for CodeDeploy/ECS action
- printf '[{"name":"my-app","imageUri":"%s"}]' $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG > imagedefinitions.json
artifacts:
files:
- imagedefinitions.json
- appspec.yml
- taskdef.json
discard-paths: yes
# Secondary artifacts (multiple output locations)
secondary-artifacts:
test-reports:
files:
- "coverage/**/*"
base-directory: coverage
reports:
jest-reports:
files:
- "test-results.xml"
file-format: JUNITXML
cache:
paths:
- '/root/.npm/**/*'
- 'node_modules/**/*'
SUBMITTED β QUEUED β PROVISIONING (container startup, ~30s)DOWNLOAD_SOURCE β runs install β pre_build β build β post_buildUPLOAD_ARTIFACTS β FINALIZING β COMPLETED or FAILEDKey: If build fails and on-failure: ABORT, post_build is skipped. If on-failure: CONTINUE, post_build runs (useful for publishing test results even on failure).
| Type | vCPU | Memory | Use Case |
|---|---|---|---|
BUILD_GENERAL1_SMALL | 3 | 4 GB | Simple builds, unit tests |
BUILD_GENERAL1_MEDIUM | 7 | 16 GB | Standard Docker builds |
BUILD_GENERAL1_LARGE | 15 | 36 GB | Heavy builds, ML model training |
BUILD_GENERAL1_XLARGE | 72 | 144 GB | Large parallel test suites |
BUILD_LAMBDA_1GBβ10GB | Varies | 1β10 GB | Lambda-backed builds (no Docker) |
CodeBuild containers run in Docker themselves. To build Docker images, you need privileged mode enabled on the project. Without it, docker build fails.
# In pre_build: authenticate to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin ACCOUNT.dkr.ecr.REGION.amazonaws.com
# Use BuildKit for faster builds with better caching
export DOCKER_BUILDKIT=1
docker build --cache-from ACCOUNT.dkr.ecr.REGION.amazonaws.com/myapp:cache \
--build-arg BUILDKIT_INLINE_CACHE=1 \
-t myapp:$IMAGE_TAG .
# Push cache layer separately
docker push ACCOUNT.dkr.ecr.REGION.amazonaws.com/myapp:cache
cache: { type: LOCAL, modes: [SOURCE, DOCKER_LAYER, CUSTOM] }. Docker layer cache lives in the build fleet but is NOT guaranteed between builds.cache.paths contents to S3 after build, restores at start of next build. Works across all build environments.--cache-from; only changed layers are rebuilt.For builds that need to reach private resources (RDS, ElastiCache, internal APIs), configure the CodeBuild project with a VPC, subnets, and security groups. The build container runs inside your VPC. See VPC & Networking for subnet selection and security group rules.
Highest to lowest: Start build override > Project-level env vars > buildspec env vars. Never put secrets in plaintext project env varsβthey show up in logs. Always use Parameter Store or Secrets Manager references.
A deployment service that automates application deployments to EC2, ECS, Lambda, or on-premises servers. It handles the mechanics of the deployment (stopping old versions, starting new ones, health checks) so you define the strategy, not the shell scripts.
In-Place (EC2/on-premises only) The running application on each instance is stopped, the new version is installed, and the application is restarted. Instance is out of service during update. Best for non-critical workloads or when rolling-restart downtime is acceptable.
Blue/Green A new set of instances (green) is provisioned, the app is deployed and validated there, then traffic is shifted from old (blue) to new (green). Blue instances can be kept for rollback or terminated. Zero-downtime, instant rollback.
A deployment group is the target set: a set of EC2 instances (via tags or Auto Scaling Groups), ECS service/cluster, or Lambda function. It also specifies:
For EC2/on-premises (appspec.yml):
version: 0.0
os: linux
files:
- source: /dist # From the artifact
destination: /opt/myapp
permissions:
- object: /opt/myapp
owner: ec2-user
group: ec2-user
mode: 755
type:
- directory
- file
hooks:
BeforeInstall:
- location: scripts/stop_server.sh
timeout: 300
runas: root
AfterInstall:
- location: scripts/install_dependencies.sh
timeout: 600
runas: ec2-user
ApplicationStart:
- location: scripts/start_server.sh
timeout: 300
runas: ec2-user
ValidateService:
- location: scripts/health_check.sh
timeout: 300
Hook lifecycle for in-place EC2 deployments:
BeforeBlockTraffic β BlockTraffic β AfterBlockTraffic
β ApplicationStop
β DownloadBundle β BeforeInstall β Install β AfterInstall
β ApplicationStart β ValidateService
β BeforeAllowTraffic β AllowTraffic β AfterAllowTraffic
For ECS (appspec.yml):
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: <TASK_DEFINITION> # Replaced by CodePipeline action
LoadBalancerInfo:
ContainerName: "my-app"
ContainerPort: 80
PlatformVersion: "LATEST"
Hooks:
- BeforeInstall: "LambdaFunctionToValidateBeforeInstall"
- AfterInstall: "LambdaFunctionToValidateAfterInstall"
- AfterAllowTestTraffic: "LambdaFunctionToValidateAfterTestTrafficShifts"
- BeforeAllowTraffic: "LambdaFunctionToValidateBeforeTrafficShifts"
- AfterAllowTraffic: "LambdaFunctionToValidateAfterTrafficShifts"
For Lambda (appspec.yml):
version: 0.0
Resources:
- MyLambdaFunction:
Type: AWS::Lambda::Function
Properties:
Name: "MyLambdaFunction"
Alias: "MyLambdaFunctionAlias"
CurrentVersion: "1"
TargetVersion: "2"
Hooks:
- BeforeAllowTraffic: "PreTrafficLambdaFunction"
- AfterAllowTraffic: "PostTrafficLambdaFunction"
| Config | Description |
|---|---|
CodeDeployDefault.AllAtOnce | All instances at once. Fastest, maximum downtime risk |
CodeDeployDefault.HalfAtATime | 50% at a time. Half available during deploy |
CodeDeployDefault.OneAtATime | One instance at a time. Slowest, safest |
CodeDeployDefault.LambdaAllAtOnce | Lambda: all traffic immediately |
CodeDeployDefault.LambdaCanary10Percent5Minutes | 10% for 5min, then 90% |
CodeDeployDefault.LambdaCanary10Percent30Minutes | 10% for 30min, then 90% |
CodeDeployDefault.LambdaLinear10PercentEvery1Minute | 10% every 1min until 100% |
CodeDeployDefault.ECSCanary10Percent5Minutes | ECS: same canary pattern |
| Custom | Define minimumHealthyHosts: { value: 75, type: FLEET_PERCENT } |
Configure deployment group to auto-rollback when:
{
"autoRollbackConfiguration": {
"enabled": true,
"events": ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
},
"alarmConfiguration": {
"enabled": true,
"alarms": [{ "name": "5xxErrorRateAlarm" }]
}
}
A fully managed continuous delivery service that models, visualizes, and automates release pipelines. A pipeline is a directed sequence of stages, each containing one or more actions. Pipelines are triggered by source changes and progress artifacts from stage to stage.
Stages: Logical grouping of actions (e.g., Source, Build, Test, Deploy). Stages execute sequentially. A stage can have parallel actions (within the stage). A stage can be blocked for manual approval.
Actions: The atomic unit of work. Types:
Source: CodeCommit, S3, ECR, GitHub (via CodeStar Connection), BitbucketBuild: CodeBuild (invoke a build project)Test: CodeBuild, AWS Device Farm, third-party (BlazeMeter)Deploy: CodeDeploy, CloudFormation, ECS, Elastic Beanstalk, S3Approval: Manual approval (SNS notification β human approves/rejects in console/CLI)Invoke: Lambda function (custom logic)Artifacts: Files passed between actions via S3 (the pipeline's artifact bucket). Each action can declare inputArtifacts and outputArtifacts. CodePipeline zips/unzips them automatically.
Source (GitHub) β [SourceArtifact]
β
Build (CodeBuild) β [SourceArtifact] β [BuildArtifact]
β
Approval (Manual) β notification via SNS
β
Deploy (CodeDeploy) β [BuildArtifact]
Resources:
AppPipeline:
Type: AWS::CodePipeline::Pipeline
Properties:
RoleArn: !GetAtt PipelineRole.Arn
ArtifactStore:
Type: S3
Location: !Ref ArtifactBucket
EncryptionKey:
Id: !Ref PipelineKMSKey
Type: KMS
Stages:
- Name: Source
Actions:
- Name: GitHub_Source
ActionTypeId:
Category: Source
Owner: AWS
Provider: CodeStarSourceConnection
Version: "1"
Configuration:
ConnectionArn: !Ref GitHubConnection
FullRepositoryId: "org/repo"
BranchName: main
OutputArtifactFormat: CODE_ZIP
OutputArtifacts:
- Name: SourceArtifact
- Name: Build
Actions:
- Name: Build_and_Test
ActionTypeId:
Category: Build
Owner: AWS
Provider: CodeBuild
Version: "1"
InputArtifacts:
- Name: SourceArtifact
OutputArtifacts:
- Name: BuildArtifact
Configuration:
ProjectName: !Ref CodeBuildProject
RunOrder: 1
# Parallel action in same stage
- Name: Security_Scan
ActionTypeId:
Category: Test
Owner: AWS
Provider: CodeBuild
Version: "1"
InputArtifacts:
- Name: SourceArtifact
Configuration:
ProjectName: !Ref SecurityScanProject
RunOrder: 1 # Same RunOrder = parallel
- Name: Approve_Prod
Actions:
- Name: Manual_Approval
ActionTypeId:
Category: Approval
Owner: AWS
Provider: Manual
Version: "1"
Configuration:
NotificationArn: !Ref ApprovalSNSTopic
CustomData: "Review staging results before promoting to prod"
ExternalEntityLink: "https://staging.example.com/health"
- Name: Deploy_Prod
Actions:
- Name: Deploy
ActionTypeId:
Category: Deploy
Owner: AWS
Provider: CodeDeploy
Version: "1"
InputArtifacts:
- Name: BuildArtifact
Configuration:
ApplicationName: !Ref CodeDeployApp
DeploymentGroupName: !Ref ProdDeploymentGroup
Within a stage, actions with the same RunOrder number execute in parallel. Actions with a higher RunOrder wait for all lower-order actions in the stage to succeed.
Stage: Build
RunOrder 1: [CodeBuild: unit-tests] [CodeBuild: lint] β parallel
RunOrder 2: [CodeBuild: integration-tests] β waits for RunOrder 1
To deploy from Account A's pipeline to Account B's resources:
sts:AssumeRole).codecodedeploy:* or CloudFormation permissions needed for deployment.RoleArn of the cross-account role.// Account A Pipeline's deploy action
{
"name": "Deploy_to_Prod",
"roleArn": "arn:aws:iam::ACCOUNT_B:role/CrossAccountDeployRole",
"actionTypeId": { "provider": "CloudFormation" },
"configuration": { ... }
}
| Provider | Trigger | Auth | Notes |
|---|---|---|---|
| CodeCommit | EventBridge (instant) | IAM role | Native, no credentials |
| GitHub.com | Webhook via CodeStar Connection | OAuth App | Requires one-time console auth |
| GitHub Enterprise | Webhook via CodeStar Connection | OAuth App | Self-hosted support |
| Bitbucket | Webhook via CodeStar Connection | OAuth App | Same as GitHub flow |
| S3 | Polling (1 min) or EventBridge | IAM role | For zip artifacts, ML models |
| ECR | EventBridge (image push) | IAM role | Trigger on new image tag |
Branch Strategy: Pipeline Behavior:
βββββββββββββββββ ββββββββββββββββββ
feature/* ββpushβββ [Lint + Unit Tests only, no deploy]
(PR to develop triggers CodeBuild check)
develop ββmergeβββ [Build β Test β Deploy to Dev]
(auto-deploy to dev environment)
release/* ββcreateβββ [Build β Test β Deploy to Staging]
(QA gate; no auto-prod)
main ββmergeβββ [Build β Test β Manual Approval β Deploy to Prod]
(git tag triggers versioned release)
hotfix/* ββpushβββ [Fast-track: Build β Test β Manual Approval β Prod]
(bypasses staging; requires 2 approvals)
Implementation: Use separate CodePipeline pipelines per long-lived branch. Feature branch CI is handled by a CodeBuild project triggered via EventBridge on any refs/heads/feature/* pushβnot a full pipeline.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TOOLS ACCOUNT (111111111111) β
β - CodePipeline β
β - CodeBuild β
β - Artifact S3 bucket (KMS encrypted) β
β - ECR (cross-account pull allowed) β
ββββββββββββββββ¬ββββββββββββββββββββββββββββββββ¬ββββββββββββββββββ
β AssumeRole β AssumeRole
β β
ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββ
β DEV ACCOUNT (222222) β β STAGING ACCOUNT (333333) β
β CrossAccountDeployRole β β CrossAccountDeployRole β
β ECS Cluster / Lambda β β ECS Cluster / Lambda β
ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββ
β Manual Approval
β
ββββββββββββββββββββββββββββββββ
β PROD ACCOUNT (444444) β
β CrossAccountDeployRole β
β ECS Cluster / Lambda β
ββββββββββββββββββββββββββββββββ
Key setup steps:
ecr:GetDownloadUrlForLayer etc. to Dev/Staging/Prod account principals.kms:Decrypt for all deployment account roles.s3:GetObject for all deployment account roles.Strategy 1: CodeBuild webhook (checks only)
# CodeBuild project filter groups
filterGroups:
- - type: EVENT
pattern: PULL_REQUEST_CREATED,PULL_REQUEST_UPDATED
- type: HEAD_REF
pattern: refs/heads/feature/.*
Builds run on PR open/update. Build status is reported back to GitHub as a commit status. No deployment.
Strategy 2: Ephemeral environments per PR
A Lambda function (triggered by CodePipeline or directly from GitHub webhook) calls CloudFormation create-stack with a stack name derived from the PR number. Teardown via a second Lambda on PR close/merge.
# Lambda: create ephemeral env
def handler(event, context):
pr_number = event['detail']['pullRequestId']
stack_name = f"feature-pr-{pr_number}"
cfn.create_stack(
StackName=stack_name,
TemplateURL=f"s3://{TEMPLATE_BUCKET}/feature-env.yaml",
Parameters=[
{'ParameterKey': 'PRNumber', 'ParameterValue': pr_number},
{'ParameterKey': 'ImageTag', 'ParameterValue': event['detail']['sourceCommit']}
],
Tags=[{'Key': 'ephemeral', 'Value': 'true'}],
Capabilities=['CAPABILITY_IAM']
)
Mental model: CloudFormation is a declarative state engine. You describe the desired state in a template (JSON/YAML). CloudFormation computes the diff between current state and desired state, then executes the changes in dependency order (using an internal DAG).
Core concepts:
update-stack directly.DRIFTED, IN_SYNC, or NOT_CHECKED per resource. Does NOT auto-remediate.AWSCloudFormationStackSetAdministrationRole in admin account and AWSCloudFormationStackSetExecutionRole in target accounts.AWS::CloudFormation::Stack) in a parent stack. Enables template reuse and breaking up large templates. Parent passes parameters to child; child outputs are available via !GetAtt NestedStack.Outputs.OutputName.# Parent stack with nested stacks
Resources:
NetworkStack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: !Sub "https://s3.${AWS::Region}.amazonaws.com/${TemplateBucket}/network.yaml"
Parameters:
VpcCidr: "10.0.0.0/16"
AppStack:
Type: AWS::CloudFormation::Stack
DependsOn: NetworkStack
Properties:
TemplateURL: !Sub "https://s3.${AWS::Region}.amazonaws.com/${TemplateBucket}/app.yaml"
Parameters:
VpcId: !GetAtt NetworkStack.Outputs.VpcId
SubnetIds: !GetAtt NetworkStack.Outputs.PrivateSubnetIds
# Change set workflow
# aws cloudformation create-change-set --stack-name prod-stack --template-body file://template.yaml --change-set-name update-v2
# aws cloudformation describe-change-set --stack-name prod-stack --change-set-name update-v2
# aws cloudformation execute-change-set --stack-name prod-stack --change-set-name update-v2
DeletionPolicy and UpdateReplacePolicy:
MyDatabase:
Type: AWS::RDS::DBInstance
DeletionPolicy: Snapshot # Take snapshot before delete
UpdateReplacePolicy: Snapshot # Take snapshot if resource must be replaced during update
Properties:
...
Stack rollback behavior: If any resource fails to create/update, CloudFormation rolls back the entire stack to the last known good state. Exception: --disable-rollback flag (useful for debugging but dangerous in prod).
Mental model: CDK is a layer on top of CloudFormation. You write infrastructure in a real programming language (TypeScript, Python, Java, Go, C#), and CDK synthesizes it to a CloudFormation template. The power is reusable abstractions, type safety, and your IDE's autocomplete.
Construct levels:
new ec2.CfnInstance(this, 'MyEC2', { ... }).new ec2.Instance(this, 'MyEC2', { instanceType: ec2.InstanceType.of(...) }). Most used in practice.new patterns.ApplicationLoadBalancedFargateService(this, 'Service', { ... }). One-liner for ALB + ECS + IAM roles + logging.// CDK stack example: ECS service with auto-scaling
import * as cdk from 'aws-cdk-lib';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as ecs_patterns from 'aws-cdk-lib/aws-ecs-patterns';
export class AppStack extends cdk.Stack {
constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const vpc = new ec2.Vpc(this, 'AppVpc', { maxAzs: 2 });
const cluster = new ecs.Cluster(this, 'Cluster', { vpc });
const fargateService = new ecs_patterns.ApplicationLoadBalancedFargateService(this, 'Service', {
cluster,
cpu: 512,
memoryLimitMiB: 1024,
desiredCount: 2,
taskImageOptions: {
image: ecs.ContainerImage.fromEcrRepository(
ecr.Repository.fromRepositoryName(this, 'Repo', 'my-app')
),
environment: {
NODE_ENV: 'production',
},
secrets: {
DB_PASSWORD: ecs.Secret.fromSecretsManager(
secretsmanager.Secret.fromSecretNameV2(this, 'DbSecret', 'myapp/db-password')
),
},
},
});
// Auto-scaling
const scaling = fargateService.service.autoScaleTaskCount({ maxCapacity: 10 });
scaling.scaleOnCpuUtilization('CpuScaling', {
targetUtilizationPercent: 70,
scaleInCooldown: cdk.Duration.seconds(60),
scaleOutCooldown: cdk.Duration.seconds(60),
});
}
}
CDK Aspects: Cross-cutting concerns applied to the entire construct tree. Use to enforce policies (e.g., all S3 buckets must have encryption, all lambdas must have X-Ray tracing).
class EnforceEncryption implements cdk.IAspect {
visit(node: IConstruct): void {
if (node instanceof s3.CfnBucket) {
if (!node.bucketEncryption) {
cdk.Annotations.of(node).addError('S3 bucket must have encryption enabled');
}
}
}
}
// Apply to entire app
cdk.Aspects.of(app).add(new EnforceEncryption());
CDK Pipelines: A self-mutating pipeline built with CDK. The pipeline's first action is cdk deploy of the pipeline itselfβso changing the pipeline definition just requires a push. Uses CodePipeline under the hood.
const pipeline = new pipelines.CodePipeline(this, 'Pipeline', {
synth: new pipelines.ShellStep('Synth', {
input: pipelines.CodePipelineSource.connection('org/repo', 'main', {
connectionArn: 'arn:aws:codestar-connections:...',
}),
commands: ['npm ci', 'npm run build', 'npx cdk synth'],
}),
});
pipeline.addStage(new AppStage(this, 'Dev', {
env: { account: '222222222222', region: 'us-east-1' }
}), {
post: [
new pipelines.ShellStep('IntegTest', {
commands: ['curl -f $ENDPOINT/health'],
envFromCfnOutputs: { ENDPOINT: deployedStage.loadBalancerDnsName },
}),
],
});
pipeline.addStage(new AppStage(this, 'Prod', { env: { ... } }), {
pre: [new pipelines.ManualApprovalStep('PromoteToProd')],
});
CDK commands:
cdk synth: Synthesize CloudFormation template (no deployment)cdk diff: Show what will change vs deployed stackcdk deploy: Deploy (synth + create change set + execute)cdk destroy: Destroy stackcdk bootstrap: Create the CDKToolkit stack (S3 bucket, ECR repo, IAM roles) in an account/regionβrequired once per account/region before first deployState management (critical topic):
# Remote state in S3 with DynamoDB locking
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/us-east-1/app/terraform.tfstate"
region = "us-east-1"
encrypt = true
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/..."
# DynamoDB for state locking (prevents concurrent applies)
dynamodb_table = "terraform-state-lock"
}
}
# DynamoDB table for locking (must have LockID as partition key)
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
AWS provider configuration:
provider "aws" {
region = "us-east-1"
# For cross-account: assume a role
assume_role {
role_arn = "arn:aws:iam::TARGET_ACCOUNT:role/TerraformDeployRole"
session_name = "terraform-deploy"
}
}
# Multiple providers for multi-region/multi-account
provider "aws" {
alias = "us_west"
region = "us-west-2"
assume_role {
role_arn = "arn:aws:iam::TARGET_ACCOUNT:role/TerraformDeployRole"
}
}
Workspace pattern for multi-env:
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
terraform workspace select prod
terraform apply -var-file="prod.tfvars"
Key Terraform vs CDK/CloudFormation tradeoffs:
| Dimension | Terraform | CDK/CloudFormation |
|---|---|---|
| Multi-cloud | Yes (provider model) | AWS only |
| State management | Explicit (S3/DynamoDB) | Managed by AWS |
| Language | HCL (simple) | Full language (TypeScript etc.) |
| Import existing | terraform import | cdk import (newer) |
| Drift handling | terraform plan shows drift | Drift detection + console |
| Rollback | Manual (re-apply old state) | Automatic on CloudFormation |
IAM access keys (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY) stored in GitHub secrets are a security liabilityβthey don't expire, can be exfiltrated, and require manual rotation. OIDC federation lets GitHub Actions obtain short-lived credentials by exchanging a GitHub-signed JWT for temporary AWS credentials via sts:AssumeRoleWithWebIdentity. The credentials last ~1 hour and scope is defined by the IAM role's trust policy.
Step 1: Create OIDC provider in AWS (once per account):
aws iam create-open-id-connect-provider \
--url https://token.actions.githubusercontent.com \
--client-id-list sts.amazonaws.com \
--thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1
Step 2: Create IAM role with trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
},
"StringLike": {
// Scope to specific repo and branch (principle of least privilege)
"token.actions.githubusercontent.com:sub": "repo:org/repo:ref:refs/heads/main"
}
}
}
]
}
Step 3: GitHub Actions workflow:
name: Deploy to AWS
on:
push:
branches: [main]
permissions:
id-token: write # Required to request the JWT
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsDeployRole
role-session-name: github-actions-deploy
aws-region: us-east-1
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Build, tag, and push image to ECR
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/my-app:$IMAGE_TAG .
docker push $ECR_REGISTRY/my-app:$IMAGE_TAG
- name: Deploy to ECS
uses: aws-actions/amazon-ecs-deploy-task-definition@v1
with:
task-definition: task-definition.json
service: my-service
cluster: my-cluster
wait-for-service-stability: true
# Assume a deployment role in a different account
- name: Configure AWS credentials (prod account)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::PROD_ACCOUNT:role/GitHubDeployRole
role-chaining: true # Uses current credentials to assume next role
aws-region: us-east-1
# .github/workflows/deploy-reusable.yml
on:
workflow_call:
inputs:
environment:
required: true
type: string
role-arn:
required: true
type: string
jobs:
deploy:
environment: ${{ inputs.environment }} # Uses GitHub Environment for approval gates
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ inputs.role-arn }}
aws-region: us-east-1
- run: # deploy commands
ECR offers two scanning modes:
# Enable enhanced scanning on registry level
aws ecr put-registry-scanning-configuration \
--scan-type ENHANCED \
--rules '[{
"repositoryFilters": [{"filter": "*", "filterType": "WILDCARD"}],
"scanFrequency": "CONTINUOUS_SCAN"
}]'
Gate on scan results in CI:
# Wait for scan to complete, fail build if CRITICAL findings
IMAGE_TAG=$CODEBUILD_RESOLVED_SOURCE_VERSION
aws ecr wait image-scan-complete \
--repository-name my-app \
--image-id imageTag=$IMAGE_TAG
FINDINGS=$(aws ecr describe-image-scan-findings \
--repository-name my-app \
--image-id imageTag=$IMAGE_TAG \
--query 'imageScanFindings.findingSeverityCounts.CRITICAL' \
--output text)
if [ "$FINDINGS" != "None" ] && [ "$FINDINGS" -gt "0" ]; then
echo "CRITICAL vulnerabilities found: $FINDINGS. Failing build."
exit 1
fi
{
"rules": [
{
"rulePriority": 1,
"description": "Keep last 10 production images",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["prod-"],
"countType": "imageCountMoreThan",
"countNumber": 10
},
"action": { "type": "expire" }
},
{
"rulePriority": 2,
"description": "Expire untagged images after 7 days",
"selection": {
"tagStatus": "untagged",
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 7
},
"action": { "type": "expire" }
}
]
}
The default ECS service update strategy. ECS gradually replaces old tasks with new ones. Controlled by two parameters on the service:
minimumHealthyPercent: Minimum % of desired tasks that must stay running (e.g., 50 means half can be stopped to deploy).maximumPercent: Maximum % of desired tasks that can run simultaneously (e.g., 200 means double capacity during rollout).# Register new task definition
TASK_DEF_ARN=$(aws ecs register-task-definition \
--cli-input-json file://task-definition.json \
--query 'taskDefinition.taskDefinitionArn' --output text)
# Update service (triggers rolling deployment)
aws ecs update-service \
--cluster my-cluster \
--service my-service \
--task-definition $TASK_DEF_ARN \
--deployment-configuration "minimumHealthyPercent=50,maximumPercent=200" \
--force-new-deployment
ECS blue/green requires:
deploymentController: { type: CODE_DEPLOY }# appspec.yml for ECS blue/green
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "<TASK_DEFINITION>"
LoadBalancerInfo:
ContainerName: "my-app"
ContainerPort: 80
Hooks:
- AfterAllowTestTraffic: "arn:aws:lambda:us-east-1:123456789012:function:ValidateGreen"
Traffic shifting for ECS: The test listener receives 100% traffic to green. After validation Lambda passes, production traffic shifts (AllAtOnce, Canary, or Linear). Blue remains available for immediate rollback until terminationWaitTimeInMinutes expires.
Mechanism: Two identical production environments. Traffic lives on blue. Deploy to green. Switch traffic (DNS, load balancer, Route 53 weighted routing) to green. Blue is standby for instant rollback.
Rollback: Flip traffic back to blue (seconds).
Cost: Double infrastructure during transition.
When to use: High-stakes deployments where instant rollback capability justifies the cost. Lambda (CodeDeploy does this natively), ECS, EC2 with ALB.
Mechanism: Route a small percentage of traffic (e.g., 5% or 10%) to the new version. Monitor metrics. Gradually increase percentage. If metrics degrade, rollback. If stable, shift 100%.
AWS implementations:
Weight: 5 for new version, Weight: 95 for old.aws lambda update-alias --routing-config AdditionalVersionWeights='{"2": 0.05}'LambdaCanary10Percent5Minutes configs.Mechanism: Update a fraction of instances/tasks at a time. At any point, some instances run old version and some run new. No extra infrastructure cost, but requires backward compatibility during transition.
Risk: Mixed-version state. Database schemas and API contracts must be backward compatible.
ECS rolling: Controlled by minimumHealthyPercent / maximumPercent.
Kubernetes: RollingUpdate strategy with maxUnavailable and maxSurge.
Mechanism: Code for both old and new behavior is deployed. A flag (config value) controls which path executes per user/request. Full deployment before any user sees the feature. Gradual rollout by flipping flags.
AWS services: AppConfig (part of Systems Manager) is AWS's native feature flag / config management service.
# AppConfig example: check flag value
import boto3
appconfig = boto3.client('appconfigdata')
# Start session
session = appconfig.start_configuration_session(
ApplicationIdentifier='my-app',
EnvironmentIdentifier='production',
ConfigurationProfileIdentifier='feature-flags',
RequiredMinimumPollIntervalInSeconds=30
)
response = appconfig.get_latest_configuration(ConfigurationToken=session['InitialConfigurationToken'])
flags = json.loads(response['Configuration'].read())
if flags.get('new_checkout_flow', False):
return new_checkout(cart)
else:
return legacy_checkout(cart)
Comparison:
| Strategy | Downtime | Rollback Speed | Cost | Risk |
|---|---|---|---|---|
| Recreate | Yes | Fast (redeploy) | Low | High |
| Rolling | No | Slow (re-roll) | Low | Medium |
| Blue/Green | No | Instant | High (2x infra) | Low |
| Canary | No | Fast | Medium | Low |
| Feature flags | No | Instant (flag flip) | Low | Low |
The key pattern: define alarms on metrics that indicate unhealthy deployments, then configure CodeDeploy or CodePipeline to automatically rollback when these alarms fire.
Critical deployment metrics:
5XXError count on ALB/API GatewayHealthyHostCount on target groups (drops during bad deploy)Errors and DurationCPUUtilization spike (runaway new task)PutMetricData)# CloudWatch alarm for 5xx rate
aws cloudwatch put-metric-alarm \
--alarm-name "prod-5xx-rate-high" \
--metric-name "HTTPCode_Target_5XX_Count" \
--namespace "AWS/ApplicationELB" \
--statistic Sum \
--period 60 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:deploy-alerts" \
--dimensions "Name=LoadBalancer,Value=app/my-alb/1234567890abcdef"
Linking alarms to CodeDeploy rollback:
{
"alarmConfiguration": {
"enabled": true,
"ignorePollAlarmFailure": false,
"alarms": [
{ "name": "prod-5xx-rate-high" },
{ "name": "prod-healthy-hosts-low" }
]
},
"autoRollbackConfiguration": {
"enabled": true,
"events": ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
}
}
CloudWatch Synthetics Canaries: Scheduled Lambda functions that simulate user journeys. Run every 1 minute. If they fail, the alarm fires β triggers rollback.
// Synthetics canary: test the deployed endpoint
const synthetics = require('Synthetics');
const syntheticsConfiguration = synthetics.getConfiguration();
const apiCanaryBlueprint = async function () {
const requestOptionsTemplate = {
hostname: process.env.ENDPOINT,
method: 'GET',
path: '/health',
port: 443,
protocol: 'https:',
};
await synthetics.executeHttpStep('Verify health endpoint', requestOptionsTemplate);
};
exports.handler = async () => {
return await apiCanaryBlueprint();
};
Container Insights for ECS deployments:
# Enable Container Insights on ECS cluster
aws ecs update-cluster-settings \
--cluster my-cluster \
--settings name=containerInsights,value=enabled
This gives you per-service CPU, memory, network, and storage metrics in CloudWatch, plus automated dashboards.
EventBridge + SNS pattern:
// EventBridge rule: notify on deployment state change
{
"source": ["aws.codedeploy"],
"detail-type": ["CodeDeploy Deployment State-change Notification"],
"detail": {
"state": ["FAILURE", "STOP", "SUCCESS"],
"deploymentGroup": ["prod-deployment-group"]
}
}
CloudTrail: Every CodePipeline, CodeBuild, CodeDeploy API call is logged to CloudTrail automatically. This is your immutable audit log: who started a deployment, when, with what source revision. Essential for compliance (SOC2, PCI-DSS). See Observability for CloudWatch dashboards and alerting on deployment metrics.
Hierarchical key-value store for configuration data and secrets. Two tiers:
SecureString parameters are KMS-encrypted at rest.
# Write parameters
aws ssm put-parameter \
--name "/myapp/prod/db-password" \
--value "super-secret-password" \
--type SecureString \
--key-id "alias/myapp-key" \
--tier Advanced
aws ssm put-parameter \
--name "/myapp/prod/db-host" \
--value "mydb.cluster-xyz.us-east-1.rds.amazonaws.com" \
--type String
# Read in application
aws ssm get-parameter \
--name "/myapp/prod/db-password" \
--with-decryption \
--query 'Parameter.Value' \
--output text
Hierarchy and versioning: Parameters are versioned (1, 2, 3...). Use GetParameter with --version to pin to a specific version. Path-based hierarchy lets you GetParametersByPath --path /myapp/prod/ to retrieve all prod configs at once.
In CodeBuild buildspec:
env:
parameter-store:
DB_PASSWORD: "/myapp/prod/db-password"
API_KEY: "/myapp/prod/api-key"
For secrets that require automatic rotation. More expensive than Parameter Store but handles rotation for RDS, Redshift, DocumentDB, and custom Lambda-based rotation.
import boto3, json
def get_secret(secret_name):
client = boto3.client('secretsmanager', region_name='us-east-1')
response = client.get_secret_value(SecretId=secret_name)
# response['SecretString'] is JSON for DB credentials
return json.loads(response['SecretString'])
creds = get_secret('myapp/prod/rds-credentials')
connection = psycopg2.connect(
host=creds['host'],
port=creds['port'],
database=creds['dbname'],
user=creds['username'],
password=creds['password']
)
Parameter Store vs Secrets Manager:
| Feature | Parameter Store | Secrets Manager |
|---|---|---|
| Cost | Free (standard) / $0.05/param (advanced) | $0.40/secret/month + $0.05/10k API calls |
| Auto-rotation | No | Yes (Lambda-based) |
| Cross-account | Via resource policy | Via resource policy |
| Hierarchy | Yes (/app/env/key) | Via naming convention |
| Size limit | 4KB (8KB advanced) | 64KB |
| Best for | Config data, non-rotating secrets | Database passwords, API keys needing rotation |
Secure browser-based or CLI shell access to EC2 instances (and on-prem via Hybrid Activations) without SSH ports, bastion hosts, or SSH keys. Sessions are IAM-controlled and fully logged to S3/CloudWatch.
# Start interactive session to EC2 instance
aws ssm start-session --target i-0123456789abcdef0
# Port forwarding (replace SSH tunnel)
aws ssm start-session \
--target i-0123456789abcdef0 \
--document-name AWS-StartPortForwardingSession \
--parameters '{"portNumber":["5432"],"localPortNumber":["5432"]}'
# Tunnel to RDS via EC2 bastion (without SSH key)
aws ssm start-session \
--target i-0123456789abcdef0 \
--document-name AWS-StartPortForwardingSessionToRemoteHost \
--parameters '{"host":["mydb.xyz.us-east-1.rds.amazonaws.com"],"portNumber":["5432"],"localPortNumber":["5432"]}'
Requirements: EC2 instance must have AmazonSSMManagedInstanceCore policy and the SSM Agent running.
Execute commands on multiple instances simultaneously without SSH. Uses SSM Documents (JSON/YAML runbooks).
# Run a shell command across instances with tag Environment=prod
aws ssm send-command \
--document-name "AWS-RunShellScript" \
--targets "Key=tag:Environment,Values=prod" \
--parameters 'commands=["sudo systemctl restart myapp"]' \
--timeout-seconds 30 \
--output-s3-bucket-name my-ssm-logs \
--output-s3-key-prefix run-command-logs/
# Check command status
aws ssm list-command-invocations \
--command-id <command-id> \
--details
Automate OS patching across EC2 instances. Define Patch Baselines (which patches to approve/reject) and Maintenance Windows (when to apply them).
# Create patch baseline
aws ssm create-patch-baseline \
--name "prod-linux-baseline" \
--operating-system "AMAZON_LINUX_2" \
--approval-rules "PatchRules=[{PatchFilterGroup:{PatchFilters=[{Key=SEVERITY,Values=[Critical,High]}]},ApproveAfterDays=7}]" \
--approved-patches-compliance-level CRITICAL
# Create maintenance window: every Sunday 2 AM UTC
aws ssm create-maintenance-window \
--name "prod-patch-window" \
--schedule "cron(0 2 ? * SUN *)" \
--duration 4 \
--cutoff 1 \
--allow-unassociated-targets false
Part of Systems Manager. Validates configurations before deployment and enables safe, monitored config rollouts with automatic rollback.
# Create feature flags configuration
aws appconfig create-configuration-profile \
--application-id myapp \
--name feature-flags \
--location-uri hosted \
--type AWS.AppConfig.FeatureFlags
# Deploy with 10-minute bake time, rollback on CloudWatch alarm
aws appconfig start-deployment \
--application-id myapp \
--environment-id prod \
--deployment-strategy-id rolling-10-min \
--configuration-profile-id feature-flags \
--configuration-version 2 \
--description "Enable new checkout flow for 10% of users"
Q: What's the difference between CodeDeploy in-place and blue/green, and when would you choose each?
A: In-place stops the old application on each instance, installs the new version, and restarts it. The instance is briefly out of service during this window, so it requires careful management of minimumHealthyHosts to maintain capacity. It's cost-effective because no new infrastructure is provisioned. Blue/green provisions an entirely new environment, validates the deployment there, then shifts trafficβgiving you zero downtime and an instant rollback (just flip traffic back). I'd use in-place for non-critical internal services or batch workers where brief unavailability is acceptable. I'd always use blue/green for user-facing production services, Lambda functions, and ECS services where availability and rollback speed matter.
Q: Walk me through how you'd set up a secure CI/CD pipeline from GitHub to AWS without long-lived credentials.
A: OIDC federation. First, create an IAM OIDC identity provider in AWS for token.actions.githubusercontent.com. Then create an IAM role with a trust policy that conditions AssumeRoleWithWebIdentity on the sub claim matching the specific repo and branch (e.g., repo:org/repo:ref:refs/heads/main)βthis prevents other repos from assuming the role. Attach the minimum-privilege policy needed (ECR push, ECS update-service, etc.). In the GitHub Actions workflow, set permissions: id-token: write, use aws-actions/configure-aws-credentials with role-to-assume. GitHub mints a JWT, exchanges it for temporary STS credentials (1 hour TTL). No secrets stored anywhere.
Q: How would you build a multi-account pipeline that promotes a tested artifact from dev to prod?
A: Single Tools/CICD account owns CodePipeline, CodeBuild, the ECR repository, the S3 artifact bucket, and the KMS key. The pipeline builds once, produces a versioned image, and promotes the same artifact through environmentsβnever rebuilds. For each target account (Dev, Staging, Prod), create an IAM role with a trust policy trusting the pipeline's execution role in the Tools account. Update the S3 artifact bucket policy and KMS key policy to allow Decrypt/GetObject by these cross-account roles. In the pipeline, each deploy action specifies RoleArn of the target account's role. Manual approval gates separate staging β prod. The key principle: build once, deploy many times.
Q: What's the difference between CloudFormation nested stacks and StackSets?
A: Nested stacks are about template decomposition within a single account/region. A parent stack references child stack templatesβyou break a large template into reusable modules (network, IAM, compute). They still create a single logical stack hierarchy. StackSets are about multi-account/multi-region deployment. A single StackSet definition deploys identical stacks to multiple accounts (member accounts in an AWS Organization) and multiple regions. It's for governance at scaleβe.g., deploying security baselines or VPC configurations to all 50 accounts in your org in one operation. The admin account manages the StackSet; execution roles in each member account perform the actual stack creation.
Q: When would you use AWS CDK over Terraform? When would you use Terraform?
A: CDK when the team is already TypeScript/Python developers, when you want type safety, IDE support, and reusable L2/L3 constructs that encode best practices. CDK Aspects are powerful for enforcing compliance policies across the entire construct tree. CDK Pipelines give you a self-mutating pipeline that's a first-class CDK construct. I'd choose Terraform when the infrastructure spans multiple clouds (not just AWS), when the team has existing Terraform expertise and modules, or when you need strict control over the change lifecycle with plan/apply separation. Terraform's state model is more explicit and inspectable. Both are valid; the practical answer often comes down to team skill set and the existing ecosystem.
Q: How does CodeBuild handle secrets, and what are the common mistakes?
A: Secrets should come from Parameter Store (parameter-store key in buildspec env section) or Secrets Manager (secrets-manager key). CodeBuild automatically decrypts them at build start using the project's service role (which must have ssm:GetParameter and kms:Decrypt permissions). The common mistakes: (1) putting secrets in plaintext environment variables in the CodeBuild project consoleβthey show up in build logs and in describe-projects output; (2) echoing environment variables in build commandsβalways mask with echo "***" or avoid echoing sensitive vars; (3) not scoping the IAM roleβthe CodeBuild role should only be able to access /myapp/prod/* parameters, not all parameters in the account; (4) forgetting --with-decryption when reading SecureString parameters via CLI in a build script.
Q: Explain the CodePipeline artifact model and how artifacts flow between stages.
A: Every pipeline has an S3 artifact bucket (specified at pipeline creation). When a source action (e.g., GitHub) triggers, it downloads the source code and zips it into an artifact in that S3 bucket, encrypted with KMS. The artifact has a logical name (SourceArtifact). The next stage's action declares this as an inputArtifactβCodePipeline downloads and unzips it into the build environment. That action can produce its own outputArtifact (e.g., BuildArtifact containing imagedefinitions.json). This continues through stages. Key point: artifacts are versioned per pipeline execution. If two parallel actions in a stage both produce artifacts, each has a distinct S3 key. Cross-account pipelines require the artifact bucket's KMS key to be accessible by the cross-account deployment role.
Q: What is SSM Session Manager and why is it better than a bastion host?
A: Session Manager is an IAM-controlled, agent-based mechanism for shell access to instances and port forwarding. The advantages over a bastion host: (1) No open inbound portsβthe SSM agent initiates an outbound connection to the SSM endpoint; (2) No SSH keys to manage, distribute, or rotate; (3) Access is controlled entirely by IAM policies (who can call ssm:StartSession on which instance); (4) All session activity is auditedβkeystrokes are logged to S3/CloudWatch Logs; (5) No separate EC2 bastion to patch, maintain, and pay for. The agent needs outbound HTTPS to SSM endpoints (or VPC endpoints for fully private scenarios). For databases, the port forwarding feature replaces SSH tunnels completely.
Q: What deployment strategy would you use for a Lambda function that processes financial transactions?
A: CodeDeploy with a canary or linear traffic shifting on a Lambda alias. I'd use LambdaCanary10Percent5Minutes to start: 10% of invocations go to the new version for 5 minutes while I monitor. I'd attach pre-traffic and post-traffic validation Lambda hooksβthe pre-traffic hook runs integration tests against the new version before any live traffic is sent. I'd configure CloudWatch alarms on the new version's Errors and Duration metrics, linked to CodeDeploy for automatic rollback. If the error rate exceeds threshold during the canary window, CodeDeploy automatically rolls back by updating the alias to 100% on the old version. The whole thing is defined in an AppSpec file and triggered by CodePipeline.
Q: How do you handle database migrations in a blue/green deployment?
A: This is the hardest part. The key constraint: during the switchover window, both blue (old) and green (new) versions may be processing requests, so the database must be compatible with both. The pattern is expand-contract migrations (also called parallel-change): (1) Expand: add new columns/tables as nullable with defaultsβboth old and new code can read/write safely; (2) deploy new code (which starts writing to new columns); (3) backfill old rows; (4) Contract: after all old code is gone, run a second migration to add constraints, remove old columns, or make nullable columns required. Never do rename-and-drop in a single deploy. For blue/green specifically, run migrations before switching trafficβthe old (blue) version must tolerate the expanded schema. Tools like Flyway and Liquibase support this with their migration + undo pattern.
The modern pattern for GitHub Actions β AWS is to use OpenID Connect (OIDC) to get temporary credentials via AssumeRoleWithWebIdentity. No AWS access keys stored in GitHub Secrets.
# Step 1: Create OIDC identity provider in AWS IAM
aws iam create-open-id-connect-provider \
--url https://token.actions.githubusercontent.com \
--client-id-list sts.amazonaws.com \
--thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1
// Step 2: Create IAM role with trust policy for GitHub
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"},
"StringLike": {
// Restrict to your org/repo and optionally a specific branch
"token.actions.githubusercontent.com:sub": "repo:my-org/my-repo:ref:refs/heads/main"
}
}
}]
}
name: Deploy to AWS
on:
push:
branches: [main]
permissions:
id-token: write # Required for OIDC
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsDeployRole
aws-region: us-east-1
# No access-key-id or secret-access-key needed!
- name: Deploy with CDK
run: npx cdk deploy --all --require-approval never
- name: Build and push to ECR
run: |
aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_REGISTRY
docker build -t my-app .
docker push $ECR_REGISTRY/my-app:$GITHUB_SHA
env:
ECR_REGISTRY: 123456789012.dkr.ecr.us-east-1.amazonaws.com
refs/heads/main β prevents feature branches from deploying to prodSee also: IAM & Security for OIDC identity provider configuration and trust policy patterns.
CDK Pipelines creates a CodePipeline that can update itself before deploying your application. See Developer Tools for the full CDK ecosystem and construct library reference.
import * as pipelines from 'aws-cdk-lib/pipelines';
import * as codecommit from 'aws-cdk-lib/aws-codecommit';
// Define your application stage (what gets deployed to each environment)
class ApiStage extends cdk.Stage {
constructor(scope: Construct, id: string, props?: cdk.StageProps) {
super(scope, id, props);
new ApiStack(this, 'Api'); // Lambda + API Gateway
new DatabaseStack(this, 'Database'); // DynamoDB
}
}
class PipelineStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const pipeline = new pipelines.CodePipeline(this, 'Pipeline', {
// Source: GitHub via OIDC (preferred) or CodeStar connection
synth: new pipelines.ShellStep('Synth', {
input: pipelines.CodePipelineSource.connection('my-org/my-repo', 'main', {
connectionArn: 'arn:aws:codestar-connections:us-east-1:...:connection/...',
}),
commands: [
'npm ci',
'npm run build',
'npm run test', // Unit tests before synth
'npx cdk synth',
],
primaryOutputDirectory: 'cdk.out',
}),
selfMutation: true, // Pipeline updates itself before deploying app
crossAccountKeys: true, // KMS keys for cross-account artifact encryption
dockerEnabledForSynth: true,
// Publish assets before pipeline execution (faster parallel deploys)
publishAssetsInParallel: true,
});
// Staging: auto-deploy, run integration tests
const stagingDeployment = pipeline.addStage(
new ApiStage(this, 'Staging', { env: { account: '111111111111', region: 'us-east-1' } }),
{
pre: [
new pipelines.ShellStep('LintAndTest', { commands: ['npm run lint', 'npm test'] }),
],
post: [
new pipelines.ShellStep('IntegrationTests', {
envFromCfnOutputs: { API_URL: stagingStack.apiUrl },
commands: ['npm run test:integration'],
}),
new pipelines.ShellStep('LoadTest', {
commands: ['npm run test:load -- --threshold-p99=500ms'],
}),
],
}
);
// Production: manual approval gate + canary deployment
pipeline.addStage(
new ApiStage(this, 'Production', { env: { account: '222222222222', region: 'us-east-1' } }),
{
pre: [
new pipelines.ManualApprovalStep('ApproveProductionDeploy', {
comment: 'Review staging test results before deploying to production',
}),
],
post: [
new pipelines.ShellStep('SmokeTest', {
commands: ['curl -f $API_URL/health || exit 1'],
}),
],
}
);
}
}
When the pipeline detects changes to the pipeline definition itself (new stages, new steps, CDK version upgrade):
Synth step to generate new CloudFormation templatesUpdatePipeline action applies the new pipeline CloudFormation definitionThis means you never manually update the pipeline β just push CDK code changes.
# 1. Start LocalStack or real DynamoDB Local for dependencies
docker run -p 8000:8000 amazon/dynamodb-local
docker run -p 4566:4566 localstack/localstack # or use real AWS in dev environment
# 2. Set up environment variables for local testing
cat > env.json << 'EOF'
{
"OrdersFunction": {
"TABLE_NAME": "orders-dev",
"AWS_ENDPOINT_URL": "http://localhost:8000"
}
}
EOF
# 3. Invoke a single function locally
sam local invoke OrdersFunction \
--event events/create-order.json \
--env-vars env.json \
--docker-network my-local-net
# 4. Start local API Gateway (all functions)
sam local start-api \
--env-vars env.json \
--docker-network my-local-net \
--warm-containers EAGER # keep containers warm between invocations
# 5. Generate sample events
sam local generate-event apigateway aws-proxy > events/api-event.json
sam local generate-event sqs receive-message > events/sqs-event.json
sam local generate-event s3 put > events/s3-event.json
# 6. Build (using container for Lambda-compatible deps)
sam build --use-container --container-env-var PIP_EXTRA_INDEX_URL=...
# 7. Deploy with guided setup (first time)
sam deploy --guided
# Creates samconfig.toml with your choices
# 8. Fast iteration: sam sync (no CloudFormation, direct Lambda update)
sam sync --watch --stack-name my-app-dev
| sam deploy | sam sync | |
|---|---|---|
| Method | CloudFormation ChangeSet | Direct API calls |
| Speed | 1-3 minutes | 3-5 seconds |
| Drift | No drift | Creates drift from CF state |
| Rollback | Yes (CloudFormation) | Manual |
| Use for | CI/CD, staging, prod | Rapid local iteration only |
# SAM template with canary deployment
Resources:
ApiFunction:
Type: AWS::Serverless::Function
Properties:
AutoPublishAlias: live
DeploymentPreference:
Type: Canary10Percent5Minutes # 10% β wait 5min β 100%
Alarms:
- !Ref ErrorRateAlarm
- !Ref LatencyAlarm
Hooks:
PreTraffic: !Ref PreTrafficCheck
PostTraffic: !Ref PostTrafficCheck
// PreTraffic hook: validate new version before any traffic
exports.handler = async (event) => {
const functionToTest = process.env.NewVersion; // ARN of new version
// Run validation against new version directly
const result = await lambda.invoke({
FunctionName: functionToTest,
Payload: JSON.stringify({ path: '/health', httpMethod: 'GET' }),
Qualifier: '$LATEST',
}).promise();
const statusCode = JSON.parse(result.Payload).statusCode;
if (statusCode !== 200) {
// Fail the deployment β CodeDeploy rolls back automatically
await codedeploy.putLifecycleEventHookExecutionStatus({
deploymentId: event.DeploymentId,
lifecycleEventHookExecutionId: event.LifecycleEventHookExecutionId,
status: 'Failed',
}).promise();
} else {
await codedeploy.putLifecycleEventHookExecutionStatus({...status: 'Succeeded'}).promise();
}
};
AWS-CreateImage, AWS-StopEC2InstanceWithApproval; event-triggered automation| Criterion | Session Manager | Bastion Host |
|---|---|---|
| Setup | SSM Agent (pre-installed on Amazon Linux 2) | EC2 instance + key pair + SG rule |
| Network | No inbound port needed | Port 22 must be open from bastion |
| Cost | Free (included with SSM) | EC2 instance cost |
| Audit | Session logs to S3/CloudWatch | Manual logging (or Jumpcloud) |
| MFA | IAM MFA on AssumeRole | SSH key management |
| Verdict | Always prefer Session Manager | Legacy only |
describe-projects output and build logs. Always Parameter Store or Secrets Manager.aws cloudformation update-stack directly in production." Always create a change set first. Describe it. Review. Then execute. Skipping this in production is how you delete a database.latest tag for ECR images in production." Mutable, non-reproducible, impossible to rollback to a known version. Always use immutable tags (commit SHA or semantic version).docker build fails at the daemon socket connection. This is a common CodeBuild gotcha.